0% found this document useful (0 votes)
49 views36 pages

Statistics: Shaheena Bashir

This document discusses hypothesis testing and provides examples of hypothesis tests for a single population mean using a z-test. It explains the concepts of the null and alternative hypotheses, type 1 and type 2 errors, test statistics, critical regions, p-values, and how to set up and conduct hypothesis tests for single means when the population standard deviation is known and the sample size is large. Examples are provided to illustrate hypothesis testing for a single population mean.

Uploaded by

Qasim Rafi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views36 pages

Statistics: Shaheena Bashir

This document discusses hypothesis testing and provides examples of hypothesis tests for a single population mean using a z-test. It explains the concepts of the null and alternative hypotheses, type 1 and type 2 errors, test statistics, critical regions, p-values, and how to set up and conduct hypothesis tests for single means when the population standard deviation is known and the sample size is large. Examples are provided to illustrate hypothesis testing for a single population mean.

Uploaded by

Qasim Rafi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

1/36

Statistics

Inferential Statistics
Hypothesis Testing

Shaheena Bashir

FALL, 2019
2/36
Outline

Hypothesis Testing
Tests for Single Population Mean:Tests involving Normal
Distribution
Test for Single Population Mean: Large Sample
Test for Single Population Mean: Small Sample σ unknown

Tests for Proportion


Z -test for Population Proportion

Tests based on χ2 -distribution


χ2 -test for Variance
Contingency Table
χ2 -test for Independence in Contingency Table
χ2 -test for Homogeneity of Proportions
o
3/36
Hypothesis Testing

Statistical Inference

I Point Estimation
I Interval Estimation
I Hypothesis Testing o
4/36
Hypothesis Testing

Hypothesis

o
5/36
Hypothesis Testing

o
6/36
Hypothesis Testing

Hypothesis Testing: A Case Study

I Automobile engine emits 100 mg of nitrogen oxides per


second on average.
I Modified design engine has been proposed that may reduce
the emissions.
I A random sample of n = 50 modified engines gave average
emission of 92 mg/sec with a standard deviation of 21 mg/sec.
I Isn’t 92 far enough below 100 for us to say the modified
engine is better?

o
7/36
Hypothesis Testing

Concepts

Hypothesis Testing: which value of the parameter is most


consistent with the data? Useful not only in statistics, but also in
all sciences, e.g.,
I A diagnostic blood test for a person re cholesterol level, the
hypothesis being tested is that patient has the problem
I In pharmaceutical industry, the interest may be to find out if
the new drug is more effective in treating hypertension than
the standard drug.

o
8/36
Hypothesis Testing

Concepts of Hypothesis Testing

A method of statistical inference using sample data in a scientific


study
I Statistical hypothesis: A conjecture about a population
parameter. The conjecture may or may not be true, e.g.,
higher than normal cholestrol level
I Null Hypothesis Ho : A statement about the value of a
population parameter (e.g., mean, median, mode, variance,
standard deviation, proportion, etc.), Ho : µ = 100
I Alternative Hypothesis Ha : (or research hypothesis)
considered to be true if the null hypothesis is rejected,
Ha : µ 6= 100

o
9/36
Hypothesis Testing

Exercise

Formulate the null & the alternative hypothesis for each case:
I The automobile engine emission is 100 mg/sec.
I The average score of hockey players in a year is less than 10
I The average age of accountants is greater than 25.4 years
I The average pulse rate of marathon runners is less than 70
beats per minute
I The average yield of wheat crop is 5000 kg

o
10/36
Hypothesis Testing

Concepts of Hypothesis Testing Cont’d

I Statistical Test: A statistical formula that uses the data


from sample to make decision about whether the null
hypothesis should be rejected, e.g.,
x̄ − µo
z= √
σ/ n

o
11/36
Hypothesis Testing

Concepts of Hypothesis Testing Cont’d: Errors

In hypothesis testing anything can go wrong


I Blood test re to cholesterol level
I The test results show the elevated cholesterol level (outcome)
I The reality is that the person does not have higher
cholesterol, i.e., Ho is true
I The test result can be wrong

o
12/36
Hypothesis Testing

Concepts of Hypothesis Testing Cont’d

Decision
Reject Ho Do not Reject Ho
Ho True Type I error (α) Correct Decision
False positive True negative
Ho False Correct Decision Type II error (β)
True positive (1 − β) False negative
I Power of test (Sensitivity): The probability of not
committing Type II error, i.e., .................. 1 − β
I Specificity: The probability of not committing Type I error,
i.e., ..................1 − α

o
13/36
Hypothesis Testing

Concepts of Hypothesis Testing Cont’d

I Level of Significance (α): Maximum probability of


committing a type I error, e.g., α = 0.05 is assumed meaning
that ..................
I Critical Region: A range of values of the test value that
indicates that there is a significant difference & the Ho should
be rejected.

o
14/36
Hypothesis Testing

Concepts of Hypothesis Testing Cont’d

Given that the true population mean emission is 100 mg/s,


P(x̄ ≤ 92)?
I P-value: used in the context of hypothesis testing in order to
quantify the idea of statistical significance of evidence.
I Probability of getting a sample statistic at least as extreme as
the one that was actually observed, assuming Ho is true. It is
the estimated probability of rejecting the null hypothesis Ho of
a study question when that hypothesis is true
I p-value measures the strength of evidence against Ho . The
smaller the p-value (an indication of extreme results if Ho was
true), more statistical evidence exists to support the alternative
hypothesis

o
15/36
Hypothesis Testing

Concepts of Hypothesis Testing Cont’d


One-Tailed Test
Two-tailed Test Right-tailed test Left-tailed test
Ho : µ = µ o Ho : µ ≤ µo Ho : µ ≥ µ o
Ha : µ 6= µo Ha : µ > µo Ha : µ < µo

o
16/36
Hypothesis Testing

Exercise

The average production of peanuts is 3000 pounds per acre. A


new fertilizer is tested on 60 individual plots of land. The mean
yield with the new fertilizer is 3120 pounds per acre with a
standard deviation of 578 pounds. Can we conclude that average
production has increased?

o
17/36
Hypothesis Testing

Exercise

A fertilizer is helpful in improving the wheat yield. The test of


hypothesis on a sample data concluded that the treatment is not
helpful in improving the yield. Can you identify the type of error
committed?

o
18/36
Tests for Single Population Mean:Tests involving Normal Distribution
Test for Single Population Mean: Large Sample

Z -Test for a Population Mean: Large Sample


I Large sample, test the mean of a population
I State the null & the alternative hypothesis.
I Ho : µ = µo
I Ha : µ 6= µo or Ha : µ > µo or Ha : µ < µo
I Specify the test statistic to be used
x̄ − µo
z= √
σ/ n
I Set α (risk of a Type I error you are willing to take) at a
pre-specified level.
I Critical Region
I For Ha : µ 6= µo , use |z| > zα/2 (two-tailed test)
I For Ha : µ > µo , use z > zα (upper tailed test)
I For Ha : µ < µo , use z < −zα (lower tailed test) o
19/36
Tests for Single Population Mean:Tests involving Normal Distribution
Test for Single Population Mean: Large Sample

Z -test for a Mean Cont’d

I p-value approach: From the sample data, compute p-value,


the probability that one would obtain a sample result atleast
as extreme as the observed one given Ho is true.
I For Ha : µ 6= µo , p-value = P(|z| > |zobs |) (two-tailed test)
I For Ha : µ > µo , p-value = P(z > zobs ) (upper tailed test)
I For Ha : µ < µo , p-value = P(z < zobs ) (lower tailed test)
I Conclusion

o
20/36
Tests for Single Population Mean:Tests involving Normal Distribution
Test for Single Population Mean: Large Sample

Example

A drug manufacturer claimed that the mean potency of 1 of its


antibiotic was 80%. A random sample of 100 capsules tested
produced a mean of x̄ = 79.7% with a standard deviation of
s = .8%. Do the data present sufficient evidence to refute
manufacture’s claim?

o
21/36
Tests for Single Population Mean:Tests involving Normal Distribution
Test for Single Population Mean: Large Sample

Relationship Between Tests and Confidence Intervals

There is a close connection between confidence intervals and


two-sided tests:
If a 100(1 − α)% confidence interval is constructed and a
hypothesized parameter value is not in the interval, we reject that
value of the parameter at significance level α using a two-sided
test
I Thus values of a parameter in a confidence interval are
consistent with the data in the sense that they would not be
rejected if used as a value for the null hypothesis.
I Equivalently, values of the parameter not in the confidence
interval are inconsistent with the data since they would be
rejected if used as a value for the null hypothesis.
o
22/36
Tests for Single Population Mean:Tests involving Normal Distribution
Test for Single Population Mean: Large Sample

o
23/36
Tests for Single Population Mean:Tests involving Normal Distribution
Test for Single Population Mean: Small Sample σ unknown

Student’s t-distribution
I If parent population is normal, then x̄ ∼ N(µ, √σn ) &
x̄−µ
√ ∼ N(0, 1)
Z = σ/ n
I For non-normal population, then x̄ ≈ N(µ, √σn ) &
Z ∼ N(0, 1), if n is large
x̄−µ
√ ∼ tn−1 .
I For small sample size & σ unknown, then t = s/ n

Comparison of t Distributions
0.4

Distributions
df=1
df=3
df=8
df=30
0.3

normal
Density

0.2
0.1

o
0.0
24/36
Tests for Single Population Mean:Tests involving Normal Distribution
Test for Single Population Mean: Small Sample σ unknown

t-Test for a Population Mean: Small Sample & σ unknown


I State the null & the alternative hypothesis.
I Ho : µ = µo
I Ha : µ 6= µo or Ha : µ > µo or Ha : µ < µo
I Specify the test statistic to be used
x̄ − µo
t= √
s/ n
I Set α (risk of a Type I error you are willing to take) at a
pre-specified level.
I Critical Region
I p-value approach: From the sample data, compute p-value,
the probability that one would obtain a sample result at least
as extreme as the observed one given Ho is true.
I For Ha : µ 6= µo , p-value = P(|t| > |tobs |) (two-tailed test)
I For Ha : µ > µo , p-value = P(t > tobs ) (upper tailed test)
I For Ha : µ < µo , p-value = P(t < tobs ) (lower tailed test)
o
I Conclusion
25/36
Tests for Single Population Mean:Tests involving Normal Distribution
Test for Single Population Mean: Small Sample σ unknown

Example

In a study of the effect of cigarette smoking on the CO diffusing


capacity (DL) of the lung, researchers found that current smokers
had CO readings significantly lower than those of nonsmokers. The
CO diffusing capacity for a random sample of 20 current smokers
are listed below
103.7 88.6 73 123.09 91.05
92.3 61.68 90.68 84.02 76.01
100.61 88.02 71.21 82.11 89.22
102.75 108.58 73.15 106.75 90.48

Does the data indicate that the mean DL reading for current
smokers are lower than 100 DL, the average for non-smokers?
o
26/36
Tests for Proportion
Z -test for Population Proportion

Z -Test for a Proportion


I A hypothesis test involving proportions can be considered as
binomial experiment
I State the null & the alternative hypothesis.
I H o : p = po
I Ha : p 6= po or Ha : p > po or Ha : p < po
I Specify the test statistic to be used
p̂ − po
z=p
po qo /n
I Set level of significance α
I From the sample data, compute p-value
I Conclusion
o
27/36
Tests for Proportion
Z -test for Population Proportion

Z -Test for a Proportion: Assumptions

I n is large enough so that the sampling distribution of p̂ can be


approximated by normal distribution.
I npo > 5 and nqo > 5

o
28/36
Tests for Proportion
Z -test for Population Proportion

Example

A peony plant with red petals was crossed with another plant
having streaky petals. A geneticist states that 75% of the offspring
resulting from this cross will have red flowers. To test this claim,
100 seeds from this cross were collected and germinated and 58
plants had red petals. Test geneticist’s claim.

o
29/36
Tests based on χ2 -distribution

χ2 -distribution

I The χ2 is positively skewed distribution


I It depends on the number of degrees of freedom
I Uses of χ2 test
I Used to test if the variance of a population is equal to a
specified value, i.e., Ho : σ 2 = σo2
I Test of independence of two criteria of classification of
qualitative data,
I Test of homogeneity of proportions
I Test for goodness of fit of an observed distribution to a
theoretical one

o
30/36
Tests based on χ2 -distribution
χ2 -test for Variance

χ2 -test for Variance


I State the null & the alternative hypothesis.
I Ho : σ 2 = σo2
I Ha : σ 2 6= σo2 or Ha : σ 2 > σo2 or Ha : σ 2 < σo2
I Specify the test statistic to be used

(n − 1)s 2
χ2 =
σo2

I The test value is χ2 ∼ χ2(n−1)


I Set α at a level determined by how great a risk of a Type I
error you are willing to take. Traditional values of α are .05
and .01.
I Conclusion
o
31/36
Tests based on χ2 -distribution
χ2 -test for Variance

Example

A dairy processing company claims that the variance of the


amount of fat in the whole milk processed by the company is no
more than 0.25. To test this claim a random sample of 41 milk
containers gave a variance of 0.27. At α = 0.05, is there enough
evidence to reject the company’s claim? Assume the population is
normally distributed.

o
32/36
Tests based on χ2 -distribution
Contingency Table

Contingency Table

Absent Present Total


Female 12 45 57
Male 10 114 124
Total 22 129 181

I Classify an observation according to two categorical variables


to generate bivariate data
I Contingency tables show frequencies produced by
cross-classifying observations
I In the analysis of contingency tables, the objective is to
determine that if 1 method of classification is independent of
other method of classification
o
33/36
Tests based on χ2 -distribution
Contingency Table

Example
I Is there any association between the gender & the presence in
the office next day after Eid?
I The proportion of employees present are
45/57 = 0.79 & 114/124 = 0.92 for females & males
respectively.
I Do female employees take day off more than the male
employees after Eid?

o
34/36
Tests based on χ2 -distribution
χ2 -test for Independence in Contingency Table

χ2 -test for Independence


I State the null & the alternative hypothesis.
I Ho : The taking of a day off is independent of gender
I Ha : There is association between gender & taking the day off
after Eid
I Specify the test statistic to be used
X (O − E )2
χ2 =
E
row sum × clumn sum
E=
grand total
I The degrees of freedom for any contingency table are
d.f . = (R − 1)(C − 1)
I Set α at a level determined by how great a risk of a Type I
error you are willing to take. Traditional values of α are .05
and .01.
o
I Conclusion
35/36
Tests based on χ2 -distribution
χ2 -test for Homogeneity of Proportions

Test of Homogeneity of Proportions

I Ho : p1 = p2 = · · · = pk
I Ha : At least 1 proportion is different from others
I The computational procedure is the same as for test of
independence

o
36/36
Tests based on χ2 -distribution
χ2 -test for Homogeneity of Proportions

Example
Migraine headache patients took part in a double-blind clinical trial
to assess experimental surgery. 75 patients were randomly assigned
to real surgery on migraine trigger sites (n1 = 49) or sham surgery
(n2 = 26) in which an incision was made but nothing else. The
surgeons hoped that patients would experience ‘a substantial
reduction in migraine headaches. Does real surgery substantially
reduce migraine headache?

Real Surgery Sham Surgery Total


Substantial reduction 41 15 56
in migraine headache
No reduction 8 11 19
in migraine headache
Total 49 26 75
o

You might also like