Biostat Hypothesis Testing
Biostat Hypothesis Testing
Hypothesis Testing
10/15/2019 1
Learning objectives
At the end of this chapter the student will be
able to:
1. Define hypothesis
2. Understand the concepts of null and
alternative hypothesis
3. Explain the meaning and application of
statistical significance
Hypothesis Testing
Hypothesis
• It is a testable statement that describes the nature of the proposed
relationship between two or more variables of interest.
• The purpose of the study is to collect data which will allow the
researcher to test the hypothesis.
• It is usually concerned with the parameters of the population.
e.g. Hospital administrator may want to test the hypothesis that
average length of stay of patients admitted to the hospital is 5 days.
Idea of hypothesis testing
10/15/2019 3
Steps in Hypothesis-Testing
• Identify the parameter of interest and describe it in the
context of the problem situation.
1. Formulate hypothesis
2. Put the decision rule
3. Choose appropriate test statistics
4. Compute the test statistics
5. Make statistical decision.
6. Conclusion
10/15/2019 4
1. Formulate a hypothesis
Type of statistical hypotheses
1. Null hypothesis H0: It is the hypothesis to be tested. It is a
statement about the value of the population parameter. It
claim that there is no difference between the hypothesized
value and the population value.
(The effect of interest is zero = no difference)
• Begin with the assumption that the Ho is true
• Always contains “=” , “ ≤” or “≥ ” sign
• May or may not be rejected
2. Alternative hypothesis HA: It is a statement of what we believe
is true if our sample data cause us to reject the null hypothesis.
• Is a statement that disagrees (opposes) with Ho
(The effect of interest is not zero)
Never contains “=” , “ ≤” or “≥ ” sign
• May or may not be accepted
10/15/2019 5
2. The decision rule (significance level) =
• The confidence with which a null hypothesis is rejected or accepted.
• It is not always possible to make a correct decision since
we are dealing with random samples.
• Two possible errors can be made in any types of hypothesis testing.
– Type I error(α): is made when HO is true but rejected.
– Convicting an innocent person
– Type II error(β): made when HO is false but we fail to reject
– Failing to convict a guilty person
10/15/2019 8
5. Make statistical decision.
We can use two different approaches
a). Rejection region approach: Typically used when computing
statistics manually
It is the region above or below the critical or tabulated value, used
to refute the null hypothesis
Values of the test statistics for which we reject the null hypothesis
in favor of the alternative hypothesis.
10/15/2019 9
Con'td…..
(b). The P-value approach: which is generally used
with a computer and statistical software
– Reject Ho if P-value < α
– Fail to reject Ho if P-value ≥ α
P-Value
• The p-value of a test is the probability of observing a
test statistics at least as extreme as the one computed,
given the null hypothesis is true.
• As the p-value decreases, the likelihood that the null
hypothesis will be rejected increases.
• When the p-value is less than 0.05, we often say that
the result is statistically significant. We have
sufficient evidence for rejecting the null hypothesis.10
10/15/2019
6. Conclusion
If Ho is rejected, we conclude that HA is true
there is a statistically significant association.
If Ho is not rejected, we conclude that Ho may
be true. In this case we do not have sufficient
statistical evidence to prove association.
10/15/2019 11
1. Hypothesis Testing for Single Mean
Example-1
- A simple random sample of 10 people from a certain population has
a mean age of 27. The variance is known to be 20. Let = .05.
Can we conclude that the mean age of the population is not 30?
Answer, "Yes we can, if we can reject the Ho that it is 30."
1. Formulate appropriate statistical hypotheses
Specify HO and HA
H0: = 0
H1: 0
2. Specify the desired level of significance = 0.05
decision rule
10/15/2019 12
3. Select appropriate test statistics
- population variance is known, we use Z as the test statistic.
4. Compute the statistical test
n = 10, sample mean = 27, 2 = 20, α = 0.05
10/15/2019 13
• Can we conclude that the mean age of the population is <30?
1. Hypotheses
Ho: µ ≥ 30, HA: µ < 30
2. Decision rule (= 0.05)
• we have the entire rejection region at the left. The critical value
will be Z = -1.645. Reject Ho if Z < -1.645.
3. Select appropriate test statistics
- population variance is known, we use Z as the test statistic.
4. Compute the statistical test
n = 10, sample mean = 27, 2 = 20,
5. Make statistical decision
- We reject the Ho b/s Z = -2.12 < -1.64.
P-value = .0170
6. Conclusion
we have statistically significant evidence that the µ < 30.
10/15/2019 14
Example-2
• A simple random sample of 14 people from a certain population gives a sample
mean body mass index (BMI) of 30.5 and sd of 10.64. Can we conclude that the
BMI is not 35 at α 5%?
1. Ho: µ = 35, HA: µ ≠35
2. Decision rule
– The critical t values with 13 df are -2.1604 and 2.1604.
– We reject Ho if the t ≤ -2.1604 or t ≥ 2.1604.
3. Appropriate test statistic- t test b/s unknown variance & sample <30
4. Compute
5. Decision
Do not reject Ho because -1.58 is not in the rejection region.
6. Conclusion
Since Ho is not rejected we don’t have statistical evidence to say µ ≠ 35
10/15/2019 15
2. Two Population Means of Independent Samples
• Researchers wish to know a difference in mean serum uric acid
(SUA) levels between normal individuals and individuals with
Down’s syndrome. The means SUA levels on 12 individuals with
Down’s syndrome and 15 normal individuals are 4.5 and 3.4
mg/100 ml, respectively. with variances. (2=1, 2=1.5,
respectively). Is there a difference between the means of both
groups at α 5%?
1. Hypotheses:
Ho: µ1- µ2 = 0 or Ho: µ1 = µ2
HA: µ1 - µ2 ≠ 0 or HA: µ1 ≠ µ2
2. Decision rule: α = 0.05, the critical values of Z are -1.96 and
+1.96. We reject Ho if Z < -1.96 or Z > +1.96
10/15/2019 16
3. Appropriate test statistic – Z- test
4. Compute the test statistics
5. Statistical decision
Reject Ho because 2.57 > 1.96.
6. Conclusion
We have a statistical evidence that the two population means are not equal.
10/15/2019 17
3. Single Population Proportion
Example: We are interested in the probability of developing asthma
over a given one-year period for children 0 to 4 years of age whose
mothers smoke in the home. In the general population of 0 to 4-
year-olds, the annual incidence of asthma is 1.4%. If 10 cases of
asthma are observed over a single year in a sample of 500 children
whose mothers smoke, can we conclude that this is different from
the underlying probability of p = 0.014? α = 5%
1. Hypotheses
– H0 : P = 0.014
– HA: P ≠ 0.014
2. Decision rule: α = 0.05
3. Appropriate test statistics
Z- test
10/15/2019 18
4. Compute
5. Statistical decision
Fail to reject Ho since Zcalculated < Ztabulated/critical
P-value = 0.2548 = 1.14 < 1.96
6. Conclusion
We do not have sufficient evidence to conclude that
the probability of developing asthma for children
whose mothers smoke in the home is different from
the probability in the general population
10/15/2019 19
4. Difference Between Two Population Proportions
10/15/2019 20
Where X1 = the observed number of events in the first sample
and X2 = the observed number of events in the second sample
10/15/2019 21
Example-1
• A study was conducted to investigate the possible cause
of gastroenteritis outbreak following a lunch served in a
high school cafeteria. Among 225 students who ate the
sandwiches, 109 became ill. While, among 38 students
who did not eat the sandwiches, 4 became ill. Is there a
significant difference between the two groups at α =5%.
1. Hypotheses
Ho: p1 = p2
HA: p1 ≠ p2
2. Decision rule: α =5%
3. Appropriate test statistics: Z test
10/15/2019 22
5. Statistical decision
We reject H0 since Z calculated > Z critical (4.36> 1.96)
6. Conclusion
The proportion of students who became ill differs in the two groups;
those who ate the prepared sandwiches were more likely to develop
gastroenteritis.
10/15/2019 23
Chi-square test (X2)
• Used to test association b/n categorical data
To use X2 test
– At least 80% of the expected value should be >5
– All the expected values should be >1
– There should not be observed value of zero
Second Criterion First Criterion of Classification →
↓ 1 2 3 …. c Total
1 N11 N12 N13 …… N1c N1.
2 N21 N22 N 23 …… N2c N2.
3 N31 N32 N33 …...… N3c N3.
. . . . … . .
. . . . . .
r Nr1 Nr2 Nr3 N rc Nr.
10/15/2019 24
Observed versus Expected Frequencies
• Oij : The frequencies in ith row and jth column given in any
contingency table are called observed frequencies that result
form the cross classification according to the two classifications.
• e : Expected frequencies on the assumption of independence of
ij
(oi ei)
2
[
2 k
i 1 ]
ei
Where summation is for all values of rxc = k cells.
10/15/2019 D.F.: the degrees of freedom = (r-1)(c-1) 25
Chi-square test for association in RxC table
• H0: There is no association between row and columns
TotalROW * TotalCOL
expected ij
TotalOVERALL
10/15/2019 26
Disease(Lung ca )
Smoking
Status Total
Yes No
40 60
Yes 100
10 140
No 150
10/15/2019 28
Solution
1. Hypotheses
H0: Educational status has no effect on smoking habit
HA: Educational status has an effect on smoking habit
2. Decision rule: α =5%
Reject H0 if calculated value is greater than critical
value
2
= (3.841)
,( r 1)( c 1)
10/15/2019 29
4. Compute the test statistics