Lecture 4-Statistical Inferences
Lecture 4-Statistical Inferences
February, 2025
Addis Ababa, Ethiopia 1
Outline
• Introduction
• Types of Estimation
• Confidence intervals
• Types of errors
• Power of a test and significance level
• Application of different test statistics
2
Introduction
• Estimation and Hypothesis testing are the two forms of
statistical inferences
3
Population
Sample
Parameter Statistic
4
Introduction…
Definitions
• Population: a largest collection of individuals of interest
5
Estimation
• The objective of estimation is to determine the value of
a population parameter on the basis of a sample
statistic.
6
Estimate and Estimator
• A single computed value is an estimate.
7
Properties of good estimator
• Unbiasedness
– If a measure of the sample statistic is equal to the population
parameter, then the sample statistics estimate is unbiased.
Example,
– The mean of the sampling distribution of means is equal to the population
mean, hence the mean of the sampling distribution of means is unbiased
estimator of the population mean.
• Minimum variance
– An estimate which has a minimum standard error
Example :
– Standard error can be the expected value of the standard deviation of the
means of several samples.
– In a skewed distribution median has a minimum standard error
8
Cont’d…
• Efficiency:-if it has the smallest standard error compared to
other estimates. E.g. mean has the smallest error than median.
9
Types of estimation
• There are two types of estimations , Point Estimation and
Interval Estimation
Point Estimation
– Is a single numerical value used to estimate the
corresponding population parameter
10
Point estimation , Example
• The mean stay of 2000 inpatients , who are randomly selected ,
in B hospital is found to be 5 days with a standard deviation of
2 days.
µ
X
S2 σ2
S σ
p Л or P
11
Interval Estimation
• It is a statement that describes a population parameter has a
value lying in between two specified limits with a certain
confidence interval.
• A point estimate does not give any indication on how far away
the parameter lies.
• A more useful method of estimation is to compute an interval
which has a high probability of containing the parameters.
This will lead to the concept of confidence interval
12
Confidence interval
• Is an interval estimate of a population parameter
13
Cont’d…
A larger confidence level produces a wider confidence interval
14
Cont’d…
• A confidence interval has the form of,
16
Cont’d…
• Example , Estimating a population mean
X Z
2 n
X Z , X Z
2 n 2 n
17
Cont’d…
• Confidence interval provides a range of values of the estimate
likely to include the “true” (population) value with a given
probability.
18
The following table shows the standard errors computed for
different population parameters that will be used
Parameter Estimate Standard Error
Sample mean µ
2
X n
Difference in µ1-µ2 12 22
Means
X1 X 2 n1
n2
Sample p
Proportion
(1 )
n
p1-p2
Difference in
Proportions 1 2 1 (1 1 ) 2 (1 2 )
n1
n2
19
Estimation for Single Population
20
1. CI for a Single Population Mean (normally
distributed)
A. Known variance (large sample size)
• Consider the task of computing a CI estimate of μ for a
population distribution that is normal with σ known.
• Available are data from a random sample of size = n.
21
Assumptions
Population standard deviation () is known
Population is normally distributed
If population is not normal, use large sample
• A 100(1-)% C.I. for is:
22
3. Commonly used CLs are 90%, 95%, and 99%
23
Finding the Critical Value
24
Example :
• The mean weight of 100 children who are 5 years old in
a certain locality is found to be 14 kg. A clinician wants
to know the mean weight of all the children in that
locality with 95 % confidence interval, if it is known
that the SD for all children is 4kg.
25
Cont’d…
Given points:
CI = 95 % α = 0.05 and α/2 = 0.025 and the value
of Z at α/2 is 1.96
n=100 σ = 4 and x = 14
• When you Insert the given values in the formula
X Z ( )
2 n
• The result will be 14 ± 0.784 (13.21 and
14.78 )
• Interpretation ?
26
Cont,d….
• Example:-suppose a survey conducted on a
reprehensive sample of 900 newborn babies in A/A and
it is found that their average weight at birth is 3.5 kg
with SD of 0.5Kg estimate the wt of newborn babies in
A/A at the 95% level of confidence.
• Solution:-
Given n=900 =3.5kg S=0.5kg level of confidence =95% μ=?
Case-II:- 0.025 b/c =0.05
=3.5 ± 1.96x (0.5/√900) = 0.033
=3.467, 3.533
27
B. Unknown variance (small sample size, n ≤ 30)
• What if the for the underlying population is unknown and the
sample size is small?
• As an alternative we use Student’s t distribution.
28
29
Student’s t Distribution
• The t is a family of distributions
• Bell Shaped
• Symmetric about zero (the mean)
• Flatter than the Normal (0,1). This means
– The variability of a t is greater than that of a Z that is
normal(0,1)
– Thus, there is more area under the tails and less at center
– Because variability is greater, resulting confidence
intervals will be wider.
30
• Note: t approaches z as n increases
31
What happens to CI as sample gets larger?
s
x Z For large samples: Z
and t values become
n almost identical, so CIs
are almost identical.
s
x t
n 32
Degrees of Freedom (df)
df = n-1
33
Student’s t Table
34
t distribution values
• With comparison to the Z value
35
Example2
• sample of 20 houses studied to estimate the mean
sprayable area of house for controlling of malaria
epidemic. The result was =22.9m2, SD is
6.0m.construct CI for mean sprayable of area of the
population with 95% confidence.
• Solution:-given =22.9m2 SD=6.0m =0.05 0.025 degree of
freedom (n-1) =19 t=2.09
= 22.9 2.09(6/) =22.9 2.09(1.34)
=22.9 2.8 =22.9-2.8, 22.9+2.8
=20.01, 25.7
• We are 95% confident that the total sprayable area of a house
is b/n 20.01 and 25.7m2.
36
2.CI of single Population proportion
pq
p Z
2 n
37
Example 1
• A random sample of 100 people shows that 25 are
left-handed. Form a 95% CI for the true proportion of
left-handers.
38
Interpretation
39
Estimation for Two Populations
40
3. CI for the difference between population
means (normally distributed)
A. Known variances (2 independent samples)
41
42
Illustration
A researcher performs a drug trial involving two
independent groups.
– A control group is treated with a placebo while, separately;
– The intervention group is treated with an active agent.
43
Example
• Researchers are interested in the difference between serum
uric acid levels in patients with and without Down’s syndrome.
• Patients without Down’s syndrome
– n=12, sample mean=4.5 mg/100ml, 2=1.0
• Patients with Down’s syndrome
– n=15, sample mean=3.4 mg/100ml, 2=1.5
• Calculate the 95% CI.
• SE = 0.43, 95% CI = 1.1 ± 1.96 (0.43) = (0.26, 1.94)
• We are 95% confident that the true difference between the
two population means is between 0.26 and 1.94.
44
B. Unknown variances (Independent samples)
I. Population variances equal (large sample)
• Assumptions:
– Samples are randomly and independently drawn
– Both sample sizes are ≥30
– Population standard deviations are unknown
45
Forming confidence estimates:
• Use sample standard deviation s to estimate , and
• the test statistic is a z-value
46
II. Population variances equal (small sample)
• Assumptions:
– Populations are normally distributed
– The populations have equal variances
– Samples are independent
– Both sample sizes are <30
– Population standard deviations are unknown
47
Forming confidence estimates:
• The population variances are assumed equal, so use
the two sample standard deviations and pool them to
estimate
• The test statistic is a t value with (n1 + n2 – 2) degrees
of freedom
• The pooled estimate (s2p) is the weighted average of
the two sample variances.
48
• The pooled standard deviation is :
49
50
III. Population variances unequal (small sample)
51
C. Paired Samples
Tests Means of 2 Related Populations
∆ Paired or matched samples
∆ Repeated measures (before/after)
∆ Use difference between paired values:
d = x1-x2
Eliminates variation among subjects
Assumptions:
Both populations are normally distributed,
Or, if not normal, use large samples.
52
53
• Where tα/2 has n-1 df.
Example
• Ten hypertensive patients are given methyl dopa for
their condition.
• They are asked to come back 1 week later and have
their blood pressures measured again. Suppose the
initial and follow-up SBPs (mm Hg) of the patients are
given below.
54
Example…
56
4. Two Population Proportions
• We are often interested in comparing proportions
from 2 populations:
• Is the incidence of disease A the same in two
populations?
• Patients are treated with either drug D, or with
placebo. Is the proportion “improved” the same in
both groups?
57
58
Confidence Interval for Two Population Proportions
• SE of the difference =
59
Example
• In a clinical trial for a new drug to treat hypertension,
n1 = 50 patients were randomly assigned to receive
the new drug, and n2 = 50 patients to receive a
placebo. 34 of the patients receiving the drug
showed improvement, while 15 of those receiving
placebo showed improvement.
– Compute a 95% CI estimate for the difference between
proportions improved.
60
Example…
• p1 = 34/50 = 0.68, p2 = 15/50 = 0.30
• The point estimate for the difference is:
= [0.68−0.30]=0.38
• SE of the difference =
• 95% CI
– Lower = ( point estimate ) - (Zα/2) (SE)
= 0.38 – (1.96)(0.0925) = 0.20
– Upper = ( point estimate ) + (Zα/2) (SE)
= 0.38 + (1.96)(0.0925) = 0.56
• 95% CI = (0.20, 0.56)
61
Hypothesis Testing
Hypothesis
• Hypothesis is a statement made about one or more population
parameter
63
Examples of Research Hypotheses
Population Mean
• The average length of stay of patients admitted to the hospital
is five days
• The mean birth weight of babies delivered by mothers with low
SES is lower than those from higher SES.
Population Proportion
• The proportion of adult smokers in Addis Ababa City is p = 0.40
• The prevalence of HIV among non-married adults is higher than
that in married adults, etc
64
Cont’d…
There are five ingredients to any statistical test
Null Hypothesis
Alternate Hypothesis
Test Statistic
Rejection/Critical Region
Conclusion
65
Types of Hypothesis
1. The Null Hypothesis, H0
· Is a statement claiming that there is no difference between
the hypothesized value and the population value.
· (The effect of interest is zero = no difference)
· States the assumption (hypothesis) to be tested
· H0 is a statement of agreement (or no difference)
· H0 is always about a population parameter, not about a
sample statistic
66
Cont’d…
• Begin with the assumption that the Ho is true
– Similar to the notion of innocent until proven guilty
67
2. The Alternative Hypothesis, HA
• Is a statement of what we will believe is true if our sample data
causes us to reject Ho.
• Is generally the hypothesis that is believed (or needs to be
supported) by the researcher
• Is a statement that disagrees (opposes) with Ho
(The effect of interest is not zero)
· Never contains “=” , “ ≤” or “≥ ” sign
• May or may not be accepted
68
Steps in Hypothesis Testing
1. Formulate the appropriate statistical hypotheses clearly
Specify HO and HA
H0: = 0 H0: ≤ 0 H0: ≥ 0
H1: 0 H1: > 0 H1: < 0
two-tailed one-tailed one-tailed
2. State the assumptions necessary for computing probabilities
• A distribution is approximately normal (Gaussian)
• Variance is known or unknown
69
Cont,d…
3. Select a sample and collect data
• Categorical, continuous
4. Decide on the appropriate test statistic for the
hypothesis. E.g., One population
OR
70
Cont,d…
5. Specify the desired level of significance for the
statistical test and determine the critical value.
(=0.05, 0.01, etc.)
– A value the test statistic must attain to be declared
significant.
71
Cont,d…
6. Obtain sample evidence and compute the test statistic
7. Reach a decision and draw the conclusion
• If Ho is rejected, we conclude that HA is true (or
accepted).
• If Ho is not rejected, we conclude that Ho may
be true.
72
Significance level
• The significance level of a statistical hypothesis test is a fixed
probability of wrongly rejecting the null hypothesis H0, if it is in
fact true.
73
Another way to state conclusion
• Reject Ho if P-value < α
• Accept Ho if P-value ≥ α
74
Types of Errors in Hypothesis Tests
• Whenever we reject or accept the Ho, we commit errors.
• Two types of errors are committed.
– Type I Error
– Type II Error
75
Type I Error
• The error committed when a true Ho is rejected
• Considered a serious type of error
• The probability of a type I error is the probability of rejecting
the Ho when it is true
• The probability of type I error is α which is Called level of
significance of the test
• Set by researcher in advance
76
Type II Error
• The error committed when a false Ho is not rejected
• The probability of Type II Error is
• Usually unknown but larger than α
Power
• The probability of rejecting the Ho when it is false.
Power = 1 – β = 1- probability of type II error
77
Action Reality
(Conclusion)
Ho True Ho False
78
Factors Affecting Type II Error
79
Factors Affecting the Power of the Test
80
Hypothesis Test for One Sample
• Test for single mean
• Test for single proportion
81
1. Hypothesis Testing of a Single Mean
(Normally Distributed)
82
1.1 Known Variance
83
Example: Two-Tailed Test
1. A simple random sample of 10 people from a certain
population has a mean age of 27. Can we conclude that the
mean age of the population is not 30? The variance is known
to be 20. Let CL = .95.
A. Data
n = 10, sample mean = 27, 2 = 20, α = 0.05
B. Assumptions
Simple random sample
Normally distributed population
84
C. Hypotheses
Ho: µ = 30
HA: µ ≠ 30
D. Test statistic
As the population variance is known, we use Z as the
test statistic.
85
E. Decision Rule
• Reject Ho if the Z value falls in the rejection region.
• Don’t reject Ho if the Z value falls in the non-rejection region.
• Because of the structure of Ho it is a two tail test. Therefore,
reject Ho if Z ≤ -1.96 or Z ≥ 1.96.
86
F. Calculation of test statistic
G. Statistical decision
We reject the Ho because Z = -2.12 is in the rejection region. The
value is significant at 5% α.
H. Conclusion
We conclude that µ is not 30. P-value =
(1-.9830)2=( 0.0170)2= 0.0340
88
Example: Two-Tailed Test
• A simple random sample of 14 people from a certain
population gives a sample mean body mass index (BMI) of 30.5
and sd of 10.64. Can we conclude that the BMI is not 35 at α
5%?
• Ho: µ = 35, HA: µ ≠35
• Test statistic
89
• Decision rule
– We have a two tailed test. With α = 0.05 it means that each tail is
0.025. The critical t values with 13 df are -2.1604 and 2.1604.
– We reject Ho if the t ≤ -2.160 or t ≥ 2.160.
90
2. Hypothesis Testing about the Difference Between
Two Population Means
Independent Samples (Normally Distributed)
Two Sample Means,
91
2.1 Known Variances (Independent Samples)
• When two independent samples are drawn from a
normally distributed population with known variance,
the test statistic for testing the Ho of equal population
means is:
92
Example:
The means SUA levels on 12 individuals with Down’s
syndrome and 15 normal individuals are 4.5 and 3.4 mg/100
ml, respectively. with variances. (2=1, 2=1.5, respectively). Is
there a difference between the means of both groups at α
5%?
• Hypotheses:
Ho: µ1- µ2 = 0 or Ho: µ1 = µ2
HA: µ1 - µ2 ≠ 0 or HA: µ1 ≠ µ2
93
• With α = 0.05, the critical values of Z are -1.96 and +1.96.
We reject Ho if Z < -1.96 or Z > +1.96.
94
2.2 Unknown Variances
i. Equal variances (Independent samples)
• With equal population variances, we can obtain a pooled
value from the sample variances.
• The test statistic for µ1 - µ2 is:
• Where tα/2 has (n1 + n2 – 2) df., and
95
Example:
• We wish to know if we may conclude, at the 95%
confidence level, that smokers, in general, have greater
lung damage than do non-smokers.
• Calculation of Pooled
Variance
96
Example:
• Hypotheses:
Ho: µ1 ≤ µ2 = 0, H A: µ 1 > µ 2
• With α = 0.05 and df = 23, the critical value of t is 1.714. We reject Ho if
t > 1.714.
• Test statistic
97
ii. Unequal variances (Independent samples)
98
Hypothesis Testing for Paired Samples
• Two samples are paired when each data point of the first sample
is matched and is related to a unique data point of the second
sample.
• Tests means of 2 related populations
– Paired or matched samples
– Repeated measures (before/after)
• Longitudinal or follow-up study
• Assumptions:
– Both populations are normally distributed
– Or, if not normal, use large samples
99
The Paired t Test
100
101
102
Example:
• The following data show the SBP levels (mm Hg) in 10 women
while not using (baseline) and while using (follow-up) oral
contraceptives. Can we conclude that there is a difference
between mean baseline and follow-up SBP at α 5%? di =
baseline – follow-up
103
Example…
= (13 + 3 + …. + 2)/10 = 4.80
S2d = [(13-4.8)2 + … + (2-4.8)2]/9 = 20.844
Sd = √20.844 = 4.566
t = 4.80/(4.566/√10) = 4.80/1.44 = 3.32
• From the Table, t9,α/2 = 2.262
• Since t (= 3.32) > t9,α/2 (2.262) Ho is rejected
• P-value <0.005.
• Since 3.32 falls in the rejection region, there is a
significance difference between the population means
SBP while not using and using OC use.
104
Hypothesis Tests for Proportions
• Involves categorical values
• Two possible outcomes
– “Success” (possesses a certain
characteristic)
– “Failure” (does not possesses that
characteristic)
• Fraction or proportion of population in the
“success” category is denoted by p
105
Proportions
106
3. Hypothesis Testing about a Single Population Proportion
(Normal Approximation to Binomial Distribution)
107
Example
• In the general population of 0 to 4-year-olds, the
annual incidence of asthma is 1.4%. If 10 cases of
asthma are observed over a single year in a sample
of 500 children whose mothers smoke, can we
conclude that this is different from the underlying
probability of p0 = 0.014? cl = 95%
H0 : p = 0.014
HA: p ≠ 0.014
108
• The test statistic is given by:
109
Example…
• The critical value of Zα/2 at α=5% is ±1.96.
• Don’t reject Ho since Z (=1.14) in the non-rejection region
between ±1.96.
• P-value = (1-0.8729)2= 0.2542
• We do not have sufficient evidence to conclude that the
probability of developing asthma for children whose mothers
smoke in the home is different from the probability in the
general population
110
4. Hypothesis Tests about the Difference Between
Two Population Proportions
111
Where X1 = the observed number of events in the first sample
and X2 = the observed number of events in the second sample
112
113
Example
• Among the 225 students who ate the sandwiches, 109
became ill. While, among the 38 students who did not eat the
sandwiches, 4 became ill. Is there a significant difference
between the two groups at α =5%.
• We wish to test
Ho: p1 = p2 against the alternative
HA: p1 ≠ p2
114
115
• Assume that the sample sizes are large enough, and
the normal approximation to the binomial
distribution is valid.
• If the Ho is true, then p1 = p2 = p
116
The zcal >z tab .We reject H0 at the 0.05 level.
117
Thank you ! !!
118