6A. Intro To Stat Inference
6A. Intro To Stat Inference
By
Erastus K Njeru
Introduction
• Foundation for statistical inference (for
means and proportions)
– Population distribution curve
– Probability & Probability distributions
– Normal distribution
– Sampling distribution of the mean
– Decision errors
From Sample to population
• Recall: Histogram for continuous data
– When constructing histogram, we may use as
many bars as we would like, without distorting
picture
– if number of observations infinite, we can have
infinite no of bars – we get smooth curve.
– Curve is called population distribution curve
• If symmetric, then population follows
normal distribution.
Population distribution curve and
histogram
Normal Distribution
Characteristics of the Normal Curve
Women:
µ = 63.6
= 2.5
Men:
µ = 69.0
= 2.8
63.6 69.0
Height (inches)
Normal Curve
• Mean ± 1sd limits include 68.27%
• Mean ± 2sd limits include 95.45%
• Mean ± 1.96sd limits include 95%
• Mean ± 3sd limits include 99.73%
• Mean ± 2.58sd limits include 99%
Normal Curve
Standardization
• For every population with mean μ and
variance σ² there is a normal distribution
• To compare different populations, we define
the standard normal distribution
xi - μ
Zi = ----------
σ
i.e. for every observation xi subtract population
mean and divide by the std deviation. The
resulting Zi value is known as the z-score or
standard normal deviate
Standardizing the Normal
Distribution
X
Z
Normal Standardized
Distribution Normal Distribution
= 1
X = 0 Z
Standard Scores
To convert any value x to a z-score:
Value Mean x
z
Standard deviation
0 x
z
Z- Score
• If an obs ht is 165cm, mean is 160cm and
SD is 5cm, then Z is +1
• Then obs above +1Sd is 16% (as per the
normal curve)
• Thus probability of having ht above 165cm
is 0.16
• Z tabulated in bks as “Table of Unit Normal
Distribution” (Normal Probability Integral)
The Empirical Rule
Standard Normal Distribution: µ = 0 and = 1
99.7% of data are within 3 standard deviations of the mean
95% within
2 standard deviations
68% within
1 standard deviation
34% 34%
2.4% 2.4%
0.1% 0.1%
13.5% 13.5%
x - 3s x - 2s x-s x x + s x + 2s x + 3s
Standard Scores (z-score)
Once the mean (µ) and SD (σ) have
been specified, finding probabilities
is a simple process for a normal
random variable.
Convert the endpoints of an interval
of interest to z-scores, look up
probabilities associated with the z-
score
Example Standard Scores for Height
Value Mean 62 65
z 1.11
Standard deviation 2 .7
True situation
(in population)
H0 True Ho False
Test Ho True (Accept) OK Type II
Decision Ho False (Reject) Type I OK