5-Normal Distribution
5-Normal Distribution
Objectives
At the end of this lecture students will be able to:
• Explain what is normal distribution and standard
normal distribution
• Explain properties of normal distribution and
test normality
• Find the z-scores of a data set
• Explain the distribution of sample mean and
Standard error
• Calculate standard error
2
Normal Distribution
The normal distribution is perhaps
the most important of statistical
distribution
3
Normal Distribution
The reason why the normal
distribution plays such a key role in
statistics is that countless phenomena
follow (or closely approximate) the
normal distribution
Most of the statistical theories and
methods are developed around the
assumption of normal distribution of
the data 4
Characteristics of Normal
Distribution
Has a Bell Shape Curve and is
Symmetric
It is Symmetric around the mean: Two
halves of the curve are the same
(mirror images)
5
Characteristics of Normal
Distribution Cont’d
Hence Mean = Median = mode
68% of
the data
8
How good is rule for real
data?
Check some example data:
The mean of the weight of the women
= 127.8
The standard deviation (SD) = 15.5
9
68% of 120 = .68x120 = ~ 82 runners
In fact, 79 runners fall within 1-SD (15.5 lbs) of the mean.
25
20
P
e 15
r
c
e
n 10
t
0
80 90 100 110 120 130 140 150 160
POUNDS
10
95% of 120 = .95 x 120 = ~ 114 runners
In fact, 115 runners fall within 2-SD’s of the mean.
25
20
P
e 15
r
c
e
n 10
t
0
80 90 100 110 120 130 140 150 160
POUNDS
11
99.7% of 120 = .997 x 120 = 119.6 runners
In fact, all 120 runners fall within 3-SD’s of the mean.
25
20
P
e 15
r
c
e
n 10
t
0
80 90 100 110 120 130 140 150 160
POUNDS
12
z-scores
When a set of data values are
normally distributed, we can
standardize each score by converting
it into a z-score.
14
z-score formula
x
z
Where x represents an element of the
data set, the mean is represented by
and standard deviation by .
15
Example
Example: Ahmad gets a 50 on his
Statistics midterm and an 50 on his
epidemiology midterm. Did he do
equally well on these two exams?
16
Example 1
Statistics Epidemiology •In one case, Ahmad’s
15
Statistics Epidemiology
GRADE 18
Example 1
Statistics Epidemiology Ahmad in Statistics:
15
(50 - 40)/10 = 1
(one SD above the mean)
10
Ahmad in Epidemiology
5
(50 - 60)/10 = -1
(one SD below the mean)
0
0 20406080100
Statistics Epidemiology
means are identical,
but the two sets of
scores have different
spreads
Ahmad’s Stats Z-score
(50-40)/5 = 2
Ahmad’s Epide Z-score
0 20406080100
(50-40)/20 = .5
GRADE 20
Three Properties of Standard
Scores
1. The mean of a set of z-scores is
always zero
21
Three Properties of Standard
Scores cont.
3. The distribution of a set of
standardized scores has the same
shape as the unstandardized scores
beware of the “normalization”
misinterpretation
22
The shape is the same
(but the scaling or metric is different)
UNSTANDARDIZED STANDARDIZED
0.5
6
0.4
4
0.3
0.2
2
0.1
0.0
0
23
Advantages of Standard
Scores
1. We can use standard scores to find
centile scores: the proportion of people
with scores less than or equal to a
particular score. Centile scores are
intuitive ways of summarizing a person’s
location in a larger set of scores.
24
The area under a normal curve
0. 0.1 0.2 0.3 0.4
50%
34% 34%
14% 14%
2% 2%
-4 -2 0 2 4
SCORE
25
Advantages of Standard
Scores cont.
2. Standard scores provides a way to
standardize or equate different metrics.
We can now interpret Ahmad’s scores in
Statistics and Epidemiology on the same
metric (the z-score metric). (Each score
comes from a distribution with the same
mean [zero] and the same standard
deviation [1].)
26
Disadvantages of Standard
Scores
1. Because a person’s score is expressed
relative to the group (X - M), the same person
can have different z-scores when assessed in
different samples
Example: If Ahmad had taken his statistic
exam in a class in which everyone knew
statistic well his z-score would be well below
the mean. If the class didn’t know statistic
very well, however, Ahmad would be above the
mean. Ahmad’s score depends on everyone
else’s scores. 27
Disadvantages of Standard
Scores cont.
2. If the absolute score is meaningful or
of psychological interest, it will be
obscured by transforming it to a relative
metric.
28
Question?
Suppose biology scores among college
students are normally distributed with a
mean of 50 and a standard deviation of
10. If a student scores a 70, what would
be her z-score?
Answer Now
29
Answer
Suppose biology scores among college students are
normally distributed with a mean of 50 and a
standard deviation of 10. If a student scores a 70,
what would be her z-score?
70-50
Z = =2
10
Her z-score would be 2 which means
her score is two standard deviations
above the mean.
30
Question?
• A set of math test scores has a mean
of 70 and a standard deviation of 8.
Answer Now
33
Answer
What will be the miles per gallon for a Toyota
Camry when the average mpg is 23, it has a z
value of 1.5 and a standard deviation of 2?
x
Using the formula for z-scores: z
x 23
1.5 3 x 23 x 26
2
The Toyota Camry would be expected
to use 26 mpg of gasoline.
34
Question?
A group of data with normal distribution
has a mean of 45. If one element of the
data is 60, will the z-score be positive
or negative?
Answer Now
35
Answer
A group of data with normal distribution has
a mean of 45. If one element of the data is
60, will the z-score be positive or negative?
36
T Score
T score have a mean of 50 and a
standard deviation of 10.
A T score is computed by multiplying
the Z score by 10 and adding 50.
T =10(Z) + 50
It often used for personality
inventories
37
Exercise
Lets assume that the normal heart
rate of healthy persons is distributed
normal
Mean = 70 , SD = 10 beats / min
38
Many statistics books have z-score
tables, giving us this information:
z (a) Area between (b) Area
mean and z beyond z (a)
0.00 0.0000 0.5000
0.01 0.0040 0.4960
0.02 0.0080 0.4920
: : :
1.00 0.3413 0.1587 (b)
: : :
2.00 0.4772 0.0228
: : :
3.00 10/27/2013
0.4987 0.0013 41
Diagram of Exercise # 1
34.13%
0.1587
-3 -2 -1 μ 1 2 3
70 80 90 100
43
Second Exercise
Then:
44
Diagram of Exercise # 2
34.13%
13.07%
2.28%
0.13%
0.028
-3 -2 -1 μ 1 2 3
45
Third Exercise
Then:
46
Diagram of Exercise # 3
34.13%
13.07%
2.28%
0.954
0.13%
-3 -2 -1 μ 1 2 3
47
Fourth Exercise
Then:
48
Diagram of 4th exercise
34.13%
13.07%
2.28%
0.13%
0.0013
-3 -2 -1 μ 1 2 3
49
Fifth Exercise
50
Diagram of 5th exercise
34.13%
13.07%
2.28%
0.13%
0.0013 0.0013
-3 -2 -1 μ 1 2 3
51
Solution/Answers
1) 15.9% or 0.159
2) 2.8% or 0.028
3) 95.4% or 0.954
4) 0.13 % or 0.0013
Z=1.51
Area is 93.45%
Z=1.51
53
Testing Normal Distribution
1. Coefficient of variation less than 30%
2. Skewness and kurtosis values = 0,
perfect normal, SPSS output
The skewness and kurtosis values
54
Testing Normal Distribution cont.
3. Histogram
Frequency and shape
55
Testing Normal Distribution cont.
Skewed data
56
Testing Normal Distribution cont.
4. Normal Q–Q plot
Data value plotted against the value that
would be expected if the data came from
a normal distribution
If the variable was normally distributed,
the points would fall directly on the
straight line
Any deviations from the straight line
indicate some degree of non-normality
57
Testing Normal Distribution cont.
58
Testing Normal Distribution cont.
59
Testing Normal Distribution cont.
60
Testing Normal Distribution cont.
61
Outlier Cutoffs
Outlier Cutoffs
“Large” outliers:
values > upper hinge +1.5*IQR
i.e. > 75th percentile +1.5*(75th percentile –
25th percentile)
“Small” outliers:
65
Population
6
5
4
3
2
1
1 2 3 4 5 6 7 8 9
66
Distribution of Sample Means
from Samples of Size n = 2
Sample # Scores Mean X
( )
1 2, 2 2
2 2,4 3
3 2,6 4
4 2,8 5
5 4,2 3
6 4,4 4
7 4,6 5
8 4,8 6
9 6,2 4
10 6,4 5
11 6,6 6
12 6,8 7
13 8,2 5
14 8,4 6
15 8.6 7
16 8.8 8 67
Distribution of Sample Means
from Samples of Size n = 2
6
5
4
3
2
1
1 2 3 4 5 6 7 8 9
sample mean
68
Distribution of Individuals in
Population
6
= 5, = 2.24 6 X = 5, X = 1.58
5 5
4 4
3 3
2 2
1 1
1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9
sample mean
72
Distribution of Sample Means from
Samples of Size n = 3
24
22
20
18 X = 5, X = 1.29
16
14
12
10
8
6
4
2
1 2 3 4 5 6 7 8 9
73
sample mean
Distribution of Sample Means
As the sample gets bigger, the sampling
distribution…
74
Central Limit Theorem
For any population with mean and
standard deviation , the distribution
of sample means for sample size n
1. will have a mean of
2. will have a standard deviation of
n
3. will approach a normal distribution as n
approaches infinity (∞)
75
Notation
The mean of the sampling distribution
X
76
Standard Error
The “standard error” of the mean is:
The standard deviation of the distribution
of sample means.
The standard error measures the standard
amount of difference between x-bar and
that is reasonable to expect simply by
chance.
SE =
n 77
Standard Error
The Law of Large Numbers states:
The larger the sample size, the smaller
the standard error.
78
The Standard Error of M
Consider the changes in Standard Do NOT confuse
Error of M as n increases from 1
to 4 and then to 100 for a “standard deviations”
normal population with a mean of with
80 (μ=80) and a standard
deviation of 20 (σ=20)
“standard errors”
79
Distribution of Individuals in Population
6 = 5, = 2.24 6 X = 5, X = 1.58
5 5
4 2.24
4 X 1.58
3 3 2
2 2
1 1
1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9
sample mean
80
Sampling Distribution (n=3)
24
22
20 X = 5
18 X = 1.29
16
14
12 2.24
X 1.29
10 3
8
6
4
2
1 2 3 4 5 6 7 8 9
81
sample mean
Clarifying Formulas
Population Sample Distribution of
Sample Means
X
X
X X
N n
X
( x ) 2
s
( x x ) 2
n
N n 1 notice
2
2
X
n 82
Exercise
6
5
4
3
2
1
1 2 3 4 5 6 7 8 9 10 11 12 13
N=7 SD
n=4 SE
SE=? n
83
Thanks for ……..