0% found this document useful (0 votes)
33 views85 pages

5-Normal Distribution

it is the lecture not of Kabul medical university presented by Dr.Obaidullah Fahim

Uploaded by

Fatah Hamidi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views85 pages

5-Normal Distribution

it is the lecture not of Kabul medical university presented by Dr.Obaidullah Fahim

Uploaded by

Fatah Hamidi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 85

Normal Distribution

Objectives
At the end of this lecture students will be able to:
• Explain what is normal distribution and standard
normal distribution
• Explain properties of normal distribution and
test normality
• Find the z-scores of a data set
• Explain the distribution of sample mean and
Standard error
• Calculate standard error

2
Normal Distribution
 The normal distribution is perhaps
the most important of statistical
distribution

 It was first discovered by the French


mathematician Abraham Demoivre in
1733

3
Normal Distribution
 The reason why the normal
distribution plays such a key role in
statistics is that countless phenomena
follow (or closely approximate) the
normal distribution
 Most of the statistical theories and
methods are developed around the
assumption of normal distribution of
the data 4
Characteristics of Normal
Distribution
 Has a Bell Shape Curve and is
Symmetric
 It is Symmetric around the mean: Two
halves of the curve are the same
(mirror images)

5
Characteristics of Normal
Distribution Cont’d
 Hence Mean = Median = mode

 The total area under the curve is 1


(or 100%)

 Normal Distribution has the same


shape as Standard Normal
Distribution.
6
Distinguishing Features
 The mean ± 1 standard deviation
covers 68% of the area under the
curve
 The mean ± 2 standard deviation
covers 95% of the area under the
curve
 The mean ± 3 standard deviation
covers 99.7% of the area under the
curve
7
68-95-99.7 Rule

68% of
the data

95% of the data

99.7% of the data

8
How good is rule for real
data?
 Check some example data:
 The mean of the weight of the women
= 127.8
 The standard deviation (SD) = 15.5

9
68% of 120 = .68x120 = ~ 82 runners
In fact, 79 runners fall within 1-SD (15.5 lbs) of the mean.

112.3 127.8 143.3

25

20

P
e 15
r
c
e
n 10
t

0
80 90 100 110 120 130 140 150 160
POUNDS

10
95% of 120 = .95 x 120 = ~ 114 runners
In fact, 115 runners fall within 2-SD’s of the mean.

96.8 127.8 158.8

25

20

P
e 15
r
c
e
n 10
t

0
80 90 100 110 120 130 140 150 160
POUNDS

11
99.7% of 120 = .997 x 120 = 119.6 runners
In fact, all 120 runners fall within 3-SD’s of the mean.

81.3 127.8 174.3

25

20

P
e 15
r
c
e
n 10
t

0
80 90 100 110 120 130 140 150 160
POUNDS
12
z-scores
When a set of data values are
normally distributed, we can
standardize each score by converting
it into a z-score.

z-scores make it easier to compare


data values measured on different
scales.
13
z-scores
A z-score reflects how many standard
deviations above or below the mean a
raw score is.

The z-score is positive if the data


value lies above the mean and negative
if the data value lies below the mean.

14
z-score formula
x
z

Where x represents an element of the
data set, the mean is represented by 

and standard deviation by .

15
Example
 Example: Ahmad gets a 50 on his
Statistics midterm and an 50 on his
epidemiology midterm. Did he do
equally well on these two exams?

16
Example 1
Statistics Epidemiology •In one case, Ahmad’s
15

exam score is 10 points


above the mean
•In the other case,
10

Ahmad’s exam score is 10


points below the mean
5

•In an important sense, we


must interpret Ahmad’s
grade relative to the
0

0 20406080100 average performance of


Mean Statistics GR
= 40
Mean
ADEEpidemiology
= 60
the class 17
Example 2
•Both distributions
have the same mean
0 5 10 15 20 25 30

Statistics Epidemiology

(40), but different


standard deviations
(5 vs. 20)
•Thus, how we evaluate
Ahmad’s performance
depends on how much
variability there is in
the exam scores
0 20406080100

GRADE 18
Example 1
Statistics Epidemiology Ahmad in Statistics:
15

(50 - 40)/10 = 1
(one SD above the mean)
10

Ahmad in Epidemiology
5

(50 - 60)/10 = -1
(one SD below the mean)
0

0 20406080100

Mean Statistics = Mean


GR ADEEpidemiology
40 = 60 19
Example 2
An example where the
0 5 10 15 20 25 30

Statistics Epidemiology
means are identical,
but the two sets of
scores have different
spreads
Ahmad’s Stats Z-score
(50-40)/5 = 2
Ahmad’s Epide Z-score

0 20406080100
(50-40)/20 = .5
GRADE 20
Three Properties of Standard
Scores
 1. The mean of a set of z-scores is
always zero

 2. The SD of a set of standardized


scores is always 1

21
Three Properties of Standard
Scores cont.
 3. The distribution of a set of
standardized scores has the same
shape as the unstandardized scores
 beware of the “normalization”
misinterpretation

22
The shape is the same
(but the scaling or metric is different)
UNSTANDARDIZED STANDARDIZED

0.5
6

0.4
4

0.3
0.2
2

0.1
0.0
0

0.4 0.6 0.8 1.0 -6 -4 -2 0 2

23
Advantages of Standard
Scores
1. We can use standard scores to find
centile scores: the proportion of people
with scores less than or equal to a
particular score. Centile scores are
intuitive ways of summarizing a person’s
location in a larger set of scores.

24
The area under a normal curve
0. 0.1 0.2 0.3 0.4
50%

34% 34%

14% 14%

2% 2%
-4 -2 0 2 4

SCORE
25
Advantages of Standard
Scores cont.
2. Standard scores provides a way to
standardize or equate different metrics.
We can now interpret Ahmad’s scores in
Statistics and Epidemiology on the same
metric (the z-score metric). (Each score
comes from a distribution with the same
mean [zero] and the same standard
deviation [1].)
26
Disadvantages of Standard
Scores
1. Because a person’s score is expressed
relative to the group (X - M), the same person
can have different z-scores when assessed in
different samples
Example: If Ahmad had taken his statistic
exam in a class in which everyone knew
statistic well his z-score would be well below
the mean. If the class didn’t know statistic
very well, however, Ahmad would be above the
mean. Ahmad’s score depends on everyone
else’s scores. 27
Disadvantages of Standard
Scores cont.
2. If the absolute score is meaningful or
of psychological interest, it will be
obscured by transforming it to a relative
metric.

28
Question?
Suppose biology scores among college
students are normally distributed with a
mean of 50 and a standard deviation of
10. If a student scores a 70, what would
be her z-score?

Answer Now

29
Answer
Suppose biology scores among college students are
normally distributed with a mean of 50 and a
standard deviation of 10. If a student scores a 70,
what would be her z-score?

70-50
Z = =2
10
Her z-score would be 2 which means
her score is two standard deviations
above the mean.
30
Question?
• A set of math test scores has a mean
of 70 and a standard deviation of 8.

• A set of English test scores has a mean


of 74 and a standard deviation of 16.

For which test would a score of 78 have


a higher standing?
Answer Now 31
Answer
To solve: Find the z-score for each test.
78-70
math z -score = 1
8 English z -score= 78-74  .25
16
The math score would have the highest
standing since it is 1 standard deviation
above the mean while the English score is
only .25 standard deviation above the mean.
32
Question?
What will be the miles per gallon for a
Toyota Camry when the average mpg is
23, it has a z value of 1.5 and a
standard deviation of 2?

Answer Now
33
Answer
What will be the miles per gallon for a Toyota
Camry when the average mpg is 23, it has a z
value of 1.5 and a standard deviation of 2?
x
Using the formula for z-scores: z 

x  23
1.5  3  x  23 x  26
2
The Toyota Camry would be expected
to use 26 mpg of gasoline.
34
Question?
A group of data with normal distribution
has a mean of 45. If one element of the
data is 60, will the z-score be positive
or negative?

Answer Now

35
Answer
A group of data with normal distribution has
a mean of 45. If one element of the data is
60, will the z-score be positive or negative?

The z-score must be positive since the


element of the data set is above the
mean.

36
T Score
 T score have a mean of 50 and a
standard deviation of 10.
 A T score is computed by multiplying
the Z score by 10 and adding 50.
 T =10(Z) + 50
 It often used for personality
inventories

37
Exercise
 Lets assume that the normal heart
rate of healthy persons is distributed
normal
 Mean = 70 , SD = 10 beats / min

1) What area under the curve is above


80 beats/min?

38
Many statistics books have z-score
tables, giving us this information:
z (a) Area between (b) Area
mean and z beyond z (a)
0.00 0.0000 0.5000
0.01 0.0040 0.4960
0.02 0.0080 0.4920
: : :
1.00 0.3413 0.1587 (b)
: : :
2.00 0.4772 0.0228
: : :
3.00 10/27/2013
0.4987 0.0013 41
Diagram of Exercise # 1
34.13%

0.1587

-3 -2 -1 μ 1 2 3
70 80 90 100

43
Second Exercise
Then:

2) What area of the curve is above 90


beats/min?

44
Diagram of Exercise # 2
34.13%
13.07%

2.28%

0.13%
0.028

-3 -2 -1 μ 1 2 3

45
Third Exercise
Then:

3) What area of the curve is between


50-90 beats/min?

46
Diagram of Exercise # 3
34.13%
13.07%

2.28%

0.954
0.13%

-3 -2 -1 μ 1 2 3

47
Fourth Exercise
Then:

4) What area of the curve is above 100


beats/min?

48
Diagram of 4th exercise
34.13%
13.07%

2.28%

0.13%

0.0013

-3 -2 -1 μ 1 2 3

49
Fifth Exercise

5) What area of the curve is below 40


beats per min or above 100 beats per
min?

50
Diagram of 5th exercise
34.13%

13.07%

2.28%

0.13%

0.0013 0.0013

-3 -2 -1 μ 1 2 3

51
Solution/Answers
1) 15.9% or 0.159

2) 2.8% or 0.028

3) 95.4% or 0.954

4) 0.13 % or 0.0013

5) 0.26 % or 0.0013 (for each tail)


52
Looking up probabilities in the
standard normal table
What is the area to the
left of Z=1.51 in a
standard normal curve?

Z=1.51
Area is 93.45%

Z=1.51

53
Testing Normal Distribution
1. Coefficient of variation less than 30%
2. Skewness and kurtosis values = 0,
perfect normal, SPSS output
 The skewness and kurtosis values

shouldn’t be tow times higher than


their standard error.

54
Testing Normal Distribution cont.
3. Histogram
 Frequency and shape

 Gaps in the data and outlying values

55
Testing Normal Distribution cont.
 Skewed data

56
Testing Normal Distribution cont.
4. Normal Q–Q plot
 Data value plotted against the value that
would be expected if the data came from
a normal distribution
 If the variable was normally distributed,
the points would fall directly on the
straight line
 Any deviations from the straight line
indicate some degree of non-normality

57
Testing Normal Distribution cont.

58
Testing Normal Distribution cont.

59
Testing Normal Distribution cont.

60
Testing Normal Distribution cont.

61
Outlier Cutoffs
 Outlier Cutoffs
 “Large” outliers:
values > upper hinge +1.5*IQR
i.e. > 75th percentile +1.5*(75th percentile –
25th percentile)
 “Small” outliers:

values < lower hinge - 1.5*IQR


i.e. < 25th percentile -1.5*(75th percentile –
25th percentile)
62
Anatomy of a boxplot:
Total Length of Stay, 2011
Claims With at Least One Inpatient Visit
40
30
Length of Stay (Days)

(Large) Largest non-outlying value (upper tail)


Outliers
20

75th percentile (upper hinge)


10

Median(50th percentile) Interquartile Range (IQR)


0

25th percentile (lower hinge)

Smallest non-outlying value (Lower tail)


63
Testing Normal Distribution cont.
5. Normality tests

 P value less than 0.05 indicates that the


distribution is significantly different from
normal
 See the value of statistics as well
 Shapiro-Wilk is better for small sample
(<50).
64
Distribution of Sample Means
A distribution of sample means is:
the collection of sample means for all
the possible random samples of a
particular size (n) that can be obtained
from a population.
All possible sample =

65
Population

6
5
4
3
2
1

1 2 3 4 5 6 7 8 9

66
Distribution of Sample Means
from Samples of Size n = 2
Sample # Scores Mean X
( )
1 2, 2 2
2 2,4 3
3 2,6 4
4 2,8 5
5 4,2 3
6 4,4 4
7 4,6 5
8 4,8 6
9 6,2 4
10 6,4 5
11 6,6 6
12 6,8 7
13 8,2 5
14 8,4 6
15 8.6 7
16 8.8 8 67
Distribution of Sample Means
from Samples of Size n = 2

6
5
4
3
2
1

1 2 3 4 5 6 7 8 9

sample mean

68
Distribution of Individuals in
Population

6
 = 5,  = 2.24 6 X = 5, X = 1.58
5 5
4 4
3 3
2 2
1 1
1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9
sample mean

Distribution of Sample Means


69
A key distinction
Population Distribution – distribution of all
individual scores in the population
Sample Distribution – distribution of all
the scores in your sample
Sampling Distribution – distribution of all
the possible sample means when taking
samples of size n from the population.
Also called “the distribution of sample
means”. 70
Distribution of
Sample Means Things to Notice
1. The sample means tend to
pile up around the
population mean
6
5 2. The distribution of sample
4 means is approximately
3 normal in shape, even
2
though the population
1
distribution was not.

1 2 3 4 5 6 7 8 9 3. The distribution of sample


means has less variability
sample mean
than does the population
distribution. 71
What if we took a larger sample?

72
Distribution of Sample Means from
Samples of Size n = 3
24
22
20
18 X = 5, X = 1.29
16
14
12
10
8
6
4
2

1 2 3 4 5 6 7 8 9
73
sample mean
Distribution of Sample Means
As the sample gets bigger, the sampling
distribution…

1. stays centered at the population


mean.
2. becomes less variable.
3. becomes more normal.

74
Central Limit Theorem
For any population with mean  and
standard deviation , the distribution
of sample means for sample size n
1. will have a mean of 

2. will have a standard deviation of
n
3. will approach a normal distribution as n
approaches infinity (∞)

75
Notation
The mean of the sampling distribution
X  

The standard deviation of sampling


distribution (“standard error of the
mean”) 
X 
n

76
Standard Error
The “standard error” of the mean is:
The standard deviation of the distribution
of sample means.
The standard error measures the standard
amount of difference between x-bar and 
that is reasonable to expect simply by
chance.

SE =
n 77
Standard Error
The Law of Large Numbers states:
The larger the sample size, the smaller
the standard error.

This makes sense from the formula for


standard error …

78
The Standard Error of M
Consider the changes in Standard Do NOT confuse
Error of M as n increases from 1
to 4 and then to 100 for a “standard deviations”
normal population with a mean of with
80 (μ=80) and a standard
deviation of 20 (σ=20)
“standard errors”

79
Distribution of Individuals in Population

Distribution of Sample Means

6  = 5,  = 2.24 6 X = 5, X = 1.58
5 5
4 2.24
4 X   1.58
3 3 2
2 2
1 1
1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9
sample mean

80
Sampling Distribution (n=3)
24
22
20 X = 5
18 X = 1.29
16
14
12 2.24
X   1.29
10 3
8
6
4
2

1 2 3 4 5 6 7 8 9
81
sample mean
Clarifying Formulas
Population Sample Distribution of
Sample Means

  X
X
 X X  
N n


X 
  ( x   ) 2
s
 ( x  x ) 2
n
N n 1 notice

2
 
2
X
n 82
Exercise

6
5
4
3
2
1
1 2 3 4 5 6 7 8 9 10 11 12 13

N=7 SD
n=4 SE 
SE=? n
83
Thanks for ……..

You might also like