0% found this document useful (0 votes)
78 views33 pages

Lecture (Chapter 5) : The Normal Curve: Ernesto F. L. Amaral

The document summarizes key concepts about the normal distribution from Chapter 5 of Healey's textbook, including: - The normal curve is a theoretical, bell-shaped distribution with the mean, median and mode equal. - Z-scores standardize raw scores and allow finding areas under the normal curve using tables. - Areas under the curve can be expressed as probabilities to estimate how likely events are. - Variables' distributions can be assessed for normality using histograms, boxplots, quantile plots, and power transformations if needed.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
78 views33 pages

Lecture (Chapter 5) : The Normal Curve: Ernesto F. L. Amaral

The document summarizes key concepts about the normal distribution from Chapter 5 of Healey's textbook, including: - The normal curve is a theoretical, bell-shaped distribution with the mean, median and mode equal. - Z-scores standardize raw scores and allow finding areas under the normal curve using tables. - Areas under the curve can be expressed as probabilities to estimate how likely events are. - Variables' distributions can be assessed for normality using histograms, boxplots, quantile plots, and power transformations if needed.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Lecture (chapter 5):

The normal curve

Ernesto F. L. Amaral

February 7, 2018
Advanced Methods of Social Research (SOCI 420)

Source: Healey, Joseph F. 2015. ”Statistics: A Tool for Social Research.” Stamford: Cengage
Learning. 10th edition. Chapter 5 (pp. 122–142).
Chapter learning objectives
• Define and explain the concept of the normal
curve
• Convert empirical scores to Z scores
• Use Z scores and the normal curve table
(Appendix A) to find areas above, below, and
between points on the curve
• Express areas under the curve in terms of
probabilities

2
Properties of the normal curve
• Theoretical
• Bell-shaped
• Unimodal
• Smooth
• Symmetrical
• Unskewed
• Tails extend to infinity
• Mode, median, and mean are same value

3
Standard normal distribution
• Normal distribution with 𝑋" = 0 and s = 1
• Distances on horizontal axis cut off the same area

• ±1s = 68.26%
• ±2s = 95.44%
• ±3s = 99.72%

• Between mean & 1s = 34.13%


• Between mean & 2s = 47.72%
• Between mean & 3s = 49.86%
Source: Healey 2015, p.125. 4
IQ scores,
females
𝑋" = 100
s = 10
N = 1000

IQ scores,
males
𝑋" = 100
s = 20
N = 1000

Source: Healey 2015, p.123–124. 5


IQ scores,
females
𝑋" = 100
s = 10
N = 1000

IQ scores,
males
𝑋" = 100
s = 20
N = 1000

Source: Healey 2015, p.123–124. 6


Z scores
• Z scores are scores that have been
standardized to the theoretical normal curve
• Z scores represent how different a raw score is
from the mean in standard deviation units
• To find areas, first compute Z scores
• The Z score formula changes a raw score to a
standardized score

𝑋& − 𝑋"
𝑍=
𝑠
7
IQ for males
𝑋& − 𝑋" 120 − 100
𝑍= = = +1.00
𝑠 20

• An IQ score of 120 falls one standard deviation


above (to the right of) the mean
8
Estimated date of delivery
4.0

3.0

%
2.0

1.0
68.26%
13.59% 13.59%
0.0
Feb Feb Mar Mar Apr 7 Apr May May May
14 27 12 25 20 3 16 29

95.44%
s = 13 days (based on Naegele’s rule)
Area under the normal curve
• Compute the
Z score
• Draw a
picture of the
normal curve
and shade in
the area in
which you are
interested
• Find your Z
score in
Column A...
... ... ...
Source: Healey 2015, Appendix A, p.443. 10
Positive score
• Find your Z score
in Column A
• To find area
below a positive
score
– Add column b
area to 0.50

• To find area
above a positive
score
– Look in column c

... ... ...


Source: Healey 2015, Appendix A, p.443. 11
Area below Z = 0.85
• Finding the area below a positive Z score:
• Z = +0.85
• Area from column b = 0.3023
• 0.50 + 0.3023 = 0.8023 or 80.23%

Command in Stata
(normal shows area below Z)

display normal(0.85)

.80233746

Source: Healey 2015, p.129. 12


Area above Z = 0.40
• Finding the area above a positive Z score
• Z = +0.40
• Area from column c = 0.3446 or 34.46%

Command in Stata
(normal shows area below Z)

di 1-normal(0.4)

.34457826

Source: Healey 2015, p.130. 13


Negative score
• Find your Z score
in Column A
• To find area
below a negative
score
– Look in column c

• To find area
above a negative
score
– Add column b
area to 0.50

... ... ...


Source: Healey 2015, Appendix A, p.443. 14
Area below Z = –1.35
• Finding the area below a negative Z score
• Z = –1.35
• Area from column c = 0.0885 or 8.85%

Command in Stata
(normal shows area below Z)

di normal(-1.35)

.08850799

Source: Healey 2015, p.129. 15


Between scores, opposite sides
of mean
• Find your Z scores
in Column A
• To find area
between two scores
on opposite sides
of the mean
– Find the areas
between each score
and the mean from
column b
– Add the two areas
... ... ...
Source: Healey 2015, Appendix A, p.443. 16
Area between two scores,
opposite sides of mean
• Finding the area between Z scores on different sides
of the mean
• Z = –0.35, area from column b = 0.1368
• Z = +0.60, area from column b = 0.2257
• Area = 0.1368 + 0.2257 = 0.3625 or 36.25%

Command in Stata
(normal shows area below Z)

di normal(0.6)-normal(-0.35)

.36257753

Source: Healey 2015, p.131. 17


Between scores, same side of
mean
• Find your Z scores
in Column A
• To find area
between two scores
on the same side of
the mean
– Find the area
between each score
and the mean from
column b
– Subtract the smaller
area from the larger
area
... ... ...
Source: Healey 2015, Appendix A, p.443. 18
Area between two scores,
same side of mean
• Finding the area between Z scores on the same
side of the mean
• Z = +0.65, area from column b = 0.2422
• Z = +1.05, area from column b = 0.3531
• Area = 0.3531 – 0.2422 = 0.1109 or 11.09%

Command in Stata
(normal shows area below Z)

di normal(1.05)-normal(0.65)

.11098705

Source: Healey 2015, p.131. 19


Estimating probabilities
• Areas under the curve can also be expressed as
probabilities

• Probabilities are proportions


– They range from 0.00 to 1.00

• The higher the value, the greater the probability


– The more likely the event

20
Example
• If a distribution has mean equals to 13 and
standard deviation equals to 4

• What is the probability of randomly selecting a


score of 19 or more?
𝑋& − 𝑋" 19 − 13 6
𝑍= = = = 1.5
𝑠 4 4
• Command in Stata (normal shows area below Z)
di 1-normal(1.5)
p = 0.0668072
21
Determining normality
• Some statistical methods require random
selection of respondents from a population with
normal distribution for its variables

• We can analyze histograms, boxplots, outliers,


quantile-normal plots to determine if variables
have a normal distribution

22
Histogram of income

Source: 2016 General Social Survey. 23


Boxplot of income

Source: 2016 General Social Survey. 24


Quantile-normal plots
• A quantile-normal plot is a scatter plot
– One axis has quantiles of the original data
– The other axis has quantiles of the normal distribution

• If the points do not form a straight line or if the points


have a non-linear symmetric pattern
– The variable does not have a normal distribution

• If the pattern of points is roughly straight


– The variable has a distribution close to normal

• If the variable has a normal distribution


– The points would exactly overlap the diagonal line
25
Quantile-normal plots reflect distribution shapes

(discrete values) (bimodal)

Source: Hamilton 1992, p.16.


Quantile-normal plot of income

Source: 2016 General Social Survey. 27


Power transformation
• Lawrence Hamilton (“Regression with Graphics”, 1992, p.18–19)
Y3 → q=3
Y2 → q = 2
Y1 → q = 1
Y0.5 → q = 0.5
log(Y) → q = 0
–(Y–0.5) → q = –0.5
–(Y–1) → q = –1

• q>1: reduce concentration on the right (reduce negative skew)


• q=1: original data
• q<1: reduce concentration on the left (reduce positive skew)
• log(x+1) may be applied when x=0. If distribution of log(x+1) is
normal, it is called lognormal distribution
28
Histogram of log of income

Source: 2016 General Social Survey. 29


Boxplot of log of income

Source: 2016 General Social Survey. 30


Quantile-normal plot of log of income

Source: 2016 General Social Survey. 31


Points to remember
• Cases with scores close to the mean are
common and those with scores far from the
mean are rare

• The normal curve is essential for understanding


inferential statistics in Part II of the textbook

32

You might also like