0% found this document useful (0 votes)
103 views73 pages

Lecture 2-Statistics The Normal Distribution and The Central Limit Theorem

The document provides an overview of the normal distribution and how to calculate probabilities using the normal distribution. Some key points: - The normal distribution is symmetric and bell-shaped, with the mean, median, and mode being equal. It is defined by its mean (μ) and standard deviation (σ). - About 68% of values fall within 1 standard deviation of the mean, 95% within 2 standard deviations, and 99.7% within 3 standard deviations. - To calculate a probability, the value is converted to a z-score by subtracting the mean and dividing by the standard deviation, then the area under the normal curve is found. - Probabilities

Uploaded by

Dstorm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
103 views73 pages

Lecture 2-Statistics The Normal Distribution and The Central Limit Theorem

The document provides an overview of the normal distribution and how to calculate probabilities using the normal distribution. Some key points: - The normal distribution is symmetric and bell-shaped, with the mean, median, and mode being equal. It is defined by its mean (μ) and standard deviation (σ). - About 68% of values fall within 1 standard deviation of the mean, 95% within 2 standard deviations, and 99.7% within 3 standard deviations. - To calculate a probability, the value is converted to a z-score by subtracting the mean and dividing by the standard deviation, then the area under the normal curve is found. - Probabilities

Uploaded by

Dstorm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 73

Lecture 2-Statistics

The Normal Distribution and the Central Limit


Theorem
Continuous Probability Distributions

◼ A continuous random variable is a variable that


can assume any value on an interval
◼ thickness of an item, weight, length
◼ time required to complete a task
◼ temperature
◼ height, in cm
◼ km per litre
Probability Density

◼ A function f(x) ≥ 0 that shows the more likely


and less likely intervals of variable X.
◼ P(a ≤ X ≤ b) = area under f(x) from a to b

f(X) P (a ≤ X ≤ b)
= P (a < X < b)
(Note that the
probability of any
individual value is zero)

a b X
Probability is the area under the
density curve

f(X) P (a ≤ X ≤ b)
= P (a < X < b)
(Note that the
probability of any
individual value is zero)

a b X
The Normal Distribution
◼ ‘Bell Shaped’
◼ Symmetrical
f(X)
◼ Mean, Median and Mode
are Equal
Location is determined by the σ
mean, μ X
Spread is determined by the μ
standard deviation, σ
Mean
The random variable has an = Median
infinite theoretical range: = Mode
+  to − 
The Normal Distribution
Density Function

◼ The formula for the normal probability density function is


2
1  (X −μ) 
1 −  
2  
f(X) = e
2π
Where e = the mathematical constant approximated by 2.71828
π = the mathematical constant approximated by 3.14159
μ = the population mean
σ = the population standard deviation
X = any value of the continuous variable
Many Normal Distributions

By varying the parameters μ and σ, we obtain


different normal distributions
The Normal Distribution
Shape

f(X) Changing μ shifts the


distribution left or right.

Changing σ increases
or decreases the
σ spread.

μ X
Probability as
Area Under the Curve
The total area under the curve is 1.0, and the curve is
symmetric, so half is above the mean, half is below

f(X) P( −  X  μ) = 0.5
P(μ  X   ) = 0.5

0.5 0.5

μ X
P( −  X   ) = 1.0
Empirical Rules

What can we say about the distribution of values


around the mean? For any normal distribution:
f(X)

μ ± 1σ encloses about
68.26% of X’s
σ σ

X
μ-1σ μ μ+1σ
68.26%
The Empirical Rule
(continued)

◼ μ ± 2σ covers about 95% of X’s


◼ μ ± 3σ covers about 99.7% of X’s

2σ 2σ 3σ 3σ
μ x μ x

95.44% 99.73%
**The beauty of the normal curve:

No matter what  and  are, the area between - and


+ is about 68%; the area between -2 and +2 is
about 95%; and the area between -3 and +3 is
about 99.7%. Almost all values fall within 3 standard
deviations.
68-95-99.7 Rule

68% of
the data

95% of the data

99.7% of the data


The Standardized Normal

◼ Any normal distribution (with any mean and


standard deviation combination) can be
transformed into the standardized normal
distribution (Z)

◼ Need to transform X units into Z units

◼ The standardized normal distribution (Z) has a


mean of 0 and a standard deviation of 1
Translation to the Standardized
Normal Distribution

◼ Translate from X to the standardized normal


(the “Z” distribution) by subtracting the mean
of X and dividing by its standard deviation:

X −μ
Z=
σ
The Z distribution always has mean = 0 and
standard deviation = 1
The Standardized Normal
Probability Density Function
◼ The formula for the standardized normal
probability density function is

1 −(1/2)Z2
f(Z) = e

Where e = the mathematical constant approximated by 2.71828


π = the mathematical constant approximated by 3.14159
Z = any value of the standardized normal distribution
The Standardized
Normal Distribution
◼ Also known as the “Z” distribution
◼ Mean is 0
◼ Standard Deviation is 1
f(Z)

1
Z
0
Values above the mean have positive Z-values,
values below the mean have negative Z-values
Standard Normal Model

.5 .5

Z
-3 -2 -1 0 1 2 3

Z~N(0, 1) denotes the standard normal


model
 = 0 and  = 1
Standardizing

◼ (X−)  is also a normal model; we will


denote it by Z:
Z = (X−) 
◼  has mean 0 and standard deviation 1:
 = 0;  = 1.
  ( 1)
◼ The normal model Z is called the standard
normal model.
µ = 6 and  = 2

X
0 3 6 8 9 12
(X-6)/2

µ = 0 and  = 1

.5 .5

Z
-3 -2 -1 0 1 2 3
Example

◼ If X is distributed normally with mean of 100


and standard deviation of 50, the Z value for
X = 200 is

X − μ 200 − 100
Z= = = 2.0
σ 50
◼ This says that X = 200 is two standard
deviations (2 increments of 50 units) above
the mean of 100.
Comparing X and Z units

100 200 X (μ = 100, σ = 50)

0 2.0 Z (μ = 0, σ = 1)

Note that the shape of the distribution is the same,


only the scale has changed. We can express the
problem in original units (X) or in standardized
units (Z)
Example
Area=?

0 1.27 z

n Area between 0 and 1.27) = ?


Example

Area to the left of -1.85 =?


Example
Area=?

0 z
-2.24

n Area between -2.24 and 0 =?


Example

A1 A2

z
-1.18 0 2.73

n Area between -1.18 and 2.73 = ?


Example

Area between -1 and +1 = ?


Finding Normal Probabilities

◼ Let X represent the time it takes to


download an image file from the internet.
◼ Suppose X is normal with mean 8.0 and
standard deviation 5.0. Find P(X < 8.6)

X
8.0
8.6
Finding Normal Probabilities
(continued)
◼ Let X represent the time it takes to download an image file from the
internet.
◼ Suppose X is normal with mean 8.0 and standard deviation 5.0. Find
P(X < 8.6)

X − μ 8.6 − 8.0
Z= = = 0.12
σ 5.0

μ=8 μ=0
σ = 10 σ=1

8 8.6 X 0 0.12 Z

P(X < 8.6) P(Z < 0.12)


Finding Normal
Upper Tail Probabilities
◼ Suppose X is normal with mean 8.0 and
standard deviation 5.0.
◼ Now Find P(X > 8.6)

X
8.0
8.6
Finding a Normal Probability
Between Two Values

◼ Suppose X is normal with mean 8.0 and


standard deviation 5.0. Find P(8 < X < 8.6)

Calculate Z-values:

X −μ 8 −8
Z= = =0
σ 5
8 8.6 X
X − μ 8.6 − 8 0 0.12 Z
Z= = = 0.12
σ 5 P(8 < X < 8.6)
= P(0 < Z < 0.12)
Probabilities in the Lower Tail

◼ Suppose X is normal with mean 8.0 and


standard deviation 5.0.
◼ Now Find P(7.4 < X < 8)

X
8.0
7.4
Probabilities in the Lower Tail
(continued)

Now Find P(7.4 < X < 8)…


P(7.4 < X < 8)
= P(-0.12 < Z < 0) 0.0478
= P(Z < 0) – P(Z ≤ -0.12)
= 0.5000 - 0.4522 = 0.0478 0.4522

The Normal distribution is


symmetric, so this probability
7.4 8.0 X
is the same as P(0 < Z < 0.12) Z
-0.12 0
Example:
P(2.9 < X < 7.1) =?

Normal
Distribution µ=5
 = 10

2.9 5 7.1 X
Example: P(X  8) = ?

Normal
Distribution

 = 10

 =5 8 X
Example
What is Zα if P(Z>Zα)=0.005
= 0.025
. = 0.05


Given a Normal Probability
Find the X Value

◼ Steps to find the X value for a known


probability:
1. Find the Z value for the known probability
2. Convert to X units using the formula:

X = μ + Zσ
Finding the X value for a
Known Probability (continued)
Example:
◼ Let X represent the time it takes (in seconds) to
download an image file from the internet.
◼ Suppose X is normal with mean 8.0 and standard
deviation 5.0
◼ Find X such that 20% of download times are less than
X.
0.2000

? 8.0 X
? 0 Z
Find the Z value for
20% in the Lower Tail
1. Find the Z value for the known probability
◼ 20% area in the lower
tail is consistent with a
Z value of ?

0.2000

? 8.0 X
0 Z
Find the Z value for
20% in the Lower Tail
1. Find the Z value for the known probability
◼ 20% area in the lower
tail is consistent with a
Z value of -0.84

0.2000

? 8.0 X
-0.84 0 Z
Finding the X value

2. Convert to X units using the formula:

X = μ + Zσ
= 8.0 + ( −0.84 )5.0
= 3.80

So 20% of the values from a distribution


with mean 8.0 and standard deviation
5.0 are less than 3.80
Example

The average household income in some country


is 900 coins, and the standard deviation is 200
coins. Assuming the Normal distribution of
incomes,
(a) Compute the proportion of “the middle class,”
whose income is between 600 and 1200 coins.
(b) The government decides to issue food stamps
to the poorest 3% of households. Below what
income will families receive food stamps?
N(275, 43); find k so that area
to the left is .9846
N(275, 43); find k so that area
to the left is .9846

.9846 = area to the left of k under N(275,43)


curve = area to left of z = ( k − 275) 43 under
N(0,1) curve  k − 275 = 2.16
43
(from standard normal table)
 k = 2.16(43) + 275 = 367.88
Example

◼ Regulate blue dye for mixing paint; machine can be


set to discharge an average of  ml./can of paint.
◼ Amount discharged: N(, .4 ml). If more than 6 ml.
discharged into paint can, shade of blue is
unacceptable.
◼ Determine the setting  so that only 1% of the cans
of paint will be unacceptable
Solution
X =amount of dye discharged into can
X ~N( , .4); determine  so that
area to the right of 6 is .01
Solution (cont.)
X =amount of dye discharged into can
X ~N( , .4); determine  so that
the area to the right of x= 6 is .01.
.01 = area to the right of x = 6
= area to the right of z = (6 −  ) .4
−
 6.4 = 2.33(from standard normal table)
  = 6-2.33(.4) = 5.068
Evaluating Normality

◼ Not all continuous distributions are normal


◼ It is important to evaluate how well the data set is
approximated by a normal distribution.
◼ Normally distributed data should approximate the
theoretical normal distribution:
◼ The normal distribution is bell shaped (symmetrical)
where the mean is equal to the median.
◼ The empirical rule applies to the normal distribution.
Evaluating Normality
(continued)
Comparing data characteristics to theoretical
properties
◼ Construct charts or graphs
◼ For small- or moderate-sized data sets, construct a stem-and-leaf
display or a boxplot to check for symmetry
◼ For large data sets, does the histogram or polygon appear bell-
shaped?
◼ Compute descriptive summary measures
◼ Do the mean, median and mode have similar values?
◼ Is the range approximately 6 σ?
Evaluating Normality
(continued)

Comparing data characteristics to theoretical


properties
◼ Observe the distribution of the data set
◼ Do approximately 2/3 of the observations lie within mean ±1
standard deviation?
◼ Do approximately 80% of the observations lie within mean
±1.28 standard deviations?
◼ Do approximately 95% of the observations lie within mean ±2
standard deviations?
◼ Evaluate normal probability plot
◼ Is the normal probability plot approximately linear (i.e. a straight
line) with positive slope?
Constructing
A Normal Probability Plot

◼ Normal probability plot


◼ Arrange data into ordered array
◼ Find corresponding standardized normal quantile
values (Z)
◼ Plot the pairs of points with observed data values (X)
on the vertical axis and the standardized normal
quantile values (Z) on the horizontal axis
◼ Evaluate the plot for evidence of linearity
The Normal Probability Plot
Interpretation
A normal probability plot for data
from a normal distribution will be
approximately linear:

X
90

60

30

-2 -1 0 1 2 Z
Normal Probability Plots (cont)

n Nearly Normal data have a histogram


and a Normal probability plot that look
somewhat like this example:
Normal Probability Plots (cont)
n A skewed distribution might have a
histogram and Normal probability plot
like this:
Sampling Distributions

◼ A sampling distribution is a distribution of a


statistic computed from a sample of size n.

◼ A sample is random, collected from a


population. Hence, all statistics computed from
it are random variables.
Sampling Distribution of the
Sample Mean
Sample mean has the following mean and standard deviation:

μ X = E (X ) = μ = population mean

 population standard deviation


σX = =
n sample size

Standard deviation of the sample mean is also called


the standard error of the sample mean. It decreases
as the sample size increases.

Chap 7-57
Sample Mean for a Normal Population

◼ If a population is normal with mean μ and


standard deviation σ, the sampling distribution
of X is also normal with

σ
μX = μ and σX =
n
Sampling Distribution Properties

Normal Population


μx = μ Distribution

μ x
(i.e. x is unbiased ) Normal Sampling
Distribution
(has the same mean)

μx
x
Sampling Distribution Properties
(continued)

As n increases, Larger
σx decreases sample size

Smaller
sample size

μ x
Sample Mean
for non-Normal Populations

◼ Central Limit Theorem:


◼ Even if the population is not normal,
◼ …sample means are approximately normal
as long as the sample size is large enough.
The Central Limit Theorem
(for the sample mean x)
◼ If a random sample of n observations is
selected from a population (any population),
then when n is sufficiently large, the sampling
distribution of x will be approximately normal.
(The larger the sample size, the better will be the
normal approximation to the sampling
distribution of x.)
Central Limit Theorem

the sampling
As the n↑ distribution of
sample the sample
size gets mean becomes
large almost normal
enough… regardless of
shape of
population

x
Sample Mean
if the Population is not Normal
(continued)

Population Distribution
Sampling distribution
properties:
Central Tendency

μx = μ
μ x
Sampling Distribution
Variation
σ
σx =
(becomes normal as n increases)
Larger
n Smaller
sample size
sample
size

μx x
The Importance of the Central
Limit Theorem
◼ When we select simple random samples of
size n, the sample means x will vary from
sample to sample. We can model the
distribution of these sample means with a
probability model that is …
 
N  , 
 n
How Large is Large Enough?

◼ For most distributions, n > 30 will give a


sampling distribution that is nearly normal
◼ For fairly symmetric distributions, n > 5
◼ For normal population distributions, the
sampling distribution of the mean is always
normally distributed
Z-value for Sampling Distribution
of the Mean
◼ Z-value for the sampling distribution of X :

(X − μ X ) (X − μ)
Z= =
σX σ
n

Xwhere: = sample mean


μ = population mean
σ = population standard deviation
n = sample size
Example

◼ Suppose a population has mean μ = 8 and


standard deviation σ = 3. Suppose a random
sample of size n = 36 is selected.

◼ What is the probability that the sample mean is


between 7.8 and 8.2?
Example
(continued)

Solution:
◼ Even if the population is not normally
distributed, the central limit theorem can be
used (n > 30)
◼ … so the sampling distribution of x is
approximately normal
◼ … with mean μ x = 8
σ 3
◼ …and standard deviation σ x = n = 36 = 0.5
Example
(continued)
Solution (continued):
 
 7.8 - 8 X -μ 8.2 - 8 
P(7.8  X  8.2) = P   
 3 σ 3 
 36 n 36 
= P(-0.4  Z  0.4) = 0.6554 - 0.3446 = 0.3108

Population Sampling Standard Normal


Distribution Distribution Distribution
???
? ??
? ? Sample Standardize
? ? ?
?
-0.4 0.4
μ=8 X 7.8
μX = 8
8.2
x μz = 0 Z
Example
n The probability distribution of annual
incomes of account executives has mean
$20,000 and standard deviation $5,000.
Example 2(cont.)
n b) n=64 account executives are randomly
selected. What is the probability that the
sample mean exceeds $20,500?
Example 2(cont.)
n b) n=64 account executives are randomly
selected. What is the probability that the
sample mean exceeds $20,500?
answer E(X) = $20, 000
SD(X) = $5, 000
E ( X ) = $20, 000
SD ( X ) = = = 625
SD ( x ) 5,000
n 64

By CLT, X ~ N (20, 000, 625)


P ( X  20, 500) =
P ( X −625
20,000
 20,500 − 20,000
625
)=
P ( z  .8) = 1 − .7881 = .2119

You might also like