Normal Distribution
Normal Distribution
The normal distribution is the most important probability distribution in statistics because it
fits many natural phenomena. For example, heights, blood pressure, measurement error, and IQ
scores follow the normal distribution. It is also known as the Gaussian distribution and the bell curve.
The normal distribution is a probability function that describes how the values of a variable are
distributed. It is a symmetric distribution where most of the observations cluster around the central
peak and the probabilities for values further away from the mean taper off equally in both directions.
Extreme values in both tails of the distribution are similarly unlikely.
As with any probability distribution, the parameters for the normal distribution define its shape and
probabilities entirely. The normal distribution has two parameters, the mean and standard deviation.
The normal distribution does not have just one form. Instead, the shape changes based on
the parameter values.
Mean
The mean is the central tendency of the distribution. It defines the location of the peak for normal
distributions. Most values cluster around the mean. On a graph, changing the mean shifts the entire
curve left or right on the X-axis.
Standard deviation
The standard deviation is a measure of variability. It defines the width of the normal distribution. The
standard deviation determines how far away from the mean the values tend to fall. It represents the
typical distance between the observations and the average.
On a graph, changing the standard deviation either tightens or spreads out the width of the
distribution along the X-axis. Larger standard deviations produce distributions that are more spread
out.
• The mean and standard deviation are parameter values that apply to entire populations. For the
normal distribution, statisticians signify the parameters by using the Greek symbol μ (mu) for
the population mean and σ (sigma) for the population standard deviation.
• Unfortunately, population parameters are usually unknown because it’s generally impossible to
measure an entire population. However, you can use random samples to calculate estimates of
these parameters. Statisticians represent sample estimates of these parameters using x̅ for the
sample mean and s for the sample standard deviation.
Let,
Mean, 𝐸(𝑋) = 𝜇 -∞ μ ∞
Variance, 𝑉(𝑋) = 𝜎 2 x
There is a family of normal distribution. Each distribution may have a different mean () or
standard deviation (). The number of normal distributions is, therefore, unlimited. [It would be
physically impossible to provide table of probabilities (such as for the Binomial and Poisson) for each
combination of and ].
Fortunately, one member of the family of normal distribution can be used for all cases where the
normal distribution is applicable. It has a mean of 0 and a standard deviation of 1 and is called the
Standard Normal Distribution. Any normal distribution can be converted into the "Standard Normal
Distribution" by subtracting the mean from each observation and dividing by the standard deviation.
First it is necessary to convert, or standardize, the actual distribution to a standard normal distribution
using a z value, also called a z score, a z statistic, the standard normal deviate, or just the normal
deviate.
Z value: The distance between a selected value, designated X, and the mean, , divided by the
standard deviation (S. D.), .
X −
z=
Where:
Let,
𝑋−𝜇
𝑍=
𝜎
Then, Mean, 𝐸(𝑍) = 0
Variance, 𝑉(𝑍) = 1 -∞ ∞
0
And, if Z has a distribution function (pdf), z
1 1 2
𝑓(𝑧) = 𝑒 −2 𝑧 ; −∞<𝑧<∞
√2𝜋
We write it as, 𝑍 ~ 𝑁 (0, 1)
Let, see how the mean 𝐸 (𝑋) = 𝝁 of random variable X transform into mean 𝐸 (𝑍) = 𝟎 of standard
normal variable Z.
𝑋−𝜇 𝑋 𝜇 𝑋 𝜇 1 𝜇 𝜇 𝜇
𝐸(𝑍) = 𝐸( ) = 𝐸( 𝜎 − ) = 𝐸( 𝜎 ) - 𝜎 = 𝜎 𝐸 (𝑋) − 𝜎 = 𝜎 − = 0 [As 𝐸(𝑋) = 𝜇]
𝜎 𝜎 𝜎
3. Bell-shaped curve
4. The area under the curve lying between μ±σ is 68.27% of the total area
5. The area under the curve lying between μ±2σ is 95.45% of the total area
6. The area under the curve lying between μ±3σ is 99.73% of the total area
68.27%
-∞ μ-σ μ μ+σ ∞
Let’s say in any area the age of a population follows Normal distribution with Age ~ N (35, 10)
• Normal distribution table provides probabilities for N(0,1) i.e. for standard normal
distribution
• Usually, normal table gives P[0 < Z < z] for positive values of Z.
• For other values, we can use the property of symmetry with median 0 of standard normal
distribution
• To find probabilities for a normal random variable X, we can transform the probability
statement about X in terms of probability statement for Z and then calculate the probability
using the standard normal distribution table or Z-table
𝑋−𝜇 𝑎−𝜇 𝑎−𝜇
𝑃[𝑋 < 𝑎] = 𝑃 [ < ] = 𝑃 [𝑍 < ]
𝜎 𝜎 𝜎
Example 1:
The number of viewers of a TV show per week has a mean of 29 million with a standard deviation of
5 million. Assume that, the number of viewers of that show follows a normal distribution.
Solution:
∴ 𝑋 ~ 𝑁 (𝜇, 𝜎 2 )
a. the probability that, next week’s show will have between 30 and 34 million viewers-
30 − 𝜇 𝑋 − 𝜇 34 − 𝜇 30 − 29 𝑋 − 𝜇 34 − 29
𝑃[30 ≤ 𝑋 ≤ 34] = 𝑃 [ ≤ ≤ ] = 𝑃[ ≤ ≤ ]
𝜎 𝜎 𝜎 5 𝜎 5
= 𝑃[0.20 ≤ 𝑍 ≤ 1] = 𝑃[0 ≤ 𝑍 ≤ 1] − 𝑃[0 ≤ 𝑍 ≤ 0.2] = 0.3413 − 0.0793
= 0.262
-∞ 0 0.2 1 ∞
z
b. the probability that, next week’s show will have at least 23 million viewers-
𝑋 − 𝜇 23 − 𝜇 𝑋 − 𝜇 23 − 29
𝑃[𝑋 ≥ 23] = 𝑃 [ ≥ ] = 𝑃[ ≥ ] = 𝑃[𝑍 ≥ −1.2]
𝜎 𝜎 𝜎 5
= 𝑃[−1.2 ≤ 𝑍 ≤ 0] + 𝑃[𝑍 ≥ 0] = 0.3849 + 0.5 = 0.8849
c. the probability that, next week’s show will exceed 40 million viewers-
𝑋 − 𝜇 40 − 𝜇 𝑋 − 𝜇 40 − 29
𝑃[𝑋 > 40] = 𝑃 [ > ] = 𝑃[ > ] = 𝑃 [𝑍 > 2.2]
𝜎 𝜎 𝜎 5
= 𝑃[𝑍 ≥ 0] − 𝑃[0 ≤ 𝑍 ≤ 2.2] = 0.5 − 0.4861 = 0.0139
-1.2 0 ∞ -∞ 0 2.2 ∞
-∞
Example 2: z z
a. For what value of ‘a’, P[Z≤a] = 0.95?
Solution:
a. P[Z≤a] = 0.95
b. P[Z≥a] = 0.05
Example 3: Suppose that the height of UCLA female students has normal distribution with mean 62
inches and standard deviation 8 inches.
a. Find the height below which is the shortest 30% of the female students.
b. Find the height above which is the tallest 5% of the female students.
Solution:
More Examples:
Example: Given a normal distribution with = 50, = 10, find the probability that X assumes a
value between 45 and 62
45 − 50 62 − 50
Z1 = = −0.5 Z2 = = 1.2
10 10
P(45<x<62) = P(-0.5<Z<1.2)
= P(Z<1.2) – P(Z<-0.5)
Example: The weekly incomes of the bankers of a bank follow normal distribution with a mean
of $ 1,000 and std. of $100.
What is the livelihood of selecting a banker whose weekly income is between $1000 and $100?
= P(0<Z<1)
= 0.3413
482 − 400
# P(X>482) = P[Z> ] = P(Z>1.64) = 0.5 – 0.4495 = 0.0505
50
Practice Problem:
1. The weekly incomes of some employees in an industry are normally distributed with a mean of
$1,000 and standard deviation of $100. What is the likelihood of selecting an employee whose
weekly income is -
a. between $1,000 and $1,100?
b. between $790 and $1,000?
c. Less than $790?
d. Between $840 and $1,200?
e. Between $1,150 and $1,250?
2. A large group of students took a test in Physics and the final grades have a mean of 70 and a
standard deviation of 10. If we can approximate the distribution of these grades by a normal
distribution, what percent of the students
a. scored higher than 80?
b. should pass the test (grades≥60)?
c. should fail the test (grades<60)?
3. The annual salaries of employees in a large company are approximately normally distributed
with a mean of $50,000 and a standard deviation of $20,000.
a. What percent of people earn less than $40,000?
b. What percent of people earn between $45,000 and $65,000?
c. What percent of people earn more than $70,000?
4. A tire manufacturer wishes to set a minimum mileage guarantee on its new MX100 tire. Tests
reveal the mean mileage is 67,900 with a standard deviation of 2,050 miles and a normal
distribution. The manufacturer wants to set the minimum guaranteed mileage so that no more
than 4 percent of the tires will have to be replaced. What minimum guaranteed mileage should
the manufacturer announce?
[Answer: 64,312]