5-Random-Variables
5-Random-Variables
We can derive the expected value from the distribution of 𝑌 the same way as we derive the mean
or average 𝑦ത or 𝜇 of a list. For example, the average of the list (1,0,8,6,6,1,6) of 𝑛 = 7 number is
1+0+8+6+6+1+6 1 2 3 1
=0× +1× +6× +8× =4
7 7 7 7 7
Example
• If random variable 𝑌 can take two possible values: 𝑎 and 𝑏, with the
probabilities 𝑃 𝑎 and 𝑃 𝑏 . Then
• 𝐸 𝑌 = 𝑎 𝑃 𝑎 + 𝑏𝑃(𝑏)
• Where 𝑃 𝑎 + 𝑃 𝑏 = 1. The weighted average of 𝑎 and 𝑏 would be
a number between 𝑎 and 𝑏. The larger 𝑃(𝑎), the closer 𝐸 𝑌 is to 𝑎;
and the larger 𝑃(𝑏), the closer 𝐸 𝑌 is to 𝑏
Variance
• If you predict the value of a random variable 𝑌 using its expected value
𝐸 𝑌 = 𝜇, you will be off by the random amount 𝑌 − 𝜇 which is known as
deviation
𝐸 𝑌−𝜇 =𝐸 𝑌 −𝜇 =0
• If you want to consider the size of the deviation, you either need to
consider the absolute value or the square of 𝑌 − 𝜇. In algebra, it is easier
to use the 2square values than the absolute values, you need to consider
𝐸[ 𝑌 − 𝜇 ], then take the square root to get the value in the units same
as 𝑌
𝐸 𝑌 − 𝜇 2 = 𝐸 𝑌 2 − 2𝜇𝑌 + 𝜇2
𝑉𝑎𝑟 𝑌 = 𝐸 𝑌 2 − 2𝜇2 + 𝜇2 because 𝐸 𝑌 = 𝜇
= 𝐸 𝑌 2 − 𝜇2 = 𝐸 𝑌 2 − 𝐸 𝑌 2
Example
• Let 𝑌 be a discrete random variable with probability function as given
below 𝑦
𝑖 𝑓(𝑦 )
𝑖
0 0.20
1 0.15
2 0.25
3 0.35
4 0.05
• Expected Value
• 𝐸 𝑌 = 0 × 0.20 + 1 × 0.15 + 2 × 0.25 + 3 × 0.35 + 4 × 0.05 =
1.90
𝑦𝑖 𝑓(𝑦𝑖 )
0 0.20
Example 1
2
0.15
0.25
3 0.35
4 0.05
• Variance
• Variance can be calculated in two way
• 𝑉𝑎𝑟 𝑌 = 0 − 1.90 2 × 0.20 + 1 − 1.90 2 × 0.15 + (2 −
1.90)2 × 0.25 + 3 − 1.90 2 × 0.35 + 4 − 1.90 2 × 0.05 = 1.49
• Second way is
• 𝐸 𝑌 2 = 02 × 0.20 + 12 × 0.15 + 22 × 0.25 + 32 × 0.35 + 42 ×
0.05 = 5.10
• 𝑉𝑎𝑟 𝑌 = 5.10 − 1.902 = 1.49
Binomial Distribution
Binomial Distribution
• We need to find a formula for
finding the probability of getting 𝑘
successes in 𝑛 independent trials.
• For this, we consider a tree
diagram for 𝑛 = 4
• Each path down the 𝑛 steps
represents the possible outcomes
of the first 𝑛 trials.
• The 𝑘th node in the 𝑛th trial
represents 𝑘 successes in 𝑛 trials.
Binomial Distribution
• The expression in each node denotes
the probabilities of success (denoted
by 𝑝) and failure (denoted by 1 − 𝑝 =
𝑞) on each trial.
• The expression shows the sum of
probabilities of all paths leading to the
node.
• For example in row 3, the probabilities
of 𝑘 = 0, 1, 2, 3 successes in 𝑛 = 3 can
be expressed by
2 7
9 1 5 36 × 57
= = 0.279
2 6 6 69
𝑘 𝑛−𝑘 𝑛
𝑘 𝑛−𝑘
1 1 1
𝑝 𝑞 = =
2 2 2
𝑛 1
• Probability of 𝑘 heads in 𝑛 fair coin tosses = 𝑘
× 2𝑛 where as 0 ≤ 𝑘 ≤ 𝑛
Create function to calculate probability mass
function for binary events.
Properties of Binomial Distribution
• There are 𝑛 independent and identical trials for a Bernoulli (success-
failure) experiment.
• There are two possible outcomes of every experiment: Success or
failure
• The probability of success (denoted by 𝑝) remains constant for each
trial
• 𝑘 is a random variable that denotes the number of “successes”
observed during 𝑛 trials.
Mean and Standard Deviation
• Like any other probability distributions, a binomial probability
distribution also has a mean 𝜇 and standard deviation, 𝜎.
𝜇 = 𝑛𝑝
𝜎= 𝑛𝑝 1 − 𝑝 = 𝑛𝑝𝑞
Mean and Standard Deviation
• Example: A company that is producing turf grass monitors the quality of grass by taking a sample
on 25 seeds on a regular interval. The germination rate of the seed is consistent at 85%. Find the
average number of seeds that will germinate in the sample of 25 seeds
• 𝜇 = 𝑛𝑝 = 25 × 0.85 = 21.25
• 𝜎 = 25 × 0.85 × 0.15 = 1.785
• Using Python
>>> from scipy.stats import binom
>>> n, p = 25, 0.85
>>> binom.mean(n,p)
21.25
>>> binom.std(n,p)
1.7853571071357126
Binomial Probability Distribution
• We can plot the probability distribution whereas the
distribution is tending towards left skewness
>>> import numpy as np
>>> from scipy.stats import binom
>>> import matplotlib.pyplot as plt
>>> x = np.arange(0,26)
>>> n, p = 25, 0.85
>>> fig, ax = plt.subplots(1, 1)
>>> ax.plot(x,binom.pmf(x,n,p),lw=5,alpha=0.5)
>>> ax.vlines(x,0,binom.pmf(x,n,p),lw=5,alpha=0.5)
>>> plt.title('Binomial Distribution n=25, p=0.85')
>>> plt.show()
Binomial Probability Distribution
• The bimonial(𝑛, 𝑝) distributions have roughly the same bell shape
irrespective of values of 𝑛 and 𝑝. As 𝑛 and 𝑝 vary, binomial
distributions differ in their mean and standard deviation
Distribution of number of success for n trials
• The bimonial(100, 𝑝) distribution is shown for 𝑝 = 10% 𝑡𝑜 90% by step of 10%. With
an increase in 𝑝 the distributions shifts to the right because the distribution is
centered around mean, 100𝑝, which increases with 𝑝. The distribution is symmetric
around 𝑝 = 0.5 and skewed around 𝑝 = 0 𝑎𝑛𝑑 𝑝 = 1. The spread of the distribution
increases with 𝑝 till 50% where it is maximum and then start reducing. This is justified
from the formula of the standard deviation 𝑛𝑝 1 − 𝑝 that increases with value of 𝑝
till 50% and then reduces.
•
import numpy as np
from scipy.stats import binom
import matplotlib.pyplot as plt
fig, ((ax1, ax2, ax3), (ax4, ax5, ax6), (ax7, ax8, ax9)) = plt.subplots(3, 3)
x=np.arange(0,101)
ax1.plot(x,binom.pmf(x,n=100,p=0.1))
ax2.plot(x,binom.pmf(x,n=100,p=0.2))
ax3.plot(x,binom.pmf(x,n=100,p=0.3))
ax4.plot(x,binom.pmf(x,n=100,p=0.4))
ax5.plot(x,binom.pmf(x,n=100,p=0.5))
Distribution of number of success for n trials
ax6.plot(x,binom.pmf(x,n=100,p=0.6))
ax7.plot(x,binom.pmf(x,n=100,p=0.7))
ax8.plot(x,binom.pmf(x,n=100,p=0.8))
ax9.plot(x,binom.pmf(x,n=100,p=0.9))
ax1.set_title("binomial(n=100, p=0.1)",fontsize=10)
ax2.set_title("binomial(n=100, p=0.2)",fontsize=10)
ax3.set_title("binomial(n=100, p=0.3)",fontsize=10)
ax4.set_title("binomial(n=100, p=0.4)",fontsize=10)
ax5.set_title("binomial(n=100, p=0.5)",fontsize=10)
ax6.set_title("binomial(n=100, p=0.6)",fontsize=10)
ax7.set_title("binomial(n=100, p=0.7)",fontsize=10)
ax8.set_title("binomial(n=100, p=0.8)",fontsize=10)
ax9.set_title("binomial(n=100, p=0.9)",fontsize=10)
fig.subplots_adjust(hspace=0.7)
fig.subplots_adjust(wspace=0.7)
plt.show()
Distribution of Number of Heads for n coin
tosses
• The bimonial(𝑛, 0.5) distribution is shown for 𝑛 = 10 𝑡𝑜 90 by step of 10. With an increase in 𝑛 the distributions shifts to the right
because the distribution is centered around mean, 𝑛/2, which increases with 𝑝. The distribution is symmetric around the expected
value 𝑛/2. The spread of the distribution increases with 𝑛. This is justified from the formula of the standard deviation 𝑛𝑝 1 − 𝑝
that increases with value of 𝑛. Due to increase in spread the distribution covers a wider range of values.
import numpy as np
from scipy.stats import binom
import matplotlib.pyplot as plt
fig, ((ax1, ax2, ax3),(ax4, ax5, ax6),
(ax7, ax8, ax9)) = plt.subplots(3,3)
x=np.arange(0,91)
ax1.plot(x,binom.pmf(x,n=10,p=0.5))
ax2.plot(x,binom.pmf(x,n=20,p=0.5))
ax3.plot(x,binom.pmf(x,n=30,p=0.5))
ax4.plot(x,binom.pmf(x,n=40,p=0.5))
Distribution of Number of Heads for n coin
tosses
ax5.plot(x,binom.pmf(x,n=50,p=0.5))
ax6.plot(x,binom.pmf(x,n=60,p=0.5))
ax7.plot(x,binom.pmf(x,n=70,p=0.5))
ax8.plot(x,binom.pmf(x,n=80,p=0.5))
ax9.plot(x,binom.pmf(x,n=90,p=0.5))
ax1.set_title("binomial(n=10, p=0.5)",fontsize=10)
ax2.set_title("binomial(n=20, p=0.5)",fontsize=10)
ax3.set_title("binomial(n=30, p=0.5)",fontsize=10)
ax4.set_title("binomial(n=40, p=0.5)",fontsize=10)
ax5.set_title("binomial(n=50, p=0.5)",fontsize=10)
ax6.set_title("binomial(n=60, p=0.5)",fontsize=10)
ax7.set_title("binomial(n=70, p=0.5)",fontsize=10)
ax8.set_title("binomial(n=80, p=0.5)",fontsize=10)
ax9.set_title("binomial(n=90, p=0.5)",fontsize=10)
fig.subplots_adjust(hspace=0.7)
fig.subplots_adjust(wspace=0.7)
plt.show()
Example
For the US presidential elections, there are 4 races. In each race, Republicans have 60% chances of winning. If
each race is independent of each other, what is the probability that
• The Republicans will win 0 races, 1 race, 2 race, 3 race or all the 4 races
• The Republicans will win at least one race
• The Republicans will win the majority of the races
Let 𝑋 equals the number of races
4 4!
• 0
𝑝0 𝑞4 = 0.60 0.44 = 0.44 = 0.0256
0! 4−0 !
4 4!
• 1
𝑝1 𝑞 3 = 0.61 0.43 = 4 × 0.61 × 0.43 = 0.1536
1! 4−1 !
4 4!
• 2
𝑝2 𝑞2 = 0.62 0.42 = 6 × 0.62 × 0.42 = 0.3456
2! 4−2 !
4 4!
• 3
𝑝 3 𝑞1 = 0.63 0.41 = 4 × 0.63 × 0.41 = 0.3456
3! 4−3 !
4 4!
• 4
𝑝4 𝑞0 = 0.64 0.40 = 0.64 × 0.40 = 0.129
4! 4−4 !
>>> binom.cdf(k=2,n=4,p=0.6)
0.5247999999999999
• The random variable X has a binomial distribution with n = 10 and p = 0.5. Determine the following
probabilities:
(a) P(X = 5) (b) P(X ≤ 2)
(c) P(X ≥ 9) (d) P(3 ≤ X 5)
• A bank has issued 200 personal loans to its customers. Based on past
data, the probability of a customer defaulting on a loan is estimated
to be 0.05 (5%). The bank wants to assess the potential financial risk
involved by calculating the probability of a certain number of loan
defaults.
• Tasks:
1.Calculate the probability that exactly 10 customers will default on
their loans out of the 200 loans issued.
2.Determine the probability that no more than 5 customers will default
on their loans.
3.Assess the probability that 15 or more customers will default on their
loans.
Poisson Distribution
Poisson Distribution
• Poisson distribution is another distribution for discrete random
variables. When 𝑛 is large and 𝑝 is very close to 0 or 1, the binomial
distribution is not even approximately symmetric. When 𝑝 is close to
0, 𝑞 is close to 1. In such case the standard deviation 𝜎 = 𝑛𝑝𝑞 is
𝜎 = 𝜇𝑞 ≈ 𝜇
• Where 𝜇 = 𝑛𝑝 is the mean. If we consider 𝑛 trials with probability of
1
success 𝑝 = , in such case 𝜇 = 1 and 𝜎 ≈ 1. This leads to bad
𝑛
normal approximation irrespective of a larger value of 𝑛.
Example - The binomial (10,0.1) distribution
A box contains 1 red ball and 9 white balls. The distribution of number of red
ball picked in 10 random draws with replacement is as follows
>>> x = np.arange(0,10)
>>> n, p = 100, 0.01
>>> fig, ax = plt.subplots(1, 1)
>>> ax.plot(x,binom.pmf(x,n,p),lw=5,alpha=0.5)
>>> ax.vlines(x,0,binom.pmf(x,n,p),lw=5,alpha=0.5)
>>> plt.title('b(100,1/100)')
>>> plt.show()
Binomial (1000,0.001) distribution
• A box contains 1 red ball and 999 white balls. The distribution of
number of red ball picked in 1000 random draws with replacement is
as follows
Define – Poisson Distribution
• It can be shown that the binomial distributions are always concentrated around a
smaller number of values with mean value 𝜇 = 1. The shape of the distribution
1
approaches a limit with 𝑛 → ∞ and 𝑝 = → 0
𝑛
• When expected value 𝜇 = 𝑛𝑝 is constant and the binomial(𝑛, 𝑝) approaches the
1
limits 𝑛 → ∞ and 𝑝 = → 0, we get the Poisson distribution with parameter 𝜇.
𝑛
• So, if 𝑛 is large and 𝑝 is small, the distribution of success of independent 𝑛 trials
depend on value of 𝜇 = 𝑛𝑝. The Poisson approximation states
𝜇 𝑘
𝑃(𝑘 success) = 𝑒 −𝜇
𝑘!
• The Poisson distribution with parameter 𝜇 or Poisson 𝑘
(𝜇) distribution is defined
𝜇
as the distribution of probabilities 𝑃𝜇 𝑘 = 𝑒 −𝜇 Where as 𝑘 = 0,1,2, …
𝑘!
Example
• A manufacturing process produces 1% defective items in a long run. What
is the probability of getting 2 or more defective samples in sample of 200
items?
• Mean 𝜇 = 200 × 0.01 = 2. We can use Poisson approximation
𝑃 2 or more defectives = 1 − 𝑃 0 − 𝑃 1
2 0 21
= 1 − 𝑒 −2 − 𝑒 −2 = 1 − 3𝑒 −2 = 0.594
0! 1!
>>> from scipy.stats import poisson
>>> mu = 2
>>> 1 - poisson.pmf(0, mu) - poisson.pmf(1, mu)
0.5939941502901619
>>> 1 - poisson.cdf(1,mu)
0.593994150290162
• For the case of the thin copper wire, suppose that the number of flaws follows a Poisson distribution with a
mean of 2.3 flaws per millimeter. Determine the probability of exactly two flaws in 1 millimeter of wire.
• Determine the probability of 10 flaws in 5 millimeters of wire. Let X denote the number of flaws in 5
millimeters of wire.
Properties of Poisson Distribution
• For small value of 𝜇, the
distribution is piled up near
zero. With an increase in 𝜇,
the distribution shifts to right
and spreads out. As 𝜇 → ∞,
the distribution approaches
the normal distribution.
Properties of Poisson Distribution
import numpy as np
from scipy.stats import poisson
import matplotlib.pyplot as plt
fig, ((ax1, ax2, ax3), (ax4, ax5, ax6), (ax7, ax8, ax9)) = plt.subplots(3, 3)
x=np.arange(0,15)
ax1.plot(x,poisson.pmf(x,mu=0))
ax2.plot(x,poisson.pmf(x,mu=0.5))
ax3.plot(x,poisson.pmf(x,mu=1.0))
ax4.plot(x,poisson.pmf(x,mu=1.5))
ax5.plot(x,poisson.pmf(x,mu=2.0))
ax6.plot(x,poisson.pmf(x,mu=2.5))
ax7.plot(x,poisson.pmf(x,mu=3.0))
ax8.plot(x,poisson.pmf(x,mu=3.5))
ax9.plot(x,poisson.pmf(x,mu=4.0))
Properties of Poisson Distribution
ax1.set_title("Poisson(mu=0)",fontsize=10)
ax2.set_title("Poisson(mu=0.5)",fontsize=10)
ax3.set_title("Poisson(mu=1.0)",fontsize=10)
ax4.set_title("Poisson(mu=1.5)",fontsize=10)
ax5.set_title("Poisson(mu=2.5)",fontsize=10)
ax6.set_title("Poisson(mu=2.0))",fontsize=10)
ax7.set_title("Poisson(mu=2.5)",fontsize=10)
ax8.set_title("Poisson(mu=3.0)",fontsize=10)
ax9.set_title(”Poisson(mu=0.9)",fontsize=10)
fig.subplots_adjust(hspace=0.7)
fig.subplots_adjust(wspace=0.7)
plt.show()
Mean and Variance
• For Poisson (𝜇) distribution
𝐸 𝑦𝜇 =𝜇
𝑉𝑎𝑟 𝑦 𝜇 = 𝜇
𝜎 𝑦𝜇 = 𝜇
ax1.set_title("n=1000, p=0.5")
ax2.set_title("n=1000, p=0.1")
ax3.set_title("n=1000, p=0.01")
ax4.set_title("n=1000, p=0.001")
plt.show()
Continuous Random Variable
Continuous Random Variable
• The section is about continuous random variables.
• In continuous random variables, there is an uncountably infinite
number of real numbers in a given range.
• Due to which it is impossible to get the probability of a particular
value, as in the case of discrete random variable.
• The probability of getting a particular value is zero.
• So, in the case of continuous random variables, we use probability
density function and use calculus to compute the probabilities.
Probability Distribution Function
• In the case of histograms, we studied the concept of calculating the
probability of 𝑦 falling within an interval.
• As the number of the intervals goes to infinity, and the width of interval or
bin goes to zero, the relative frequency histogram would almost be a
smooth curve.
• This smooth curve is called the probability density function.
• The height of the probability function does not represent the probability at
that point.
• Infect at every point the probability is zero. We can find how dense the
probability is at a given point by measuring the height of the curve.
Probability Distribution Function
• The total area under the curve is one. Since the area under the curve
can be calculated using integration
∞
න 𝑓 𝑦 𝑑𝑦 = 1
−∞
Probability Distribution Function
• The area of the histogram that lies between the interval (𝑎, 𝑏) gives
the proportion of the observations that line in the interval. The
proportion of the area represents the probability that the random
variable falls in the interval (𝑎, 𝑏).
𝑏
𝑃 𝑎 < 𝑌 < 𝑏 = න 𝑓 𝑦 𝑑𝑦
𝑎
Probability Distribution Function
• The relative frequency distribution curve can
take several types of shapes.
• If the know the function that represents the
curve, we can find out the area under the whole
curve or between an interval using the
integration.
• Fortunately, the functions for many of these
curves are known and ready to use.
• Example: The probability distribution function of
the score obtained by the students is known.
We can find the probability that a particular
student will score more than 80% by calculating
the shaded area.
Expected Value and Variance
• For a relative frequency histogram, as the number of bars increase without
bound, the width of each bar gets closer and closer to zero. In the limit, the
midpoint of each bar that contain 𝑦 gets closer and closer to 𝑦. The height
of the bar that contains 𝑦 approaches to 𝑓 𝑦 . This is also known as
relative frequency density. In the limit the relative frequency density gets
closer to probability density. The expected value of the random variable
∞
𝐸 𝑌 = න 𝑦𝑓 𝑦 𝑑𝑦
−∞
2
• The expected value 𝐸 𝑌 − 𝐸 𝑌 gives us variance
∞
2
𝑉𝑎𝑟 𝑌 = 𝐸 𝑌 − 𝐸 𝑌 = 𝐸 𝑌2 − 𝐸 𝑌 2
=න 𝑦 − 𝜇 2 𝑓 𝑦 𝑑𝑦
−∞
Discrete vs Continuous Distribution
Discrete Distribution Continuous Distribution
Point Probability Infinitesimal Probability
𝑃 𝑋 = 𝑥 = 𝑃(𝑥) 𝑃 𝑋 ∈ 𝑑𝑥 = 𝑓(𝑥)𝑑𝑥
𝑃 𝑥 is the probability that random variable 𝑋 has The probability per unit length (density 𝑓(𝑥)) for
integer value 𝑥 value near 𝑥
Discrete vs Continuous Distribution
Discrete Distribution Continuous Distribution
Interval Probability Interval Probability
𝑏
𝑃 𝑎 ≤ 𝑋 ≤ 𝑏 = 𝑃(𝑥) 𝑃 𝑎 ≤ 𝑋 ≤ 𝑏 = න 𝑓 𝑥 𝑑𝑥
𝑎≤𝑥≤𝑏 𝑎
1
Relative area under histogram between 𝑎 − and 𝑏 +
1 Area under the graph between 𝑎 and 𝑏
2 2
Constraints Constraints
Non negative with sum 1 Non negative with total integral 1
∞
𝑃 𝑥 ≥ 0 for all 𝑥 and σ𝑎𝑙𝑙 𝑥 𝑃 𝑥 = 1 𝑓 𝑥 ≥ 0 for all 𝑥 and −∞ 𝑓 𝑥 𝑑𝑥 = 1
Expectation Expectation
∞
𝐸 𝑋 = 𝑥 × 𝑓(𝑥) 𝐸 𝑋 = න 𝑥 𝑓 𝑥 𝑑𝑥
𝑎𝑙𝑙 𝑥
−∞
• The diameter of a particle of contamination (in micrometers) is
modeled with the probability density function f (x) = 2/(x)^3 for x > 1.
Determine the following:
(a) P(X < 2) (b) P(X > 5) (c) P(4 < X < 8) (d) P(X < 4 or X > 8) (e) x such
that P(X < x) = 0.95
Uniform Distribution
Uniform Distribution
• The random variable 𝑋 has a uniform 0,1 distribution if its probability
density function 𝑓(𝑥) is constant on [0,1] and 0 everywhere else.
1 for 0 ≤ 𝑥 ≤ 1
𝑓 𝑥 =ቊ
0 for ∉ [0,1]
• In case of uniform(𝑎, 𝑏), the density is constant on (𝑎, 𝑏). The value of
density 𝑐 is 1/(𝑏 − 𝑎). The total area under the density function always
remains 1. So
𝑏 − 𝑎 𝑐 = 1 ⟹ 𝑐 = 1/(𝑏 − 𝑎)
Uniform Distribution
• For a uniform distribution, the probabilities are relative lengths only. If
𝑋 has uniform (𝑎, 𝑏) distribution, the probability of 𝑋 between 𝑥 and 𝑦 is
1 −
1
𝑥−𝜇 2 /𝜎 2
𝑦= 𝑒 2
2𝜋𝜎
For −∞ < 𝑥 < +∞.
Normal Distribution
The equation consists of two fundamental constants 𝜋 =
3.14159265358 … and the base of natural algorithm 𝑒 =
2.7182818285 …
The equation of normal curve also consists of two parameters 𝜇,
the mean and 𝜎, the standard deviation.
The equation has the term 2𝜋𝜎 in denominator so that the total
area under the curve is 1.
The mean, 𝜇, can be a positive or negative real number.
The 𝜇 signifies the location of the curve.
The standard deviation, 𝜎, can only be a positive number.
Standard deviation makes a horizontal scale and measures the
spread of the distribution.
The curve is symmetric around 𝜇. From 𝜇 to (𝜇 ± 𝜎), the curve is
concave. Beyond inflection points 𝜇 ± 𝜎), the curve becomes
convex.
Normal Distribution Equation
1 𝑥−𝜇 2
−2
• The term 𝑒 𝜎2 determines the shape of
the curve.
1
• Whereas the term does not change the
2𝜋𝜎
basic shape of the curve.
• It just changes the area under the curve that
should equate to 1.
1
• If we denote the term
2
𝑘= , the equation
1 𝑥−𝜇 2𝜋𝜎
−2
becomes 𝑘𝑒 𝜎2 .
1 𝑥−𝜇 2
−
• The curve 𝑦 = 𝑘𝑒 2 𝜎2 for several values
of 𝑘 is as follows
Normal Distribution Equation
• As mentioned before, the shape of the curve depends on the values
of 𝜇 and 𝜎.
• By changing the values of 𝜇 and 𝜎, we can alter the location and the
spread.
• Even by changing the values of 𝜇 and 𝜎, we always get a bell shape
curve that is mound around the mean.
• The peak of the normal distribution lies around the mean 𝜇.
Normal Distribution
• By changing the value of 𝜇, we can slide
the distribution along the 𝑥 − axis.
• For example, if we increase the value of 𝜇
by 4, the whole distribution shifts to right
by 4 points.
• For 𝜎 2 > 1 and smaller values of 𝜎, the
curve is thin and tall and the values are
piled up around 𝜇.
• For higher values of 𝜎, the values are
more dispersed around 𝜇.
• For 𝜎 2 = 0, all the values in data
2
set are
equal to zero. The 𝑋 ∼ 𝑁(𝜇, 𝜎 ) is the
distribution of constant values 𝜇 with
probability one at 𝜇.
Normal Distribution
>>> import numpy as np
>>> from scipy.stats import norm
>>> import matplotlib.pyplot as plt
>>> x = np.linspace(-7,7,100)
>>> plt.plot(x,norm.pdf(x,loc=0,scale=1),label="mu=0,
sigma=1")
>>> plt.plot(x,norm.pdf(x,loc=0,scale=2),label="mu=0,
sigma=2")
>>>
plt.plot(x,norm.pdf(x,loc=0,scale=2/3),label="mu=0,
sigma=2/3")
>>>
plt.plot(x,norm.pdf(x,loc=4,scale=2/3),label="mu=4,
sigma=2/3")
>>> plt.legend(loc="best")
>>> plt.show()
Standard Normal Distribution
The equation of normal curve with 𝜇 and 𝜎 2 can be written as
1 1
− 𝑧2
𝑦= 𝑒 2
2𝜋𝜎
𝑥−𝜇
Where as 𝑧 = shows how many standard deviations away the value is from the mean.
𝜎
If the normal distribution has 𝜇 = 0 and 𝜎 = 1, we get standard normal distribution. The
area under standard normal curve can be defined as
1 1
−2𝑧 2
𝑦=𝜙 𝑧 = 𝑒
2𝜋
𝑏−𝜇 𝑎−𝜇
Φ −Φ
𝜎 𝜎
From the symmetry of the normal curve
Φ −𝑧 = 1 − Φ 𝑧
For −∞ < 𝑧 < ∞
Standard Normal Distribution
The probability of the interval 𝑎, 𝑏 for standard normal distribution
can be denoted by
Φ 𝑎, 𝑏 = Φ 𝑏 − Φ(𝑎)
These formulas are used when we work with the normal distribution.
While working with the normal distribution, sketch the standard
normal curve and remember the definition of Φ(𝑧) as the proportion
of area under the curve that is left to 𝑧.
The three most common standard normal probabilities are
Central Limit Theorem
• We can explain the appearance of the normal distribution in several
contexts can be explained through central limit theorem.
• For independent random variables having same distribution and finite
variance, as number of samples tend to infinity, the distribution of
standardized sum (or average) of 𝑛 variable approaches the standard
normal distribution.
• When we average a large number of independent measurements
whereas each measurement is smaller than the sum of all the
measurements, the distribution of sum approaches the normal shape
even if the shape of individual measurements does not follow.
The probability within one standard deviation of the mean Φ −1,1 ≈ 68.26%
The probability within two standard deviation of the mean Φ −2,2 ≈ 95.44%
The probability within three standard deviation of the mean Φ −3,3 ≈ 99.74%
Example
For a normal distribution 𝑋 ∼ 𝑁(20,22 ), define the probability that the measurement will be less
than 23
First, we calculate the number of standard deviations the value is away from mean
𝑥 − 𝜇 23 − 20
𝑧= = = 1.5
𝜎 2
So 𝑥 lies 1.5 standard deviation away from mean. We can find the area corresponding to 𝑧 = 1.5 as
• What is the probability that the milk production for a randomly chosen cow
will be less than 60 pounds?
• What is the probability that the milk production for a randomly chosen cow
will be greater than 90 pounds?
• What is the probability that the milk production for a randomly chosen cow
will be between 60 and 90 pounds?
Solution (a)
𝑧 score is
60 − 70
𝑧= = −.77
13
>>> import scipy.stats as st
>>> st.norm.cdf(-.77)
0.2206499463
Area to the left is 0.2206. So the probability that randomly chosen cow
will be less than 60 pounds is 0.2206
Solution (b)
𝑧 score is
90 − 70
𝑧= = 1.54
13
Area to the left is 0.9382. So the probability that randomly chosen cow
will be more than 90 pounds is 1 − 0.9382 = 0.0618
Solution (c)
To find the probability between 60 and 90, we have to find area
between 60 and 90.
0.9382 − 0.2206 = 0.7176
So, we can say that the production for 22.06% cow is less than 60
pounds, 71.76% is between 60 and 90 pounds and 6.18% is more than
90 pounds
Example
The score for2 an entrance exam follows the normal distribution with 𝑋 ∼
𝑁 500, 100
What proportion of the students taking exam will score below 350. Also calculate
lower 10 percentile of all scores
The 𝑧 value is
350 − 500
𝑧= = −1.5
100
For second part, we need to find the 𝑧 value corresponding to probability 10%
𝑥 − 500
𝑧 = −1.28 =
100
𝑋ത − 𝜇
𝑡=
𝑆/√𝑛
The degree of freedom
• For a dynamic system, the number of degrees of freedom is the minimum
number of independent coordinated that are used to completely specify
the position of the system.
• For statistics, the number of degrees of freedom is the number of
measurements that are free to vary in the calculation of statistic.
• The degrees of freedom can also be defined as the number of independent
measurements of a sample of data that we can use to estimate a
parameter of population from which the sample is drawn.
• For example, if we have 𝑛 measurements, for mean, we have 𝑛
independent observation and degrees of freedom is 𝑛. For variance, one
degrees of freedom is lost to calculate 𝜇, so we have 𝑛 − 1 degrees of
freedom.
Normal distribution vs t-distribution
• Initially the statistics focused on finding probability and inference using
large samples and the normal distribution. The standard normal
distribution results into the bell-shaped probability distribution for large
samples but for small samples results into larger probability areas in the
trails of the distribution.
• The standard normal distribution and student t-distribution both are
symmetrical and have a mean of zero. The standard normal distribution is
bell shape and has a standard deviation of one whereas t-distribution is
uni-model and has a standard deviation that is not equal to one.
• The standard deviation of the t-distribution varies. For a small sample size,
the t-distribution is more peaked (leptokurtic). Compared to the standard
normal distribution, t-distribution’s probability areas in the tail are higher.
In other words the probability density of the t-distribution is lower in the
centre and heavier in the tail.
Normal distribution vs t-distribution
Following graph shows the density of t-distribution with an increase in degrees of freedom 𝜈. With the increase
in the value of 𝜈, the t-distribution becomes closer to normal distribution
>>> import numpy as np
>>> from scipy.stats import norm
>>> from scipy.stats import t
>>> import matplotlib.pyplot as plt
>>> x = np.linspace(-5,5,100)
>>> plt.plot(x,norm.pdf(x),label="Normal")
>>> plt.plot(x,t.pdf(x,1), label="T DF =1")
>>> plt.plot(x,t.pdf(x,5), label="T DF =5")
>>> plt.plot(x,t.pdf(x,10), label="T DF =10")
>>> plt.plot(x,t.pdf(x,30), label="T DF =30")
>>> plt.legend(loc="best")
>>> plt.show()
X = 82
Mu = 6.5
n = 100
90% Confidence interval
Exponential Distribution
• Let’s study a continuous distribution function which is closely related to
Poisson distribution.
• For Poisson point process, we count the number of occurrences in a given
interval.
• This is a discrete type random variable and follows Poisson distribution.
• Along with the number of occurrences, waiting time (or distance) between
successive occurrences is also a random variable.
• For example, waiting time between the arrival of email; or the waiting time
between phone calls at a telephone exchange; or the locations of road
accidents on a national highway.
Exponential Distribution
• The distance or span of time between two consecutive point is a
continuous random variable 𝑋.
• The random variable can take any positive value.
• It forms the exponential distribution.
• For the exponential distribution there is only one parameter 𝜆 > 0
• The exponential distribution is often used to model the failure of the
objects.
• The failures form a Poisson process in time and the time to next
failure is exponentially distributed.
Exponential Distribution
Exponential Distribution: The probability density function of a continuous random
variable 𝑋 that follows exponential distribution is
𝜆𝑒 −𝜆𝑥 𝑓𝑜𝑟 𝑥 ≥ 0
𝑃 𝑋𝜆 =ቊ
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Where 𝜆 > 0 is a parameter
𝑎
Mean
𝑎+𝑏
𝑎𝑏
Variance
𝑎+𝑏 2 𝑎+𝑏+1
Gamma family of distributions
The 𝑔𝑎𝑚𝑚𝑎(𝑎, 𝑏) distribution is used for a non-negative continuous random
variable 0 ≤ 𝑥 < ∞. The two parameters 𝑎 > 0, 𝑏 > 0
𝑏 𝑎 𝑎−1 −𝑏𝑥
𝑃𝛾 𝑥 𝑎, 𝑏 = 𝑥 𝑒
Γ 𝑎
𝑏𝑎
The shape of the curve is given by 𝑥 𝑎−1 𝑒 −𝑏𝑥 . The constant is required to make
Γ 𝑎
the curve a probability density.
fig.subplots_adjust(hspace=0.7)
fig.subplots_adjust(wspace=0.7)
plt.show()
Gamma family of distributions
Mean and Variance of 𝑔𝑎𝑚𝑚𝑎(𝑎, 𝑏)
Mean - 𝑎/𝑏
Variance - 𝑎/𝑏 2
Chi-Square distributions
The chi-square 𝜒 2 (𝑟) distribution with 𝑟 degree of freedom is a special
𝑟 1
case of 𝑔𝑎𝑚𝑚𝑎 distribution with 𝑎 = , 𝑏 =
2 2
1 𝑟
−1 −
𝑥
𝑃 𝑥𝑟 = r 𝑥2 𝑒 2, 0<𝑥<∞
Γ r\2 22