0% found this document useful (0 votes)
4 views

5-Random-Variables

The document provides an overview of random variables, including discrete and continuous types, their probability distributions, and key concepts such as probability mass functions, expectations, and variance. It explains how to calculate probabilities for discrete random variables and introduces the binomial distribution, detailing its properties and applications. Additionally, it includes examples and Python code snippets for practical implementation of these concepts.

Uploaded by

rpatelpath
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

5-Random-Variables

The document provides an overview of random variables, including discrete and continuous types, their probability distributions, and key concepts such as probability mass functions, expectations, and variance. It explains how to calculate probabilities for discrete random variables and introduces the binomial distribution, detailing its properties and applications. Additionally, it includes examples and Python code snippets for practical implementation of these concepts.

Uploaded by

rpatelpath
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 128

Random Variables

Samatrix Consulting Pvt Ltd


Random Variables
• So far, we have studied the random experiments and events.
• We have also studied the concepts of probability for random events.
• We have seen how conditional probability can change our belief about an
unobserved event, given that we have new observed new evidence or related
event.
• In this section, we will study discrete and continuous random variables and
probability distributions.
• For example, in the case of four tosses of a coin, the number of heads could be
any one of the following possible values: 0, 1, 2, 3, 4.
• The random variable, in this case, is the number of heads that may be one of the
possible values, with a distribution of probabilities over this set of values.
• In other words, we can say that a random variable describes the outcome of the
experiment in terms of a number.
Random Variables
• When conducting a random experiment, the experimenter takes a measurement.
• This measurement is the outcome of a random variable.
• We are interested in the probability of measurement that will result in A whereas A is the
subset of outcome space of X.
• If we know the probability of measurement for all subsets A, we know the probability
distribution of the random variable.
• Generally, we denote the random variables using capital letters 𝑋, 𝑌, 𝑍, etc.
• The value the random variable takes is denoted by lowercase letters, e.g., x,𝑦, 𝑧, etc.
• For example, we can use 𝑋 for “the number obtained by rolling a die”, Y for "the number
of heads in four-coin tosses", and Z for "the suit of a card dealt from a well-shuffled
deck".
• The range of a random variable 𝑋 is the set of all the possible values that 𝑋 might
produce.
Random Variables and Their Range
Random Variable Description Range
X Number on a Die {1, 2, 3, 4, 5, 6}
Y Number of heads in 4 coin toss {0, 1, 2, 3, 4}
Discrete Random Variables
Discrete Random Variables
• The dictionary meaning of discrete is distinct or separate.
• Going by this meaning, the random variables 𝑌 that can take on
distinct or separated values 𝑦𝑘 are called discrete random variables.
• The possible number of values can be finite, for example, the number
of heads in 5 tosses has possible values 0,1,2,3,4,5.
• There can also be a countably infinite number of possible values for
example number of tosses until the first head has possible values
0,1,2, ⋯ , ∞.
• In the case of discrete random variables, the possible values are
separated by gaps.
Probability Mass Function
• For a discrete random variable, we can define the probability mass function
𝑓 𝑦𝑘 = 𝑃 𝑌 = 𝑦𝑘
• For the countable number of values of 𝑦𝑘 , the probability mass function
𝑓(𝑦𝑘 ) is positive. For all possible values 𝑦𝑘 of the discrete random
variable,
𝑓 𝑦𝑘 ≥ 0 for 𝑘 = 1, 2, ⋯
𝑓 𝑦 = 0 for all values of 𝑦
• Since 𝑌 must take on one of the ∞
values 𝑦𝑘
෍ 𝑓 𝑦𝑘 = 1
𝑖=1
• We can present the probability mass function in a graphical format
Example
• Suppose we conduct an experiment by tossing 2 fair coins. If 𝑌 denotes the
number of heads appear on the top. The random variable, 𝑌, can take one
of the values, 0, 1, and 2. The probabilities are
1
𝑃 𝑌 = 0 = 𝑃{ 𝑇, 𝑇 } =
4
1
𝑃 𝑌 = 1 = 𝑃 𝑇, 𝐻 , 𝐻, 𝑇 =
2
1
𝑃 𝑌 = 2 = 𝑃 𝐻, 𝐻 =
4
• We can present the probability mass function by plotting in a graphical
format
Example
• Similarly, we can represent the probability mass function of a random
variable representing the sum when we roll two dice as a part of our
experiment
Discrete Distribution using Python
• We can use scipy.stats.rv_discrete()to construct discrete distribution by a list of values and corresponding probabilities.

>>> import numpy as np
>>> from scipy import stats
>>> import matplotlib.pyplot as plt
>>> yk = np.arange(7)
>>> pk = (0.1, 0.2, 0.3, 0.0, 0.1, 0.1, 0.2)
>>> custm = stats.rv_discrete(name='custm', values=(yk, pk))
>>> fig, ax = plt.subplots(1,1)
>>> ax.plot(yk, custm.pmf(yk),'ro',ms=12,mec='r')
[<matplotlib.lines.Line2D object at 0x7feb9d140fd0>]
>>> ax.vlines(yk,0,custm.pmf(yk),colors='r',lw=4)
<matplotlib.collections.LineCollection object at 0x7feb9d1551f0>
>>> plt.show()
Expectations
• We can define the expected value of a discrete random variable 𝑌 as the sum of each possible
values times its probability.
𝐸 𝑌 = ෍ 𝑦𝑘 × 𝑓 𝑦𝑘
𝑘=1
• We also call the expected value of the random variable 𝑌 as the mean of the random variable and
denote by 𝜇. We can denote the sample mean of 𝑛a random sample of size 𝑛 is
1
𝑦ത = ෍ 𝑦𝑖
𝑛
𝑖=1

We can derive the expected value from the distribution of 𝑌 the same way as we derive the mean
or average 𝑦ത or 𝜇 of a list. For example, the average of the list (1,0,8,6,6,1,6) of 𝑛 = 7 number is

1+0+8+6+6+1+6 1 2 3 1
=0× +1× +6× +8× =4
7 7 7 7 7
Example
• If random variable 𝑌 can take two possible values: 𝑎 and 𝑏, with the
probabilities 𝑃 𝑎 and 𝑃 𝑏 . Then
• 𝐸 𝑌 = 𝑎 𝑃 𝑎 + 𝑏𝑃(𝑏)
• Where 𝑃 𝑎 + 𝑃 𝑏 = 1. The weighted average of 𝑎 and 𝑏 would be
a number between 𝑎 and 𝑏. The larger 𝑃(𝑎), the closer 𝐸 𝑌 is to 𝑎;
and the larger 𝑃(𝑏), the closer 𝐸 𝑌 is to 𝑏
Variance
• If you predict the value of a random variable 𝑌 using its expected value
𝐸 𝑌 = 𝜇, you will be off by the random amount 𝑌 − 𝜇 which is known as
deviation

𝐸 𝑌−𝜇 =𝐸 𝑌 −𝜇 =0
• If you want to consider the size of the deviation, you either need to
consider the absolute value or the square of 𝑌 − 𝜇. In algebra, it is easier
to use the 2square values than the absolute values, you need to consider
𝐸[ 𝑌 − 𝜇 ], then take the square root to get the value in the units same
as 𝑌
𝐸 𝑌 − 𝜇 2 = 𝐸 𝑌 2 − 2𝜇𝑌 + 𝜇2
𝑉𝑎𝑟 𝑌 = 𝐸 𝑌 2 − 2𝜇2 + 𝜇2 because 𝐸 𝑌 = 𝜇
= 𝐸 𝑌 2 − 𝜇2 = 𝐸 𝑌 2 − 𝐸 𝑌 2
Example
• Let 𝑌 be a discrete random variable with probability function as given
below 𝑦
𝑖 𝑓(𝑦 )
𝑖

0 0.20
1 0.15
2 0.25
3 0.35
4 0.05

• Expected Value
• 𝐸 𝑌 = 0 × 0.20 + 1 × 0.15 + 2 × 0.25 + 3 × 0.35 + 4 × 0.05 =
1.90
𝑦𝑖 𝑓(𝑦𝑖 )
0 0.20

Example 1
2
0.15
0.25
3 0.35
4 0.05
• Variance
• Variance can be calculated in two way
• 𝑉𝑎𝑟 𝑌 = 0 − 1.90 2 × 0.20 + 1 − 1.90 2 × 0.15 + (2 −
1.90)2 × 0.25 + 3 − 1.90 2 × 0.35 + 4 − 1.90 2 × 0.05 = 1.49
• Second way is
• 𝐸 𝑌 2 = 02 × 0.20 + 12 × 0.15 + 22 × 0.25 + 32 × 0.35 + 42 ×
0.05 = 5.10
• 𝑉𝑎𝑟 𝑌 = 5.10 − 1.902 = 1.49
Binomial Distribution
Binomial Distribution
• We need to find a formula for
finding the probability of getting 𝑘
successes in 𝑛 independent trials.
• For this, we consider a tree
diagram for 𝑛 = 4
• Each path down the 𝑛 steps
represents the possible outcomes
of the first 𝑛 trials.
• The 𝑘th node in the 𝑛th trial
represents 𝑘 successes in 𝑛 trials.
Binomial Distribution
• The expression in each node denotes
the probabilities of success (denoted
by 𝑝) and failure (denoted by 1 − 𝑝 =
𝑞) on each trial.
• The expression shows the sum of
probabilities of all paths leading to the
node.
• For example in row 3, the probabilities
of 𝑘 = 0, 1, 2, 3 successes in 𝑛 = 3 can
be expressed by

𝑝+𝑞 3 = 𝑞 3 + 3𝑝𝑞 2 + 3𝑝2 𝑞 + 𝑝3


Binomial Distribution
• The second term on the right-hand side 3𝑝𝑞 2 denotes the 1 success from 3 trials
(𝑘 = 1, 𝑛 = 3).
• The factor 3 arises because there are three ways to get one success in three trials.
𝐹𝐹𝑆, 𝐹𝑆𝐹, 𝑆𝐹𝐹.
• It also represents the three possible ways to reach the first node in the row 3.
• From 𝑛 trials if we want to achieve 𝑘 successes, we should move down to the
right 𝑘 times, that is corresponding to 𝑘 successes and straight down 𝑛 − 𝑘 times
that is corresponding to 𝑛 − 𝑘 failures.
• The probability of every path of 𝑘 successes in the 𝑛 trials is 𝑝𝑘 𝑞 𝑛−𝑘 .
• So, the probability mass function of all the paths of 𝑘 successes in the 𝑛 trials is
𝑛 𝑘 𝑛−𝑘
𝑓 𝑘 = 𝑝 𝑞
𝑘
Binomial Distribution
• Where as 𝑛𝑘 denotes number of paths. It is called 𝑛 𝑐ℎ𝑜𝑜𝑠𝑒 𝑘. The term
𝑛
𝑘
is given by the formula
𝑛 𝑛 𝑛−1 ⋯ 𝑛−𝑘+1 𝑛!
= =
𝑘 𝑘 𝑘 − 1 ⋯1 𝑘! 𝑛 − 𝑘 !

• The 𝑛 and 𝑝 are fixed where as 𝑘 varies.


• The binomial probabilities thus define a probability distribution known as
Binomial Probability (𝑛, 𝑝) distribution.
• The binomial distribution is a distribution of number of successes in 𝑛
independent and identical trials with the probability of success for each
trial as 𝑝.
Example
Calculate the probability mass function of getting 2 fives and 7 non-fives in 9 rolls of a dice

2 7
9 1 5 36 × 57
= = 0.279
2 6 6 69

We can solve this problem by using scipy.stats.binom.pmf() function.


>>> from scipy.stats import binom
>>> k,n,p=2,9,1/6
>>> binom.pmf(k,n,p)
0.2790816472336532

Example2: In the case of fair coin tossing: 𝑝 = 𝑞 = 1/2 so

𝑘 𝑛−𝑘 𝑛
𝑘 𝑛−𝑘
1 1 1
𝑝 𝑞 = =
2 2 2
𝑛 1
• Probability of 𝑘 heads in 𝑛 fair coin tosses = 𝑘
× 2𝑛 where as 0 ≤ 𝑘 ≤ 𝑛
Create function to calculate probability mass
function for binary events.
Properties of Binomial Distribution
• There are 𝑛 independent and identical trials for a Bernoulli (success-
failure) experiment.
• There are two possible outcomes of every experiment: Success or
failure
• The probability of success (denoted by 𝑝) remains constant for each
trial
• 𝑘 is a random variable that denotes the number of “successes”
observed during 𝑛 trials.
Mean and Standard Deviation
• Like any other probability distributions, a binomial probability
distribution also has a mean 𝜇 and standard deviation, 𝜎.

𝜇 = 𝑛𝑝

𝜎= 𝑛𝑝 1 − 𝑝 = 𝑛𝑝𝑞
Mean and Standard Deviation
• Example: A company that is producing turf grass monitors the quality of grass by taking a sample
on 25 seeds on a regular interval. The germination rate of the seed is consistent at 85%. Find the
average number of seeds that will germinate in the sample of 25 seeds
• 𝜇 = 𝑛𝑝 = 25 × 0.85 = 21.25
• 𝜎 = 25 × 0.85 × 0.15 = 1.785

• Using Python
>>> from scipy.stats import binom
>>> n, p = 25, 0.85
>>> binom.mean(n,p)
21.25
>>> binom.std(n,p)
1.7853571071357126
Binomial Probability Distribution
• We can plot the probability distribution whereas the
distribution is tending towards left skewness
>>> import numpy as np
>>> from scipy.stats import binom
>>> import matplotlib.pyplot as plt
>>> x = np.arange(0,26)
>>> n, p = 25, 0.85
>>> fig, ax = plt.subplots(1, 1)
>>> ax.plot(x,binom.pmf(x,n,p),lw=5,alpha=0.5)
>>> ax.vlines(x,0,binom.pmf(x,n,p),lw=5,alpha=0.5)
>>> plt.title('Binomial Distribution n=25, p=0.85')
>>> plt.show()
Binomial Probability Distribution
• The bimonial(𝑛, 𝑝) distributions have roughly the same bell shape
irrespective of values of 𝑛 and 𝑝. As 𝑛 and 𝑝 vary, binomial
distributions differ in their mean and standard deviation
Distribution of number of success for n trials
• The bimonial(100, 𝑝) distribution is shown for 𝑝 = 10% 𝑡𝑜 90% by step of 10%. With
an increase in 𝑝 the distributions shifts to the right because the distribution is
centered around mean, 100𝑝, which increases with 𝑝. The distribution is symmetric
around 𝑝 = 0.5 and skewed around 𝑝 = 0 𝑎𝑛𝑑 𝑝 = 1. The spread of the distribution
increases with 𝑝 till 50% where it is maximum and then start reducing. This is justified
from the formula of the standard deviation 𝑛𝑝 1 − 𝑝 that increases with value of 𝑝
till 50% and then reduces.

import numpy as np
from scipy.stats import binom
import matplotlib.pyplot as plt
fig, ((ax1, ax2, ax3), (ax4, ax5, ax6), (ax7, ax8, ax9)) = plt.subplots(3, 3)
x=np.arange(0,101)
ax1.plot(x,binom.pmf(x,n=100,p=0.1))
ax2.plot(x,binom.pmf(x,n=100,p=0.2))
ax3.plot(x,binom.pmf(x,n=100,p=0.3))
ax4.plot(x,binom.pmf(x,n=100,p=0.4))
ax5.plot(x,binom.pmf(x,n=100,p=0.5))
Distribution of number of success for n trials
ax6.plot(x,binom.pmf(x,n=100,p=0.6))
ax7.plot(x,binom.pmf(x,n=100,p=0.7))
ax8.plot(x,binom.pmf(x,n=100,p=0.8))
ax9.plot(x,binom.pmf(x,n=100,p=0.9))
ax1.set_title("binomial(n=100, p=0.1)",fontsize=10)
ax2.set_title("binomial(n=100, p=0.2)",fontsize=10)
ax3.set_title("binomial(n=100, p=0.3)",fontsize=10)
ax4.set_title("binomial(n=100, p=0.4)",fontsize=10)
ax5.set_title("binomial(n=100, p=0.5)",fontsize=10)
ax6.set_title("binomial(n=100, p=0.6)",fontsize=10)
ax7.set_title("binomial(n=100, p=0.7)",fontsize=10)
ax8.set_title("binomial(n=100, p=0.8)",fontsize=10)
ax9.set_title("binomial(n=100, p=0.9)",fontsize=10)
fig.subplots_adjust(hspace=0.7)
fig.subplots_adjust(wspace=0.7)
plt.show()
Distribution of Number of Heads for n coin
tosses
• The bimonial(𝑛, 0.5) distribution is shown for 𝑛 = 10 𝑡𝑜 90 by step of 10. With an increase in 𝑛 the distributions shifts to the right
because the distribution is centered around mean, 𝑛/2, which increases with 𝑝. The distribution is symmetric around the expected
value 𝑛/2. The spread of the distribution increases with 𝑛. This is justified from the formula of the standard deviation 𝑛𝑝 1 − 𝑝
that increases with value of 𝑛. Due to increase in spread the distribution covers a wider range of values.

import numpy as np
from scipy.stats import binom
import matplotlib.pyplot as plt
fig, ((ax1, ax2, ax3),(ax4, ax5, ax6),
(ax7, ax8, ax9)) = plt.subplots(3,3)
x=np.arange(0,91)
ax1.plot(x,binom.pmf(x,n=10,p=0.5))
ax2.plot(x,binom.pmf(x,n=20,p=0.5))
ax3.plot(x,binom.pmf(x,n=30,p=0.5))
ax4.plot(x,binom.pmf(x,n=40,p=0.5))
Distribution of Number of Heads for n coin
tosses
ax5.plot(x,binom.pmf(x,n=50,p=0.5))
ax6.plot(x,binom.pmf(x,n=60,p=0.5))
ax7.plot(x,binom.pmf(x,n=70,p=0.5))
ax8.plot(x,binom.pmf(x,n=80,p=0.5))
ax9.plot(x,binom.pmf(x,n=90,p=0.5))
ax1.set_title("binomial(n=10, p=0.5)",fontsize=10)
ax2.set_title("binomial(n=20, p=0.5)",fontsize=10)
ax3.set_title("binomial(n=30, p=0.5)",fontsize=10)
ax4.set_title("binomial(n=40, p=0.5)",fontsize=10)
ax5.set_title("binomial(n=50, p=0.5)",fontsize=10)
ax6.set_title("binomial(n=60, p=0.5)",fontsize=10)
ax7.set_title("binomial(n=70, p=0.5)",fontsize=10)
ax8.set_title("binomial(n=80, p=0.5)",fontsize=10)
ax9.set_title("binomial(n=90, p=0.5)",fontsize=10)
fig.subplots_adjust(hspace=0.7)
fig.subplots_adjust(wspace=0.7)
plt.show()
Example
For the US presidential elections, there are 4 races. In each race, Republicans have 60% chances of winning. If
each race is independent of each other, what is the probability that
• The Republicans will win 0 races, 1 race, 2 race, 3 race or all the 4 races
• The Republicans will win at least one race
• The Republicans will win the majority of the races
Let 𝑋 equals the number of races
4 4!
• 0
𝑝0 𝑞4 = 0.60 0.44 = 0.44 = 0.0256
0! 4−0 !
4 4!
• 1
𝑝1 𝑞 3 = 0.61 0.43 = 4 × 0.61 × 0.43 = 0.1536
1! 4−1 !
4 4!
• 2
𝑝2 𝑞2 = 0.62 0.42 = 6 × 0.62 × 0.42 = 0.3456
2! 4−2 !
4 4!
• 3
𝑝 3 𝑞1 = 0.63 0.41 = 4 × 0.63 × 0.41 = 0.3456
3! 4−3 !
4 4!
• 4
𝑝4 𝑞0 = 0.64 0.40 = 0.64 × 0.40 = 0.129
4! 4−4 !

• b. 𝑃 at least 1 = 1 − 𝑃 0 = 0.9744 or 𝑃 1 + 𝑃 2 + 𝑃 3 + 𝑃 4 = 0.9744


• c. 𝑃 Republicans wins the majority = 𝑃 3 + 𝑃 4 = 0.4752
Example – Using Python
>>> from scipy.stats import binom
>>> binom.pmf(k=0, n=4, p=0.6)
0.025600000000000008
>>> binom.pmf(k=1, n=4, p=0.6)
0.15360000000000007
>>> binom.pmf(k=2, n=4, p=0.6)
0.3456000000000001
>>> binom.pmf(k=3, n=4, p=0.6)
0.3456000000000001
>>> binom.pmf(k=4, n=4, p=0.6)
0.1296
>>> 1-binom.cdf(k=0,n=4,p=0.6)
0.9744
>>> 1- binom.cdf(k=2,n=4,p=0.6)
0.47520000000000007
Example
• We use 1- binom.cdf(k=2,n=4,p=0.6) to check the probability of 3 success or more
whereas binom.cdf(k=2,n=4,p=0.6) to check the probability of 2 successes or less.

>>> binom.cdf(k=2,n=4,p=0.6)
0.5247999999999999

• This example demonstrates the difference between cumulative distribution


function, cdf() and Probability mass function, pmf() functions.
• Probability mass function is used to check mass (proportion of observations) at a
given number of successes 𝑘. Cumulative density function is used to check the
probability of achieving the successes within a certain range.
• Each sample of water has a 10% chance of containing a particular organic pollutant. Assume that the samples
are independent with regard to the presence of the pollutant.
1) Find the probability that in the next 18 samples, exactly 2 contain the pollutant.
2) Determine the probability that at least four samples contain the pollutant.

• The random variable X has a binomial distribution with n = 10 and p = 0.5. Determine the following
probabilities:
(a) P(X = 5) (b) P(X ≤ 2)
(c) P(X ≥ 9) (d) P(3 ≤ X  5)
• A bank has issued 200 personal loans to its customers. Based on past
data, the probability of a customer defaulting on a loan is estimated
to be 0.05 (5%). The bank wants to assess the potential financial risk
involved by calculating the probability of a certain number of loan
defaults.
• Tasks:
1.Calculate the probability that exactly 10 customers will default on
their loans out of the 200 loans issued.
2.Determine the probability that no more than 5 customers will default
on their loans.
3.Assess the probability that 15 or more customers will default on their
loans.
Poisson Distribution
Poisson Distribution
• Poisson distribution is another distribution for discrete random
variables. When 𝑛 is large and 𝑝 is very close to 0 or 1, the binomial
distribution is not even approximately symmetric. When 𝑝 is close to
0, 𝑞 is close to 1. In such case the standard deviation 𝜎 = 𝑛𝑝𝑞 is

𝜎 = 𝜇𝑞 ≈ 𝜇
• Where 𝜇 = 𝑛𝑝 is the mean. If we consider 𝑛 trials with probability of
1
success 𝑝 = , in such case 𝜇 = 1 and 𝜎 ≈ 1. This leads to bad
𝑛
normal approximation irrespective of a larger value of 𝑛.
Example - The binomial (10,0.1) distribution
A box contains 1 red ball and 9 white balls. The distribution of number of red
ball picked in 10 random draws with replacement is as follows

>>> import numpy as np


>>> from scipy.stats import binom
>>> import matplotlib.pyplot as plt
>>> x = np.arange(0,11)
>>> n, p = 10, 0.1
>>> fig, ax = plt.subplots(1, 1)
>>> ax.plot(x,binom.pmf(x,n,p),lw=5,alpha=0.5)
>>> ax.vlines(x,0,binom.pmf(x,n,p),lw=5,alpha=0.5)
>>> plt.title('b(10,0.1)')
>>> plt.show()
binomial (100,0.01) distribution
A box contains 1 red ball and 99 white balls. The
distribution of number of red ball picked in
100 random draws with replacement is as follows

>>> x = np.arange(0,10)
>>> n, p = 100, 0.01
>>> fig, ax = plt.subplots(1, 1)
>>> ax.plot(x,binom.pmf(x,n,p),lw=5,alpha=0.5)
>>> ax.vlines(x,0,binom.pmf(x,n,p),lw=5,alpha=0.5)
>>> plt.title('b(100,1/100)')
>>> plt.show()
Binomial (1000,0.001) distribution
• A box contains 1 red ball and 999 white balls. The distribution of
number of red ball picked in 1000 random draws with replacement is
as follows
Define – Poisson Distribution
• It can be shown that the binomial distributions are always concentrated around a
smaller number of values with mean value 𝜇 = 1. The shape of the distribution
1
approaches a limit with 𝑛 → ∞ and 𝑝 = → 0
𝑛
• When expected value 𝜇 = 𝑛𝑝 is constant and the binomial(𝑛, 𝑝) approaches the
1
limits 𝑛 → ∞ and 𝑝 = → 0, we get the Poisson distribution with parameter 𝜇.
𝑛
• So, if 𝑛 is large and 𝑝 is small, the distribution of success of independent 𝑛 trials
depend on value of 𝜇 = 𝑛𝑝. The Poisson approximation states
𝜇 𝑘
𝑃(𝑘 success) = 𝑒 −𝜇
𝑘!
• The Poisson distribution with parameter 𝜇 or Poisson 𝑘
(𝜇) distribution is defined
𝜇
as the distribution of probabilities 𝑃𝜇 𝑘 = 𝑒 −𝜇 Where as 𝑘 = 0,1,2, …
𝑘!
Example
• A manufacturing process produces 1% defective items in a long run. What
is the probability of getting 2 or more defective samples in sample of 200
items?
• Mean 𝜇 = 200 × 0.01 = 2. We can use Poisson approximation
𝑃 2 or more defectives = 1 − 𝑃 0 − 𝑃 1
2 0 21
= 1 − 𝑒 −2 − 𝑒 −2 = 1 − 3𝑒 −2 = 0.594
0! 1!
>>> from scipy.stats import poisson
>>> mu = 2
>>> 1 - poisson.pmf(0, mu) - poisson.pmf(1, mu)
0.5939941502901619
>>> 1 - poisson.cdf(1,mu)
0.593994150290162
• For the case of the thin copper wire, suppose that the number of flaws follows a Poisson distribution with a
mean of 2.3 flaws per millimeter. Determine the probability of exactly two flaws in 1 millimeter of wire.

• Determine the probability of 10 flaws in 5 millimeters of wire. Let X denote the number of flaws in 5
millimeters of wire.
Properties of Poisson Distribution
• For small value of 𝜇, the
distribution is piled up near
zero. With an increase in 𝜇,
the distribution shifts to right
and spreads out. As 𝜇 → ∞,
the distribution approaches
the normal distribution.
Properties of Poisson Distribution
import numpy as np
from scipy.stats import poisson
import matplotlib.pyplot as plt
fig, ((ax1, ax2, ax3), (ax4, ax5, ax6), (ax7, ax8, ax9)) = plt.subplots(3, 3)
x=np.arange(0,15)
ax1.plot(x,poisson.pmf(x,mu=0))
ax2.plot(x,poisson.pmf(x,mu=0.5))
ax3.plot(x,poisson.pmf(x,mu=1.0))
ax4.plot(x,poisson.pmf(x,mu=1.5))
ax5.plot(x,poisson.pmf(x,mu=2.0))
ax6.plot(x,poisson.pmf(x,mu=2.5))
ax7.plot(x,poisson.pmf(x,mu=3.0))
ax8.plot(x,poisson.pmf(x,mu=3.5))
ax9.plot(x,poisson.pmf(x,mu=4.0))
Properties of Poisson Distribution
ax1.set_title("Poisson(mu=0)",fontsize=10)
ax2.set_title("Poisson(mu=0.5)",fontsize=10)
ax3.set_title("Poisson(mu=1.0)",fontsize=10)
ax4.set_title("Poisson(mu=1.5)",fontsize=10)
ax5.set_title("Poisson(mu=2.5)",fontsize=10)
ax6.set_title("Poisson(mu=2.0))",fontsize=10)
ax7.set_title("Poisson(mu=2.5)",fontsize=10)
ax8.set_title("Poisson(mu=3.0)",fontsize=10)
ax9.set_title(”Poisson(mu=0.9)",fontsize=10)
fig.subplots_adjust(hspace=0.7)
fig.subplots_adjust(wspace=0.7)
plt.show()
Mean and Variance
• For Poisson (𝜇) distribution

𝐸 𝑦𝜇 =𝜇

𝑉𝑎𝑟 𝑦 𝜇 = 𝜇

𝜎 𝑦𝜇 = 𝜇

Mean and variance of the Poisson 𝜇 are equal to 𝜇


Characteristics of Poisson Distribution
Poisson distribution is a limiting case of a binomial distribution. For a
binomial distribution if 𝑛 → ∞ and 𝑝 → 0 and the expected value 𝜇 =
𝑛𝑝 remains constant we get Poisson approximation.
• In the binomial distribution, the probability of success is constant across all
the trials. For the Poisson, the instantaneous rate of occurrences per unit time
or space is constant. It means that the expected number of events during a
given time period is same as number of events during some other time period
at a later stage.
• In case of binomial the trials are independent. The occurrences of Poisson
events in two non-overlapping intervals will be independent of each other.
The Poisson events occur randomly throughout the given period of time at
constant instantaneous rate.
Characteristics of Poisson Distribution
• In case of binomial, the possible outcome of an event is either success or
failure. The Poisson event occur one at a time

Even though these conditions seem to be restrictive, many real-life


situations satisfy these conditions. For example, number of arrivals at
the ticket counter, bank teller counter, parking lot payment counter, or
toll booth during a given period of time such as one minute could be
approximated using the Poisson probability distribution.
Binomial vs Poisson vs Normal Distribution
• Even though it is very easy to calculate the Binomial coefficient but for large
values of 𝑛, the computation of binomial coefficient becomes challenging.
• We can use the binomial approximation techniques to bypass such problems.
• Normal approximation to binomial works very well when the variance 𝑛𝑝(1 − 𝑝)
is large.
• As we can see when the value of 𝜇 = 𝑛𝑝 ≥ 10, the normal distribution
approximates the binomial distribution very well.
• However for a low values of 𝑝 such that 𝜇 = 𝑛𝑝 = 1 and 𝜎 = 𝑛𝑝 1 − 𝑝 = 1,
the normal distribution is not able to approximate the binomial distribution.
• In this case the Poisson distribution can approximate the binomial distribution
very well.
Binomial vs Poisson vs Normal Distribution
Binomial vs Poisson vs Normal Distribution
• A normal distribution is remains symmetric for all the values of 𝜇 and
𝜎 but the basic shape of the Poisson distribution changes with the
value of 𝜇.
• The Poisson distribution is highly skewed for lower values of 𝜇.
• Most of the values are piled up near 0.
• For higher values of 𝜇, the Poisson distribution appears to take the
symmetric shape but technically it cannot.
• Because for the Poisson distribution mean = variance.
Binomial vs Poisson vs Normal Distribution
from numpy import random
import matplotlib.pyplot as plt
import seaborn as sns
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2)
fig.set_size_inches(10,7.5)
x=np.arange(0,15)
sns.distplot(random.normal(loc=500, scale=15.81, size=1000), hist=False, label='normal', ax=ax1)
sns.distplot(random.binomial(n=1000, p=0.5, size=1000), hist=False, label='binomial', ax=ax1)
sns.distplot(random.poisson(lam=500, size=1000), hist=False, label='poisson', ax=ax1)

sns.distplot(random.normal(loc=100, scale=9.48, size=1000), hist=False, label='normal', ax=ax2)


sns.distplot(random.binomial(n=1000, p=0.1, size=1000), hist=False, label='binomial', ax=ax2)
sns.distplot(random.poisson(lam=100, size=1000), hist=False, label='poisson', ax=ax2)
Binomial vs Poisson vs Normal Distribution
sns.distplot(random.normal(loc=10, scale=3.16, size=1000), hist=False, label='normal', ax=ax3)
sns.distplot(random.binomial(n=1000, p=0.01, size=1000), hist=False, label='binomial', ax=ax3)
sns.distplot(random.poisson(lam=10, size=1000), hist=False, label='poisson', ax=ax3)

sns.distplot(random.normal(loc=1, scale=1, size=1000), hist=False, label='normal', ax=ax4)


sns.distplot(random.binomial(n=1000, p=0.001, size=1000), hist=False, label='binomial', ax=ax4)
sns.distplot(random.poisson(lam=1, size=1000), hist=False, label='poisson', ax=ax4)

ax1.set_title("n=1000, p=0.5")
ax2.set_title("n=1000, p=0.1")
ax3.set_title("n=1000, p=0.01")
ax4.set_title("n=1000, p=0.001")
plt.show()
Continuous Random Variable
Continuous Random Variable
• The section is about continuous random variables.
• In continuous random variables, there is an uncountably infinite
number of real numbers in a given range.
• Due to which it is impossible to get the probability of a particular
value, as in the case of discrete random variable.
• The probability of getting a particular value is zero.
• So, in the case of continuous random variables, we use probability
density function and use calculus to compute the probabilities.
Probability Distribution Function
• In the case of histograms, we studied the concept of calculating the
probability of 𝑦 falling within an interval.
• As the number of the intervals goes to infinity, and the width of interval or
bin goes to zero, the relative frequency histogram would almost be a
smooth curve.
• This smooth curve is called the probability density function.
• The height of the probability function does not represent the probability at
that point.
• Infect at every point the probability is zero. We can find how dense the
probability is at a given point by measuring the height of the curve.
Probability Distribution Function
• The total area under the curve is one. Since the area under the curve
can be calculated using integration


න 𝑓 𝑦 𝑑𝑦 = 1
−∞
Probability Distribution Function
• The area of the histogram that lies between the interval (𝑎, 𝑏) gives
the proportion of the observations that line in the interval. The
proportion of the area represents the probability that the random
variable falls in the interval (𝑎, 𝑏).
𝑏
𝑃 𝑎 < 𝑌 < 𝑏 = න 𝑓 𝑦 𝑑𝑦
𝑎
Probability Distribution Function
• The relative frequency distribution curve can
take several types of shapes.
• If the know the function that represents the
curve, we can find out the area under the whole
curve or between an interval using the
integration.
• Fortunately, the functions for many of these
curves are known and ready to use.
• Example: The probability distribution function of
the score obtained by the students is known.
We can find the probability that a particular
student will score more than 80% by calculating
the shaded area.
Expected Value and Variance
• For a relative frequency histogram, as the number of bars increase without
bound, the width of each bar gets closer and closer to zero. In the limit, the
midpoint of each bar that contain 𝑦 gets closer and closer to 𝑦. The height
of the bar that contains 𝑦 approaches to 𝑓 𝑦 . This is also known as
relative frequency density. In the limit the relative frequency density gets
closer to probability density. The expected value of the random variable

𝐸 𝑌 = න 𝑦𝑓 𝑦 𝑑𝑦
−∞
2
• The expected value 𝐸 𝑌 − 𝐸 𝑌 gives us variance

2
𝑉𝑎𝑟 𝑌 = 𝐸 𝑌 − 𝐸 𝑌 = 𝐸 𝑌2 − 𝐸 𝑌 2
=න 𝑦 − 𝜇 2 𝑓 𝑦 𝑑𝑦
−∞
Discrete vs Continuous Distribution
Discrete Distribution Continuous Distribution
Point Probability Infinitesimal Probability

𝑃 𝑋 = 𝑥 = 𝑃(𝑥) 𝑃 𝑋 ∈ 𝑑𝑥 = 𝑓(𝑥)𝑑𝑥

𝑃 𝑥 is the probability that random variable 𝑋 has The probability per unit length (density 𝑓(𝑥)) for
integer value 𝑥 value near 𝑥
Discrete vs Continuous Distribution
Discrete Distribution Continuous Distribution
Interval Probability Interval Probability

𝑏
𝑃 𝑎 ≤ 𝑋 ≤ 𝑏 = ෍ 𝑃(𝑥) 𝑃 𝑎 ≤ 𝑋 ≤ 𝑏 = න 𝑓 𝑥 𝑑𝑥
𝑎≤𝑥≤𝑏 𝑎
1
Relative area under histogram between 𝑎 − and 𝑏 +
1 Area under the graph between 𝑎 and 𝑏
2 2
Constraints Constraints
Non negative with sum 1 Non negative with total integral 1

𝑃 𝑥 ≥ 0 for all 𝑥 and σ𝑎𝑙𝑙 𝑥 𝑃 𝑥 = 1 𝑓 𝑥 ≥ 0 for all 𝑥 and ‫׬‬−∞ 𝑓 𝑥 𝑑𝑥 = 1
Expectation Expectation

𝐸 𝑋 = ෍ 𝑥 × 𝑓(𝑥) 𝐸 𝑋 = න 𝑥 𝑓 𝑥 𝑑𝑥
𝑎𝑙𝑙 𝑥
−∞
• The diameter of a particle of contamination (in micrometers) is
modeled with the probability density function f (x) = 2/(x)^3 for x > 1.
Determine the following:
(a) P(X < 2) (b) P(X > 5) (c) P(4 < X < 8) (d) P(X < 4 or X > 8) (e) x such
that P(X < x) = 0.95
Uniform Distribution
Uniform Distribution
• The random variable 𝑋 has a uniform 0,1 distribution if its probability
density function 𝑓(𝑥) is constant on [0,1] and 0 everywhere else.

1 for 0 ≤ 𝑥 ≤ 1
𝑓 𝑥 =ቊ
0 for ∉ [0,1]

• In case of uniform(𝑎, 𝑏), the density is constant on (𝑎, 𝑏). The value of
density 𝑐 is 1/(𝑏 − 𝑎). The total area under the density function always
remains 1. So

𝑏 − 𝑎 𝑐 = 1 ⟹ 𝑐 = 1/(𝑏 − 𝑎)
Uniform Distribution
• For a uniform distribution, the probabilities are relative lengths only. If
𝑋 has uniform (𝑎, 𝑏) distribution, the probability of 𝑋 between 𝑥 and 𝑦 is

length (𝑥, 𝑦) 𝑦−𝑥


𝑃 𝑥<𝑋<𝑦 = =
length (𝑎, 𝑏) 𝑏−𝑎
• If 𝑋 has a uniform (0,2) distribution, the probability that 𝑋 is 1.23 correct
to two decimal places

1.235 − 1.225 0.01


𝑃(1.225 < 𝑋 1.235) = = = 0.5%
2 2
Expected Value and Variance
Expected Value
The expected value of 𝑋 for the uniform distribution is given by (𝑎 + 𝑏)/2
For uniform 0,1 distribution, the expected value of 𝑋
1
𝐸 𝑋 =
2
Variance
The Variance of 𝑋 can be given by 𝑏 − 𝑎 2 /12
For the uniform (0,1) distribution, the variance of 𝑋
1
𝑉𝑎𝑟 𝑋 =
12
• The probability density function of the length of a cutting blade is f (x) = 125 for 746  x  754 millimeters.
• Determine the following:
(a) P(X  748) (b) P(X  748 or X  752)
(c) If the specifications for this process are from 74.7 to 75.3
millimeters, what proportion of blades meets specifications?
• Suppose that the current measurements in a strip of wire are assumed to follow a normal distribution with a
mean of 10 milliamperes and a variance of four (milliamperes)2. What is the probability that a measurement
exceeds 13 milliamperes?

• The life of a semiconductor laser at a constant power is


• normally distributed with a mean of 7000 hours and a standard
• deviation of 600 hours.
• (a) What is the probability that a laser fails before 5000 hours?
• (b) What is the life in hours that 95% of the lasers exceed?
• (c) If three lasers are used in a product and they are assumed to
• fail independently, what is the probability that all three are
• still operating after 7000 hours?
Expected Value and Variance
>>> from scipy.stats import uniform
>>> uniform.mean()
0.5
>>> uniform.var()
0.08333333333333333
Using Python
from scipy.stats import uniform
import matplotlib.pyplot as plt
fig, ax = plt.subplots(1, 1)
x = np.linspace(0.01,0.99, 100)
ax.plot(x, uniform.pdf(x),'r-', lw=5, alpha=0.6, label='uniform pdf')
r = uniform.rvs(size=1000)
ax.hist(r, density=True, histtype='stepfilled', alpha=0.2)
ax.legend(loc='best', frameon=False)
plt.show()
Normal Distribution
Normal Distribution
• In a large population, many variables follow bell shape relative
frequency distribution.
• These bell shape relative frequency distributions are symmetric and
they are relatively higher in the middle than at the extremes.
• Examples of such distributions are price fluctuation of commodity in
the market, stochastic aptitude test scores, physical measurement
(height, weight, length) of an organism, etc.
• Each of these bell shape curve can be approximated using a normal
curve.
Normal Distribution
• The normal distribution (also known as Gaussian distribution) is used
to approximate a large number of probabilities distribution, so the
normal distribution is the most widely used distribution in statistics.
• In general, the notation for normal distribution is 𝑋 ∼ 𝑁(𝜇, 𝜎 2 )
identifies a normal random variable 𝑋 with mean 𝜇 and variance 𝜎 2
• The normal distribution equation for 𝑁(𝜇, 𝜎 2 ) can be written as

1 −
1
𝑥−𝜇 2 /𝜎 2
𝑦= 𝑒 2
2𝜋𝜎
For −∞ < 𝑥 < +∞.
Normal Distribution
The equation consists of two fundamental constants 𝜋 =
3.14159265358 … and the base of natural algorithm 𝑒 =
2.7182818285 …
The equation of normal curve also consists of two parameters 𝜇,
the mean and 𝜎, the standard deviation.
The equation has the term 2𝜋𝜎 in denominator so that the total
area under the curve is 1.
The mean, 𝜇, can be a positive or negative real number.
The 𝜇 signifies the location of the curve.
The standard deviation, 𝜎, can only be a positive number.
Standard deviation makes a horizontal scale and measures the
spread of the distribution.
The curve is symmetric around 𝜇. From 𝜇 to (𝜇 ± 𝜎), the curve is
concave. Beyond inflection points 𝜇 ± 𝜎), the curve becomes
convex.
Normal Distribution Equation
1 𝑥−𝜇 2
−2
• The term 𝑒 𝜎2 determines the shape of
the curve.
1
• Whereas the term does not change the
2𝜋𝜎
basic shape of the curve.
• It just changes the area under the curve that
should equate to 1.

1
• If we denote the term
2
𝑘= , the equation
1 𝑥−𝜇 2𝜋𝜎
−2
becomes 𝑘𝑒 𝜎2 .

1 𝑥−𝜇 2

• The curve 𝑦 = 𝑘𝑒 2 𝜎2 for several values
of 𝑘 is as follows
Normal Distribution Equation
• As mentioned before, the shape of the curve depends on the values
of 𝜇 and 𝜎.
• By changing the values of 𝜇 and 𝜎, we can alter the location and the
spread.
• Even by changing the values of 𝜇 and 𝜎, we always get a bell shape
curve that is mound around the mean.
• The peak of the normal distribution lies around the mean 𝜇.
Normal Distribution
• By changing the value of 𝜇, we can slide
the distribution along the 𝑥 − axis.
• For example, if we increase the value of 𝜇
by 4, the whole distribution shifts to right
by 4 points.
• For 𝜎 2 > 1 and smaller values of 𝜎, the
curve is thin and tall and the values are
piled up around 𝜇.
• For higher values of 𝜎, the values are
more dispersed around 𝜇.
• For 𝜎 2 = 0, all the values in data
2
set are
equal to zero. The 𝑋 ∼ 𝑁(𝜇, 𝜎 ) is the
distribution of constant values 𝜇 with
probability one at 𝜇.
Normal Distribution
>>> import numpy as np
>>> from scipy.stats import norm
>>> import matplotlib.pyplot as plt
>>> x = np.linspace(-7,7,100)
>>> plt.plot(x,norm.pdf(x,loc=0,scale=1),label="mu=0,
sigma=1")
>>> plt.plot(x,norm.pdf(x,loc=0,scale=2),label="mu=0,
sigma=2")
>>>
plt.plot(x,norm.pdf(x,loc=0,scale=2/3),label="mu=0,
sigma=2/3")
>>>
plt.plot(x,norm.pdf(x,loc=4,scale=2/3),label="mu=4,
sigma=2/3")
>>> plt.legend(loc="best")
>>> plt.show()
Standard Normal Distribution
The equation of normal curve with 𝜇 and 𝜎 2 can be written as

1 1
− 𝑧2
𝑦= 𝑒 2
2𝜋𝜎
𝑥−𝜇
Where as 𝑧 = shows how many standard deviations away the value is from the mean.
𝜎
If the normal distribution has 𝜇 = 0 and 𝜎 = 1, we get standard normal distribution. The
area under standard normal curve can be defined as

1 1
−2𝑧 2
𝑦=𝜙 𝑧 = 𝑒
2𝜋

Which known as standard normal density function


Standard Normal Distribution cdf
The standard normal cdf Φ(𝑧) gives the area under
standard normal curve that is left to the value 𝑧
𝑧
Φ 𝑧 = න 𝜙 𝑦 𝑑𝑦
−∞
For 𝑁𝑜𝑟𝑚𝑎𝑙 𝜇, 𝜎 2 , the probability between 𝑎 and 𝑏
is

𝑏−𝜇 𝑎−𝜇
Φ −Φ
𝜎 𝜎
From the symmetry of the normal curve
Φ −𝑧 = 1 − Φ 𝑧
For −∞ < 𝑧 < ∞
Standard Normal Distribution
The probability of the interval 𝑎, 𝑏 for standard normal distribution
can be denoted by

Φ 𝑎, 𝑏 = Φ 𝑏 − Φ(𝑎)

These formulas are used when we work with the normal distribution.
While working with the normal distribution, sketch the standard
normal curve and remember the definition of Φ(𝑧) as the proportion
of area under the curve that is left to 𝑧.
The three most common standard normal probabilities are
Central Limit Theorem
• We can explain the appearance of the normal distribution in several
contexts can be explained through central limit theorem.
• For independent random variables having same distribution and finite
variance, as number of samples tend to infinity, the distribution of
standardized sum (or average) of 𝑛 variable approaches the standard
normal distribution.
• When we average a large number of independent measurements
whereas each measurement is smaller than the sum of all the
measurements, the distribution of sum approaches the normal shape
even if the shape of individual measurements does not follow.
The probability within one standard deviation of the mean Φ −1,1 ≈ 68.26%

The probability within two standard deviation of the mean Φ −2,2 ≈ 95.44%

The probability within three standard deviation of the mean Φ −3,3 ≈ 99.74%
Example
For a normal distribution 𝑋 ∼ 𝑁(20,22 ), define the probability that the measurement will be less
than 23
First, we calculate the number of standard deviations the value is away from mean
𝑥 − 𝜇 23 − 20
𝑧= = = 1.5
𝜎 2

So 𝑥 lies 1.5 standard deviation away from mean. We can find the area corresponding to 𝑧 = 1.5 as

>>> import scipy.stats as st


>>> st.norm.cdf(1.5)
0.9331927987311419

The probability that a measurement is less than 23 is 0.933


Example
For a milk dairy, the mean daily production has a normal distribution
𝑋 ∼ 𝑁(70, 132 ).

• What is the probability that the milk production for a randomly chosen cow
will be less than 60 pounds?
• What is the probability that the milk production for a randomly chosen cow
will be greater than 90 pounds?
• What is the probability that the milk production for a randomly chosen cow
will be between 60 and 90 pounds?
Solution (a)
𝑧 score is
60 − 70
𝑧= = −.77
13
>>> import scipy.stats as st
>>> st.norm.cdf(-.77)
0.2206499463

Area to the left is 0.2206. So the probability that randomly chosen cow
will be less than 60 pounds is 0.2206
Solution (b)
𝑧 score is
90 − 70
𝑧= = 1.54
13

>>> import scipy.stats as st


>>> st.norm.cdf(1.54)
0.9382198232881881

Area to the left is 0.9382. So the probability that randomly chosen cow
will be more than 90 pounds is 1 − 0.9382 = 0.0618
Solution (c)
To find the probability between 60 and 90, we have to find area
between 60 and 90.
0.9382 − 0.2206 = 0.7176
So, we can say that the production for 22.06% cow is less than 60
pounds, 71.76% is between 60 and 90 pounds and 6.18% is more than
90 pounds
Example
The score for2 an entrance exam follows the normal distribution with 𝑋 ∼
𝑁 500, 100
What proportion of the students taking exam will score below 350. Also calculate
lower 10 percentile of all scores

The 𝑧 value is
350 − 500
𝑧= = −1.5
100

>>> import scipy.stats as st


>>> st.norm.cdf(-1.5)
0.06680720126885807
Solution
So, the 6.68% students who scored exam scored less than 350

For second part, we need to find the 𝑧 value corresponding to probability 10%

>>> import scipy.stats as st


>>> st.norm.ppf(.10)
-1.2815515655446004

The 𝑧 score is −1.28

𝑥 − 500
𝑧 = −1.28 =
100

𝑥 = −1.28 100 + 500 = 372


So 10 percentile of score is less than 372
Student’s t-distribution
Test Scores are normally distributed with std dev of 5.4,
Sample with 50 random scores were taken with mean of 79 then calculate
population test score mean with 95% confidence interval
Student’s t-distribution
• The Student’s t-distribution (or t-distribution) is a member of
continuous.
• This distribution is required when the sample size is small and we do
not know the population standard deviation.
• William Sealy Gosset under the pseudonym Student developed the t-
distribution.
• For sample of 𝑛 measurements from a normal distribution, the t-
distribution with 𝜈 = 𝑛 − 1 degree of freedom is the distribution of
location of sample mean relative to true mean, divided by the sample
standard deviation and multiplying with √𝑛
Z-score and t-value
If 𝑋1 , …2, 𝑋𝑛 are independently and identically drawn from a normally distributed population with mean 𝜇 and variance 𝜎 2 i.e. 𝑋 ∼
𝑁(𝜇, 𝜎 ).
The sample mean would be 𝑛
1
𝑋ത = ෍ 𝑋𝑖
𝑛
𝑖=𝑖
The sample variance is 𝑛
1
𝑆2 = ෍ 𝑋𝑖 − 𝑋ത 2
𝑛−1
𝑖=1

When the population standard deviation is known, the z-score is given by


𝑋ത − 𝜇
𝑧=
𝜎/√𝑛
With sample standard deviation and 𝑛 − 1 degree of freedom, the t-score is given by

𝑋ത − 𝜇
𝑡=
𝑆/√𝑛
The degree of freedom
• For a dynamic system, the number of degrees of freedom is the minimum
number of independent coordinated that are used to completely specify
the position of the system.
• For statistics, the number of degrees of freedom is the number of
measurements that are free to vary in the calculation of statistic.
• The degrees of freedom can also be defined as the number of independent
measurements of a sample of data that we can use to estimate a
parameter of population from which the sample is drawn.
• For example, if we have 𝑛 measurements, for mean, we have 𝑛
independent observation and degrees of freedom is 𝑛. For variance, one
degrees of freedom is lost to calculate 𝜇, so we have 𝑛 − 1 degrees of
freedom.
Normal distribution vs t-distribution
• Initially the statistics focused on finding probability and inference using
large samples and the normal distribution. The standard normal
distribution results into the bell-shaped probability distribution for large
samples but for small samples results into larger probability areas in the
trails of the distribution.
• The standard normal distribution and student t-distribution both are
symmetrical and have a mean of zero. The standard normal distribution is
bell shape and has a standard deviation of one whereas t-distribution is
uni-model and has a standard deviation that is not equal to one.
• The standard deviation of the t-distribution varies. For a small sample size,
the t-distribution is more peaked (leptokurtic). Compared to the standard
normal distribution, t-distribution’s probability areas in the tail are higher.
In other words the probability density of the t-distribution is lower in the
centre and heavier in the tail.
Normal distribution vs t-distribution
Following graph shows the density of t-distribution with an increase in degrees of freedom 𝜈. With the increase
in the value of 𝜈, the t-distribution becomes closer to normal distribution
>>> import numpy as np
>>> from scipy.stats import norm
>>> from scipy.stats import t
>>> import matplotlib.pyplot as plt
>>> x = np.linspace(-5,5,100)
>>> plt.plot(x,norm.pdf(x),label="Normal")
>>> plt.plot(x,t.pdf(x,1), label="T DF =1")
>>> plt.plot(x,t.pdf(x,5), label="T DF =5")
>>> plt.plot(x,t.pdf(x,10), label="T DF =10")
>>> plt.plot(x,t.pdf(x,30), label="T DF =30")
>>> plt.legend(loc="best")
>>> plt.show()
X = 82
Mu = 6.5
n = 100
90% Confidence interval
Exponential Distribution
• Let’s study a continuous distribution function which is closely related to
Poisson distribution.
• For Poisson point process, we count the number of occurrences in a given
interval.
• This is a discrete type random variable and follows Poisson distribution.
• Along with the number of occurrences, waiting time (or distance) between
successive occurrences is also a random variable.
• For example, waiting time between the arrival of email; or the waiting time
between phone calls at a telephone exchange; or the locations of road
accidents on a national highway.
Exponential Distribution
• The distance or span of time between two consecutive point is a
continuous random variable 𝑋.
• The random variable can take any positive value.
• It forms the exponential distribution.
• For the exponential distribution there is only one parameter 𝜆 > 0
• The exponential distribution is often used to model the failure of the
objects.
• The failures form a Poisson process in time and the time to next
failure is exponentially distributed.
Exponential Distribution
Exponential Distribution: The probability density function of a continuous random
variable 𝑋 that follows exponential distribution is

𝜆𝑒 −𝜆𝑥 𝑓𝑜𝑟 𝑥 ≥ 0
𝑃 𝑋𝜆 =ቊ
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Where 𝜆 > 0 is a parameter

Mean and Variance of the Exponential Distribution


An exponential distribution with parameter 𝜆 has
Mean: - 1/𝜆
Variance: - 1/𝜆2
Exponential Distribution using Python
>>> from scipy.stats import expon
>>> import matplotlib.pyplot as plt
>>> fig, ax = plt.subplots(1, 1)
>>> from scipy.stats import expon
>>> import matplotlib.pyplot as plt
>>> import numpy as np
>>> fig, ax = plt.subplots(1, 1)
>>> x = np.linspace(0,10,100)
>>> ax.plot(x,expon.pdf(x),'r-',lw=5, alpha=0.6, label='expon pdf')
>>> r=expon.rvs(size=1000)
>>> ax.hist(r, density=True, histtype='stepfilled',alpha=0.2)
>>> ax.legend()
>>> plt.show()
Exponential PDF for different lambda
from scipy.stats import expon
import matplotlib.pyplot as plt
import numpy as np
fig, ax = plt.subplots(1, 1)
x = np.linspace(0,5,100)
lambda1, lambda2, lambda3 = 1/0.5,1,1/1.5
ax.plot(x,expon.pdf(x,scale=lambda1), label='Lambda 0.5')
ax.plot(x,expon.pdf(x,scale=lambda2), label='Lambda 1')
ax.plot(x,expon.pdf(x,scale=lambda3), label='Lambda 1.5')
ax.legend()
plt.show()
• In a large corporate computer network, user log-ons to the system can be modeled as a Poisson process with a
mean of 25 log-ons per hour. What is the probability that there are no log-ons in an interval of six minutes?
• What is the probability that the time until the next log-on is between two and three minutes?
Beta family of distribution
The 𝑏𝑒𝑡𝑎(𝑎, 𝑏) distribution is another important distribution that is used for
continuous random variable 𝑥 in the range of 0 ≤ 𝑥 ≤ 1. The beta
distribution has the probability function of the form
Γ 𝛼 + 𝛽 𝛼−1
𝑃 𝑥 𝛼, 𝛽 = 𝑥 1 − 𝑥 𝛽−1
Γ 𝛼 Γ 𝛽
Where 𝛼 > 0 and 𝛽 > 0
𝛼−1 1 − 𝑥 𝛽−1 is called
The shape of the curve that is
Γ 𝛼+𝛽
determined by 𝑥
𝑏𝑒𝑡𝑎 𝛼, 𝛽 . The constant is required to make the curve a probability
Γ 𝛼 Γ 𝛽
density function.
The curve can have different shapes depending on the values 𝑎 and 𝑏. Which
makes the 𝑏𝑒𝑡𝑎 𝑎, 𝑏 a family of distributions. The 𝑢𝑛𝑖𝑓𝑜𝑟𝑚(0,1) is a
special case of 𝑏𝑒𝑡𝑎(𝑎, 𝑏) where 𝑎 = 1 and 𝑏 = 1
Biased coin flipped 50 times
Heads 30 times, tails 20 times

Plot beta distribution


Beta family of distribution
• The figure shows the beta distribution
𝑏𝑒𝑡𝑎 𝑎, 𝑏 for the value’s different
values of 𝑎, 𝑏 that the beta family can
take.
• For 𝑎 < 𝑏, the density has more
weight in the lower half.
• When the 𝑎 > 𝑏, the opposite is true.
• For 𝑎 = 𝑏, the 𝑏𝑒𝑡𝑎(𝑎, 𝑏) density is
symmetric.
• For 𝑎 = 1/2, more weight is given
towards zero and for 𝑏 = 1/2, more
weight is given towards one.
• For 𝑎 = 1, 𝑏 = 1, you can see the
uniform distribution.
Beta Distribution using Python
import numpy as np
from scipy.stats import beta
import matplotlib.pyplot as plt
fig, ((ax1, ax2, ax3, ax4), (ax5, ax6, ax7, ax8), (ax9, ax10, ax11, ax12),(ax13, ax14, ax15,
ax16)) = plt.subplots(4, 4)
fig.set_size_inches(10,7.5)
x = np.linspace(0,1,100)
ax1.plot(x,beta.pdf(x, 0.5, 0.5))
ax2.plot(x,beta.pdf(x, 0.5, 1.0))
ax3.plot(x,beta.pdf(x, 0.5, 2.0))
ax4.plot(x,beta.pdf(x, 0.5, 3.0))
ax5.plot(x,beta.pdf(x, 1.0, 0.5))
Beta Distribution using Python
ax6.plot(x,beta.pdf(x, 1.0, 1.0))
ax7.plot(x,beta.pdf(x, 1.0, 2.0))
ax8.plot(x,beta.pdf(x, 1.0, 3.0))
ax9.plot(x,beta.pdf(x, 2.0, 0.5))
ax10.plot(x,beta.pdf(x, 2.0, 1.0))
ax11.plot(x,beta.pdf(x, 2.0, 2.0))
ax12.plot(x,beta.pdf(x, 2.0, 3.0))
ax13.plot(x,beta.pdf(x, 3.0, 0.5))
ax14.plot(x,beta.pdf(x, 3.0, 1.0))
ax15.plot(x,beta.pdf(x, 3.0, 2.0))
ax16.plot(x,beta.pdf(x, 3.0, 3.0))
Beta Distribution using Python
ax1.set_title("beta(0.5,0.5)",fontsize=10)
ax2.set_title("beta(0.5,1.0)",fontsize=10)
ax3.set_title("beta(0.5,2.0)",fontsize=10)
ax4.set_title("beta(0.5,3.0)",fontsize=10)
ax5.set_title("beta(1.0,0.5)",fontsize=10)
ax6.set_title("beta(1.0,1.0)",fontsize=10)
ax7.set_title("beta(1.0,2.0)",fontsize=10)
ax8.set_title("beta(1.0,3.0)",fontsize=10)
ax9.set_title("beta(2.0,0.5)",fontsize=10)
ax10.set_title("beta(2.0,1.0)",fontsize=10)
ax11.set_title("beta(2.0,2.0)",fontsize=10)
ax12.set_title("beta(2.0,3.0)",fontsize=10)
ax13.set_title("beta(3.0,0.5)",fontsize=10)
ax14.set_title("beta(3.0,1.0)",fontsize=10)
ax15.set_title("beta(3.0,2.0)",fontsize=10)
ax16.set_title("beta(3.0,3.0)",fontsize=10)
fig.subplots_adjust(hspace=0.7)
fig.subplots_adjust(wspace=0.7)
plt.show()
Beta Distribution Mean and Variance
A Beta Distribution with parameters 𝑎, 𝑏

𝑎
Mean
𝑎+𝑏
𝑎𝑏
Variance
𝑎+𝑏 2 𝑎+𝑏+1
Gamma family of distributions
The 𝑔𝑎𝑚𝑚𝑎(𝑎, 𝑏) distribution is used for a non-negative continuous random
variable 0 ≤ 𝑥 < ∞. The two parameters 𝑎 > 0, 𝑏 > 0

𝑏 𝑎 𝑎−1 −𝑏𝑥
𝑃𝛾 𝑥 𝑎, 𝑏 = 𝑥 𝑒
Γ 𝑎

𝑏𝑎
The shape of the curve is given by 𝑥 𝑎−1 𝑒 −𝑏𝑥 . The constant is required to make
Γ 𝑎
the curve a probability density.

In scipy, the parameter a, 𝑏 are given as


y1 = stats.gamma.pdf(x, a, scale=1/b)
Gamma family of distributions - Python
import numpy as np
from scipy.stats import gamma
import matplotlib.pyplot as plt
fig, ((ax1, ax2, ax3), (ax4, ax5, ax6), (ax7, ax8, ax9)) = plt.subplots(3, 3)
fig.set_size_inches(10,7.5)
x = np.linspace(0,10,1000)
ax1.plot(x,gamma.pdf(x, 1.0, scale=1/1))
ax2.plot(x,gamma.pdf(x, 1.0, scale=1/5))
ax3.plot(x,gamma.pdf(x, 1.0, scale=1/15))
ax4.plot(x,gamma.pdf(x, 5.0, scale=1/1))
ax5.plot(x,gamma.pdf(x, 5.0, scale=1/5))
ax6.plot(x,gamma.pdf(x, 5.0, scale=1/15))
ax7.plot(x,gamma.pdf(x, 15.0, scale=1/1))
ax8.plot(x,gamma.pdf(x, 15.0, scale=1/5))
ax9.plot(x,gamma.pdf(x, 15.0, scale=1/15))
Gamma family of distributions - Python
ax1.set_title("gamma(1,1)",fontsize=10)
ax2.set_title("gamma(1,5)",fontsize=10)
ax3.set_title("gamma(1,15)",fontsize=10)
ax4.set_title("gamma(5,1)",fontsize=10)
ax5.set_title("gamma(5,5)",fontsize=10)
ax6.set_title("gamma(5,15)",fontsize=10)
ax7.set_title("gamma(15,1)",fontsize=10)
ax8.set_title("gamma(15,5)",fontsize=10)
ax9.set_title("gamma(15,15)",fontsize=10)

fig.subplots_adjust(hspace=0.7)
fig.subplots_adjust(wspace=0.7)
plt.show()
Gamma family of distributions
Mean and Variance of 𝑔𝑎𝑚𝑚𝑎(𝑎, 𝑏)

Mean - 𝑎/𝑏
Variance - 𝑎/𝑏 2
Chi-Square distributions
The chi-square 𝜒 2 (𝑟) distribution with 𝑟 degree of freedom is a special
𝑟 1
case of 𝑔𝑎𝑚𝑚𝑎 distribution with 𝑎 = , 𝑏 =
2 2

1 𝑟
−1 −
𝑥
𝑃 𝑥𝑟 = r 𝑥2 𝑒 2, 0<𝑥<∞
Γ r\2 22

Mean and Variance for Chi-Square Distribution


Mean 𝑟
Variance 2𝑟
Chi-Square distributions
import numpy as np
from scipy.stats import chi2
import matplotlib.pyplot as plt
fig, ax = plt.subplots(1, 1)
x = np.linspace(0,10,1000)
ax.plot(x,chi2.pdf(x, 2),label="r-2")
ax.plot(x,chi2.pdf(x, 3),label="r-3")
ax.plot(x,chi2.pdf(x, 5),label="r-5")
ax.plot(x,chi2.pdf(x, 8),label="r-8")
ax.legend()
plt.show()
Joint Probability Distribution
• The joint probability distribution of two continuous random variables
𝑋 and 𝑌 can be specified by providing a method for calculating the
probability that 𝑋 and 𝑌 assume a value in any region 𝑅 of two-
dimensional space.
• The joint probability density function is defined over two-dimensional
space.
• The probability that (𝑋, 𝑌) assumes a value in region 𝑅 can be
calculated using double integral of 𝑓𝑋𝑌 (𝑥, 𝑦) over a region 𝑅.
• The integral can be interpreted as the volume under the surface over
the region 𝑅
Joint Probability Distribution
The joint probability density function for the continuous random
variable 𝑋 and 𝑌 denoted as 𝑓𝑋𝑌 (𝑥, 𝑦) satisfy the following property
1. 𝑓𝑋𝑌 𝑥, 𝑦 ≥ 0 for all 𝑥, 𝑦
∞ ∞
2. ‫׬‬−∞ ‫׬‬−∞ 𝑓𝑋𝑌 𝑥, 𝑦 𝑑𝑥 𝑑𝑦 = 1
3. For any region 𝑅 of two-dimensional space
𝑃( 𝑋, 𝑌 ∈ 𝑅 = න න𝑓𝑋𝑌 𝑥, 𝑦 𝑑𝑥 𝑑𝑦
𝑅
Quantile-Quantile Plot
• We use Quantile-Quantile or Q-Q plot to compare two probability
distributions by plotting their quantiles against each other.
• This is a technique to check whether two sets of sample points follow the
same distribution.
• We use the qqplots to check whether the sample of data follow a particular
distribution.
• In this scenario, one distribution is known and other distribution is
unknown.
• If the other distribution follows the given distribution, we will have a
scatter-plot where the data points will be in a straight line.
• If the distributions are identical, the quantities should be approximately
equal.
Quantile-Quantile Plot
from scipy import stats
import matplotlib.pyplot as plt
nsample = 100
np.random.seed(7654321)

# t-plot for small degree of freedom


ax1 = plt.subplot(221)
x = stats.t.rvs(3, size=nsample)
res = stats.probplot(x, plot=plt)

# t-plot for large degree of freedom


ax2 = plt.subplot(222)
x = stats.t.rvs(25, size=nsample)
res = stats.probplot(x, plot=plt)
Quantile-Quantile Plot
#Two normal distribution with broadcasting
ax3 = plt.subplot(223)
x = stats.norm.rvs(loc=[0,5],
scale=[1,1.5],size=(nsample//2,2)).ravel()
res = stats.probplot(x, plot=plt)
# Standard normal distribution
ax4 = plt.subplot(224)
x = stats.norm.rvs(loc=0, scale=1,
size=nsample)
res = stats.probplot(x, plot=plt)
plt.show()
Quantile-Quantile Plot
#Loggamma distribution
fig = plt.figure()
ax = fig.add_subplot(111)
x = stats.loggamma.rvs(c=2.5,
size=500)
res = stats.probplot(x,
dist=stats.loggamma,
sparams=(2.5,), plot=ax)
ax.set_title("Probplot for
loggamma dist")
plt.show()
Thanks
Samatrix Consulting Pvt Ltd

You might also like