0% found this document useful (0 votes)
28 views54 pages

Distributions

The document discusses different types of probability distributions including the Bernoulli, binomial, Poisson, and normal distributions. It provides the probability mass functions and describes properties like the mean and variance for each distribution. Examples are also given to demonstrate how to calculate probabilities using the different distributions.

Uploaded by

Satyam Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views54 pages

Distributions

The document discusses different types of probability distributions including the Bernoulli, binomial, Poisson, and normal distributions. It provides the probability mass functions and describes properties like the mean and variance for each distribution. Examples are also given to demonstrate how to calculate probabilities using the different distributions.

Uploaded by

Satyam Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

Distributions

Motivation
• We shall look at certain types of random variables that occur over and
over again
• The probability distributions of such random variables are given
special names
• Here, we shall look at some such random variables namely
• Bernoulli Random Variable
• Normal Random Variable
• Poisson Random Variable
Bernoulli Random Variable
• This random variable is associated with the experiments where only two
outcomes are present
• We call these as “success” and “failure”
• Suppose the probability of “success” in the experiment is p then, the pmf of the
random variable can be written as:

𝑃 𝑋=1 =𝑝

𝑃 𝑋 =0 =1−𝑝
• Such a random variable is also known as Bernoulli Random Variable after Swiss
Mathematician James Bernoulli
• The expected value of the Bernoulli random variable is the probability that it
takes value 1 i.e. p
Binomial Random Variable
• Suppose, X represents the number of successes in the n independent
trails of this experiment each having constant probability of success
denoted by ‘p’ then, X is said to be a binomial random variable with
parameters (n,p).

• The probability mass function of this random variable X can be given


as:

𝑛 𝑖
𝑃 𝑋=𝑖 = 𝑝 (1 − 𝑝)𝑛−𝑖 , 𝑖 = 0,1 … 𝑛
𝑖
Example
• It is known that disks produced by a certain company will be defective
with probability .01 independently of each other. The company sells
the disks in packages of 10 and offers a money-back guarantee that at
most 1 of the 10 disks is defective. What proportion of packages is
returned? If someone buys three packages, what is the probability
that exactly one of them will be returned?
Solution
𝑃 𝑋 >1 =1−𝑃 𝑋 ≤1

1− 𝑃 𝑋 =0 +𝑃 𝑋 =1

10 10 10 9
➢1 − 0
0.99 + 1
0.01 0.99 ≈ 0.005

3
➢ 1
0.005 (0.995)2 = 0.015
Mean and Variance
The n independent trials can be represented using n Bernoulli random
variables as below

1 𝑖𝑓 𝑖𝑡ℎ 𝑡𝑟𝑖𝑎𝑙 𝑖𝑠 𝑎 𝑠𝑢𝑐𝑐𝑒𝑠𝑠


𝑋𝑖 = ቊ
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

In the above case,


𝐸 𝑋𝑖 = 𝑝
𝑉𝑎𝑟 𝑋𝑖 = 𝐸 𝑋𝑖 2 − 𝑝2
= 𝑝(1 − 𝑝)
• The binomial random variable can be expressed as a sum of the n
Bernoulli random variables, that is,
𝑛

𝑋 = ෍ 𝑋𝑖
𝑖=1
Therefore,
𝑛 𝑛

𝐸 𝑋 = 𝐸 ෍ 𝑋𝑖 = ෍ 𝐸[𝑋𝑖 ] = 𝑛𝑝
𝑖=1 𝑖=1

𝑉𝑎𝑟 𝑋 = ෍ 𝑉𝑎𝑟(𝑋𝑖 ) = 𝑛𝑝(1 − 𝑝)


𝑖=1
Bernoulli Distribution Function
𝑖
𝑛 𝑘
𝑃 𝑋≤𝑖 =෍ 𝑝 (1 − 𝑝)𝑛−𝑘
𝑘
𝑘=0

𝑛 𝑘
𝑃 𝑋=𝑘 = 𝑝 (1 − 𝑝)𝑛−𝑘
𝑘
𝑛
𝑃 𝑋 =𝑘+1 = 𝑝𝑘+1 (1 − 𝑝)𝑛−𝑘−1
𝑘+1
Taking ratio of the above two equations we get,

𝑝 𝑛−𝑘
𝑃 𝑋 =𝑘+1 = 𝑃{𝑋 = 𝑘}
1−𝑝𝑘+1

The above eqn can be used to calculate the distribution function


Poisson Random Variable
• A random variable X taking values 0,1,2… is said to be a Poisson
random variable with parameter 𝜆, 𝜆 > 0, if the probability mass
function of X is of the form:

𝑒 −𝜆 𝜆𝑖
𝑃 𝑋=𝑖 = , 𝑖 = 0,1, …
𝑖!
Probability Distribution Function, Mean and
Variance
∞ 𝑖 ∞
𝜆
෍ 𝑝 𝑖 = 𝑒 −𝜆 ෍ = 𝑒 −𝜆 𝑒 𝜆 = 1
𝑖!
𝑖=1 𝑖=0

𝜙 𝑡 = 𝐸 𝑒 𝑡𝑋
∞ 𝑖
𝜆
= ෍ 𝑒 𝑡𝑖 𝑒 −𝜆
𝑖!
𝑖=0


𝑡 )𝑖
(𝜆𝑒
𝑒 −𝜆 ෍
𝑖!
𝑖=0

−𝜆 𝜆𝑒 𝑡
𝑒 𝑒 = exp{𝜆 𝑒 𝑡 − 1 }
• After differentiation we get,

𝜙 ′ 𝑡 = 𝜆𝑒 𝑡 exp{𝜆 𝑒 𝑡 − 1 }

𝜙 ′′ 𝑡 = (𝜆𝑒 𝑡 )2 exp 𝜆 𝑒 𝑡 − 1 + 𝜆𝑒 𝑡 exp 𝜆 𝑒 𝑡 − 1

• Putting t = 0, we get
𝜙′ 0 = 𝐸 𝑋 = 𝜆

𝑉𝑎𝑟 𝑋 = 𝜙 ′′ 0 − 𝐸 𝑋 2
=𝜆
Poisson Distribution Function
• If X is a random variable with mean 𝜆, then

𝑃 𝑋 =𝑖+1 𝑒 −𝜆 𝜆𝑖+1 Τ 𝑖 + 1 ! 𝜆
= −𝜆 𝑖
=
𝑃 𝑋=𝑖 𝑒 𝜆 /𝑖! 𝑖+1
Approximation of Binomial Random Variable
• Poisson random variable is useful because it may be used as an
approximation for a binomial random variable with parameters (n, p)
when n is large and p is small
• The parameter of Poisson random variable in this case will be
𝜆 = 𝑛𝑝
Example
• Suppose that the average number of accidents occurring weekly on a
particular stretch of a highway equals 3. Calculate the probability that
there is at least one accident this week.
• X = number of accidents in a week
𝑒 −𝜆 𝜆0
• P{X>= 1} = 1-P{X=0} = 1 − = 1 − 𝑒 −3 =
0!
Example
• Suppose the probability that an item produced by a certain machine
will be defective is .1. Find the probability that a sample of 10 items
will contain at most one defective item. Assume that the quality of
successive items is independent.
10 10
• P{X<=1} = P{X=0} + P{X = 1} = 0
(0.9)10 + 1
(0.1)1 (0.9)9 =

• P{X<=1} = P{X=0} + P{X=1} = 𝑒 −𝜆 + 𝜆𝑒 −𝜆 = 𝑒 −𝜆 1 + 𝜆 = 2𝑒 −1 =


Normal Random Variables
• A random variable is said to be normally distributed with parameters 𝜇
and 𝜎 2 , and we write 𝑋~𝒩(𝜇, 𝜎 2 ), if its density is

1 − 𝑥−𝜇 2 /2𝜎 2
𝑓 𝑥 = 𝑒 , −∞ < 𝑥 < ∞
2𝜋𝜎

• Above forms a bell-shaped curve that is symmetric about the mean 𝜇


1 0.399
𝐸 𝑋 = 𝜇 and attains its maximum value ( ≈ ) also at 𝑥 = 𝜇
𝜎 2𝜇 𝜎
Why is Normal Distribution important?
• It was first used to approximate probabilities associated with binomial
random variable when the parameter n is very large
• Later, this result was extended and it was found that, many random
phenomena obey, at least approximately, a normal probability
distribution
• Examples, height of a person, error made in measurement, velocity of
molecule in any direction etc.
Mean and Variance
• The mean and variance of the normal random variable can be
calculated with the help of moment generating function
• We get the values as:
𝐸 𝑋 = 𝜙′ 0 = 𝜇

𝐸 𝑋 2 = 𝜙 ′′ 0 = 𝜎 2 + 𝜇2
Therefore,
𝑀𝑒𝑎𝑛 = 𝐸 𝑋 = 𝜇

𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = 𝐸 𝑋 2 − 𝐸 𝑋 2
= 𝜎2
Standard Normal Random Variable
• If X is a normal random variable with mean 𝜇 and variance 𝜎 2 and if we
define another random variable Y = 𝛼𝑋 + 𝛽 then, the following hold:
1. Y is also a normal random variable
2. Y has a mean 𝛼𝜇 + 𝛽 and has variance 𝛼 2 𝜎 2

• Above result can be used to define a special type of normal random


variable known as the standard normal variable Z, which has a mean 0 and
variance 1, in the following manner:

𝑋−𝜇 𝑋 𝜇
𝑍= = −
𝜎 𝜎 𝜎
• 𝛼 = 1/𝜎
• 𝛽 = −𝜇/𝜎
• ⇒ 𝛼𝜇 + 𝛽 = 0,
• 𝑡ℎ𝑎𝑡 𝑖𝑠 𝑚𝑒𝑎𝑛 𝑜𝑓 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑛𝑜𝑟𝑚𝑎𝑙 𝑟𝑎𝑛𝑑𝑜𝑚 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 𝑖𝑠 𝑧𝑒𝑟𝑜
2 2 1
•𝛼 𝜎 = × 𝜎2 = 1
𝜎2
• 𝑡ℎ𝑎𝑡 𝑖𝑠 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑛𝑜𝑟𝑚𝑎𝑙 𝑟𝑎𝑛𝑑𝑜𝑚 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 𝑖𝑠 1
Distribution Function
• The distribution function of the standard or unit normal distribution is
given as:


1 −𝑦 2 /2
𝜙 𝑥 = න 𝑒 𝑑𝑦, −∞ < 𝑥 < ∞
2𝜎 −∞
• The conversion to standard normal variable allows us to write the
probability values for X in terms of Z
• For example, if we want to find the probability P{X<b} then it can be
𝑋−𝜇 𝑏−𝜇
calculated by noting that if X<b holds then < also holds
𝜎 𝜎
• Therefore, the probability can be expressed as:

𝑋−𝜇 𝑏−𝜇 𝑏−𝜇 𝑏−𝜇


𝑃 𝑋<𝑏 =𝑃 < =𝑃 𝑍< =𝜙
𝜎 𝜎 𝜎 𝜎
• Similiarly, P{a<X<b} can be written as:

𝑎−𝜇 𝑏−𝜇 𝑏−𝜇 𝑎−𝜇


𝑃 𝑎<𝑋<𝑏 =𝑃 <𝑍< =𝜙 −𝜙
𝜎 𝜎 𝜎 𝜎
• Thus, if we know values of 𝜙 𝑥 for different values of x then,
probabilities for any normal random variable can be calculated
• The values of 𝜙 𝑥 calculated to some pre-defined accuracy are
provided in most cases and we can use the for calculating
probabilities for normal random variable X
• Further, the probability 𝜙 −𝑥 can be calculated by using the
symmetry of the normal distribution along the mean
• We have,
𝜙 −𝑥 = 𝑃 𝑍 < −𝑥

=𝑃 𝑍>𝑥 𝑏𝑦 𝑠𝑦𝑚𝑚𝑒𝑡𝑟𝑦

= 1 − 𝜙(𝑥)
Example
• If X is a normal random variable with mean μ = 3 and variance σ2 = 16,
find
• P{X<11}
• P{X>-1}
• P{2<X<7}

𝑋 − 3 11 − 3
𝑃 𝑋 < 11 = 𝑃 < = 𝑃{𝑍 < 2}
4 4

= 𝜙 2 = 0.9772
• P{X>-1} = P{X<1}

𝑋−3 1−3
𝑃 𝑋<1 =𝑃 < = 𝑃 𝑍 < −0.5 = 𝑃 𝑍 > 0.5
4 4
= 1 − 𝑃 𝑍 < 0.5

P{2<X<7}
2−3 𝑋−3 7−3
=𝑃 < < = 𝑃 −0.25 < 𝑍 < 1
4 4 4

= 𝑃 𝑍 < 1 − 1 + 𝑃{𝑍 < 0.25}


Example
• The power W dissipated in a resistor is proportional to the square of
the voltage V. That is,
𝑊 = 𝑟𝑉 2
• where, r is a constant. If r = 3, and V can be assumed (to a very good
approximation) to be a normal random variable with mean 6 and
standard deviation 1, find
• E[W]
• P{W > 120}
Solution
𝐸 𝑊 = 𝐸 3𝑉 2
= 3𝐸 𝑉 2
= 3(𝑉𝑎𝑟 𝑉 + 𝐸 2 [𝑉])
= 3 1 + 36

𝑃 𝑊 > 120 = 𝑃 3𝑉 2 > 120


= 𝑃{𝑉 > 40}
= 𝑃{𝑉 − 6 > 40 − 6}
= 𝑃 𝑍 > 0.3246
= 1 − 𝜙 0.3246
= 0.3727
Sum of Normal Random Variables
• The sum of independent normal random variables is a random
variable
• Its mean and variance are:
𝑛 𝑛

𝜇 = ෍ 𝜇𝑖 𝑎𝑛𝑑 𝜎 = ෍ 𝜎𝑖 2
𝑖=1 𝑖=1
Chi-Square Distribution
• If Z1, Z2,…Zn are n independent standard normal random variables,
then X, defined by

𝑋 = 𝑍1 2 + 𝑍2 2 + ⋯ + 𝑍𝑛 2

is said to have a chi-square distribution with n degrees of freedom and


is denoted as

𝑋~𝜒𝑛 2
Distribution of Sampling Statistics
• In order to draw conclusion about the population from samples, we
assume certain kind of relations to hold between the population and
sample
• One such assumption is that there exists an underlying probability
distribution of the population such that the measurable values from
the population can be thought to be independent random variables
having this distribution
• If the sample data is chosen from this distribution randomly, then we
can assume that the samples are also independent random variables
Definition
• If X1, X2, … Xn are independent random variables having a common
distribution F, then we say that they constitute a sample from the
distribution F
• If the form of the underlying distribution is known and we are only
interested in estimating its parameters then the inference process is
known as Parametric inference
• If neither the parameter nor the form of the distribution is known, we
call the inference non-parametric
Statistic
• A statistic is a random variable whose value is determined by the
sample data
• We are interested in finding probability distributions of certain
statistics
• Here, we shall discuss mean and variance
Sample Mean
• Suppose, we are obtaining data regarding some numerical quantity
from the population such as height, age, annual income etc.
• Then, the values obtained corresponding to any element of the
population may be regarded as a value of a random variable with
mean 𝜇 and variance 𝜎 2 where 𝜇 and 𝜎 2 are population mean and
population variance
• Let X1, X2, … Xn be a sample of values from this population, then the
sample mean will be defined as

𝑋ത = (𝑋1 +𝑋2 + ⋯ + 𝑋𝑛 )/𝑛


Mean and Variance
• Since, the sample mean as defined is also a random variable we can calculate its mean and
variance

𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛 𝐸 𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛 1
𝐸 𝑋ത = 𝐸 = = 𝑛𝜇 = 𝜇
𝑛 𝑛 𝑛

𝑋1+𝑋2+⋯+𝑋𝑛 1
𝑉𝑎𝑟 𝑋ത = Var = Var X1 + Var X2 + ⋯ + Var Xn
𝑛 𝑛2
(because of independence)

𝑛𝜎 2 𝜎 2
= 2 =
𝑛 𝑛

• Thus, the variability of sample decreases as the size of sample increases


The Central Limit Theorem
• Let X1, X2,… Xn be a sequence of independent and identically
distributed random variables each having mean 𝜇 and variance 𝜎 2 .
Then for large n, the distribution of

𝑋 = 𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛
is approximately normal with mean 𝑛𝜇 and variance 𝑛𝜎 2
• This implies that the quantity

𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛 − 𝑛𝜇
𝜎/ 𝑛

Is approximately a standard normal random variable. Thus for large


values of n

𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛 − 𝑛𝜇
𝑃 < 𝑥 ≈ 𝑃{𝑍 < 𝑥}
𝜎/ 𝑛
Example
• An insurance company has 25,000 automobile policy holders. If the yearly
claim of a policy holder is a random variable with mean 320 and standard
deviation 540, approximate the probability that the total yearly claim exceeds
8.3 million.

𝑃 𝑋 > 8.3 × 106

𝑋 − 25000 × 320 8.3 × 106 − 25000 × 320


=𝑃 >
540/ 25000 540/ 25000

= 𝑃 𝑍 > 3.51 ≈ 0.00023


Binomial Approximation
• We now have two possible approximations for binomial random
variable –
• Poisson approximation holds when n is large and p is small
• Normal approximation holds when np(1-p) is large
• In general, it holds good for np(1-p)>= 10
Approximate Distribution of Sample Mean
• As a constant times a normal random variable is also a normal
random variable, we can use the central limit theorem to state that
the sample mean is a normal random variable.
• Thus,

𝑋ത − 𝜇
𝜎/ 𝑛
Has an approximate standard normal distribution
Example
• The weights of a population of workers have mean 167 and standard
deviation 27. If a sample of 36 workers is chosen, approximate the
probability that the sample mean of their weights lies between 163
and 170.
𝑃 163 < 𝑋 < 170

𝜇 = 167, 𝜎 = 27

27
𝜎/ 𝑛 = = 4.5
36
How large a sample is needed?
• Practically, no matter how non normal the underlying population
distribution is, the sample mean of a sample of size at least 30 will be
approximately normal
• In general, the normal approximation is valid for even smaller
datasets
Sample Variance
• We already know that the sample variance is denoted by statistic 𝑆 2

𝑛 ത 2
2
σ (𝑋
𝑖=1 𝑖 −𝑋)
𝑆 =
𝑛−1

• It can be shown that E[𝑆 2 ] = 𝜎 2


Parameter Estimation
• Let X1, X2, …, Xn be a random sample from a distribution 𝐹𝜃 that is
specified up to a vector of unknown parameters 𝜃.
• The central problem is to make inferences about the unknown
parameters
• One such method is the Maximum likelihood estimate for unknown
parameters that gives point estimates
• We shall also look at an example of interval estimate
Maximum Likelihood Estimators
• Any statistic used to estimate the value of an unknown parameter 𝜃 is said
to be an estimator of 𝜃.
• Estimate is the observed value of the estimator
• Let X1, X2,…Xn, whose joint distribution is available except for an unknown
parameter 𝜃
• f(x1, x2,…xn) = 𝑓𝑋1 𝑥1 𝑓𝑋2 𝑥2 … 𝑓𝑋𝑛 𝑥𝑛
• 𝑓(𝑥1 , 𝑥2 , … 𝑥𝑛 |𝜃) represents the likelihood that values 𝑥1 , 𝑥2 , … 𝑥𝑛 will be
observed when 𝜃 is the true value of the parameter
• MLE 𝜃መ is the value of 𝜃 that maximizes 𝑓(𝑥1 , 𝑥2 , … 𝑥𝑛 |𝜃) where
𝑥1 , 𝑥2 , … 𝑥𝑛 are the observed values
• 𝑓(𝑥1 , 𝑥2 , … 𝑥𝑛 |𝜃) is called the likelihood function
MLE estimator of Bernoulli Parameter
• Let us consider n independent trials of experiment each with a
probability p of success

𝑃 𝑋 = 𝑥 = 𝑝 𝑥 (1 − 𝑝)1−𝑥 , x = 0,1
MLE estimator of a Poisson Parameter
• Suppose X1, . . . , Xn are independent Poisson random variables each
having mean λ
Example
• The number of traffic accidents in Berkeley, California, in 10 randomly
chosen non rainy days in 1998 is as follows:
4,0,6,5,2,1,2,0,4,3

• Use these data to estimate the proportion of non rainy days that had
2 or fewer accidents that year.
• Soln: First estimate the mean and then use it to calculate the
probability P{X<=2}
MLE of Normal Population
• The MLE estimators of 𝜇 𝑎𝑛𝑑 𝜎 are given by

𝑛 1/2
𝑋𝑖 − 𝑋ത 2
𝑋ത 𝑎𝑛𝑑 ෍
𝑛
𝑖=1

• It should be noted that the MLE estimator of standard deviation 𝜎 differs


from the sample standard deviation S
𝑛 1/2
𝑋𝑖 − 𝑋ത 2
𝑆= ෍
𝑛−1
𝑖=1

• However, for n of reasonable size, these two estimators of 𝜎 are


approximately equal
Interval Estimates
For the population 𝜎 is known while the mean is estimated as 𝑋ത

You might also like