Pattern - Recognition - Module - 2 Notes
Pattern - Recognition - Module - 2 Notes
where,
n n!
=
k k!(n − k)!
Definition 29. Cumulative Distribution Function (CDF) of Binomial Distribution is
x
X n
P (X ≤ x) = (p)k (1 − p)(n−k)
k
k=0
µ = E(X) = np
σ 2 = V ar(X) = np(1 − p)
Example: The probability that a certain kind of component will survive a shock test is 3/4. Find
the probability that exactly 2 of the next 4 components tested survive.
Solution: Assuming that the tests are independent and p = 3/4 for each of the 4 tests, we obtain
Example: The probability that a patient recovers from a rare blood disease is 0.4. If 15 people are
known to have contracted this disease, what is the probability that
• The number of outcomes occurring in one time interval or specified region of space is indepen-
dent of the number that occur in any other disjoint time interval or region. In this sense we
say that the Poisson process has no memory.
• The probability that a single outcome will occur during a very short time interval or in a small
region is proportional to the length of the time interval or the size of the region and does not
depend on the number of outcomes occurring outside this time interval or region.
• The probability that more than one outcome will occur in such a short time interval or fall in
such a small region is negligible.
Definition 33. The probability mass function of the Poisson random variable X is
e−λ (λ)x
P (X = x) = , x=0,1,2,3,...
x!
where λ is the average number of outcomes per unit time, distance, area, or volume and e = 2.71828...
Definition 34. Cumulative Distribution Function (CDF) of Poisson Distribution is
x
X e−λ (λ)k
P (X ≤ x) =
k!
k=0
µ = E(X) = λ
σ 2 = V ar(X) = λ
Solution:
Example: The number of calls arriving at a call center follows a Poisson distribution at 10 per hour.
Calculate the probability that the number of calls over a 3-hour period will exceed 30.
Solution:
P (X = x) = (1 − p)x−1 p, x=1,2,3,...
P (X ≤ x) = 1 − (1 − p)x
Solution:
Solution:
Here, a = 0, b =20
Example: A random variable X has a uniform distribution over (-5 , 6), find cumulative distribution
function for x = 3.
Solution:
Here, a = -5, b = 6, x = 3
Definition 46. A continuous random variable X is said to have an exponential distribution with
parameter λ > 0, if its PDF is given by
λe−λx
x>0
f (x) =
0 otherwise
F (x) = 1 − e−λx
Solution:
Note: No closed form solution exists for the cumulative distribution function of a normal distribu-
tion. Many functions that are numerical approximations are used for finding the value of cumulative
distribution function of normal distribution.
• It is a two parameter distribution, where the parameter µ is the mean (location parameter)
and the parameter σ is the standard deviation (scale parameter).
• All normal distributions have symmetrical bell shape around mean µ (thus it is also median).
µ is also the mode of the normal distribution, that is, µ is the mean, median as well as the
mode.
• Any linear transformation of a normal random variable is also normal random variable. That
is, if X is a normal random variable, then the linear transformation AX + B (where A and B
are two constants) is also a normal random variable.
• If X1 and X2 are two independent normal random variables with mean µ1 and µ2 and variance
σ12 and σ22 , respectively, then X1 + X2 is also a normal distribution with mean µ1 + µ2 and
variance σ12 + σ22 .
• Sampling distribution of mean values of a large sample drawn from a population of any distri-
bution is likely to follow a normal distribution.
Standard Normal Variable
Every normal distribution can be converted to the standard normal distribution by turning the
individual values into z-scores.
Definition 51. A normal random variable with mean µ = 0 and σ = 1 is called the standard normal
variable (distribution) and usually represented by Z. The probability density function of a standard
normal variable is given by
1 Z2
f (z) = √ e− 2
2π
By using the following transformation, any normal random variable X can be converted into a
standard normal variable:
X −µ
Z=
σ
Example 1: According to a survey on use of smart phones in India, the smart phone users spend
68 minutes in a day on average in sending messages and the corresponding standard deviation is 12
minutes. Assume that the time spent in sending messages follows a normal distribution.
(a) What proportion of the smart phone users are spending more than 90 minutes in sending mes-
sages daily?
(b) What proportion of customers are spending less than 20 minutes?
(c) What proportion of customers are spending between 50 minutes and 100 minutes?
Solution:
(b) Proportion of customers spending less than 20 minutes is P (X ≤ 20) = F (20) = 3.167110−5 .
(c) Proportion of customers spending between 50 and 100 minutes is given by P (50 ≤ X ≤
100) = F (100) − F (50) = 0.9293.
X1 − µ1
Z1 =
σ1
Then,
!2
X1 − µ1
Z12 =
σ1
is a chi-square distribution with one degree of freedom [χ2 (1)]. Let X2 be a normal random variable
with mean µ2 and standard deviation σ2 and Z2 be the corresponding standard normal variable.
Then the random variable Z12 + Z22 given by
!2 !2
X1 − µ1 X2 − µ2
Z12 + Z22 = +
σ1 σ2
√
• The mean and standard deviation of a chi-square distribution are k and 2k respectively,
where k is the degrees of freedom.
• As the degrees of freedom k increases, the probability density function of a chi-square distri-
bution approaches normal distribution.
• Chi-square goodness of fit test is one of the popular tests for checking whether a data follows
a specific probability distribution.
where X and S are mean and standard deviation estimated from the sample X1 , X2 , · · · , Xn . Then
the random variable t defined by
X̄ − µ
t= √
S/ n
follows a t-distribution with (n - 1) degrees of freedom. Here one degree of freedom is lost since the
standard deviation is estimated from the sample (degrees of freedom is the number of observations
in the sample minus number of restrictions or estimates made using the sample).
Properties of t-Distribution:
3.2.7 F-Distribution
F-distribution (short form of Fisher’s distribution named after statistician Ronald Fisher) is a ratio
of two chi-square distributions. Let Y1 and Y2 be two independent chi-square distributions with k1
and k2 degrees of freedom, respectively. Then the random variable X defined as
Y1 /k1
X=
Y2 /k2
Properties of F-Distribution:
• F-distribution is non-symmetrical and the shape of the distribution depends on the values of
k1 and k2 .
• F-distribution is used in Analysis of Variance to test the mean values of multiple groups.
• VISUALIZATION https://flowingdata.com/2012/05/15/how-to-visualize-and-compare-distributio
• PLOTS http://www.cookbook-r.com/Graphs/Plotting_distributions_(ggplot2)/
3.4 Problems
Problem 59. An employee is selected from a staff of 10 to super- vise a certain project by selecting
a tag at random from a box containing 10 tags numbered from 1 to 10. Find the formula for
the probability distribution of X rep- resenting the number on the tag that is drawn. What is the
probability that the number drawn is less than 4?
Problem 60. According to Chemical Engineering Progress (November 1990), approximately 30%
of all pipework failures in chemical plants are caused by operator error.
(a) What is the probability that out of the next 20 pipework failures at least 10 are due to operator
error?
(b) What is the probability that no more than 4 out of 20 such failures are due to operator error?
(c) Suppose, for a particular plant, that out of the ran- dom sample of 20 such failures, exactly 5 are
due to operator error. Do you feel that the 30% figure stated above applies to this plant? Comment.
Problem 61. The probability that a patient recovers from a delicate heart operation is 0.9. What
is the probabil- ity that exactly 5 of the next 7 patients having this operation survive?
Problem 62. In testing a certain kind of truck tire over rugged terrain, it is found that 25% of
the trucks fail to complete the test run without a blowout. Of the next 15 trucks tested, find the
probability that
(a) from 3 to 6 have blowouts;
(b) fewer than 4 have blowouts;
(c) more than 5 have blowouts.
Problem 63. The probability that a student pilot passes the written test for a private pilot’s license
is 0.7. Find the probability that a given student will pass the test
(a) on the third try;
(b) before the fourth try
Problem 64. On average, 3 traffic accidents per month occur at a certain intersection. What is
the probability that in any given month at this intersection
(a) exactly 5 accidents will occur?
(b) fewer than 3 accidents will occur?
(c) at least 2 accidents will occur?
Problem 65. On average, a textbook author makes two word processing errors per page on the first
draft of her textbook. What is the probability that on the next page she will make
(a) 4 or more errors?
(b) no errors?
Problem 72. Given a random variable X having a normal distribution with µ = 50 and σ = 10,
find the probability that X assumes a value between 45 and 62.
Problem 73. Given a random variable X having a normal distribution with µ = 300 and σ = 50,
find the probability that X assumes a value greater than 362.
Problem 74. Given a normal distribution with µ = 40 and σ = 6, find the value of x that has
(a) 45% of the area to the left and
(b) 14% of the area to the right.
Problem 75. A certain type of storage battery lasts, on average, 3.0 years with a standard deviation
of 0.5 year. Assuming that battery life is normally distributed, find the probability that a given battery
will last less than 2.3 years.
Problem 76. An electrical firm manufactures light bulbs that have a life, before burn-out, that is
normally distributed with mean equal to 800 hours and a standard deviation of 40 hours. Find the
probability that a bulb burns between 778 and 834 hours.
Problem 77. Gauges are used to reject all components for which a certain dimension is not within
the specification 1.50 ± d. It is known that this measurement is normally distributed with mean 1.50
and standard deviation 0.2. Determine the value d such that the specifications “cover” 95% of the
measurements.
Problem 78. A certain machine makes electrical resistors having a mean resistance of 40 ohms and
a standard deviation of 2 ohms. Assuming that the resistance follows a normal distribution and can
be measured to any degree of accuracy, what percentage of resistors will have a resistance exceeding
43 ohms?
Problem 81. The daily amount of coffee, in liters, dispensed by a machine located in an airport
lobby is a random variable X having a continuous uniform distribution with A = 7 and B = 10.
Find the probability that on a given day the amount of coffee dispensed by this machine will be
(a) at most 8.8 liters;
(b) more than 7.4 liters but less than 9.5 liters;
(c) at least 8.5 liters.
Problem 82. Given a standard normal distribution, find the value of k such that
(a) P (Z > k) = 0.2946;
(b) P (Z < k) = 0.0427;
(c) P (−0.93 < Z < k) = 0.7235.
Problem 83. The lifetime T (years) of an electronic component is a continuous random variable
with a probability density function given by
Find the lifetime L which a typical component is 60components are sold to a manufacturer, find the
probability that at least one of them will have a lifetime less than L years.
Problem 84. Commonly, car cooling systems are controlled by electrically driven fans. Assuming
that the lifetime T in hours of a particular make of fan can be modelled by an exponential distribution
with λ = 0.0003 find the proportion of fans which will give at least 10000 hours service. If the fan
is redesigned so that its lifetime may be modelled by an exponential distribution with λ = 0.00035,
would you expect more fans or fewer to give at least 10000 hours service?
Problem 85. The time intervals between successive barges passing a certain point on a busy water-
way have an exponential distribution with mean 8 minutes.
(a) Find the probability that the time interval between two successive barges is less than 5 minutes.
(b) Find a time interval t such that we can be 95% sure that the time interval between two successive
barges will be greater than t.