Random variables
A random variable is a variable whose value is determined by the outcome of
a random experiment. In other word, if a variable has a probability distribution
then it is called a random variable. Random variables can be either discrete or
continuous.
Example
Suppose one family is randomly selected from this population. The process of
randomly selecting a family is called a random experiment. Let x denote the
number of vehicles owned by the selected family. Then x can assume any of the
five possible values (0, 1, 2, 3, and 4) listed in the first column of Table 5.1. The
value assumed by x depends on which family is selected. Thus, this value
depends on the outcome of a random experiment. Consequently, x is called a
random variable.
Discrete random variable
A random variable that assumes countable or certain/isolated values is called a
discrete random variable.
Examples
1. The number of cars sold at a dealership during a given month
2. The number of customers who visit a bank during any given hour
3. The number of heads obtained in two tosses of a coin
4. Determining the number of defects in a batch of 50 items
Continuous random variable
A random variable that can assume any value contained in one or more intervals
is called a continuous random variable.
Example
1. The height of a person
2. Age of a person
3. The price of a house
Probability distribution
The probability distribution of a discrete random variable lists all the
possible values that the random variable can assume and their corresponding
probabilities.
Example
To begin our study of probability distribution, let’s go back to the idea of a fair
coin, suppose we toss a fair coin twice the possible outcomes are:
First toss Second Number of Probability of
toss heads on the four
Possible two tosses possible
outcomes outcomes
from two T T 0 0.5*0.5 0.25
tosses of a fair
T H 1 0.5*0.5 0.25
coin
H T 1 0.5*0.5 0.25
H H 2 0.5*0.5 0.25
Total 1.0
Characteristics of probability distribution
The probability distribution of a discrete random variable possesses the
following two characteristics.
1. 0 ≤ P (x) ≤ 1 for each value of x
2. ΣP (x) = 1.
Types of probability distribution
Probability Distribution
Discrete probability distribution Continuous probability distribution
Bernoulli distribution Uniform distribution
Binomial distribution Exponential distribution
Poisson distribution Normal distribution
Discrete probability distribution
The probability distribution of a discrete random variable lists all the possible
values that the random variable can assume and their corresponding
probabilities.
Example
The probability that you were born in a given month is also discrete because
there are 12 possible values.
Example
Continuous probability distribution
In a continuous probability distribution, the variable under consideration can
take on any within a given range. So, we cannot list all the possible values.
Example
Suppose we were examining the level of effluent in a variety of streams and we
measured the level of effluent by parts of effluent per million parts of water. We
would expect quite a continuous range of parts per million (ppm), all the way
from very low levels is clear mountains streams of extremely high levels in
polluted streams. We would call the distribution of this variable (ppm) a
continuous distribution.
Bernoulli distribution
Bernoulli trial
A random experiment whose outcomes have been classified into two categories
namely “success” and “failure” represented by letters S and F respectively is
called a Bernoulli trail.
Bernoulli distribution
A discrete random variable X is said to have a Bernoulli distribution if its
probability function is given by
p x q1 x for x 0,1
f x, p
0, otherwise
where p is the parameter of the distribution satisfying 0 p 1 and p q 1 .
Example
A coin is tossed in which the outcome “head” is a success and the probability of
head is p . Then q 1 p is the probability of failure or tail. If the number of
heads or success is a random variable X , the X can take values 0 or 1 according
to the outcome is tail (failure) or head (success). Then the probability function
of X is
p x q1 x for x 0,1
f x, p
0, otherwise
Binomial distribution
Introduction
Binomial distribution was first derived by Swiss mathematician James Bernoulli
(1654-1705) and was first published posthumously in 1913, eight years after his
death.
Definition
A discrete random variable X is said to have a binomial distribution if its
probability function is defined by
n x n x
p q for x 0,1, 2,..., n
f x; n, p x
0; otherwise
where the two parameters n and p satisfy 0 p 1 and p q 1 , also n is
positive integer. For a binomial experiment, the probability of exactly x
successes in n trials is given by the binomial formula
where
n = total number of trials
p = probability of success
q = 1 – p = probability of failure
x = number of successes in n trials
n - x = number of failures in n trials
Conditions for Binomial distribution
o There are n identical finite trials.
o Each trail has only two possible outcomes.
o The probabilities of the two outcomes remain constant.
o The trials are independent.
Mean of Binomial distribution
Mean np , where n number of trials and p probability of success.
Variance of Binomial distribution
Variance npq , where, n number of trials, p probability of success and q
probability of failure 1 p .
Example 1:
Five percent of all DVD players manufactured by a large electronics company
are defective. Three DVD players are randomly selected from the production
line of this company. The selected DVD players are inspected to determine
whether each of them is defective or good. Is this experiment a binomial
experiment?
1. This example consists of three identical trials.
2. Each trial has two outcomes: defective or good.
3. The probability p that a DVD player is defective is 0.05. The
probability q that a DVD player is good is 0.95.
4. Each trial (DVD player) is independent.
Because all four conditions of a binomial experiment are satisfied, this is
an example of a binomial experiment.
Example 2:
2
In a community, the probability that a newly born child will be boy . Among
5
the 4 newly born children in that community, what is the probability that
(a) All the four boys
(b) No boys
(c) Exactly one boy.
Solution
Let us consider the event that a newly born child is a boy as success in Bernoulli
2
trial with probability of success . Let the number of boys be a random variable
5
X . Then X can take values 0, 1, 2, 3, and 4.
According to binomial law, the probability function of X is
4 x
2 4 2 3
x
f x, 4, for x 0,1, 2,3, 4 .
5 x 5 5
44
4 2
4
a) p all boys p x 4
3
0.0256 .
4 5 5
40
4 2
0
b) p no boys p x 0
3
0.1296 .
0 5 5
4 1
4 2
1
c) p exactly one boy p x 1
3
0.3456 .
1 5 5
Example 3:
A fair coin is tossed 5 times. Find the probability of
a) exactly two heads
b) no head
Solution
Let the number of heads be a random variable X which can take values 0, 1, 2,
1
3, 4 and 5. Then X is binomial variate with p and n 5 .
2
The probability function of X is
5 x
1 5 1 1
x
f x,5, for x 0,1, 2,3, 4,5
2 x 2 2
5 2
5 1
2
a) p exactly two heads p x 2
1
0.3125 .
2 2 2
50
5 1
0
b) p no heads p x 0
1
0.03125 .
2 2 2
Example 4:
Determine the binomial distribution for which mean is 4 and variance is 3.
Solution
Let X be a binomial variate with parameters n and p . Here, we have, np 4
npq 3 3 3 1 4 4
and npq 3 . Thus q and p 1 q 1 . Then n 16 .
np 4 4 4 4 p 1
4
Hence, the binomial distribution is
16 1 x 3 16 x
for x 0,1, 2,...,16.
f x; n, p x
4 4
0; otherwise
Example 5:
At the Express House Delivery Service, providing high-quality service to
customers is the top priority of the management. The company guarantees a
refund of all charges if a package it is delivering does not arrive at its
destination by the specified time. It is known from past data that despite all
efforts, 2% of the packages mailed through this company do not arrive at their
destinations within the specified time. Suppose a corporation mails 10 packages
through Express House Delivery Service on a certain day.
a) Find the probability that exactly one of these 10 packages will not arrive
at its destination within the specified time.
b) Find the probability that at most one of these 10 packages will not arrive
at its destination within the specified time.
Solution:
n=total number of packages mailed = 10
p = P (success) = 2% = 0.02
q = P (failure) = 1 – 0.02 = 0.98
a) We know that,
x = number of successes = 1
n – x = number of failures = 10 – 1 = 9
10!
𝑃(𝑥 = 1) = 10 𝐶1 (0.02)1 (0.98)9 = (0.02)1 (0.98)9
1! (10 − 1)!
= (10)(.02)(.83374776) = 0.1667
Thus, there is a 0.1667 probability that exactly one of the 10 packages mailed
will not arrive at its destination within the specified time
b) At most one x = 0 and x = 1
𝑃(𝑥 ≤ 1) = 𝑃(𝑥 = 0) + 𝑃(𝑥 = 1)
=10 𝐶0 (0.02)0 (0.98)10 +10 𝐶1 (0.02)1 (0.98)9
= (1)(1)(0.81707281) + (10)(0.02)(0.83374776)
= 0.8171 + 0.1667 = 0 .9838
Thus, the probability that at most one of the 10 packages mailed will not arrive
at its destination within the specified time is 0.9838.
Example 6: The phone lines to an airline reservation system are occupied 40%
of the time. Assume that the events that the lines are occupied on successive
calls are independent. Assume that 10 calls are placed to the airline.
(a) What is the probability that for exactly three calls the lines are occupied?
(b) What is the probability that for at least one call the lines are not occupied?
(c) What is the expected number of calls in which the lines are all occupied?
Solution:
Let,
X , be the airline reservation system is occupied.
Then, p = 40% = 0.40, q = 1- p= 1- 0.40= 0.60 and n= 10
According to binomial law, the probability function of X is
f x = nc x px q(n−x)
(a) x = 3
p[x = 3]= f 3 = 10c 3 (0.4)3 (0.6)(10−3) = 120 ∗ 0.064 ∗ 0.028 =
0.215
(b) p[x ≥ 1]= 1 −p[x < 1] = 1- p[x = 0]=1-10c 0 ( 0.4)0 (0.6) 10−0
=1- 0.00604 = 0.993
c) The expected number of calls in which the lines are all occupied
Mean= n.p =10*0.40= 4
Example 7 (self test)
Each sample of water has a 10% chance of containing a particular organic
pollutant. Assume that the samples are independent with regard to the presence
of the pollutant. Find the probability that in the next 18 samples, (i) exactly 2
contain the pollutant; (ii) determine the probability that at least four samples
contain the pollutant,
Example 8: The incidence of occupational disease in an industry is such that
the workers have 20% chance of suffering from it. What is the probability that
out of six workers
i. 4 or more will contract disease?
ii. Exactly 3 will contract disease?
iii. At best 2 will contract disease?
iv. Find the mean and variance of workers have suffering
from
Example 9: Warranty records show that the probability that a new car needs a
warranty repair in the first 90 days is 0.05. If a sample of three new cars is
selected, what is the probability that in the first 90 days
i. None needs a warranty repairs?
ii. More than one needs a warranty repairs?
iii. At least one needs a warranty repairs?
iv. What are the mean and standard deviation of number warranty
repair?
Poisson distribution
Introduction
Poisson distribution was developed by France mathematician and physicist
Simeon Denis Poisson (1781-1840), who published it in 1837.
Definition
A discrete random variable X is said to have a Poisson distribution if its
probability function is given by
e- x
for x 0,1, 2,..., .
f x; x !
0; otherwise
where, e 2.71828 and is the parameter of the distribution which is the mean
number of success and np .
Note:
If X is a Poisson variate with parameter , then mean and variance .
Hence, mean and variance of Poisson distribution are equal.
Examples
The number of cars passing a certain street in time t .
Number of suicide reported in a particular day.
Number of faulty blades in a packet of 100.
Number of printing mistakes at each page of a book.
Number of air accidents in some unit of time.
Number of deaths from a disease such as heart attack or cancer or due to
snake bite.
Number of telephone calls received at a particular telephone exchange in
some unit of time.
The number of defective materials in a packing manufactured by a good
concern.
The number of letters lost in a mail on a given day in a certain big city.
The number of fishes caught in a day in a certain city.
The number of robbers caught on a given day in a certain city.
Approximation of Binomial Distribution to Poisson:
When 𝑝→0 (𝑆𝑢𝑐𝑐𝑒𝑠𝑠𝑟𝑎𝑡𝑒𝑖𝑠𝑣𝑒𝑟𝑦𝑙𝑜𝑤),
𝑛→∞ (𝑇𝑟𝑖𝑎𝑙𝑛𝑢𝑚𝑛𝑒𝑟𝑖𝑠𝑣𝑒𝑟𝑦𝑙𝑎𝑟𝑔𝑒);
Then Binomial Distribution is approximated to Poisson.
Mathematically, 𝐵𝑖𝑛𝑜𝑚 (𝑥; 𝑛, 𝑝) ≈ 𝑃𝑜𝑖𝑠 (𝑥; 𝜆), where, 𝜆 = 𝑛𝑝.
N.B: As a rule of thumb, if n>29 and 𝑛𝑝 ≤ 7, the approximation is close enough
to use the Poisson distribution for binomial problems.
Example 1:
Suppose that the number of emergency patients in a given day at a certain
hospital is a Poisson variable X with parameter 20 . What is the probability
that in a given day there will be?
a) 15 emergency patients.
b) At least 3 emergency patients.
c) More than 20 but less than 25 patients.
Solution
We know that,
e- x
f x; for x 0,1, 2,..., .
x!
e-20 20 x
Here, 20 , f x; 20 for x 0,1, 2,..., .
x !
e-20 20
15
a) p 15 emergency patients p x 15 0.0516 .
15!
b) p at least 3 patients p x 3 1 p x 3
1 p x 0 p x 1 p x 2
e-20 20 e-20 20 e-20 20
0 1 2
1 1.
0! 1! 2!
c) p 20 x 25 p x 21 p x 22 p x 23 p x 24
e-20 20 e-20 20 e-20 20 e-20 20
21 22 23 24
0.2841 .
21! 22! 23! 24!
Example 2:
If the probability that a car accident happens is a very busy road in on hour is
0.001. If 2000 cars passed in one hour by the road, what is the probability that?
a) exactly 3
b) More than 2 car accidents happened on that hour of the road.
Solution
We know that,
e- x
f x; for x 0,1, 2,..., .
x!
Here, p 0.001 , n 2000
np 2000*0.001 2 .
e-2 2 x
f x; 2 for x 0,1, 2,..., .
x !
e-2 2
3
a) p exactly 3 accidents p x 3 0.18 .
3!
b) p more than 2 accidents p x 2 1 p x 2
1 p x 0 p x 1 p x 2
e-2 2 e-2 2 e-2 2
0 1 2
1 0.323 .
0! 1! 2!
Example 3:
A factory produces blades in a packet of 10. The probability of a blade to be
defective is 0.2%. Find the number of packets having two defective blades in a
consignment of 10,000 packets.
Solution
We know that,
e- x
f x; for x 0,1, 2,..., .
x!
Here, p 0.2% 0.002 , n 10 . np 10*0.002 0.02 .
e-0.02 0.02
2
p 2 defective blades p x 2 0.000196 .
2!
Therefore, the total number of packets having two defective blades in a
consignment of 10,000 packets is 10000 0.000196 1.96 2 .
Example 4:
What probability model is appropriate to describe a situation where 100
misprints are distributed randomly throughout the 100 pages of a book? For this
model, what is the probability that a page observed at random will contain at
least three misprints?
Solution
We know that,
e- x
f x; for x 0,1, 2,..., .
x!
1
We have, p 0.01 (because there is only one mistake on the average in a page) ,
100
n 100 . np 100 0.01 1 .
p at least 3 misprints p x 3 1 p x 3
1 p x 0 p x 1 p x 2
e-1 1 e-1 1 e-1 1
0 1 2
1 0.0803
0! 1! 2!
Example 5 (self-test)
On average, a household receives 9.5 telemarketing phone calls per week. Using
the Poisson distribution formula, find the probability that a randomly selected
household receives exactly 6 telemarketing phone calls during a given week.
Example 6:
A washing machine in a Laundromat breaks down an average of three times per
month. Using the Poisson probability distribution formula, find the probability
that during the next month this machine will have
i. a) exactly two breakdowns
ii. b) at most one breakdown
Example 7:
An auto salesperson sells an average of 0.9 cars per day. Let x be the number of
cars sold by this salesperson on any given day. Find the mean, variance, and
standard deviation.
Normal Distribution
Normal distribution is the most important probability distribution in Statistics.
Definition
A continuous random variable X is said to have a normal distribution if its
density function is given by
x 2
f x, , 2
1
e 2 2
; x 1
2
Where, the parameters and 2 satisfy and 2 0 .
The variable X whose density function given in (1) is called normal variate
with parameters and 2 and is denoted by N , 2 . The parameters and 2
are actually the mean and variance of the normal variate X . The graph of the
normal curve is
Standard Normal variate:
X
If X is a normal variate with parameters and 2 , then Z is a standard
normal variate with mean zero and variance unity. The density function of Z is
2
1 z2
f z , 0,1 e ; z
2
Importance of Normal Distribution
Most of the distributions occurring in practice can be approximated by
normal distribution. Moreover, many of the sampling distributions e., g.,
Student’s t , Snedecor’s F , Chi-square distributions etc tend to normal for
large samples.
Normal distribution finds large applications in Statistical Quality Control
in industry for setting control limits.
Note: Let X be a continuous random variable with a cumulative distribution
function F x and let a and b be two possible values of X , with a b . The
probability that X lies between a and b is
p a x b F b F a
Example 1:
A company produces light bulbs whose life times follows a normal distribution
with mean 1200 hours and standard deviation 250 hours. If a light bulb is
chosen randomly from the company’s output, what is the probability that its life
time will be between 900 and 1300 hours?
Solution
Let X represent lifetime in hours. Then
900 X 1300
p 900 x 1300 p
900 1200 1300 1200
p z
250 250
p 1.2 z 0.4
p z 0.4 p z 1.2
0.65542 0.11507
(By using Normal table)
0.54035
Hence, the probability is approximately 0.54 that a light bulb will last between
900 and 1300 hours.
Example 2:
A very large group of students obtains test scores that are normally distributed
with mean 60 and standard deviation 15. What proportion of students obtained
scores?
a) Less than 85.
b) More than 90.
c) Between 85 and 95.
Solution
Let X denote the test score. Then
X 85 85 60
a) p x 80 p
p z
15
p z 1.67 p z 1.67
0.9525 . (By using Normal table).
That is 95.25% of the students obtained scores less than 80.
X 90 90 60
b) p x 90 p
p z
15
p z 2 1 p z 2 1 p z 2
1 0.9772 0.0228 . (By using Normal table).
That is 2.28% of the students obtained scores more than 90.
85 X 95
c) p 85 x 95 p
85 60 95 60
p z
15 15
p 1.67 z 2.33
p z 2.33 p z 1.67
0.9901 0.9525
(By using Normal table)
0.03756
That is 3.76% of the students obtained scores in the range 85 to 95.
Example 3:
The average daily sales of 500 branch office were Tk. 150 thousands and the
standard deviation Tk. 15 thousands. Assuming the distribution to be normal
indicate how many branches have sales between
a) Tk. 120 thousands and Tk. 145 thousands.
b) Tk. 140 thousands and Tk. 165 thousands.
Solution
Let X be the average daily sales of 500 branch office.
120 X 145
a) p 120 x 145 p
120 150 145 150
p z
15 15
p 2 z 0.33
p z 0.33 p z 2
0.3707 0.02275 0.34795 (By using Normal table)
Hence, the expected number of branches having sales between Tk. 120
thousands and Tk. 145 thousands are 0.3479 500 173.95 174
140 X 165
b) p 140 x 165 p
140 150 165 150
p z
15 15
p 0.67 z 1
p z 1 p z 0.67
0.84434 0.25143 0.58991 (By using Normal table)
Hence, the expected number of branches having sales between Tk. 140
thousands and Tk. 165 thousands are
0.58991 500 294.955 295
Example 4:
The life span of a calculator manufactured by Texas Instruments has a normal
distribution with a mean of 54 months and a standard deviation of 8 months.
The company guarantees that any calculator that starts malfunctioning within 36
months of the purchase will be replaced by a new one. About what percentage
of calculators made by this company are expected to be replaced?
Solution: [Hints: For x = 36, P(x < 36) = P (z < -2.25) = .0122, Hence, 1.22%
of the calculators are expected to be replaced]
Example 5:
The line width of for semiconductor manufacturing is assumed to be normally
distributed with a mean of 0.5 micrometer and a standard deviation of 0.05
micrometer.
(a) What is the probability that a line width is greater than 0.62 micrometer?
(b) What is the probability that a line width is between 0.47 and 0.63
micrometer?
Solution: Try yourself [Hints: (a) for, x = 0.62, P(x > 0.62) = P (z > 2.4) = 1 -
2.4 = 0.99]
Example 6:
The life of a semiconductor laser at a constant power is normally distributed
with a mean of 7000 hours and a standard deviation of 600 hours. What is the
probability that a laser fails before 5000 hours?
Solution: Try yourself [Hints: For x = 5000, P(x < 5000) ]
Example 7:
The time it takes a cell to divide (called mitosis) is normally distributed with an
average time of one hour and a standard deviation of 5 minutes.
(a) What is the probability that a cell divides in less than 45 minutes?
(b) What is the probability that it takes a cell more than 65 minutes to divide?
Solution: Try yourself [Hints: mean= 1 hour = 60 min]