0% found this document useful (0 votes)
7 views73 pages

Theory Unit 4

This document provides an introduction to discrete probability distributions, focusing on random variables (r.v.), their types, and probability mass functions. It explains the distinction between discrete and continuous r.v.s, how to derive probability mass functions, and introduces cumulative distribution functions. Additionally, it covers the concepts of expectation and variance for discrete r.v.s, emphasizing their significance in summarizing distribution features.

Uploaded by

p.irena1998
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views73 pages

Theory Unit 4

This document provides an introduction to discrete probability distributions, focusing on random variables (r.v.), their types, and probability mass functions. It explains the distinction between discrete and continuous r.v.s, how to derive probability mass functions, and introduces cumulative distribution functions. Additionally, it covers the concepts of expectation and variance for discrete r.v.s, emphasizing their significance in summarizing distribution features.

Uploaded by

p.irena1998
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 73

INTRODUCTION TO STATISTICS

UNIT 4: DISCRETE PROBABILITY DISTRIBUTIONS


2024-25

STATISTICS (2024-25) UNIT 4 1 / 73


1. Random variables (r.v.): De…nition and types

A random experiment where the possible outcomes are real numbers is


called a random variable, that is, a random experiment in which the
sample space Ω is a subset of R (it could also be all numbers in R).
For example, if the random experiment is "‡ip a coin two times and
analyze the outcomes", then the possible outcomes are fHH, HT , TH,
TT g (where H is heads and T is tails). In this case, the random
experiment is not a random variable. However, if the random
experiment is "‡ip a coin two times and count the number of heads",
then the possible outcomes are f0, 1, 2g. In this case the random
experiment is the random variable.
Random variables are denoted by capital letters: X , Y , ...
Since we will be using the term "random variable" a lot, we will use
the abbreviation "r.v." to make things more simple.

STATISTICS (2024-25) UNIT 4 2 / 73


1. Random variables (r.v.): De…nition and types

Types of random variables:


A r.v. X is discrete if the set of possible outcomes of X is …nite or
countably in…nite.
A r.v. X is continuous if the set of possible outcomes of X is
uncountably in…nite.

EXAMPLE 1: In the following situations, what type of random


variable (r.v.) is X ?
If the possible outcomes of X are, 21 , 1, 3
2 and 2, then:
X is a discrete r.v. (there is a …nite number of possible outcomes).

STATISTICS (2024-25) UNIT 4 3 / 73


1. Random variables (r.v.): De…nition and types

EXAMPLE 1 (Cont.):

If X denotes the number of customers that visit a store in a day, then:


X is a discrete r.v. (the possible outcomes of X are 0 and all the
natural numbers, so there is a countably in…nite number of possible
outcomes).
If X denotes the time (in seconds) it takes an employee to perform a
task, then:
X is a continuous r.v. (because the possible outcomes of X are positive
real numbers, so there are uncountably in…nite possible outcomes).
If X denotes how much a child born in a particular hospital will weigh
(in kg), then:
X is a continuous r.v. (because the set of possible outcomes of X
contains uncountably in…nite possible outcomes).

STATISTICS (2024-25) UNIT 4 4 / 73


2. Probability distributions for discrete random variables

If X is a discrete r.v., the probability mass function of X is the


function that gives the probability of each possible outcome xi . The
range (or support) of the the r.v. is the set of all possible outcomes.
The probability mass function is usually denoted by fX , or simply f if
there is no ambiguity, that is:

f (xi ) = P (X = xi ), for each possible outcome xi

The range of the r.v. is usually denoted by RX , or simply R if there is


no ambiguity.
Properties of the probability mass function of a discrete r.v.
1 The values f (xi ) are always in the interval (0, 1] because they are
probabilities of possible outcomes.
2 The sum of all the values of the probability mass function is 1, that is,
∑x 2R f (x ) = 1 because the sum of these values is the probability of
the entire sample space, and this probability must be 1.

STATISTICS (2024-25) UNIT 4 5 / 73


2. Probability distributions for discrete random variables

How do we obtain the probability mass function of a discrete r.v. in


practice?
There are situations where we can deduce the probability mass function
of a r.v. by analyzing how its outcomes are generated. This occurs, for
example, in games of chance or in cases where it is possible to apply
Laplace’s rule.
There are other situations where it is not possible to deduce the
probability mass function based on reasoning. In these cases, we
assume a probability mass function for the r.v. (based on past
experience or subjective assessments) and, afterwards, we can analyze
whether the assumed probability mass function is appropriate or not if
a large number of outcomes of the r.v. is available (how to perform
this type of analysis is discussed in courses on statistical inference).

STATISTICS (2024-25) UNIT 4 6 / 73


2. Probability distributions for discrete random variables

Note that the probability mass function characterizes a discrete r.v. in


the sense that if the probability mass function is known, it is possible
to deduce the probability of any event about the r.v.
It is also important to note that the probability mass function is not
de…ned for all real numbers, but only for real numbers that are
possible outcomes of the r.v. (therefore this function is di¤erent from
the functions that are usually studied in mathematics).
The probability mass function of a discrete r.v. is usually represented
with vertical segments on each of the possible outcomes, where the
height of each segment is equal to the probability of the
corresponding point. This graph shows the distribution of the
probabilities among the possible outcomes.

STATISTICS (2024-25) UNIT 4 7 / 73


2. Probability distributions for discrete random variables
EXAMPLE 2: Consider a random experiment where we toss a coin
two times. We use X to denote the r.v. indicating the number of
heads obtained after two tosses. The possible outcomes of X are:

f0, 1, 2g
Now we obtain the probability mass function of X .
In this case, the possible outcomes of the experiment "toss the coin
two times" are HH, HT, TH, TT (where H is get a head and T is get a
tail). The probability of each of these four possible outcomes is 14 .
Therefore, the probability mass function of X is:

f (0) = P (X = 0) = P (Number of heads = 0) = P (TT) = 1


4

f (1) = P (X = 1) = P (Number of heads = 1) = P (HT) + P (TH) = 1


2

f (2) = P (X = 2) = P (Number of heads = 2) = P (HH) = 1


4

STATISTICS (2024-25) UNIT 4 8 / 73


2. Probability distributions for discrete random variables
EXAMPLE 2 (Cont.): The following graph shows the probability
mass function we have obtained:

f(x) 0.5

0.4

0.3

0.2

0.1

0 1 2
x

STATISTICS (2024-25) UNIT 4 9 / 73


2. Probability distributions for discrete random variables
EXAMPLE 3: Let X be the r.v. "number of cars of a particular
make sold in a day at a car dealership". Suppose we know that the
probability mass function of X is:
x 0 1 2 3
f (x ) 0.70 0.25 0.03 λ
In this case, the probability mass function is already given but it is
incomplete because we do not know the value of λ that appears as
f (3), that is, we do not know the probability that X equals 3.
However, it is easy to deduce the value of λ because the sum of all the
values of the probability mass function is 1, so:
0.70 + 0.25 + 0.03 + λ = 1
Solving the above equation, we get that λ = 0.02. From here, we can
derive any probability about X . For example, the probability that this
car dealership will sell more than one car in a day is:
P (X > 1) = P (X = 2) + P (X = 3) = f (2) + f (3) = 0.03 + 0.02 = 0.05

STATISTICS (2024-25) UNIT 4 10 / 73


2. Probability distributions for discrete random variables

Another interesting tool that can be used to analyze a discrete r.v. is


the cumulative distribution function. For a discrete r.v. X , the
cumulative distribution function of X is the function that gives the
probability that the outcome of the r.v. X is less than or equal to x,
for every real number x. This function is denoted by F , or by FX if we
want to note the r.v. we are referring to. Thus:

F (x ) = P (X x ), for every real number x

In the notation, it is important to note the distinction between X


(capital letter: outcome of the random experiment) and x (lowercase:
a generic value of the function, that is, any real number).
Note that, unlike the probability mass function, the cumulative
distribution function is de…ned for all real numbers, and not only for
those which are possible outcomes of the r.v.

STATISTICS (2024-25) UNIT 4 11 / 73


2. Probability distributions for discrete random variables

Properties of the cumulative distribution function of a discrete r.v.:


1 F (x ) is always in the interval [0, 1] because F (x ) is the probability of
event X x, that is, it is the probability that the outcome of the
random experiment is less than or equal to the real number x.
2 F (x ) is nondecreasing, that is, if a < b then F (a) F (b ).We know
that this will be true because, using R to denote the set of all possible
outcomes of X , we have that:

F (a ) = P (X a) = ∑u 2R ,u a
P (X = u ) = ∑u 2R ,u a
f (u )

F (b ) = P (X b) = ∑u 2R ,u b
P (X = u ) = ∑u 2R ,u b
f (u )

Therefore, if there is no possible outcome in the interval (a, b ], then it


will be F (a) = F (b ). If there is a possible outcome in the interval
(a, b ], then it will be F (a) < F (b ), since F (b ) will contain all the
terms contained in F (a) plus an additional term, which will be positive.
Therefore we can ensure that F (a) F (b ).
STATISTICS (2024-25) UNIT 4 12 / 73
2. Probability distributions for discrete random variables
EXAMPLE 2 (Cont.): The cumulative distribution function F (x )
yields the probability of the event “X x,” where x is any number.
In this example, X =“number of heads after two tosses.” We saw that
the possible outcomes of X are 0, 1 and 2, and their probabilities are
f (0) = 1/4, f (1) = 1/2 and f (2) = 1/4. Therefore:
If x < 0: F (x ) = P (Number of heads x ) = 0, because
no possible outcome of X is x.

If x 2 [0, 1): F (x ) = P (Number of heads x ) = f (0) = 14 , because


0 is the only possible outcome of X that is x.

If x 2 [1, 2): F (x ) = P (Number of heads x ) = f (0) + f (1) = 34 ,


because 0, 1 are the possible outcomes of X that are x.

If x 2: F (x ) = P (Number of heads x ) = f (0) + f (1) + f (2)


= 1, because all possible outcomes of X are x.

STATISTICS (2024-25) UNIT 4 13 / 73


2. Probability distributions for discrete random variables
EXAMPLE 2 (Cont.): The following graph shows the function
F (x ):

1.0
F(x)

0.5

-2 -1 0 1 2 3 4
x

STATISTICS (2024-25) UNIT 4 14 / 73


2. Probability distributions for discrete random variables
Using reasoning like in the previous example, it is easy to see that the
cumulative distribution function of a discrete r.v. with possible
outcomes fx1 , x2 , ..., xn g, where x1 < x2 < < xn , is always a step
function with discontinuities in x1 , x2 , ..., xn , and the jump at every
possible outcome xi is equal to the probability of xi , that is:
8
>
> 0 if x < x1
>
>
>
> f (x1 ) if x 2 [x1 , x2 )
>
>
<
F (x ) = f (x1 ) + f (x2 ) if x 2 [x2 , x3 )
>
> .............................. ......................
>
>
>
> f ( x1 ) + f ( x2 ) + + f ( xn 1 ) if x 2 [xn 1 , xn )
>
>
: 1 if x xn
The cumulative distribution function is useful when we want to …nd
the probability of a large number of possible outcomes; in this case,
knowing F (x ) can help us to …nd probabilities more easily. In the last
section of this lesson we will study an example of this situation.
STATISTICS (2024-25) UNIT 4 15 / 73
3. Expectation and variance of a discrete random variable

In this section we will study how to obtain numerical measures that


summarize the most important features of the distribution of a
discrete random variable. Speci…cally, we will de…ne a measure that
indicates the average location of the possible outcomes (“expectation
of the r.v.”) and measures that indicate the degree of dispersion
(“variance and standard deviation of a r.v.”).

This section will be divided into three sub-sections:


1 Expectation of a discrete r.v. X
2 Expectation of a function of a discrete r.v. g (X )
3 Variance and standard deviation of a discrete r.v. X

STATISTICS (2024-25) UNIT 4 16 / 73


3.1. Expectation of a discrete r.v.
The expectation of a discrete r.v. X whose possible outcomes are
fx1 , ..., xn g is de…ned as the weighted average of the possible
outcomes, where the weights equal the probability of each possible
outcome. This value is denoted by E (X ). Following this de…nition,
we have that:
E (X ) = x1 f (x1 ) + + xn f (xn )
The expectation of X is also called the expected value of X , the
mean of X and the average value of X . The Greek letter µ is
sometimes used to denote the mean of the r.v. X (or µX to
emphasize which r.v. we are referring to).
Note that:
Expectation is de…ned as a weighted average of the possible outcomes
rather than as a simple average in order to re‡ect the fact that the
most likely possible outcomes are considered the most important
outcomes.
The measurement unit of E (X ) is the same as X .
STATISTICS (2024-25) UNIT 4 17 / 73
3.1. Expectation of a discrete r.v.
EXAMPLE 2 (Cont.): The probability mass function of the r.v.
X =“number of heads when tossing a coin two times” is:
x 0 1 2
f (x ) 0.25 0.50 0.25
The expectation of X is the weighted average of the possible outcomes
of X , where the weights are the probabilities, that is:
E (X ) = 0 0.25 + 1 0.50 + 2 0.25 = 1 head

EXAMPLE 3 (Cont.): The probability mass function of the r.v.


X =“number of cars sold in a day at a car dealership” is:
x 0 1 2 3
f (x ) 0.70 0.25 0.03 0.02
The expectation of X is:
E (X ) = 0 0.70 + 1 0.25 + 2 0.03 + 3 0.02 = 0.37 cars
STATISTICS (2024-25) UNIT 4 18 / 73
3.1. Expectation of a discrete r.v.

In Descriptive Statistics, we de…ne the concept of sample mean as a


measure of central tendency of a sample. We have now studied the
concept of expectation or mean as a measure of central tendency
of a discrete r.v. What is the relationship between the two
concepts?
Suppose it were possible to observe the outcome of the random
experiment denoted by X a very large number of times n. These n
results comprise a sample and can be used to calculate the sample
mean. It is easy to see that the sample mean obtained with these n
outcomes will be approximately equal to E (X ) (in the example
below we will show why this occurs). This property is true because we
have de…ned E (X ) as the probability-weighted average, and it is the
intuitive interpretation of the concept of expectation for a discrete
r.v. X .

STATISTICS (2024-25) UNIT 4 19 / 73


3.1. Expectation of a discrete r.v.

EXAMPLE 3 (Cont.): We have found that the expectation of X is


0.37 cars. How should we interpret this result?
Suppose we could observe the number of cars sold in a day at several
car dealerships (i.e., 1000). Taking into account the probability mass
function of X , we get that about 70% of the dealers (i.e., 700) would
not have sold any cars, 25% of the dealers (i.e., 250) would have sold 1
car, 3% of the dealers (i.e., 30) would have sold 2 cars, and 2% of the
dealers (i.e., 20) would have sold 3 cars. Therefore, the sample mean
of the 1000 observations of X would be approximately equal to:
0 700 + 1 250 + 2 30 + 3 20
= 0.37
1000
As we can see, the sample mean we would get with all the observations
would be approximately equal to the expectation of X .

STATISTICS (2024-25) UNIT 4 20 / 73


3.1. Expectation of a discrete r.v.

EXAMPLE 4: In a lottery, a natural number between 1 and 100 is


drawn. A ticket costs 1 euro. If the number on the ticket matches the
number drawn, the player wins 10 euros; if only the last number
matches, the player gets back the cost of the ticket. In any other
case, the player gets nothing. Let X be the r.v. indicating the net
winnings earned by a player. Calculate the possible outcomes of the
r.v. X , its probability mass function and its expectation.
There are three possible situations: the player wins, the player gets
back the cost of the ticket, or the player loses. Let us calculate the net
winnings of these situations as well as the probability that each
situation will occur.

STATISTICS (2024-25) UNIT 4 21 / 73


3.1. Expectation of a discrete r.v.

EXAMPLE 4 (Cont.):
If the player wins, the net winnings will be 9 euros (the 10 euros that
the player wins minus the euro that the ticket cost). This situation will
occur if the number drawn matches the number that the player bought.
1 .
Since there are 100 numbers, the probability that this will occur is 100

If the player gets back the cost of the ticket, the net winnings will be 0
euros. This situation will occur if the number drawn is one of the 9
numbers with the same termination as the one the player bought, but
which do not match the number bought. The probability that this will
9 .
occur is 100

If the player loses, the net winnings will be 1 euro. This situation will
occur if the number drawn is one of the 90 numbers whose termination
does not coincide with the termination of the number bought. The
90 .
probability that this will occur is 100

STATISTICS (2024-25) UNIT 4 22 / 73


3.1. Expectation of a discrete r.v.

EXAMPLE 4 (Cont.): We have deduced, therefore, that the


possible outcomes of X are 1, 0 and 9, and their probability mass
function is:
x 1 0 9
f (x ) 0.90 0.09 0.01

From this information it is easy to deduce the expectation of the r.v.


X.
The expectation of X is the weighted average of the possible outcomes:

E (X ) = ( 1) 0.90 + 0 0.09 + 9 0.01 = 0.81 euros

This result means that if we play this lottery, for example, 1000 times,
and we call X1 , ..., X1000 the net winnings that we get in each one of
this lotteries, then the sample mean (X1 + X2 + + X1000 )/1000
will be approximately equal to 0.81; therefore, the total net winnings
X1 + X2 + + X1000 will be approximately equal to 810 euros.
STATISTICS (2024-25) UNIT 4 23 / 73
3.1. Expectation of a discrete r.v.

EXAMPLE 5: A player takes part in the following game: the player


tosses a coin three times; if he gets three heads then he wins 12
euros, if he gets two heads and one tail then he wins 1 euro, and if he
gets less than two heads then he wins nothing. Let us …nd the
expectation of the amount of money that a player wins when taking
part in this game.
Let X be the amount of money that the player wins. In this game
there are several stages; in these situations, it is advisable to use a tree
diagram to see how the game works and the possible outcomes of X .

STATISTICS (2024-25) UNIT 4 24 / 73


3.1. Esperanza de una v.a. discreta
EJEMPLO 5 (Cont.):
As the three stages of the game are independent, the tree diagram is as
follows (the outcome of X in each case is shown at the right).
2 *
Heads 0.5 X = 12, with prob. 0.53
6 Heads 0.5
6 *
6
6 Tails 0.5 X = 1, with prob. 0.53
6 Heads 0.5
6 *
6 Heads 0.5 X = 1, with prob. 0.53
6
6 Tails 0.5
6
6 Tails 0.5 X = 0, with prob. 0.53
6
6
6 *
6 Heads 0.5 X = 1, with prob. 0.53
6
6 Heads 0.5
6 *
6 Tails 0.5 X = 0, with prob. 0.53
6
6 Tails 0.5
6 *
6 Heads 0.5 X = 0, with prob. 0.53
6
4 Tails 0.5
Tails 0.5 X = 0, with prob. 0.53
STATISTICS (2024-25) UNIT 4 25 / 73
3.1. Esperanza de una v.a. discreta
EJEMPLO 5 (Cont.):
In the diagram we observe that the possible outcomes of X are 12, 1
and 0, and their probabilities are:
f (12) = 0.53 = 0.125
f (1) = 0.53 + 0.53 + 0.53 = 0.375
f (0) = 0.53 + 0.53 + 0.53 + 0.53 = 0.5
Therefore, the expectation of the amount of money that the player will
win is:
E (X ) = 12 0.125 + 1 0.375 + 0 0.5 = 1.875 euros
Let us assume that the player could also take part in the following
alternative game: the player rolls a die twice; if he gets two sixes then
he wins 12 euros, if he gets a six one of the times but not the other
then he wins 6 euros, and if he gets no six then he wins nothing. With
this alternative game, would the player obtain a higher expected
amount of money?
STATISTICS (2024-25) UNIT 4 26 / 73
3.1. Esperanza de una v.a. discreta
EJEMPLO 5 (Cont.):
Let Y be the amount of money that the player wins with this
alternative game. The tree diagram that shows the possible outcomes
of Y and their probabilities is:
2 *
Six 16 Y = 12, with prob. 361
6 Six 1
6 6
6 No six 65 5
Y = 6, with prob. 36
6
6
6 *
6 Six 16 Y = 6, with prob. 365
6
4 No six 5
6
No six 5 Y = 0, with prob. 25
6 36
The possible outcomes of Y are 12, 6 and 0, and their probabilities are
1 , f (6) = 5 + 5 = 10 and f (0) = 25 . Thus:
f (12) = 36 36 36 36 36
1 10 25
E (Y ) = 12 +6 +0 = 2 euros
36 36 36
Therefore the player wins a higher expected amount of money with this
alternative game.
STATISTICS (2024-25) UNIT 4 27 / 73
3.1. Expectation of a discrete r.v.

When the r.v. X is discrete but the set of possible outcomes of X is


countably in…nite fx1 , x2 , ..., xn , ...g, then E (X ) can be de…ned in a
similar manner, considering a sum of in…nite terms, that is:

E (X ) = x1 f (x1 ) + + xn f (xn ) +

However, with a sum of in…nite terms (known as a "series" in


mathematics) it can happen that:
either the outcome is a real number (“convergent series”), in which
case this real number is what is considered as the expectation of X ;
or the result is in…nite (“divergent series”), in which case the
expectation of X does not exist.
In the rest of the course we will only see discrete random variables in
which the set of possible outcomes is …nite.

STATISTICS (2024-25) UNIT 4 28 / 73


3.2. Expectation of a function of a discrete r.v.

Suppose that we are not interested in the r.v. X but a function of the
r.v., which we will call g (X ), where g is a known function. For
example, X can be the number of cars that a car dealership will sell,
but we are interested in the earnings g (X ) the dealership will make
for X cars sold. The de…nition of expectation given above can be
applied to these situations.
If X is a discrete r.v. with possible outcomes fx1 , ..., xn g and g is
any known function, the expectation of g (X ) is the weighted
average of the possible outcomes of g (X ), where the weight of the
possible outcome g (xi ) is equal to the probability of xi . This value is
denoted by E (g (X )), and following this de…nition, we have that:

E (g (X )) = g (x1 )f (x1 ) + + g (xn )f (xn )

STATISTICS (2024-25) UNIT 4 29 / 73


3.2. Expectation of a function of a discrete r.v.

EXAMPLE 6: Suppose that the r.v. X that indicates the prediction


error of a certain quantity has the following probability mass function:

x 2 1 0 1 2
f (x ) 0.1 0.2 0.4 0.2 0.1

From this table we can deduce the expectation of X :


The expectation of X is the weighted average of the possible outcomes:

E (X ) = ( 2) 0.1 + ( 1) 0.2 + 0 0.4 + 1 0.2 + 2 0.1 = 0

Therefore, the expectation of the prediction error is 0.

STATISTICS (2024-25) UNIT 4 30 / 73


3.2. Expectation of a function of a discrete r.v.

EXAMPLE 6 (Cont.): Now suppose that when the prediction error


is X , the losses Y (in thousands of euros) due to the prediction error
are Y = 5X 2 . With this information, what are the average losses due
to the prediction error?
In this case we want to calculate the expectation of the r.v.
Y = g (X ), where g (X ) = 5X 2 . This expectation is the weighted
average of the possible outcomes of g (X ), that is:

E (Y ) = E (g (X )) =

g ( 2) 0.1 + g ( 1) 0.2 + g (0) 0.4 + g (1) 0.2 + g (2) 0.1

=5 ( 2)2 0.1 + 5 ( 1)2 0.2 + 5 02 0.4+

5 12 0.2 + 5 22 0.1 = 6 thousands of euros

STATISTICS (2024-25) UNIT 4 31 / 73


3.2. Expectation of a function of a discrete r.v.

In the problem we are analyzing in this sub-section, that is, to


determine the expectation of a function of a discrete r.v. E (g (X )),
we can ask: Is it possible to calculate this value E (g (X )) only from
E (X ) and from the function g ? The answer is that, in general, it is
not possible to calculate E (g (X )) only from E (X ) and from the
function g , but we need to know the probability mass function of X .
However, there is a case in which it is possible to obtain
E (g (X )) only from E (X ) and from the function g : when g is a
linear function, that is, when g (X ) = aX + b, where a and b are
known real numbers. The property that tells us how to do so is called
the linearity property of expectation.

STATISTICS (2024-25) UNIT 4 32 / 73


3.2. Expectation of a function of a discrete r.v.

Linearity property of expectation : If a and b are known real


numbers, then:
E (aX + b ) = aE (X ) + b

Proof: With the notation used in this section:

E (aX + b ) = (ax1 + b )f (x1 ) + + (axn + b )f (xn )

Performing the operations and rearranging we have that:

E (aX + b ) = ax1 f (x1 ) + + axn f (xn ) + bf (x1 ) + + bf (xn )


= a [ x1 f ( x1 ) + + xn f (xn )] + b [f (x1 ) + + f (xn )]
= aE (X ) + b

In the last equation we have taken into account the fact that the sum
of the probabilities is 1, that is, f (x1 ) + + f (xn ) = 1.

STATISTICS (2024-25) UNIT 4 33 / 73


3.2. Expectation of a function of a discrete r.v.
The linearity property tells us, when g is a linear function, if we know
E (X ) we can deduce E (g (X )) easily, and there is no need to apply
the de…nition of E (g (X )).
EXAMPLE 3 (Cont.): We have assumed that we know the
probability mass function of the r.v. X =“number of cars sold in a day
at a car dealership”, and we …nd above E (X ). Suppose the earnings
Y of a dealership are related to the number of cars sold X as follows:
Y = g (X ), where g (X ) = 50X 10. How can we …nd E (Y )?
This can be calculated by using the de…nition of E (g (X )) :
E (Y ) = E (g (X ))
= g (0) 0.7 + g (1) 0.25 + g (2) 0.03 + g (3) 0.02
= ( 10) 0.7 + 40 0.25 + 90 0.03 + 140 0.02 = 8.5
However, since the relationship between X and Y is linear and we know
that E (X ) = 0.37, the easiest thing to do is deduce E (Y ) from E (X ):
E (Y ) = E (50X 10) = 50E (X ) 10 = 50 0.37 10 = 8.5

STATISTICS (2024-25) UNIT 4 34 / 73


3.2. Expectation of a function of a discrete r.v.

It is important to emphasize that, when g is not a linear function, it


is not enough to know E (X ) to deduce E (g (X )): we would have to
deduce E (g (X )) by applying the de…nition we saw above.
A particular case of the above is the quadratic function g (X ) = X 2 :
it is not possible to calculate E (g (X )) only from E (X ) because,
generally, it is not true that E (g (X )) is the same as g (E (X )), that is
to say, we generally have that:

E (X 2 ) 6 = E (X )2

For this reason, it is not possible to deduce the value of E (X 2 ) simply


from E (X ), but we need to know the probability mass function of X ,
and then calculate

E (X 2 ) = x12 f (x1 ) + + xn2 f (xn )

STATISTICS (2024-25) UNIT 4 35 / 73


3.3. Variance and standard deviation of a discrete r.v.

The expectation of a r.v. provides us information about the center of


the variable, but does not tell us if there is a lot or a little variability
in the possible outcomes of the variable. To measure the degree of
variability of a random variable, we use a measure called the variance.
The variance of the r.v. X is de…ned as the mean value of the
square of the di¤erence between the possible outcomes of X and its
expectation. The variance of X is denoted by Var(X ). Following the
de…nition, and using µ to denote the expectation of X , the variance is
obtained as follows:

Var(X ) = E ((X µ)2 ) = (x1 µ)2 f (x1 ) + + (xn µ)2 f (xn )

where fx1 , ..., xn g are the possible outcomes of X , and f (x1 ), ...,
f (xn ) are the corresponding probabilities. The variance of the r.v. X
is usually denoted by the Greek letter σ2 (or σ2X to emphasize which
r.v. we are referring to).
STATISTICS (2024-25) UNIT 4 36 / 73
3.3. Variance and standard deviation of a discrete r.v.

From the de…nition of variance Var(X ) = ∑ni=1 (xi µ)2 f (xi ) it


follows that:
1 The unit of measure of the variance of X is the square of the unit of
measure of X .
2 The variance is always greater than or equal to 0 because all the terms
are non-negative.
3 If the variance is exactly 0, then all its terms must be equal to 0, so it
would always have to be xi = µ. In this case, there is only one possible
outcome for X , which will logically be its mean value µ and have a
probability 1. Therefore, if Var(X ) = 0, X is not really random since
we know with certainty that the outcome of X will be µ.
4 In general, the larger the Var(X ), the further the distance, on average,
from the possible outcomes xi and the mean value µ, and the larger
the variability of X .

STATISTICS (2024-25) UNIT 4 37 / 73


3.3. Variance and standard deviation of a discrete r.v.

EXAMPLE 7: Let X , Y and Z be three discrete random variables


whose probability mass functions are fX , fY and fZ . We will assume
that these probability mass functions are:

x 9.9 10.1 y 9 11 z 0 20
fX ( x ) 0.5 0.5 fY ( y ) 0.5 0.5 fZ ( z ) 0.5 0.5

The expectation of each of these random variables is:

E (X ) = 9.9 0.5 + 10.1 0.5 = 10

E (Y ) = 9 0.5 + 11 0.5 = 10

E (Z ) = 0 0.5 + 20 0.5 = 10

STATISTICS (2024-25) UNIT 4 38 / 73


3.3. Variance and standard deviation of a discrete r.v.

EXAMPLE 7 (Cont.): Although X , Y and Z have the same mean,


they behave di¤erently: the possible outcomes of X are 9.9 and 10.1,
which are values very close to the mean value 10. The possible
outcomes of Z are 0 and 20, values that are very far from the mean
value 10. And the possible outcomes of Y are 9 and 11, which is an
intermediate case between X and Z . Thus, the variable with the least
variability is X and the variable with the most variability is Z . This
can also be seen by calculating the variance of each of the variables:
Since the expectation is 10, the variances are:

Var(X )=E [(X 10)2 ]=(9.9 10)2 0.5 + (10.1 10)2 0.5=0.01

Var(Y )=E [(Y 10)2 ]=(9 10)2 0.5 + (11 10)2 0.5=1

Var(Z )=E [(Z 10)2 ]=(0 10)2 0.5 + (20 10)2 0.5=100

STATISTICS (2024-25) UNIT 4 39 / 73


3.3. Variance and standard deviation of a discrete r.v.
We have de…ned the variance of X as the mean value of the squared
di¤erences between its possible outcomes and its expectation
E (X ) = µ, that is, Var(X ) = E [(X µ)2 ]. However, there is an
alternative expression that is easier to use:
Property of variance: It always holds that:
Var(X ) = E (X 2 ) E (X )2
Proof: Expanding the square, we have that:
n
Var(X ) = E [(X µ )2 ] = ∑ i = 1 ( xi µ ) 2 f ( xi )

n
= ∑i =1 (xi2 + µ2 2xi µ)f (xi )

n n n
= ∑i =1 xi2 f (xi ) + µ2 ∑i =1 f (xi ) 2µ ∑i =1 xi f (xi )

= E (X 2 ) + µ2 1 2µ µ = E (X 2 ) µ2
Hence we only have to know E (X 2 ) and µ = E (X ) to …nd Var(X ).
STATISTICS (2024-25) UNIT 4 40 / 73
3.3. Variance and standard deviation of a discrete r.v.

EXAMPLE 3 (Cont.): We have assumed that the r.v. X “number


of cars sold in a day at a car dealership" has a probability mass
function:

x 0 1 2 3
f (x ) 0.70 0.25 0.03 0.02

Let us …nd the variance of X .


We found above that E (X ) = 0.37 cars. In a similar manner we can
…nd E (X 2 ):

E (X 2 ) = 02 0.70 + 12 0.25 + 22 0.03 + 32 0.02 = 0.55 cars2

Thus, the variance is:

Var(X ) = E (X 2 ) E (X )2 = 0.55 0.372 = 0.4131 cars2

STATISTICS (2024-25) UNIT 4 41 / 73


3.3. Variance and standard deviation of a discrete r.v.
The unit of measure of the variance is the square of the unit of the
measure of X . Sometimes it is interesting to use a measure of
variability that has the same unit of measure as X . In these cases we
use the standard deviation as a measure of variability.
The standard deviation of a r.v. X is the square root of its
variance. It is usually denoted by SD(X ):
q
SD(X ) = Var(X )
The Greek letter σ is usually used to denote the standard deviation of
the r.v. X (or σX to emphasize which r.v. we are referring to).
EXAMPLE 3 (Cont.): We have found that the variance of the r.v.
X =“number of cars sold in a day at a car dealership" is
Var(X ) = 0.4131 cars2 . Therefore:
p
SD(X ) = 0.4131 = 0.6427 cars

STATISTICS (2024-25) UNIT 4 42 / 73


3.3. Variance and standard deviation of a discrete r.v.
If we know the variance (or standard deviation) of the r.v. X , but we
are interested in a linear function of X , we can easily deduce its
variance (or standard deviation):
Variance and standard deviation of a linear function of X : If a
and b are non-random real numbers, then:
Var(aX + b ) = a2 Var(X ) and SD(aX + b ) = jaj SD(X )
Proof: If µ is the expectation of X , we know (because of the linearity
of the expectation) that the expectation of aX + b is aµ + b, therefore:
Var(aX + b ) = E ((aX + b (aµ + b ))2 )
= E ((aX + b aµ b )2 )
= E ((aX aµ)2 ) = E (a2 (X µ )2 )
n
= ∑ i = 1 a 2 ( xi µ ) 2 f ( xi )
n
= a 2 ∑ i = 1 ( xi µ)2 f (xi ) = a2 Var(X )
The property of the standard deviation is obtained by extracting the
square roots (recall that the square root of a2 is jaj).
STATISTICS (2024-25) UNIT 4 43 / 73
3.3. Variance and standard deviation of a discrete r.v.
Since Var(aX + b ) = a2 Var(X ), adding a non-random number b to
X does not a¤ect its variance (the sum shifts the outcome to the left
or right, but does not change the degree of dispersion of the
outcomes). However, multiplying X by a non-random number a does
a¤ect the variance (if jaj > 1 the degree of dispersion will be higher
and if jaj < 1 the degree of dispersion will be lower).
EXAMPLE 3 (Cont.): We assumed above that the earnings Y (in
thousands of euros) of a car dealership are related to the number of
cars sold X de…ned as Y = g (X ), where g (X ) = 50X 10.
Since the relationship between X and Y is linear and we know the
variance of X , it is easy to deduce the variance of Y and its standard
deviation:
Var(Y ) = Var(50X 10) = 502 Var(X )
= 502 0.4131 = 1032.75 (thousands of euros)2
p
SD(Y ) = 1032.75 = 32.14 thousands of euros

STATISTICS (2024-25) UNIT 4 44 / 73


3.3. Variance and standard deviation of a discrete r.v.

In Descriptive Statistics the concepts of sample variance and sample


standard deviation were de…ned as measures of the variability of a
sample. We have now studied the concepts of variance and
standard deviation as measures of the variability of a discrete r.v.
What is the relationship between the two concepts?
Suppose that it is possible to observe the outcome of the random
experiment X a large number of times n. These n outcomes comprise
a sample and we can use them to calculate the sample variance and
the sample standard deviation. It is easy to see that the sample
variance calculated with these n outcomes will be approximately
equal to the variance of the r.v. X , and that the sample
standard deviation calculated with these n outcomes will be
approximately equal to the standard deviation of the r.v. X .
This relationship is shown in the following example.

STATISTICS (2024-25) UNIT 4 45 / 73


3.3. Variance and standard deviation of a discrete r.v.

EXAMPLE 8: Let X be the r.v. indicating the number obtained


when rolling a fair die. The possible outcomes of X are 1, 2, 3, 4, 5
and 6, and the probability mass function of this discrete r.v. is
f (x ) = 16 for x = 1, 2, 3, 4, 5, 6. It is easy to obtain the mean and
the variance of this discrete r.v.:
The mean is:
1 1 1
E (X ) = 1 +2 + +6 = 3.5
6 6 6
The variance is:
1
Var(X ) = E ((X 3.5)2 ) = (1 3.5)2 +
6
1 1
+(2 3.5)2 + + (6 3.5)2 = 2.92
6 6

STATISTICS (2024-25) UNIT 4 46 / 73


3.3. Variance and standard deviation of a discrete r.v.

EXAMPLE 8 (Cont.): Suppose we roll a die 600 times, and we get


95 ones, 108 twos, 94 threes, 99 fours, 107 …ves and 97 sixes. These
outcomes comprise a sample of observations of X . With these
observations:
The sample mean is:
95 1 + 108 2+ + 97 6 2106
X = = = 3.51
600 600
The sample variance is:

95 (1 3.51)2 + + 97 (6 3.51)2 1731.94


S2 = = = 2.89
599 599
The sample mean and sample variance we obtained are very close to
the mean and variance of X . if we had many more observations, the
sample mean and sample variance would be practically the same as the
mean and variance of X .
STATISTICS (2024-25) UNIT 4 47 / 73
4. Binomial distribution

Many models for discrete random variables are often used. Each of
these models has a speci…c name. In this course we will see only one,
which is the most widely used model: the binomial distribution. To
de…ne this model, we will …rst explain what a Bernoulli experiment is.
A Bernoulli experiment with parameter p, where p is a real
number between 0 and 1, is an experiment that has two possible
outcomes. We will call these two possible outcomes "success" and
"failure", where p is the probability of success and 1 p is the
probability of failure.
In a Bernoulli experiment, this p value can sometimes be determined
by observing the experiment. Other times, however, we simply assume
a speci…c value for p (in the next course will see how to analyze
whether this assumption is reasonable or not).

STATISTICS (2024-25) UNIT 4 48 / 73


4. Binomial distribution

EXAMPLE 9: Tossing a coin is a random experiment with only two


possible outcomes. Therefore it can be interpreted as a Bernoulli
experiment where getting a head is considered a "success" and
getting a tail is considered a "failure" (or vice versa). If the coin is
well built, the probability of success is p = 21 .
EXAMPLE 10: Suppose we buy a ticket for a lottery where one
number from among 100 numbers is drawn. In this case, winning or
not winning a prize is a Bernoulli experiment. If we consider winning
1
a prize as a "success", the probability of success is p = 100 .
EXAMPLE 11: Suppose we know that when a customer enters a
particular store, the probability that the customer will buy something
is 0.3. In this case, the fact that the customer buys or does not buy
something is a Bernoulli experiment. If we consider that the customer
buys something a "success", the probability of success is p = 0.3.

STATISTICS (2024-25) UNIT 4 49 / 73


4. Binomial distribution

It is said that a r.v. X has a binomial distribution with


parameters n, p if X is the number of successes obtained when
independently repeating n times a Bernoulli experiment with a
probability of success p. When X has a binomial distribution with
parameters n, p, it is written in abbreviated form as:

X Bi(n; p )

Note that the …rst parameter of a binomial distribution n is always a


natural number, and that the second parameter of a binomial
distribution p must be a number between 0 and 1.
What we will do in this section is to deduce how to calculate the
probabilities of any random variable that has a binomial distribution.
Therefore, whenever we have a r.v. that we know has a binomial
distribution, we can calculate any probability about the r.v. using the
formulas discussed in this section.
STATISTICS (2024-25) UNIT 4 50 / 73
4. Binomial distribution

EXAMPLE 9 (Cont.): Suppose we toss a coin 10 times, and use X


to denote the r.v. “number of heads occurring in 10 tosses”. If
getting a head is a “success”, we can rewrite X as “number of
successes in 10 independent repetitions of a Bernoulli experiment with
a 21 probability of success”. Therefore, X is a r.v with a Bi(10; 12 )
distribution.
EXAMPLE 11 (Cont.): Suppose we know that one morning 6
customers will enter this store and they will behave independently. If
we use X to denote the r.v. "number of customers who will buy
something out of the 6 customers that enter the store", then X is a
r.v. with a Bi(6; 0.3) distribution. This is because we can rewrite X
as the “number of successes in 6 independent repetitions of a
Bernoulli experiment, where 0.3 is the probability of success in each
experiment”.

STATISTICS (2024-25) UNIT 4 51 / 73


4. Binomial distribution
The simplest case of binomial distribution is the n = 1 case, that is,
when X is a r.v. with a Bi(1, p ) distribution; in this case it is also
said that X has Bernoulli distribution. In this case, X denotes the
“number of successes in one repetition of a Bernoulli experiment".
X only has two possible outcomes: 0 (no success) and 1 (one success).
The probability mass function is: f (0) = 1 p, f (1) = p.
The expectation of X is:
E (X ) = 0 (1 p) + 1 p=p
The variance of X is:
Var(X ) = E [(X µ )2 ] = E (X 2 ) E (X )2
= [02 (1 p ) + 12 p] p2 = p p 2 = p (1 p)
The variance will be small if p is close to 0 or close to 1. In these
cases,there is little variability because if we have many outcomes for
this r.v., the outcome will almost always be a failure (if p is close to 0),
or the outcome will almost always be a success (if p is close to 1).
STATISTICS (2024-25) UNIT 4 52 / 73
4. Binomial distribution
The next simplest case is the n = 2 case, that is, when X is a r.v.
with a Bi(2, p ) distribution. In this case, X denotes the “number of
successes in two independent repetitions of a Bernoulli experiment".
The possible outcomes of X (number of successes) are 0, 1 and 2.
To obtain the probability mass function, we will call the success "S"
and the failure "F". When we perform two Bernoulli experiments, we
can get either SS or SF, or FS or FF. Since both experiments are
independent, the probabilities of each of these 4 situations are p 2 ,
p (1 p ), (1 p )p and (1 p )2 , respectively. Thus:
f (0) = P (Get 0 successes in 2 tries) = P (FF) = (1 p )2

f (1) = P (Get 1 success in 2 tries) = P (SF) + P (FS) =


= p (1 p ) + (1 p )p = 2p (1 p )

f (2) = P (Get 2 successes in 2 tries) = P (SS) = p 2

STATISTICS (2024-25) UNIT 4 53 / 73


4. Binomial distribution
The mean and variance of a r.v. X with a Bi(2, p ) distribution can
be derived from its probability mass function.
The expectation of X is:
E (X ) = 0 (1 p )2 + 1 2p (1 p) + 2 p2

= 2p 2p 2 + 2p 2 = 2p
The variance of X is:
Var(X ) = E [(X µ )2 ] = E (X 2 ) E (X )2

= [02 (1 p )2 + 12 2p (1 p ) + 22 p2 ] (2p )2

= 2p 2p 2 + 4p 2 4p 2 = 2p 2p 2 = 2p (1 p)
Like in the previous case, the variance will be small if p is close to 0 or
close to 1.
STATISTICS (2024-25) UNIT 4 54 / 73
4. Binomial distribution

Now we are going to study the general case of a r.v. X with a


Bi(n, p ) distribution, that is, when X denotes the “number of
successes in n independent repetitions of a Bernoulli experiment".
The possible outcomes of X are 0, 1, 2, ..., n 1 and n. Let us …nd
the probability mass function for each of these possible outcomes:
The outcome of X is 0 if and only if the outcome of the n Bernoulli
experiments is failure. Therefore:

f (0) = (1 p )n

The outcome of X is 1 if and only if there is one success and n 1


failures. This will be true if the sequence of n outcomes is
SFF FF , or FSF FF , ...., or FFF FS.The probability of each
of these sequences is p (1 p )n 1 , and there are n sequences of this
type. Therefore:
f (1) = np (1 p )n 1

STATISTICS (2024-25) UNIT 4 55 / 73


4. Binomial distribution
Probability mass function of a Bi(n, p ) distribution (cont.):
The outcome of X is 2 if and only if there are two successes and n 2
failures. This will be true if the sequence of n outcomes is
SSF FF , or SFS FF , ...., or FFF SS. The probability of each
of these sequences is p 2 (1 p )n 2 , and there are as many of these
sequences as places where the two successes can be positioned among
the n experiments, that is, C2n = 2 !(nn ! 2 )! . Thus:
n!
f (2) = C2n p 2 (1 p )n 2
= p 2 (1 p )n 2
2!(n 2)!
In general, for x = 0, 1, ..., n, we have that:
n!
f (x ) = P (Get x successes in n tries) = p x (1 p )n x
x ! (n x ) !
Note that the event “get x successes in n tries” implies that there were
x successes and n x failures in the n tries: the factors p x and
(1 p )n x are the probabilities of getting x successes and n x
failures, respectively, and the factor x !(nn ! x )! is the number of possible
places where the x successes can be positioned in the n repetitions.
STATISTICS (2024-25) UNIT 4 56 / 73
4. Binomial distribution

To sum up, whenever we have a r.v. X with a Bi(n; p ) distribution,


where n is the number of experiments and p is the probability of
success in each experiment, we have proved that its probability mass
function is:
n!
f (x ) = p x (1 p )n x
, for x = 0, 1, ..., n
x ! (n x ) !

In practice, there are many situations in which the probability we


want to calculate is the probability of the number of successes
obtained in n independent repetitions of the Bernoulli experiment.
When that occurs, we can express this probability as a probability
about a r.v. X with a Bi(n; p ) distribution. This probability can be
found in a simple way using this formula, without having to do
additional calculations or reasoning.

STATISTICS (2024-25) UNIT 4 57 / 73


4. Binomial distribution

EXAMPLE 11 (Cont.): We have assumed that we know that 6


customers will enter a store in one morning, that they will behave
independently, and that the probability that a customer will buy
something is 0.3. Under these conditions, we said that if we use X to
denote the r.v. “number of customers who will buy something out of
6 customers”, then X is a r.v. with a Bi(6; 0.3) distribution.
Therefore, any probability about the number of customers who will
buy something can be calculated using the general formula for the
probabilities of a binomial r.v. we have seen. In this case, the formula
of the probability mass function is:
6!
f (x ) = 0.3x 0.76 x
for x = 0, 1, ..., 6.
x ! (6 x ) !

STATISTICS (2024-25) UNIT 4 58 / 73


4. Binomial distribution
EXAMPLE 11 (Cont.): The probabilities we obtain are:
6!
f (0) = 0.30 0.76 = 0.11765
0! 6!
6!
f (1) = 0.31 0.75 = 0.30253
1! 5!
6!
f (2) = 0.32 0.74 = 0.32414
2! 4!
6!
f (3) = 0.33 0.73 = 0.18522
3! 3!
6!
f (4) = 0.34 0.72 = 0.05953
4! 2!
6!
f (5) = 0.35 0.71 = 0.01021
5! 1!
6!
f (6) = 0.36 0.70 = 0.00073
6! 0!

STATISTICS (2024-25) UNIT 4 59 / 73


4. Binomial distribution
EXAMPLE 11 (Cont.): The following graph shows the probability
mass function we obtained:

f(x)
0.3

0.2

0.1

0.0
0 1 2 3 4 5 6
x

STATISTICS (2024-25) UNIT 4 60 / 73


4. Binomial distribution
EXAMPLE 11 (Cont.): Using the probability mass function, we can
calculate any probability over the r.v. X = number of customers who
will buy something. For example:
What is the probability that more than 4 customers will buy something?

P (X > 4) = P (X = 5) + P (X = 6) = f (5) + f (6)


= 0.01021 + 0.00073 = 0.01094
What is the probability that at least one customer will buy something?

P (X > 0) = P (X = 1) + P (X = 2) + P (X = 3) + P (X = 4)+
6
P (X = 5) + P (X = 6) = ∑i =1 f (i ) = 0.88235
We can also calculate this probability using the complementary event,
which is “no customers will buy anything”:
P (X > 0) = 1 P (X = 0) = 1 f (0) = 1 0.11765 = 0.88235

STATISTICS (2024-25) UNIT 4 61 / 73


4. Binomial distribution
EXAMPLE 11 (Cont.): Assume that we want to …nd the probability
that exactly 4 out of the 6 customers that enter the store will not buy
anything. We can …nd this probability in two di¤erent ways:
The event “exactly 4 out of the 6 customers do not buy anything” is
the same as the event “exactly 2 out of the 6 customers buy
something”. Hence, the probability that we want to …nd is:
6!
P (X = 2) = 0.32 0.74 = 0.32414
2! 4!
Alternatively, we could consider the random variable Y = number of
customers that will not buy anything. If we consider as a “success”
that the customer does not buy anything, then Y is the number of
successes in 6 independent repetitions of this Bernoulli experiment, and
the probability of success is now 0.7. Therefore Y Bi(6; 0.7), and:
6!
P (Y = 4) = 0.74 0.32 = 0.32414
4! 2!
Obviously, we obtain the same outcome with this alternative procedure.
STATISTICS (2024-25) UNIT 4 62 / 73
4. Binomial distribution
The mean and variance of a r.v. X with a Bi(n; p ) distribution can
be derived from its probability mass function. However, the
calculations we have to do to arrive at the …nal expression are not
simple. We will not see them, but we will indicate the formulas that
are obtained. The expectation and the variance of X are:

E (X ) = np

Var(X ) = np (1 p)
Note that:
The mean is directly proportional to p and to n. Additionally, the
variance of X is directly proportional to n, and it will be small is p is
close to 0 or close to 1.
These formulas generalize the formulas n = 1 and n = 2.
These formulas are useful because they save having to make
calculations on the mean and variance, provided that we know that the
r.v. X we are interested in is a r.v. with a Bi(n; p ) distribution.
STATISTICS (2024-25) UNIT 4 63 / 73
4. Binomial distribution
EXAMPLE 11 (Cont.): Since we know that X = “number of
customers who will buy something” has a Bi(6; 0.3) binomial
distribution, we can calculate any probability over X , and its most
important characteristics:
The expected number of customers that will buy something is:
E (X ) = np = 6 0.3 = 1.8 customers

The variance of the number of customers that will buy something is:
Var(X ) = np (1 p) = 6 0.3 0.7 = 1.26 customers2

And its standard deviation is:


q p
SD(X ) = Var(X ) = 1.26 = 1.1225 customers

STATISTICS (2024-25) UNIT 4 64 / 73


4. Binomial distribution

EXAMPLE 12: We know that 95% of the computer memory chips


produced by a factory work properly, and that the remaining 5% are
defective. If you sell a box with 10 memory chips, what is the
probability that 7 of the chips in the box will work properly?
To do this calculation we will use X to denote the number of memory
chips in the box that work properly. To obtain the distribution of X , it
is su¢ cient to note that the fact that a memory chip works properly or
does not work properly is a Bernoulli experiment with a 0.95 probability
of success. Therefore, X = "number of memory chips in the box that
work properly" = "number of successes in 10 independent repetitions
of a Bernoulli experiment", with a 0.95 probability of success. Thus
X Bi(10; 0.95).
Therefore, the probability we want to calculate is:
10!
P (X = 7) = 0.957 0.053 = 0.0105
7! 3!

STATISTICS (2024-25) UNIT 4 65 / 73


4. Binomial distribution

EXAMPLE 12 (Cont.): Suppose we want to sell boxes with n chips,


and that each box must satisfy the condition that the probability that
the box has no defective chips must be greater than 0.80. How
should n be to satisfy this condition?
Using Y to denote the number of chips that work properly in a box
with n chips, and reasoning as above, it follows that Y Bi(n; 0.95).
The condition that n must satisfy is that the probability that all n chips
work properly must be greater than 0.80, that is, P (Y = n) > 0.80.
Using the probability mass function of Y this inequality can be
rewritten as:
n!
0.95n 0.050 > 0.80 )
n! 0!
0.95n > 0.80 ) ln(0.95n ) > ln 0.80 ) n ln 0.95 > ln 0.80 )
0.2231
0.0513n > 0.2231 ) n < = 4.35
0.0513
There can only be a maximum of 4 memory chips in a box.
STATISTICS (2024-25) UNIT 4 66 / 73
4. Binomial distribution
The probability mass function of a random variable X with binomial
distribution Bi(n, p ) can also be found using Excel. More speci…cally:
The Excel function BINOM.DIST(x, n, p, 0) yields the probability mass
function f (x ) of a Bi(n; p ) random variable, that is:
n!
BINOM.DIST(x, n, p, 0) = p x (1 p )n x
x ! (n x ) !
The Excel function BINOM.DIST(x, n, p, 1) yields the cumulative
distribution function F (x ) of a Bi(n; p ) random variable, that is:
x n!
BINOM.DIST(x, n, p, 1) = ∑ j =0 j ! (n j )!
p j (1 p )n j
=
n! 0 n! n!
p (1 p )n + p 1 (1 p )n 1
+ + p x (1 p )n x
0!n! 1!(n 1)! x ! (n x ) !
The Excel function that yields the cumulative distribution function is
very useful if n is large: in this case, we often want to …nd the
probability of an event that includes lots of possible outcomes, and it
would take a long time to …nd the probability of that event by adding
up the probabilities of each one of the outcomes.
STATISTICS (2024-25) UNIT 4 67 / 73
4. Binomial distribution
EXAMPLE 11 (Cont.): In this store we know that 30% of the
customers who come in buy something. Suppose we also know that
one day 100 customers come in the store. What is the probability
that the number of customers that will buy something that day is less
than or equal to 35?
Using X to denote the r.v. “number of customers out of 100 entering
the store who buy something”, we have that X Bi(100; 0.3). The
probability we want to …nd is:
P (X 35) = f (0) + f (1) + + f (35)
35 100!
= ∑ j =0 0.3j 0.7100 j
j !(100 j )!
Finding this last summation with a pocket calculator is di¢ cult.
However, it is easy to …nd it by using the cumulative distribution
function and Excel:
P (X 35) = F (35) = BINOM.DIST(35, 100, 0.3, 1) = 0.884

STATISTICS (2024-25) UNIT 4 68 / 73


4. Binomial distribution

When the cumulative distribution function F (x ) is used to …nd


probabilities about a binomial Bi(n; p ) distribution, it is important to
take into account that, for x = 0, 1, 2, ..., n, the following equalities
hold:
P (X x ) = F (x )

P (X < x ) = P (X x 1) = F (x 1)

P (X > x ) = 1 P (X x) = 1 F (x )

P (X x) = 1 P (X < x ) = 1 F (x 1)

STATISTICS (2024-25) UNIT 4 69 / 73


4. Binomial distribution
EXAMPLE 11 (Cont.): We know that the number of customers
that will buy something is a r.v. X with Bi(100; 0.3) distribution.
Therefore:
The probability that the number of customers that will buy something
is lower than 30 is:
P (X < 30) = P (X 29) = F (29) =
BINOM.DIST(29, 100, 0.3, 1) = 0.462
The probability that the number of customers that will buy something
is greater than 25 is:
P (X > 25) = 1 P (X 25) = 1 F (25) =
1 BINOM.DIST(25, 100, 0.3, 1) = 1 0.1631 = 0.8369
The probability that the number of customers that will buy something
is equal to or greater than 40 is:
P (X 40) = 1 P (X < 40) = 1 P (X 39) = 1 F (39) =
1 BINOM.DIST(39, 100, 0.3, 1) = 1 0.979 = 0.021
STATISTICS (2024-25) UNIT 4 70 / 73
4. Binomial distribution

EXAMPLE 13: A political party has to decide whether they will


support the government or not. The political party has 3030
members, and all of them will vote “yes” or “no” to the proposal of
giving support to the government. Let us assume that we have no
information about what the members of this political party think;
hence, it is equally likely that a member votes “yes” or “no” to this
proposal. And let us also assume that all members decide their vote
independently. What is the probability that the …nal result is a draw?
Let X be the r.v. that indicates “number of members that will vote yes
among all 3030 members”. Given our assumptions, this r.v. can be
rewritten as “number of successes in 3030 independent repetitions of a
Bernoulli experiment”. The probability of success in each Bernoulli
experiment is 0.5; hence:

X Bi(3030; 0.5)

STATISTICS (2024-25) UNIT 4 71 / 73


4. Binomial distribution

EXAMPLE 13 (Cont.):
Therefore, the probability that the …nal result is a draw is:
3030!
P (X = 1515) = 0.51515 0.51515
1515! 1515!
It is di¢ cult to compute separately each one the three factors of the
right-hand side, either with a pocket calculator or with computer
software. But it is possible to …nd this probability with computer
software that includes the binomial probability mass function (this
software computes jointly the product of the three factors). For
example, using Excel we …nd that:

P (X = 1515) = BINOM.DIST(1515, 3030, 0.5, 0) = 0.0145

Thus, it is not very likely that there is a draw, but it is not impossible:
the probability of a draw is higher than 1%.

STATISTICS (2024-25) UNIT 4 72 / 73


4. Binomial distribution

EXAMPLE 13 (Cont.): Let us assume now that we know that 1500


members are in favor of the proposal and 1500 members are against
the proposal, but we do not know anything about the remaining 30
members. With this information, what is probability that the …nal
result is a draw?
Let Y be the r.v. that indicates “number of members that will vote yes
among the 30 members about whom we do not know anything”. In
this case:
Y Bi(30; 0.5)
Therefore, the probability that the …nal result is a draw is:
30!
P (Y = 15) = 0.515 0.515 = 0.1445
15! 15!
As expected, now the probability of a draw is much higher: it is greater
than 14%.

STATISTICS (2024-25) UNIT 4 73 / 73

You might also like