Chapter 2 - 250407 - 114554
Chapter 2 - 250407 - 114554
FAROUK MSELMI
INFERENTIAL STATISTICS
Definition
A discrete random variable is a function X (s) from a finite or countably infinite sample space S to
a subset A of N :
X (·) : S → A.
Example
Toss a coin 3 times in sequence. The sample space is
S = {HHH, HHT , HTH, HTT , THH, THT , TTH, TTT }, and examples of random variables are
X (s) = the number of Heads in the sequence ; e.g., X (HTH) = 2,
Y (s) = The index of the first H ; e.g., Y (TTH) = 3,
Y (s) = 0 if the sequence has no H, i.e., Y (TTT ) = 0.
Example
For the sample space S = {HHH, HHT , HTH, HTT , THH, THT , TTH, TTT }, with X (s)
representing the number of Heads, the value X (s) = 2 corresponds to the event
{HHT , HTH, THH}, and the values 1 < X (s) ≤ 3 correspond to {HHH, HHT , HTH, THH}.
Notation
If it is clear what S is, then we often just write X instead of X (s).
Example
For the sample space S = {HHH, HHT , HTH, HTT , THH, THT , TTH, TTT }, with X (s)
representing the number of Heads, we have
6
P(0 < X ≤ 2) = .
8
Question
What are the values of P(X ≤ −1), P(X ≤ 0), P(X ≤ 1), P(X ≤ 2), P(X ≤ 3), P(X ≤ 4) ?
Example
For the sample space S = {HHH, HHT , HTH, HTT , THH, THT , TTH, TTT }, with X (s)
representing the number of Heads, we have
1
P(X = 0) ≡ P({TTT }) = ,
8
3
P(X = 1) ≡ P({HTT , THT , TTH}) = ,
8
3
P(X = 2) ≡ P({HHT , HTH, THH}) = ,
8
1
P(X = 3) ≡ P({HHH}) = ,
8
where
P(X = 0) + P(X = 1) + P(X = 2) + P(X = 3) = 1. (Why ?)
The events E0 , E1 , E2 , E3 are disjoint since X (s) is a function ! ( X : S → A must be defined for all
s ∈ S and must be single-valued.)
Definition
P(X = x) is called the probability mass function.
Definition
FX (x) ≡ P(X ≤ x) is called the (cumulative) probability distribution function.
Properties
FX (x) is a non-decreasing function of x.
FX (−∞) = 0 and FX (∞) = 1.
P(a < X ≤ b) = FX (b) − FX (a).
Example
With X (s) = the number of Heads, and S = {HHH, HHT , HTH, HTT , THH, THT , TTH, TTT },
1 3 3 1
P(X = 0) = , P(X = 1) = , P(X = 2) = , P(X = 3) = ,
8 8 8 8
we have the probability distribution function :
7 1 6
P(0 < X ≤ 2) = P(X = 1) + P(X = 2) = F (2) − F (0) = − = .
8 8 8
Example
Toss a coin until ”Heads” occurs. Then the sample space is countably infinite, namely,
Then
1 1 1
P(X = 1) = , P(X = 2) = , P(X = 3) = , . . . (Why ?)
2 4 8
and
n n
X X 1 1
FX (n) = P(X ≤ n) = P(X = k ) = = 1− n,
2k 2
k =1 k =1
Joint distributions
The probability mass function and the cumulative distribution function can also be functions of
more than one variable.
Example
Toss a coin 3 times in sequence. For the sample space
S = {HHH, HHT , HTH, HTT , THH, THT , TTH, TTT }, we let
Then we have the joint probability mass function P(X = x, Y = y ). For example,
2 1
P(X = 2, Y = 1) = P(2 Heads, 1st toss is Heads) = = .
8 4
For S = {HHH, HHT , HTH, HTT , THH, THT , TTH, TTT }, X (s) = number of Heads, and Y (s) =
index of the first H, we can list the values of P(X = x, Y = y ) :
P(X = x, Y = y ) y =0 y =1 y =2 y =3 P(X = x)
x =0 1/8 0 0 0 1/8
x =1 0 1/8 1/8 1/8 3/8
x =2 0 2/8 1/8 0 3/8
x =3 0 1/8 0 0 1/8
P(Y = y ) 1/8 4/8 2/8 1/8 1
NOTE :
The marginal probability P(X = x) is the probability mass function of X .
The marginal probability P(Y = y ) is the probability mass function of Y .
Definitions
P(X = x, Y = y ) is called the joint probability mass function.
FX ,Y (x, y ) ≡ P(X ≤ x, Y ≤ y ) is called the joint (cumulative) probability distribution function.
Question : Why is
Definition
Two discrete random variables X (s) and Y (s) are independent if
FX ,Y (x, y ) = FX (x) · FY (y )
for all x, y .
Conditional distributions
Definition
Let X and Y be discrete random variables with joint probability mass function P(X = x, Y = y ).
The conditional probability mass function P(X = x|Y = y ) is defined as
P(X = x, Y = y )
P(X = x|Y = y ) = ,
P(Y = y )
where P(X = x, Y = y ) is the joint probability mass function and P(Y = y ) is the marginal
probability mass function of Y .
Expectation
Thus, E[X ] represents the weighted average value of X . ( E[X ] is also called the mean of X .)
Example
The expected value of rolling a die is
6
1 X 1 6·7 7
E[X ] = · k= · = .
6 6 2 2
k =1
Example
Toss a coin until "Heads" occurs. The sample space is
n
1X k
The first few values of are as follows :
n 2k
k =1
1 Pn k
n n k =1 2k
1 0.50000000
2 1.00000000
3 1.37500000
10 1.98828125
40 1.99999999
The expected value of a function of a random variable is denoted as E[g(X )] and defined as
X
E[g(X )] ≡ g(xk ) · P(X = xk ).
k
Example
The pay-off of rolling a die is k 2 Dollar, where k is the side facing up. What should the entry fee be
for the betting to break even ? Here, g(X ) = X 2 , and
6
1X 2 1 6(6 + 1)(2 · 6 + 1) 91
E[g(X )] = k = · = ≈ $15.17.
6 6 6 6
k =1
Definition
The expected value of a function of two random variables is denoted as E[g(X , Y )] and defined as
XX
E[g(X , Y )] ≡ g(xk , y` ) · P(X = xk , Y = y` ).
k `
Properties
1 If X and Y are independent, then E[XY ] = E[X ] · E[Y ].
2 The expected value of the sum of random variables is given by E[X + Y ] = E[X ] + E[Y ].
which is the square root of the variance and represents the average weighted distance from the
mean.
Example
The variance of rolling a die is
6
" #
X k2 · 1
Var(X ) = − µ2
6
k =1
2
1 6(6 + 1)(2 · 6 + 1) 7
= · −
6 6 2
35
= .
12
The standard deviation is r
35
σ= ≈ 1.70.
12
Covariance
Let X and Y be random variables with means E[X ] = µX , E[Y ] = µY . Then, the covariance of X
and Y is defined as
X
Cov(X , Y ) ≡ E[(X − µX )(Y − µY )] = (xk − µX )(y` − µY ) · P(X = xk , Y = y` ).
k ,`
We have
• Var(aX + b) = a2 Var(X ),
• Cov(X , Y ) = Cov(Y , X ),
• Cov(cX , Y ) = cCov(X , Y ),
• Cov(X , cY ) = cCov(X , Y ),
• Cov(X + Y , Z ) = Cov(X , Z ) + Cov(Y , Z ),
• Var(X + Y ) = Var(X ) + Var(Y ) + 2Cov(X , Y ).
Property
If X and Y are independent, then Cov(X , Y ) = 0.
and that if X and Y are independent, then E[XY ] = E[X ]E[Y ]. From which the result follows.
Property
If X and Y are independent, then Var(X + Y ) = Var(X ) + Var(Y ).
PROOF : We have
and that if X and Y are independent, then Cov(X , Y ) = 0, from which the result follows.
Moment-generating function
Definition
The moment-generating function of the random variable X is given by E(etX ) and is denoted by
MX (t). Hence, X
MX (t) = E(etX ) = etx P(X = x), if X is discrete.
x
P(X = 1) = p, P(X = 0) = 1 − p,
E[X ] = 1 · p + 0 · (1 − p) = p,
E[X 2 ] = 12 · p + 02 · (1 − p) = p,
Var(X ) = E[X 2 ] − (E[X ])2 = p − p2 = p(1 − p).
NOTE : If p is small, then Var(X ) ≈ p.
Example
1
When p = 2
(e.g., for tossing a coin), we have
1
E[X ] = p = ,
2
1
Var(X ) = p(1 − p) = .
4
When rolling a die with outcome k (1 ≤ k ≤ 6), let
(
1, if the roll resulted in a six,
X (k ) =
0, if the roll did not result in a six.
Then
1
E[X ] = p = ,
6
5
Var(X ) = p(1 − p) = .
36
When p = 0.01, then
E[X ] = 0.01,
Var(X ) = 0.0099 ≈ 0.01.
Perform a Bernoulli trial n times in sequence. Assume the individual trials are independent. An
outcome could be 100011001010 (with n = 12), with probability
P(100011001010) = p5 · (1 − p)7 .
Let X be the number of "successes" (i.e., 1’s). For example, X (100011001010) = 5. We have
12
P(X = 5) = · p5 · (1 − p)7 .
5
In general, for k successes in a sequence of n trials, we have
n
P(X = k ) = · pk · (1 − p)n−k ,
k
where 0 ≤ k ≤ n.
Example
In 12 rolls of a die, write the outcome as, for example, 100011001010, where 1 denotes the roll
resulted in a six, and 0 denotes the roll did not result in a six. As before, let X be the number of 1’s
in the outcome. Then X represents the number of sixes in the 12 rolls. Then,
which can be shown to equal np. An easy way to see this is as follows : If in a sequence of n
independent Bernoulli trials we let
(Xk = 0 or 1),
then
X ≡ X1 + X2 + . . . + Xn ,
is the Binomial random variable that counts the "successes".
X ≡ X1 + X2 + . . . + Xn
We know that
E[Xk ] = p,
so
E[X ] = E[X1 ] + E[X2 ] + . . . + E[Xn ] = np.
We already know that
Example
1
For 12 tosses of a coin, with Heads as success, we have n = 12, p = 2
, so
E[X ] = np = 6,
Var(X ) = np(1 − p) = 3.
1
For 12 rolls of a die, with six as success, we have n = 12, p = 6
, so
E[X ] = np = 2,
5
Var(X ) = np(1 − p) = .
3
If n = 500 and p = 0.01, then
E[X ] = np = 5,
Var(X ) = np(1 − p) = 4.95 ≈ 5.
E[X ] = np,
and
Var(X ) = np(1 − p) ≈ np when p is small.
Indeed, for the Poisson random variable, we will show that :
E[X ] = λ,
and
Var(X ) = λ.
λk
P(X = k ) = e−λ · ,
k!
for k = 0, 1, 2, . . ., is to use the recurrence relation
P(X = 0) = e−λ ,
λ
P(X = k + 1) = · P(X = k ),
k +1
for k = 0, 1, 2, . . ..
NOTE : Unlike the Binomial random variable, the Poisson random variable can have an arbitrarily
large integer value k .
E[X ] = λ
and
Var(X ) = λ.
The Poisson distribution function is
k k
X λ` X λ`
FX (k ) = P(X ≤ k ) = e−λ · = e−λ ,
`! `!
`=0 `=0
Example
Suppose customers arrive at the rate of six per hour. The probability that k customers arrive in a
one-hour period is
60
P(k = 0) = e−6 · ≈ 0.0024,
0!
61
P(k = 1) = e−6 · ≈ 0.0148,
1!
62
P(k = 2) = e−6 · ≈ 0.0446.
2!
The probability that more than 2 customers arrive is
It follows that
MX0 (0) = E[X ], MX00 (0) = E[X 2 ].
This sometimes facilitates computing the mean µ = E[X ], and the variance Var(X ) = E[X 2 ] − µ2 .
∞ ∞ ∞
X X λk X (λet )k t t
MX (t) ≡ E[etX ] = etk P(X = k ) = etk e−λ = e−λ = e−λ eλe = eλ(e −1) .
k! k!
k =0 k =0 k =0
Here,
t
MX0 (t) = λet eλ(e −1) ,
t
MX00 (t) = λ λe2t + et eλ(e −1) .
So that,
E[X ] = MX0 (0) = λ,
E[X 2 ] = MX00 (0) = λ(λ + 1) = λ2 + λ,
Var(X ) = E[X 2 ] − E[X ]2 = λ.
THANK YOU