0% found this document useful (0 votes)

55 views71 pages

Chap2-Discrete Random Variables

1. A discrete random variable is a function that maps from a sample space of possible outcomes to real numbers. 2. Examples of discrete random variables include the number of heads when tossing a coin 3 times, and the index of the first heads. 3. The probability mass function gives the probability that a discrete random variable equals each possible value, and the cumulative distribution function gives the probability that the variable is less than or equal to each value.

Uploaded by

nguimbousdorette

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

55 views71 pages

Chap2-Discrete Random Variables

Uploaded by

nguimbousdorette

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 71

DISCRETE RANDOM VARIABLES

DEFINITION : A discrete random variable is a function X(s) from

a finite or countably infinite sample space S to the real numbers :
X(·) : S → R.

EXAMPLE : Toss a coin 3 times in sequence. The sample space

is
S = {HHH , HHT , HT H , HT T , T HH , T HT , T T H , T T T } ,
and examples of random variables are
• X(s) = the number of Heads in the sequence ; e.g., X(HT H) = 2 ,

• Y (s) = The index of the first H ; e.g., Y (T T H) = 3 ,

0 if the sequence has no H , i.e., Y (T T T ) = 0 .

NOTE : In this example X(s) and Y (s) are actually integers .

71
Value-ranges of a random variable correspond to events in S .

EXAMPLE : For the sample space

S = {HHH , HHT , HT H , HT T , T HH , T HT , T T H , T T T } ,
with
X(s) = the number of Heads ,
the value
X(s) = 2 , corresponds to the event {HHT , HT H , T HH} ,
and the values
1 < X(s) ≤ 3 , correspond to {HHH , HHT , HT H , T HH} .

NOTATION : If it is clear what S is then we often just write

X instead of X(s) .

72
Value-ranges of a random variable correspond to events in S ,
and
events in S have a probability .
Thus
Value-ranges of a random variable have a probability .

EXAMPLE : For the sample space

S = {HHH , HHT , HT H , HT T , T HH , T HT , T T H , T T T } ,
with X(s) = the number of Heads ,
we have
6
P (0 < X ≤ 2) = .
8

QUESTION : What are the values of

P (X ≤ −1) , P (X ≤ 0) , P (X ≤ 1) , P (X ≤ 2) , P (X ≤ 3) , P (X ≤ 4) ?

73
NOTATION : We will also write pX (x) to denote P (X = x) .

EXAMPLE : For the sample space

S = {HHH , HHT , HT H , HT T , T HH , T HT , T T H , T T T } ,
with
X(s) = the number of Heads ,
we have
1
pX (0) ≡ P ( {T T T } ) = 8

3
pX (1) ≡ P ( {HT T , T HT , T T H} ) = 8

3
pX (2) ≡ P ( {HHT , HT H , T HH} ) = 8

1
pX (3) ≡ P ( {HHH} ) = 8
where
pX (0) + pX (1) + pX (2) + pX (3) = 1 . ( Why ? )

74
E3 X(s)
HHH
S
E2 3
HHT

HTH
2
THH

HTT
E1 1
THT

TTH

E0 0
TTT

Graphical representation of X .

The events E0 , E1 , E2 , E3 are disjoint since X(s) is a function !

(X : S → R must be defined for all s ∈ S and must be single-valued.)

75
The graph of pX .

76
DEFINITION :
pX (x) ≡ P (X = x) ,
is called the probability mass function .

DEFINITION :
FX (x) ≡ P (X ≤ x) ,
is called the (cumulative) probability distribution function .

PROPERTIES :
• FX (x) is a non-decreasing function of x . ( Why ? )
• FX (−∞) = 0 and FX (∞) = 1 . ( Why ? )
• P (a < X ≤ b) = FX (b) − FX (a) . ( Why ? )

NOTATION : When it is clear what X is then we also write

p(x) for pX (x) and F (x) for FX (x) .

77
EXAMPLE : With X(s) = the number of Heads , and
S = {HHH , HHT , HT H , HT T , T HH , T HT , T T H , T T T } ,
1 3 3 1
p(0) = 8
, p(1) = 8
, p(2) = 8
, p(3) = 8
,
we have the probability distribution function

F (−1) ≡ P (X ≤ −1) = 0
1
F ( 0) ≡ P (X ≤ 0) = 8
4
F ( 1) ≡ P (X ≤ 1) = 8
7
F ( 2) ≡ P (X ≤ 2) = 8
F ( 3) ≡ P (X ≤ 3) = 1
F ( 4) ≡ P (X ≤ 4) = 1
We see, for example, that
P (0 < X ≤ 2) = P (X = 1) + P (X = 2)

7 1 6
= F (2) − F (0) = 8
− 8
= 8
.

78
The graph of the probability distribution function FX .

79
EXAMPLE : Toss a coin until ”Heads” occurs.
Then the sample space is countably infinite , namely,
S = {H , T H , T T H , T T T H , · · · } .

The random variable X is the number of rolls until ”Heads” occurs :

X(H) = 1 , X(T H) = 2 , X(T T H) = 3 , ···
Then
1 1
p(1) = 2
, p(2) = 4
, p(3) = 18 , · · · ( Why ? )
and n n
X X 1 1
F (n) = P (X ≤ n) = p(k) = k
= 1 − n ,
k=1 k=1
2 2
and, as should be the case,
∞ n
X X 1
p(k) = lim p(k) = lim (1 − n ) = 1 .
k=1
n→∞
k=1
n→∞ 2

NOTE : The outcomes in S do not have equal probability !

EXERCISE : Draw the probability mass and distribution functions.

80
X(s) is the number of tosses until ”Heads” occurs · · ·

REMARK : We can also take S ≡ Sn as all ordered outcomes of

length n. For example, for n = 4,

S4 = { H̃HHH , H̃HHT , H̃HT H , H̃HT T ,

H̃T HH , H̃T HT , H̃T T H , H̃T T T ,

T H̃HH , T H̃HT , T H̃T H , T H̃T T ,

T T H̃H , T T H̃T , T T T H̃ , TTTT }.

where for each outcome the first ”Heads” is marked as H̃ .

1
Each outcome in S4 has equal probability 2−n (here 2−4 = 16
) , and
1 1 1 1
pX (1) = 2
, pX (2) = 4
, pX (3) = 8
, pX (4) = 16
··· ,
independent of n .

81
Joint distributions
The probability mass function and the probability distribution function
can also be functions of more than one variable.

EXAMPLE : Toss a coin 3 times in sequence. For the sample space

S = {HHH , HHT , HT H , HT T , T HH , T HT , T T H , T T T } ,
we let
X(s) = # Heads , Y (s) = index of the first H (0 for T T T ) .
Then we have the joint probability mass function
pX,Y (x, y) = P (X = x , Y = y) .
For example,
pX,Y (2, 1) = P (X = 2 , Y = 1)

= P ( 2 Heads , 1st toss is Heads)

2 1
= 8
= 4
.

82
EXAMPLE : ( continued · · · ) For
S = {HHH , HHT , HT H , HT T , T HH , T HT , T T H , T T T } ,
X(s) = number of Heads, and Y (s) = index of the first H ,

we can list the values of pX,Y (x, y) :

Joint probability mass function pX,Y (x, y)

y=0 y=1 y=2 y=3 pX (x)
1 1
x=0 8 0 0 0 8
1 1 1 3
x=1 0 8 8 8 8
2 1 3
x=2 0 8 8 0 8
1 1
x=3 0 8 0 0 8
1 4 2 1
pY (y) 8 8 8 8 1

NOTE :
• The marginal probability pX is the probability mass function of X.
• The marginal probability pY is the probability mass function of Y .

83
EXAMPLE : ( continued · · · )
X(s) = number of Heads, and Y (s) = index of the first H .

y=0 y=1 y=2 y=3 pX (x)

1 1
x=0 8 0 0 0 8
1 1 1 3
x=1 0 8 8 8 8
2 1 3
x=2 0 8 8 0 8
1 1
x=3 0 8 0 0 8
1 4 2 1
pY (y) 8 8 8 8 1

For example,

• X = 2 corresponds to the event {HHT , HT H , T HH} .

• Y = 1 corresponds to the event {HHH , HHT , HT H , HT T } .
• (X = 2 and Y = 1) corresponds to the event {HHT , HT H} .

QUESTION : Are the events X = 2 and Y = 1 independent ?

84
X(s)
Y(s)
E 31
HHH
S
E 13 3
TTH
E11
HTT
1
E12 THT

E 21 HTH
2
HHT

E 22 THH

0
E 00
TTT

The events Ei,j ≡ { s ∈ S : X(s) = i , Y (s) = j } are disjoint .

QUESTION : Are the events X = 2 and Y = 1 independent ?

85
DEFINITION :
pX,Y (x, y) ≡ P (X = x , Y = y) ,
is called the joint probability mass function .

DEFINITION :
FX,Y (x, y) ≡ P (X ≤ x , Y ≤ y) ,
is called the joint (cumulative) probability distribution function .

NOTATION : When it is clear what X and Y are then we also

write

p(x, y) for pX,Y (x, y) ,

and
F (x, y) for FX,Y (x, y) .

86
EXAMPLE : Three tosses : X(s) = # Heads, Y (s) = index 1st H .
Joint probability mass function pX,Y (x, y)
y=0 y=1 y=2 y=3 pX (x)
1 1
x=0 8 0 0 0 8
1 1 1 3
x=1 0 8 8 8 8
2 1 3
x=2 0 8 8 0 8
1 1
x=3 0 8 0 0 8
1 4 2 1
pY (y) 8 8 8 8 1

Joint distribution function FX,Y (x, y) ≡ P (X ≤ x, Y ≤ y)

y=0 y=1 y=2 y=3 FX (·)
1 1 1 1 1
x=0 8 8 8 8 8
1 2 3 4 4
x=1 8 8 8 8 8
1 4 6 7 7
x=2 8 8 8 8 8
1 5 7
x=3 8 8 8 1 1
1 5 7
FY (·) 8 8 8 1 1

Note that the distribution function FX is a copy of the 4th column,

and the distribution function FY is a copy of the 4th row. ( Why ? )

87
In the preceding example :
Joint probability mass function pX,Y (x, y)
y=0 y=1 y=2 y=3 pX (x)
1 1
x=0 8 0 0 0 8
1 1 1 3
x=1 0 8 8 8 8
2 1 3
x=2 0 8 8 0 8
1 1
x=3 0 8 0 0 8
1 4 2 1
pY (y) 8 8 8 8 1

Joint distribution function FX,Y (x, y) ≡ P (X ≤ x, Y ≤ y)

y=0 y=1 y=2 y=3 FX (·)
1 1 1 1 1
x=0 8 8 8 8 8
1 2 3 4 4
x=1 8 8 8 8 8
1 4 6 7 7
x=2 8 8 8 8 8
1 5 7
x=3 8 8 8 1 1
1 5 7
FY (·) 8 8 8 1 1

QUESTION : Why is
P (1 < X ≤ 3 , 1 < Y ≤ 3) = F (3, 3) − F (1, 3) − F (3, 1) + F (1, 1) ?

88
EXERCISE :
Roll a four-sided die (tetrahedron) two times.
(The sides are marked 1 , 2 , 3 , 4 .)
Suppose each of the four sides is equally likely to end facing down.
Suppose the outcome of a single roll is the side that faces down ( ! ).

Define the random variables X and Y as

X = result of the first roll , Y = sum of the two rolls.

• What is a good choice of the sample space S ?

• How many outcomes are there in S ?
• List the values of the joint probability mass function pX,Y (x, y) .
• List the values of the joint cumulative distribution function FX,Y (x, y) .

89
EXERCISE :

Three balls are selected at random from a bag containing

2 red , 3 green , 4 blue balls .

Define the random variables

R(s) = the number of red balls drawn,

and
G(s) = the number of green balls drawn .

List the values of

• the joint probability mass function pR,G (r, g) .
• the marginal probability mass functions pR (r) and pG (g) .
• the joint distribution function FR,G (r, g) .
• the marginal distribution functions FR (r) and FG (g) .

90
Independent random variables

Two discrete random variables X(s) and Y (s) are independent if

P (X = x , Y = y) = P (X = x) · P (Y = y) , for all x and y ,

or, equivalently, if their probability mass functions satisfy

pX,Y (x, y) = pX (x) · pY (y) , for all x and y ,

or, equivalently, if the events

Ex ≡ X −1 ({x}) and Ey ≡ Y −1 ({y}) ,
are independent in the sample space S , i.e.,
P (Ex Ey ) = P (Ex ) · P (Ey ) , for all x and y .
NOTE :
• In the current discrete case, x and y are typically integers .
• X −1 ({x}) ≡ { s ∈ S : X(s) = x } .

91
X(s)
Y(s)
E 31
HHH
S
E 13 3
TTH
E11
HTT
1
E12 THT

E 21 HTH
2
HHT

E 22 THH

0
E 00
TTT

Three tosses : X(s) = # Heads, Y (s) = index 1st H .

• What are the values of pX (2) , pY (1) , pX,Y (2, 1) ?

• Are X and Y independent ?

92
RECALL :

X(s) and Y (s) are independent if for all x and y :

pX,Y (x, y) = pX (x) · pY (y) .

EXERCISE :

Roll a die two times in a row.

Let
X be the result of the 1st roll ,
and
Y the result of the 2nd roll .

Are X and Y independent , i.e., is

pX,Y (k, ℓ) = pX (k) · pY (ℓ), for all 1 ≤ k, ℓ ≤ 6 ?

93
EXERCISE :

Are these random variables X and Y independent ?

Joint probability mass function pX,Y (x, y)

y=0 y=1 y=2 y=3 pX (x)
x=0 1
8 0 0 0 1
8
x=1 0 1
8
1
8
1
8
3
8
x=2 0 2
8
1
8 0 3
8
x=3 0 1
8 0 0 1
8
pY (y) 1
8
4
8
2
8
1
8 1

94
EXERCISE : Are these random variables X and Y independent ?

Joint probability mass function pX,Y (x, y)

y = 1 y = 2 y = 3 pX (x)
x=1 1
3
1
12
1
12
1
2
x=2 2
9
1
18
1
18
1
3
x=3 1
9
1
36
1
36
1
6
pY (y) 2
3
1
6
1
6 1

Joint distribution function FX,Y (x, y) ≡ P (X ≤ x, Y ≤ y)

y = 1 y = 2 y = 3 FX (x)
x=1 1
3
5
12
1
2
1
2
x=2 5
9
25
36
5
6
5
6
x=3 2
3
5
6 1 1
FY (y) 2
3
5
6 1 1

QUESTION : Is FX,Y (x, y) = FX (x) · FY (y) ?

95
PROPERTY :
The joint distribution function of independent random variables
X and Y satisfies
FX,Y (x, y) = FX (x) · FY (y) , for all x, y .

PROOF :

FX,Y (xk , yℓ ) = P (X ≤ xk , Y ≤ yℓ )
P P
= i≤k j≤ℓ pX,Y (xi , yj )
P P
= i≤k j≤ℓ pX (xi ) · pY (yj ) (by independence)
P P
= i≤k { pX (xi ) · j≤ℓ pY (yj ) }
P P
= { i≤k pX (xi ) } · { j≤ℓ pY (yj ) }

= FX (xk ) · FY (yℓ ) .

96
Conditional distributions

Let X and Y be discrete random variables with joint probability

mass function
pX,Y (x, y) .

For given x and y , let

Ex = X −1 ({x}) and Ey = Y −1 ({y}) ,
be their corresponding events in the sample space S.

Then
P (Ex Ey ) pX,Y (x, y)
P (Ex |Ey ) ≡ = .
P (Ey ) pY (y)

Thus it is natural to define the conditional probability mass function

pX,Y (x, y)
pX|Y (x|y) ≡ P (X = x | Y = y) = .
pY (y)

97
X(s)
Y(s)
E 31
HHH
S
E 13 3
TTH
E11
HTT
1
E12 THT

E 21 HTH
2
HHT

E 22 THH

0
E 00
TTT

Three tosses : X(s) = # Heads, Y (s) = index 1st H .

• What are the values of P (X = 2 | Y = 1) and P (Y = 1 | X = 2) ?

98
EXAMPLE : (3 tosses : X(s) = # Heads, Y (s) = index 1st H.)
Joint probability mass function pX,Y (x, y)
y = 0 y = 1 y = 2 y = 3 pX (x)
x=0 1
8 0 0 0 1
8
x=1 0 1
8
1
8
1
8
3
8
x=2 0 2
8
1
8 0 3
8
x=3 0 1
8 0 0 1
8
pY (y) 1
8
4
8
2
8
1
8 1
pX,Y (x,y)
Conditional probability mass function pX|Y (x|y) = pY (y)
.
y=0 y=1 y=2 y=3
x=0 1 0 0 0
x=1 0 2
8
4
8 1
x=2 0 4
8
4
8 0
x=3 0 2
8 0 0
1 1 1 1

pX,Y (x,y)
EXERCISE : Also construct the Table for pY |X (y|x) = pX (x)
.

99
EXAMPLE :
Joint probability mass function pX,Y (x, y)
y = 1 y = 2 y = 3 pX (x)
x=1 1
3
1
12
1
12
1
2
x=2 2
9
1
18
1
18
1
3
x=3 1
9
1
36
1
36
1
6
pY (y) 2
3
1
6
1
6 1

pX,Y (x,y)
Conditional probability mass function pX|Y (x|y) = pY (y)
.
y=1 y=2 y=3
x=1 1
2
1
2
1
2
x=2 1
3
1
3
1
3
x=3 1
6
1
6
1
6
1 1 1

QUESTION : What does the last Table tell us?

EXERCISE : Also construct the Table for P (Y = y|X = x) .

100
Expectation
The expected value of a discrete random variable X is
X X
E[X] ≡ xk · P (X = xk ) = xk · pX (xk ) .
k k

Thus E[X] represents the weighted average value of X .

( E[X] is also called the mean of X .)

EXAMPLE : The expected value of rolling a die is

1 1 1 1
P6 7
E[X] = 1 · 6 + 2 · 6 + · · · + 6 · 6 = 6 · k=1 k = 2
.

EXERCISE : Prove the following :

• E[aX] = a E[X] ,
• E[aX + b] = a E[X] + b .

101
EXAMPLE : Toss a coin until ”Heads” occurs. Then
S = {H , T H , T T H , T T T H , · · · } .

The random variable X is the number of tosses until ”Heads” occurs :

X(H) = 1 , X(T H) = 2 , X(T T H) = 3 .
Then
n
1 1 1 X k
E[X] = 1 · + 2 · + 3 · + · · · = lim k
= 2.
2 4 8 n→∞
k=1
2
Pn k
n k=1 k/2
1 0.50000000
2 1.00000000
3 1.37500000
10 1.98828125
40 1.99999999
REMARK :
Perhaps using Sn = {all sequences of n tosses} is better · · ·

102
The expected value of a function of a random variable is

X
E[g(X)] ≡ g(xk ) p(xk ) .
k

EXAMPLE :
The pay-off of rolling a die is $k2 , where k is the side facing up.

What should the entry fee be for the betting to break even?

SOLUTION : Here g(X) = X 2 , and

6
X
21 1 6(6 + 1)(2 · 6 + 1) 91 ∼
E[g(X)] = k = = = $15.17 .
k=1
6 6 6 6

103
The expected value of a function of two random variables is
XX
E[g(X, Y )] ≡ g(xk , yℓ ) p(xk , yℓ ) .
k ℓ

EXAMPLE : y=1 y=2 y=3 pX (x)

x=1 1
3
1
12
1
12
1
2
x=2 2
9
1
18
1
18
1
3
x=3 1
9
1
36
1
36
1
6
pY (y) 2
3
1
6
1
6 1

1 1 1 5
E[X] = 1· 2
+ 2· 3
+ 3· 6
= 3
,
2 1 1 3
E[Y ] = 1· 3
+ 2· 6
+ 3· 6
= 2
,
1 1 1
E[XY ] = 1· 3
+ 2· 12
+ 3· 12

2 1 1
+ 2· 9
+ 4· 18
+ 6· 18

1 1 1 5 ( So ? )
+ 3· 9
+ 6· 36
+ 9· 36
= 2
.

104
PROPERTY :

• If X and Y are independent then E[XY ] = E[X] E[Y ] .

PROOF :
P P
E[XY ] = k ℓ xk yℓ pX,Y (xk , yℓ )
P P
= k ℓ xk yℓ pX (xk ) pY (yℓ ) (by independence)
P P
= k{ xk pX (xk ) ℓ yℓ pY (yℓ )}
P P
= { k xk pX (xk )} · { ℓ yℓ pY (yℓ )}

= E[X] · E[Y ] .

EXAMPLE : See the preceding example !

105
PROPERTY : E[X + Y ] = E[X] + E[Y ] . ( Always ! )

PROOF :
P P
E[X + Y ] = k ℓ (xk + yℓ ) pX,Y (xk , yℓ )
P P P P
= k ℓ xk pX,Y (xk , yℓ ) + k ℓ yℓ pX,Y (xk , yℓ )
P P P P
= k ℓ xk pX,Y (xk , yℓ ) + ℓ k yℓ pX,Y (xk , yℓ )
P P P P
= k {xk ℓ pX,Y (xk , yℓ )} + ℓ{ yℓ k pX,Y (xk , yℓ )}
P P
= k {xk pX (xk )} + ℓ {yℓ pY (yℓ )}

= E[X] + E[Y ] .

NOTE : X and Y need not be independent !

106
EXERCISE :
Probability mass function pX,Y (x, y)
y = 6 y = 8 y = 10 pX (x)
x=1 1
5 0 1
5
2
5
x=2 0 1
5 0 1
5
x=3 1
5 0 1
5
2
5
pY (y) 2
5
1
5
2
5 1

Show that

• E[X] = 2 , E[Y ] = 8 , E[XY ] = 16

• X and Y are not independent

Thus if
E[XY ] = E[X] E[Y ] ,

then it does not necessarily follow that X and Y are independent !

107
Variance and Standard Deviation
Let X have mean
µ = E[X] .

Then the variance of X is

X
2
V ar(X) ≡ E[ (X − µ) ] ≡ (xk − µ)2 p(xk ) ,
k

which is the average weighted square distance from the mean.

We have
V ar(X) = E[X 2 − 2µX + µ2 ]

= E[X 2 ] − 2µE[X] + µ2

= E[X 2 ] − 2µ2 + µ2

= E[X 2 ] − µ2 .

108
The standard deviation of X is
p p p
σ(X) ≡ V ar(X) = E[ (X − µ)2 ] = E[X 2 ] − µ2 .

which is the average weighted distance from the mean.

EXAMPLE : The variance of rolling a die is

6
X 1
V ar(X) = [k2 · ] − µ2
k=1
6

1 6(6 + 1)(2 · 6 + 1) 7 2 35
= − ( ) = .
6 6 2 12

The standard deviation is

r
35 ∼
σ = = 1.70 .
12

109
Covariance
Let X and Y be random variables with mean

E[X] = µX , E[Y ] = µY .

Then the covariance of X and Y is defined as

X
Cov(X, Y ) ≡ E[ (X−µX ) (Y −µY ) ] = (xk −µX ) (yℓ −µY ) p(xk , yℓ ) .
k,ℓ

We have
Cov(X, Y ) = E[ (X − µX ) (Y − µY ) ]

= E[XY − µX Y − µY X + µX µY ]

= E[XY ] − µX µY − µY µX + µX µY

= E[XY ] − E[X] E[Y ] .

110
We defined

Cov(X, Y ) ≡ E[ (X − µX ) (Y − µY ) ]
X
= (xk − µX ) (yℓ − µY ) p(xk , yℓ )
k,ℓ

= E[XY ] − E[X] E[Y ] .

NOTE :
Cov(X, Y ) measures ”concordance ” or ”coherence ” of X and Y :

• If X > µX when Y > µY and X < µX when Y < µY then

Cov(X, Y ) > 0 .

• If X > µX when Y < µY and X < µX when Y > µY then

Cov(X, Y ) < 0 .

111
EXERCISE : Prove the following :

• V ar(aX + b) = a2 V ar(X) ,

• Cov(X, Y ) = Cov(Y, X) ,

• Cov(cX, Y ) = c Cov(X, Y ) ,

• Cov(X, cY ) = c Cov(X, Y ) ,

• Cov(X + Y, Z) = Cov(X, Z) + Cov(Y, Z) ,

• V ar(X + Y ) = V ar(X) + V ar(Y ) + 2 Cov(X, Y ) .

112
PROPERTY :

If X and Y are independent then Cov(X, Y ) = 0 .

PROOF :

We have already shown ( with µX ≡ E[X] and µY ≡ E[Y ] ) that

Cov(X, Y ) ≡ E[ (X − µX ) (Y − µY ) ] = E[XY ] − E[X] E[Y ] ,

and that if X and Y are independent then

E[XY ] = E[X] E[Y ] .

from which the result follows.

113
EXERCISE : ( already used earlier · · · )
Probability mass function pX,Y (x, y)
y = 6 y = 8 y = 10 pX (x)
x=1 1
5 0 1
5
2
5
x=2 0 1
5 0 1
5
x=3 1
5 0 1
5
2
5
pY (y) 2
5
1
5
2
5 1
Show that
• E[X] = 2 , E[Y ] = 8 , E[XY ] = 16
• Cov(X, Y ) = E[XY ] − E[X] E[Y ] = 0
• X and Y are not independent

Thus if
Cov(X, Y ) = 0 ,

then it does not necessarily follow that X and Y are independent !

114
PROPERTY :

If X and Y are independent then

V ar(X + Y ) = V ar(X) + V ar(Y ) .

PROOF :

We have already shown (in an exercise !) that

V ar(X + Y ) = V ar(X) + V ar(Y ) + 2 Cov(X, Y ) ,

and that if X and Y are independent then

Cov(X, Y ) = 0 ,

from which the result follows.

115
EXERCISE :

Compute
E[X] , E[Y ] , E[X 2 ] , E[Y 2 ]

E[XY ] , V ar(X) , V ar(Y )

Cov(X, Y )
for

Joint probability mass function pX,Y (x, y)

y = 0 y = 1 y = 2 y = 3 pX (x)
x=0 1
8 0 0 0 1
8
x=1 0 1
8
1
8
1
8
3
8
x=2 0 2
8
1
8 0 3
8
x=3 0 1
8 0 0 1
8
pY (y) 1
8
4
8
2
8
1
8 1

116
EXERCISE :

Compute
E[X] , E[Y ] , E[X 2 ] , E[Y 2 ]

E[XY ] , V ar(X) , V ar(Y )

Cov(X, Y )
for

Joint probability mass function pX,Y (x, y)

y=1 y=2 y=3 pX (x)
x=1 1
3
1
12
1
12
1
2
x=2 2
9
1
18
1
18
1
3
x=3 1
9
1
36
1
36
1
6
pY (y) 2
3
1
6
1
6 1

117
SPECIAL DISCRETE RANDOM VARIABLES

The Bernoulli Random Variable

A Bernoulli trial has only two outcomes , with probability

P (X = 1) = p,

P (X = 0) = 1−p ,
e.g., tossing a coin, winning or losing a game, · · · .
We have
E[X] = 1 · p + 0 · (1 − p) = p ,

E[X 2 ] = 12 · p + 02 · (1 − p) = p ,

V ar(X) = E[X 2 ] − E[X]2 = p − p2 = p(1 − p) .

NOTE : If p is small then V ar(X) ∼

=p.

118
EXAMPLES :

1
• When p = 2
(e.g., for tossing a coin), we have
1 1
E[X] = p = 2
, V ar(X) = p(1 − p) = 4
.

• When rolling a die , with outcome k , (1 ≤ k ≤ 6) , let

X(k) = 1 if the roll resulted in a six ,
and
X(k) = 0 if the roll did not result in a six .
Then
1 5
E[X] = p = 6
, V ar(X) = p(1 − p) = 36
.

• When p = 0.01 , then

E[X] = 0.01 , V ar(X) = 0.0099 ∼

= 0.01 .

119
The Binomial Random Variable

Perform a Bernoulli trial n times in sequence .

Assume the individual trials are independent .

An outcome could be
100011001010 (n = 12) ,
with probability
P (100011001010) = p5 · (1 − p)7 . ( Why ? )

Let the X be the number of ”successes ” (i.e. 1’s) .

For example,
X(100011001010) = 5 .
We have

12
P (X = 5) = · p5 · (1 − p)7 . ( Why ? )
5

120
In general, for k successes in a sequence of n trials, we have

n
P (X = k) = · pk · (1 − p)n−k , (0 ≤ k ≤ n) .
k

EXAMPLE : Tossing a coin 12 times:

1
n = 12 , p = 2
k pX (k) FX (k)
0 1 / 4096 1 / 4096
1 12 / 4096 13 / 4096
2 66 / 4096 79 / 4096
3 220 / 4096 299 / 4096
4 495 / 4096 794 / 4096
5 792 / 4096 1586 / 4096
6 924 / 4096 2510 / 4096
7 792 / 4096 3302 / 4096
8 495 / 4096 3797 / 4096
9 220 / 4096 4017 / 4096
10 66 / 4096 4083 / 4096
11 12 / 4096 4095 / 4096
12 1 / 4096 4096 / 4096

121
1
The Binomial mass and distribution functions for n = 12 , p = 2

122
For k successes in a sequence of n trials :

n
P (X = k) = · pk · (1 − p)n−k , (0 ≤ k ≤ n) .
k

EXAMPLE : Rolling a die 12 times:

1
n = 12 , p = 6
k pX (k) FX (k)
0 0.1121566221 0.112156
1 0.2691758871 0.381332
2 0.2960935235 0.677426
3 0.1973956972 0.874821
4 0.0888280571 0.963649
5 0.0284249838 0.992074
6 0.0066324966 0.998707
7 0.0011369995 0.999844
8 0.0001421249 0.999986
9 0.0000126333 0.999998
10 0.0000007580 0.999999
11 0.0000000276 0.999999
12 0.0000000005 1.000000

123
1
The Binomial mass and distribution functions for n = 12 , p = 6

124
EXAMPLE :
In 12 rolls of a die write the outcome as, for example,

100011001010
where
1 denotes the roll resulted in a six ,
and
0 denotes the roll did not result in a six .

As before, let X be the number of 1’s in the outcome.

Then X represents the number of sixes in the 12 rolls.

Then, for example, using the preceding Table :

P (X = 5) ∼
= 2.8 % , P (X ≤ 5) ∼
= 99.2 % .

125
EXERCISE : Show that from

n
P (X = k) = · pk · (1 − p)n−k ,
k
and
n
P (X = k + 1) = · pk+1 · (1 − p)n−k−1 ,
k+1
it follows that
P (X = k + 1) = ck · P (X = k) ,
where n−k p
ck = · .
k+1 1−p

NOTE : This recurrence formula is an efficient and stable algorithm

to compute the binomial probabilities :
P (X = 0) = (1 − p)n ,

P (X = k + 1) = ck · P (X = k) , k = 0, 1, · · · , n − 1 .

126
Mean and variance of the Binomial random variable :

By definition, the mean of a Binomial random variable X is

n n
X X n
E[X] = k · P (X = k) = k· pk (1 − p)n−k ,
k
k=0 k=0

which can be shown to equal np .

An easy way to see this is as follows :

If in a sequence of n independent Bernoulli trials we let

Xk = the outcome of the kth Bernoulli trial , (Xk = 0 or 1 ) ,

then
X ≡ X1 + X2 + · · · + Xn ,
is the Binomial random variable that counts the successes ” .

127
X ≡ X1 + X2 + · · · + Xn

We know that
E[Xk ] = p ,
so
E[X] = E[X1 ] + E[X2 ] + · · · + E[Xn ] = np .

We already know that

V ar(Xk ) = E[Xk2 ] − (E[Xk ])2 = p − p2 = p(1 − p) ,

so, since the Xk are independent , we have

V ar(X) = V ar(X1 ) + V ar(X2 ) + · · · + V ar(Xn ) = np(1 − p) .

NOTE : If p is small then V ar(X) ∼

= np .

128
EXAMPLES :

• For 12 tosses of a coin , with Heads is success, we have

1
so n = 12 , p=
2
E[X] = np = 6 , V ar(X) = np(1 − p) = 3 .

• For 12 rolls of a die , with six is success , we have

1
so n = 12 , p=
6
E[X] = np = 2 , V ar(X) = np(1 − p) = 5/3 .

• If n = 500 and p = 0.01 , then

E[X] = np = 5 , V ar(X) = np(1 − p) = 4.95 ∼

= 5.

129
The Poisson Random Variable

The Poisson variable approximates the Binomial random variable :

k

n k n−k ∼ −λ λ
P (X = k) = · p · (1 − p) = e · ,
k k!
when we take

λ = n p ( the average number of successes ) .

This approximation is accurate if n is large and p small .

Recall that for the Binomial random variable

E[X] = n p , and V ar(X) = np(1 − p) ∼
= np when p is small.

Indeed, for the Poisson random variable we will show that

E[X] = λ and V ar(X) = λ .

130
A stable and efficient way to compute the Poisson probability

λk
P (X = k) = e−λ · , k = 0, 1, 2, · · · ,
k!

λk+1
P (X = k + 1) = e−λ · ,
(k + 1)!

is to use the recurrence relation

P (X = 0) = e−λ ,

λ
P (X = k + 1) = · P (X = k) , k = 0, 1, 2, · · · .
k+1

NOTE : Unlike the Binomial random variable, the Poisson random

variable can have an arbitrarily large integer value k.

131
The Poisson random variable
λk
P (X = k) = e−λ · , k = 0, 1, 2, · · · ,
k!
has (as shown later) : E[X] = λ and V ar(X) = λ .

The Poisson distribution function is

k k
X λℓ X λℓ
F (k) = P (X ≤ k) = e−λ = e−λ ,
ℓ=0
ℓ! ℓ=0
ℓ!

with, as should be the case,

∞
X λℓ
lim F (k) = e−λ = e−λ eλ = 1 .
k→∞
ℓ=0
ℓ!

( using the Taylor series from Calculus for eλ ) .

132
The Poisson random variable
−λ λk
P (X = k) = e · , k = 0, 1, 2, · · · ,
k!
models the probability of k ”successes ” in a given ”time” interval,
when the average number of successes is λ .

EXAMPLE : Suppose customers arrive at the rate of six per hour.

The probability that k customers arrive in a one-hour period is
60 ∼
P (k = 0) = e−6 · = 0.0024 ,
0!
61 ∼
P (k = 1) = e−6 · = 0.0148 ,
1!
62 ∼
P (k = 2) = e−6 · = 0.0446 .
2!
The probability that more than 2 customers arrive is
1 − (0.0024 + 0.0148 + 0.0446) ∼
= 0.938 .

133
λk

n ∼
pBinomial (k) = pk (1 − p)n−k = pPoisson (k) = e−λ
k k!

EXAMPLE : λ = 6 customers/hour.
For the Binomial take n = 12 , p = 0.5 (0.5 customers/5 minutes) ,
so that indeed np = λ .
k pBinomial pPoisson FBinomial FPoisson
0 0.0002 0.0024 0.0002 0.0024
1 0.0029 0.0148 0.0031 0.0173
2 0.0161 0.0446 0.0192 0.0619
3 0.0537 0.0892 0.0729 0.1512
4 0.1208 0.1338 0.1938 0.2850
5 0.1933 0.1606 0.3872 0.4456
6 0.2255 0.1606 0.6127 0.6063
7 0.1933 0.1376 0.8061 0.7439
8 0.1208 0.1032 0.9270 0.8472
9 0.0537 0.0688 0.9807 0.9160
10 0.0161 0.0413 0.9968 0.9573
11 0.0029 0.0225 0.9997 0.9799
12 0.0002 0.0112 1.0000 0.9911⋆ Why not 1.0000 ?

Here the approximation is not so good · · ·

134
λk

n ∼
pBinomial (k) = pk (1 − p)n−k = pPoisson (k) = e−λ
k k!
EXAMPLE : λ = 6 customers/hour.
For the Binomial take n = 60 , p = 0.1 (0.1 customers/minute) ,
so that indeed np = λ .
k pBinomial pPoisson FBinomial FPoisson
0 0.0017 0.0024 0.0017 0.0024
1 0.0119 0.0148 0.0137 0.0173
2 0.0392 0.0446 0.0530 0.0619
3 0.0843 0.0892 0.1373 0.1512
4 0.1335 0.1338 0.2709 0.2850
5 0.1662 0.1606 0.4371 0.4456
6 0.1692 0.1606 0.6064 0.6063
7 0.1451 0.1376 0.7515 0.7439
8 0.1068 0.1032 0.8583 0.8472
9 0.0685 0.0688 0.9269 0.9160
10 0.0388 0.0413 0.9657 0.9573
11 0.0196 0.0225 0.9854 0.9799
12 0.0089 0.0112 0.9943 0.9911
13 ··· ··· ··· ···

Here the approximation is better · · ·

135
1
n = 12 , p = 2
, λ=6 n = 200 , p = 0.01 , λ = 2

The Binomial (blue) and Poisson (red) probability mass functions.

For the case n = 200, p = 0.01, the approximation is very good !

136
For the Binomial random variable we found
E[X] = np and V ar(X) = np(1 − p) ,

while for the Poisson random variable, with λ = np we will show

E[X] = np and V ar(X) = np .

Note again that

np(1 − p) ∼
= np , when p is small .

EXAMPLE : In the preceding two Tables we have

n=12 , p=0.5 n=60 , p=0.1
Binomial Poisson Binomial Poisson
E[X] 6.0000 6.0000 E[X] 6.0000 6.0000
V ar[X] 3.0000 6.0000 V ar[X] 5.4000 6.0000
σ[X] 1.7321 2.4495 σ[X] 2.3238 2.4495

137
FACT : (The Method of Moments)

By Taylor expansion of etX about t = 0 , we have

2 2 3 3
h t X t X i
ψ(t) ≡ E[etX ] = E 1 + tX + + + ···
2! 3!
t2 t3
= 1 + t E[X] + E[X 2 ] + E[X 3 ] + · · · .
2! 3!

It follows that
ψ ′ (0) = E[X] , ψ ′′ (0) = E[X 2 ] . ( Why ? )

This sometimes facilitates computing the mean

µ = E[X] ,
and the variance
V ar(X) = E[X 2 ] − µ2 .

138
APPLICATION : The Poisson mean and variance :
∞ ∞
X X λk
ψ(t) ≡ E[etX ] = etk P (X = k) = etk e−λ
k=0 k=0
k!
∞
−λ
X (λet )k et t
= e = e−λ eλ = eλ(e −1) .
k=0
k!

t
Here ψ ′ (t) = λ et eλ(e −1)

′′ t 2 t λ(et −1)

ψ (t) = λ λ (e ) + e e ( Check ! )

so that E[X] = ψ ′ (0) = λ

E[X 2 ] = ψ ′′ (0) = λ(λ + 1) = λ2 + λ

V ar(X) = E[X 2 ] − E[X]2 = λ .

139
EXAMPLE : Defects in a wire occur at the rate of one per 10 meter,
with a Poisson distribution :
−λ λk
P (X = k) = e · , k = 0, 1, 2, · · · .
k!
What is the probability that :

• A 12-meter roll has at no defects?

ANSWER : Here λ = 1.2 , and P (X = 0) = e−λ = 0.3012 .

• A 12-meter roll of wire has one defect?

ANSWER : With λ = 1.2 , P (X = 1) = e−λ · λ = 0.3614 .

• Of five 12-meter rolls two have one defect and three have none?

5
ANSWER : ·0.30123 ·0.36142 = 0.0357 . ( Why ? )
3

140
EXERCISE :
Defects in a certain wire occur at the rate of one per 10 meter.
Assume the defects have a Poisson distribution.
What is the probability that :
• a 20-meter wire has no defects?

• a 20-meter wire has at most 2 defects?

EXERCISE :
Customers arrive at a counter at the rate of 8 per hour.
Assume the arrivals have a Poisson distribution.
What is the probability that :
• no customer arrives in 15 minutes?

• two customers arrive in a period of 30 minutes?

141

Joint distribution and expection
No ratings yet
Joint distribution and expection
36 pages
EE311_Lecture_Chapter_#04_Random_Variables_and_Expectation
No ratings yet
EE311_Lecture_Chapter_#04_Random_Variables_and_Expectation
48 pages
UW MATH-STAT395 Bivariate-Distributions PDF
No ratings yet
UW MATH-STAT395 Bivariate-Distributions PDF
17 pages
Econ-2042- Unit 2-HO
No ratings yet
Econ-2042- Unit 2-HO
12 pages
Chapter-3
No ratings yet
Chapter-3
26 pages
Lecture03 Ch 03 DiscRVs Baron Inf Stats Final FA24
No ratings yet
Lecture03 Ch 03 DiscRVs Baron Inf Stats Final FA24
92 pages
Probab Refresh
No ratings yet
Probab Refresh
7 pages
0.1. Probability Review
No ratings yet
0.1. Probability Review
6 pages
Probability Review Stochastic
No ratings yet
Probability Review Stochastic
23 pages
Lecture 1
No ratings yet
Lecture 1
81 pages
Joint
No ratings yet
Joint
5 pages
CHAPTER-03-Random Variables
No ratings yet
CHAPTER-03-Random Variables
42 pages
Chapter 4
No ratings yet
Chapter 4
97 pages
Random Variables
No ratings yet
Random Variables
14 pages
LECT3 Probability Theory
No ratings yet
LECT3 Probability Theory
42 pages
CHAPTER TWO (2) S
No ratings yet
CHAPTER TWO (2) S
69 pages
Module 2 (Updated)
No ratings yet
Module 2 (Updated)
71 pages
PRP - Unit 2
No ratings yet
PRP - Unit 2
41 pages
3. Probability
No ratings yet
3. Probability
44 pages
Joint Random Variables 1
No ratings yet
Joint Random Variables 1
11 pages
Lect6 PDF
No ratings yet
Lect6 PDF
11 pages
Joint Distributions: A Random Variable Is That Maps To Numbers
No ratings yet
Joint Distributions: A Random Variable Is That Maps To Numbers
37 pages
DOC-20250212-WA0007.
No ratings yet
DOC-20250212-WA0007.
111 pages
Math 5846 Chapter 2
No ratings yet
Math 5846 Chapter 2
102 pages
Randon Variable and Probability distribution
No ratings yet
Randon Variable and Probability distribution
75 pages
Lec-6 Random Variable 1D
No ratings yet
Lec-6 Random Variable 1D
12 pages
Sta 2200 Notes PDF
No ratings yet
Sta 2200 Notes PDF
52 pages
Booklet For Exam
No ratings yet
Booklet For Exam
15 pages
P&S Unit 1
No ratings yet
P&S Unit 1
50 pages
Notes
No ratings yet
Notes
56 pages
10 Introduction To Random Variables: Range (X) (X R: X(S) X For Some S S)
No ratings yet
10 Introduction To Random Variables: Range (X) (X R: X(S) X For Some S S)
5 pages
l4
No ratings yet
l4
49 pages
PRP Module 2
No ratings yet
PRP Module 2
113 pages
Statistical Signal Processing
100% (3)
Statistical Signal Processing
125 pages
EEE251 Probability Methods in Engineering: Instructor: Bakhtiar Ali
No ratings yet
EEE251 Probability Methods in Engineering: Instructor: Bakhtiar Ali
13 pages
Unit-1-Single Random Variable
No ratings yet
Unit-1-Single Random Variable
64 pages
Random Variables
No ratings yet
Random Variables
23 pages
ADC - Lec 2 - Probability
No ratings yet
ADC - Lec 2 - Probability
31 pages
Probability Review
No ratings yet
Probability Review
12 pages
Lecture Notes
No ratings yet
Lecture Notes
23 pages
01 - 2 Random Variables
No ratings yet
01 - 2 Random Variables
43 pages
Probability and Random Process
No ratings yet
Probability and Random Process
12 pages
RANDOMY_DISTRIB very imp
No ratings yet
RANDOMY_DISTRIB very imp
44 pages
StochasticModels 2011 Part 2 v1
No ratings yet
StochasticModels 2011 Part 2 v1
22 pages
3-Joint Probability Distribution-03-02-2024
No ratings yet
3-Joint Probability Distribution-03-02-2024
22 pages
Ignou Stat
No ratings yet
Ignou Stat
320 pages
Block 2
No ratings yet
Block 2
87 pages
Unit-7 IGNOU STATISTICS
No ratings yet
Unit-7 IGNOU STATISTICS
33 pages
Unit 5 Random Variables: Structure
No ratings yet
Unit 5 Random Variables: Structure
20 pages
Ch3 1 DiscreteRandomVariables
No ratings yet
Ch3 1 DiscreteRandomVariables
16 pages
Week 11
No ratings yet
Week 11
24 pages
Random Variables FinalNotes
No ratings yet
Random Variables FinalNotes
57 pages
AE 248: AI and Data Science: Prabhu Ramachandran 2024-01-01
No ratings yet
AE 248: AI and Data Science: Prabhu Ramachandran 2024-01-01
12 pages
RV Intro
No ratings yet
RV Intro
5 pages
Unit 2a
No ratings yet
Unit 2a
18 pages
Module 1 (3)
No ratings yet
Module 1 (3)
12 pages
Chapter - 2
No ratings yet
Chapter - 2
29 pages
ICE513 Module 2-1
No ratings yet
ICE513 Module 2-1
41 pages
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
An Introduction to Linear Algebra and Tensors
From Everand
An Introduction to Linear Algebra and Tensors
M. A. Akivis
1/5 (1)