0% found this document useful (0 votes)
2 views73 pages

Chap3-Conditional Proba and Discrete Random Variables

The document discusses conditional probability, illustrating how additional information can alter the likelihood of events, using examples such as coin tosses and card draws. It introduces Bayes' theorem and the concept of independent events, providing various exercises and examples to reinforce understanding. The document emphasizes the importance of conditional probability in calculating the likelihood of events based on given conditions.

Uploaded by

alexiskadje
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views73 pages

Chap3-Conditional Proba and Discrete Random Variables

The document discusses conditional probability, illustrating how additional information can alter the likelihood of events, using examples such as coin tosses and card draws. It introduces Bayes' theorem and the concept of independent events, providing various exercises and examples to reinforce understanding. The document emphasizes the importance of conditional probability in calculating the likelihood of events based on given conditions.

Uploaded by

alexiskadje
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 73

CONDITIONAL PROBABILITY

Giving more information can change the probability of an event.

EXAMPLE :
If a coin is tossed two times then what is the probability of two
Heads?
ANSWER : 1
.
4

EXAMPLE :
If a coin is tossed two times then what is the probability of two Heads,
given that the first toss gave Heads ?
ANSWER : 1
.
2

45
NOTE :

Several examples will be about playing cards .

A standard deck of playing cards consists of 52 cards :

• Four suits :
Hearts , Diamonds (red ) , and Spades , Clubs (black) .

• Each suit has 13 cards, whose denomination is


2 , 3 , · · · , 10 , Jack , Queen , King , Ace .

• The Jack , Queen , and King are called face cards .

46
EXERCISE :

Suppose we draw a card from a shuffled set of 52 playing cards.

• What is the probability of drawing a Queen ?

• What is the probability of drawing a Queen, given that the card


drawn is of suit Hearts ?

• What is the probability of drawing a Queen, given that the card


drawn is a Face card ?

What do the answers tell us?


(We’ll soon learn the events ”Queen” and ”Hearts” are independent .)

47
The two preceding questions are examples of conditional probability .

Conditional probability is an important and useful concept.

If E and F are events, i.e., subsets of a sample space S , then

P (E|F ) is the conditional probability of E , given F ,

defined as
P (EF )
P (E|F ) ≡ .
P (F )
or, equivalently

P (EF ) = P (E|F ) P (F ) ,

(assuming that P (F ) is not zero).

48
P (EF )
P (E|F ) ≡
P (F )

E
S S E
F
F

Suppose that the 6 outcomes in S are equally likely.

What is P (E|F ) in each of these two cases ?

49
P (EF )
P (E|F ) ≡
P (F )

S E S F

Suppose that the 6 outcomes in S are equally likely.

What is P (E|F ) in each of these two cases ?

50
EXAMPLE : Suppose a coin is tossed two times.

The sample space is


S = {HH , HT , T H , T T } .

Let E be the event ”two Heads ” , i.e.,


E = {HH} .

Let F be the event ”the first toss gives Heads ” , i.e.,


F = {HH , HT } .
Then
EF = {HH} = E ( since E ⊂ F ) .

We have
1
P (EF ) P (E) 4 1
P (E|F ) = = = 2 = .
P (F ) P (F ) 4
2

51
EXAMPLE :
Suppose we draw a card from a shuffled set of 52 playing cards.
• What is the probability of drawing a Queen, given that the card
drawn is of suit Hearts ?
ANSWER :
1
P (QH) 52 1
P (Q|H) = = 13 = .
P (H) 52
13

• What is the probability of drawing a Queen, given that the card


drawn is a Face card ?
ANSWER :
4
P (QF ) P (Q) 52 1
P (Q|F ) = = = 12 = .
P (F ) P (F ) 52
3

(Here Q ⊂ F , so that QF = Q .)

52
The probability of an event E is sometimes computed more easily

if we condition E on another event F ,

namely, from

P (E) = P ( E(F ∪ F c ) ) ( Why ? )

= P ( EF ∪ EF c ) = P (EF ) + P (EF c ) ( Why ? )


and

P (EF ) = P (E|F ) P (F ) , P (EF c ) = P (E|F c ) P (F c ) ,

we obtain this basic formula

P (E) = P (E|F ) · P (F ) + P (E|F c ) · P (F c ) .

53
EXAMPLE :

An insurance company has these data :

The probability of an insurance claim in a period of one year is

4 percent for persons under age 30

2 percent for persons over age 30

and it is known that

30 percent of the targeted population is under age 30.

What is the probability of an insurance claim in a period of one year


for a randomly chosen person from the targeted population?

54
SOLUTION :

Let the sample space S be all persons under consideration.

Let C be the event (subset of S) of persons filing a claim.

Let U be the event (subset of S) of persons under age 30.

Then U c is the event (subset of S) of persons over age 30.

Thus
P (C) = P (C|U ) P (U ) + P (C|U c ) P (U c )
4 3 2 7
= +
100 10 100 10
26
= = 2.6% .
1000

55
EXAMPLE :

Two balls are drawn from a bag with 2 white and 3 black balls.

There are 20 outcomes (sequences) in S . ( Why ? )

What is the probability that the second ball is white ?

SOLUTION :

Let F be the event that the first ball is white.

Let S be the event that the second second ball is white.

Then
c c 1 2 2 3 2
P (S) = P (S|F ) P (F ) + P (S|F ) P (F ) = · + · = .
4 5 4 5 5

QUESTION : Is it surprising that P (S) = P (F ) ?

56
EXAMPLE : ( continued · · · )
Is it surprising that P (S) = P (F ) ?

ANSWER : Not really, if one considers the sample space S :


n
w 1 w2 , w1 b 1 , w1 b 2 , w1 b 3 ,

w2 w1 , w2 b 1 , w2 b 2 , w2 b 3 ,

b 1 w1 , b 1 w2 , b1 b2 , b1 b3 ,

b 2 w1 , b 2 w2 , b2 b1 , b2 b3 ,
o
b 3 w1 , b 3 w2 , b3 b1 , b3 b2 ,

where outcomes (sequences) are assumed equally likely.

57
EXAMPLE :

Suppose we draw two cards from a shuffled set of 52 playing cards.

What is the probability that the second card is a Queen ?

ANSWER :

P (2nd card Q) =

P (2nd card Q|1st card Q) · P (1st card Q)

+ P (2nd card Q|1st card not Q) · P (1st card not Q)

3 4 4 48 204 4 1
= · + · = = = .
51 52 51 52 51 · 52 52 13

QUESTION : Is it surprising that P (2nd card Q) = P (1st card Q) ?

58
A useful formula that ”inverts conditioning ” is derived as follows :

Since we have both

P (EF ) = P (E|F ) P (F ) ,

and
P (EF ) = P (F |E) P (E) .

If P (E) 6= 0 then it follows that

P (EF ) P (E|F ) · P (F )
P (F |E) = = ,
P (E) P (E)
and, using the earlier useful formula, we get
P (E|F ) · P (F )
P (F |E) = ,
P (E|F ) · P (F ) + P (E|F c ) · P (F c )

which is known as Bayes’ formula .

59
EXAMPLE : Suppose 1 in 1000 persons has a certain disease.
A test detects the disease in 99 % of diseased persons.
The test also ”detects” the disease in 5 % of healthly persons.
With what probability does a positive test diagnose the disease?

SOLUTION : Let
D ∼ ”diseased” , H ∼ ”healthy” , + ∼ ”positive”.
We are given that
P (D) = 0.001 , P (+|D) = 0.99 , P (+|H) = 0.05 .

By Bayes’ formula
P (+|D) · P (D)
P (D|+) =
P (+|D) · P (D) + P (+|H) · P (H)

0.99 · 0.001 ∼
= = 0.0194 (!)
0.99 · 0.001 + 0.05 · 0.999

60
EXERCISE :
Suppose 1 in 100 products has a certain defect.

A test detects the defect in 95 % of defective products.

The test also ”detects” the defect in 10 % of non-defective products.

• With what probability does a positive test diagnose a defect?

EXERCISE :
Suppose 1 in 2000 persons has a certain disease.

A test detects the disease in 90 % of diseased persons.

The test also ”detects” the disease in 5 % of healthly persons.

• With what probability does a positive test diagnose the disease?

61
More generally, if the sample space S is the union of disjoint events
S = F1 ∪ F2 ∪ · · · ∪ Fn ,
then for any event E
P (E|Fi ) · P (Fi )
P (Fi |E) = .
P (E|F1 ) · P (F1 ) + P (E|F2 ) · P (F2 ) + · · · + P (E|Fn ) · P (Fn )

EXERCISE :
Machines M1 , M2 , M3 produce these proportions of a article

Production : M1 : 10 % , M2 : 30 % , M3 : 60 % .

The probability the machines produce defective articles is

Defects : M1 : 4 % , M2 : 3 % , M3 : 2 % .

What is the probability a random article was made by machine M1 ,


given that it is defective?

62
Independent Events

Two events E and F are independent if

P (EF ) = P (E) P (F ) .

In this case
P (EF ) P (E) P (F )
P (E|F ) = = = P (E) ,
P (F ) P (F )

(assuming P (F ) is not zero).

Thus

knowing F occurred doesn’t change the probability of E .

63
EXAMPLE : Draw one card from a deck of 52 playing cards.
Counting outcomes we find
12 3
P (Face Card) = 52
= 13
,

13 1
P (Hearts) = 52
= 4
,

3
P (Face Card and Hearts) = 52
,

3
P (Face Card|Hearts) = 13
.
We see that
3
P (Face Card and Hearts) = P (Face Card) · P (Hearts) (= ).
52
Thus the events ”Face Card ” and ”Hearts ” are independent.

Therefore we also have


3
P (Face Card|Hearts) = P (Face Card) (= ).
13

64
EXERCISE :

Which of the following pairs of events are independent?

(1) drawing ”Hearts” and drawing ”Black” ,

(2) drawing ”Black” and drawing ”Ace” ,

(3) the event {2, 3, · · · , 9} and drawing ”Red” .

65
EXERCISE : Two numbers are drawn at random from the set
{1, 2, 3, 4}.

If order is not important then what is the sample space S ?

Define the following functions on S :

X( {i, j} ) = i + j , Y ( {i, j} ) = |i − j| .

Which of the following pairs of events are independent?

(1) X = 5 and Y = 2 ,

(2) X = 5 and Y = 1 .

REMARK :
X and Y are examples of random variables . (More soon!)

66
EXAMPLE : If E and F are independent then so are E and F c .

PROOF : E = E(F ∪ F c ) = EF ∪ EF c , where


EF and EF c are disjoint .
Thus
P (E) = P (EF ) + P (EF c ) ,
from which

P (EF c ) = P (E) − P (EF )

= P (E) − P (E) · P (F ) (since E and F independent)

= P (E) · ( 1 − P (F ) )

= P (E) · P (F c ) .

EXERCISE :
Prove that if E and F are independent then so are E c and F c .

67
NOTE : Independence and disjointness are different things !

E
S S E
F
F

Independent, but not disjoint. Disjoint, but not independent.


(The six outcomes in S are assumed to have equal probability.)

If E and F are independent then P (EF ) = P (E) P (F ) .


If E and F are disjoint then P (EF ) = P ( ∅ ) = 0 .

If E and F are independent and disjoint then one has zero probability !

68
Three events E , F , and G are independent if

P (EF G) = P (E) P (F ) P (G) .


and
P (EF ) = P (E) P (F ) .
P (EG) = P (E) P (G) .
P (F G) = P (F ) P (G) .

EXERCISE : Are the three events of drawing

(1) a red card ,

(2) a face card ,

(3) a Heart or Spade ,


independent ?

69
EXERCISE :

A machine M consists of three independent parts, M1 , M2 , and M3 .

Suppose that
9
M1 functions properly with probability 10
,

9
M2 functions properly with probability 10
,

8
M3 functions properly with probability 10
,
and that

the machine M functions if and only if its three parts function.

• What is the probability for the machine M to function ?


• What is the probability for the machine M to malfunction ?

70
DISCRETE RANDOM VARIABLES

DEFINITION : A discrete random variable is a function X(s) from


a finite or countably infinite sample space S to the real numbers :
X(·) : S → R.

EXAMPLE : Toss a coin 3 times in sequence. The sample space


is
S = {HHH , HHT , HT H , HT T , T HH , T HT , T T H , T T T } ,
and examples of random variables are
• X(s) = the number of Heads in the sequence ; e.g., X(HT H) = 2 ,

• Y (s) = The index of the first H ; e.g., Y (T T H) = 3 ,


0 if the sequence has no H , i.e., Y (T T T ) = 0 .

NOTE : In this example X(s) and Y (s) are actually integers .

71
Value-ranges of a random variable correspond to events in S .

EXAMPLE : For the sample space


S = {HHH , HHT , HT H , HT T , T HH , T HT , T T H , T T T } ,
with
X(s) = the number of Heads ,
the value
X(s) = 2 , corresponds to the event {HHT , HT H , T HH} ,
and the values
1 < X(s) ≤ 3 , correspond to {HHH , HHT , HT H , T HH} .

NOTATION : If it is clear what S is then we often just write


X instead of X(s) .

72
Value-ranges of a random variable correspond to events in S ,
and
events in S have a probability .
Thus
Value-ranges of a random variable have a probability .

EXAMPLE : For the sample space


S = {HHH , HHT , HT H , HT T , T HH , T HT , T T H , T T T } ,
with X(s) = the number of Heads ,
we have
6
P (0 < X ≤ 2) = .
8

QUESTION : What are the values of


P (X ≤ −1) , P (X ≤ 0) , P (X ≤ 1) , P (X ≤ 2) , P (X ≤ 3) , P (X ≤ 4) ?

73
NOTATION : We will also write pX (x) to denote P (X = x) .

EXAMPLE : For the sample space

S = {HHH , HHT , HT H , HT T , T HH , T HT , T T H , T T T } ,
with
X(s) = the number of Heads ,
we have
1
pX (0) ≡ P ( {T T T } ) = 8

3
pX (1) ≡ P ( {HT T , T HT , T T H} ) = 8

3
pX (2) ≡ P ( {HHT , HT H , T HH} ) = 8

1
pX (3) ≡ P ( {HHH} ) = 8
where
pX (0) + pX (1) + pX (2) + pX (3) = 1 . ( Why ? )

74
E3 X(s)
HHH
S
E2 3
HHT

HTH
2
THH

HTT
E1 1
THT

TTH

E0 0
TTT

Graphical representation of X .

The events E0 , E1 , E2 , E3 are disjoint since X(s) is a function !


(X : S → R must be defined for all s ∈ S and must be single-valued.)

75
The graph of pX .

76
DEFINITION :
pX (x) ≡ P (X = x) ,
is called the probability mass function .

DEFINITION :
FX (x) ≡ P (X ≤ x) ,
is called the (cumulative) probability distribution function .

PROPERTIES :
• FX (x) is a non-decreasing function of x . ( Why ? )
• FX (−∞) = 0 and FX (∞) = 1 . ( Why ? )
• P (a < X ≤ b) = FX (b) − FX (a) . ( Why ? )

NOTATION : When it is clear what X is then we also write

p(x) for pX (x) and F (x) for FX (x) .

77
EXAMPLE : With X(s) = the number of Heads , and
S = {HHH , HHT , HT H , HT T , T HH , T HT , T T H , T T T } ,
1 3 3 1
p(0) = 8
, p(1) = 8
, p(2) = 8
, p(3) = 8
,
we have the probability distribution function

F (−1) ≡ P (X ≤ −1) = 0
1
F ( 0) ≡ P (X ≤ 0) = 8
4
F ( 1) ≡ P (X ≤ 1) = 8
7
F ( 2) ≡ P (X ≤ 2) = 8
F ( 3) ≡ P (X ≤ 3) = 1
F ( 4) ≡ P (X ≤ 4) = 1
We see, for example, that
P (0 < X ≤ 2) = P (X = 1) + P (X = 2)

7 1 6
= F (2) − F (0) = 8
− 8
= 8
.

78
The graph of the probability distribution function FX .

79
EXAMPLE : Toss a coin until ”Heads” occurs.
Then the sample space is countably infinite , namely,
S = {H , T H , T T H , T T T H , · · · } .

The random variable X is the number of rolls until ”Heads” occurs :


X(H) = 1 , X(T H) = 2 , X(T T H) = 3 , ···
Then
1 1
p(1) = 2
, p(2) = 4
, p(3) = 18 , · · · ( Why ? )
and n n
X X 1 1
F (n) = P (X ≤ n) = p(k) = k
= 1 − n ,
k=1 k=1
2 2
and, as should be the case,
∞ n
X X 1
p(k) = lim p(k) = lim (1 − n ) = 1 .
k=1
n→∞
k=1
n→∞ 2

NOTE : The outcomes in S do not have equal probability !


EXERCISE : Draw the probability mass and distribution functions.

80
X(s) is the number of tosses until ”Heads” occurs · · ·

REMARK : We can also take S ≡ Sn as all ordered outcomes of


length n. For example, for n = 4,

S4 = { H̃HHH , H̃HHT , H̃HT H , H̃HT T ,

H̃T HH , H̃T HT , H̃T T H , H̃T T T ,

T H̃HH , T H̃HT , T H̃T H , T H̃T T ,

T T H̃H , T T H̃T , T T T H̃ , TTTT }.

where for each outcome the first ”Heads” is marked as H̃ .


1
Each outcome in S4 has equal probability 2−n (here 2−4 = 16
) , and
1 1 1 1
pX (1) = 2
, pX (2) = 4
, pX (3) = 8
, pX (4) = 16
··· ,
independent of n .

81
Joint distributions
The probability mass function and the probability distribution function
can also be functions of more than one variable.

EXAMPLE : Toss a coin 3 times in sequence. For the sample space

S = {HHH , HHT , HT H , HT T , T HH , T HT , T T H , T T T } ,
we let
X(s) = # Heads , Y (s) = index of the first H (0 for T T T ) .
Then we have the joint probability mass function
pX,Y (x, y) = P (X = x , Y = y) .
For example,
pX,Y (2, 1) = P (X = 2 , Y = 1)

= P ( 2 Heads , 1st toss is Heads)

2 1
= 8
= 4
.

82
EXAMPLE : ( continued · · · ) For
S = {HHH , HHT , HT H , HT T , T HH , T HT , T T H , T T T } ,
X(s) = number of Heads, and Y (s) = index of the first H ,

we can list the values of pX,Y (x, y) :

Joint probability mass function pX,Y (x, y)


y=0 y=1 y=2 y=3 pX (x)
1 1
x=0 8 0 0 0 8
1 1 1 3
x=1 0 8 8 8 8
2 1 3
x=2 0 8 8 0 8
1 1
x=3 0 8 0 0 8
1 4 2 1
pY (y) 8 8 8 8 1

NOTE :
• The marginal probability pX is the probability mass function of X.
• The marginal probability pY is the probability mass function of Y .

83
EXAMPLE : ( continued · · · )
X(s) = number of Heads, and Y (s) = index of the first H .

y=0 y=1 y=2 y=3 pX (x)


1 1
x=0 8 0 0 0 8
1 1 1 3
x=1 0 8 8 8 8
2 1 3
x=2 0 8 8 0 8
1 1
x=3 0 8 0 0 8
1 4 2 1
pY (y) 8 8 8 8 1

For example,

• X = 2 corresponds to the event {HHT , HT H , T HH} .


• Y = 1 corresponds to the event {HHH , HHT , HT H , HT T } .
• (X = 2 and Y = 1) corresponds to the event {HHT , HT H} .

QUESTION : Are the events X = 2 and Y = 1 independent ?

84
X(s)
Y(s)
E 31
HHH
S
E 13 3
TTH
E11
HTT
1
E12 THT

E 21 HTH
2
HHT

E 22 THH

0
E 00
TTT

The events Ei,j ≡ { s ∈ S : X(s) = i , Y (s) = j } are disjoint .


QUESTION : Are the events X = 2 and Y = 1 independent ?

85
DEFINITION :
pX,Y (x, y) ≡ P (X = x , Y = y) ,
is called the joint probability mass function .

DEFINITION :
FX,Y (x, y) ≡ P (X ≤ x , Y ≤ y) ,
is called the joint (cumulative) probability distribution function .

NOTATION : When it is clear what X and Y are then we also


write

p(x, y) for pX,Y (x, y) ,


and
F (x, y) for FX,Y (x, y) .

86
EXAMPLE : Three tosses : X(s) = # Heads, Y (s) = index 1st H .
Joint probability mass function pX,Y (x, y)
y=0 y=1 y=2 y=3 pX (x)
1 1
x=0 8 0 0 0 8
1 1 1 3
x=1 0 8 8 8 8
2 1 3
x=2 0 8 8 0 8
1 1
x=3 0 8 0 0 8
1 4 2 1
pY (y) 8 8 8 8 1

Joint distribution function FX,Y (x, y) ≡ P (X ≤ x, Y ≤ y)


y=0 y=1 y=2 y=3 FX (·)
1 1 1 1 1
x=0 8 8 8 8 8
1 2 3 4 4
x=1 8 8 8 8 8
1 4 6 7 7
x=2 8 8 8 8 8
1 5 7
x=3 8 8 8 1 1
1 5 7
FY (·) 8 8 8 1 1

Note that the distribution function FX is a copy of the 4th column,


and the distribution function FY is a copy of the 4th row. ( Why ? )

87
In the preceding example :
Joint probability mass function pX,Y (x, y)
y=0 y=1 y=2 y=3 pX (x)
1 1
x=0 8 0 0 0 8
1 1 1 3
x=1 0 8 8 8 8
2 1 3
x=2 0 8 8 0 8
1 1
x=3 0 8 0 0 8
1 4 2 1
pY (y) 8 8 8 8 1

Joint distribution function FX,Y (x, y) ≡ P (X ≤ x, Y ≤ y)


y=0 y=1 y=2 y=3 FX (·)
1 1 1 1 1
x=0 8 8 8 8 8
1 2 3 4 4
x=1 8 8 8 8 8
1 4 6 7 7
x=2 8 8 8 8 8
1 5 7
x=3 8 8 8 1 1
1 5 7
FY (·) 8 8 8 1 1

QUESTION : Why is
P (1 < X ≤ 3 , 1 < Y ≤ 3) = F (3, 3) − F (1, 3) − F (3, 1) + F (1, 1) ?

88
EXERCISE :
Roll a four-sided die (tetrahedron) two times.
(The sides are marked 1 , 2 , 3 , 4 .)
Suppose each of the four sides is equally likely to end facing down.
Suppose the outcome of a single roll is the side that faces down ( ! ).

Define the random variables X and Y as

X = result of the first roll , Y = sum of the two rolls.

• What is a good choice of the sample space S ?


• How many outcomes are there in S ?
• List the values of the joint probability mass function pX,Y (x, y) .
• List the values of the joint cumulative distribution function FX,Y (x, y) .

89
EXERCISE :

Three balls are selected at random from a bag containing

2 red , 3 green , 4 blue balls .

Define the random variables

R(s) = the number of red balls drawn,


and
G(s) = the number of green balls drawn .

List the values of


• the joint probability mass function pR,G (r, g) .
• the marginal probability mass functions pR (r) and pG (g) .
• the joint distribution function FR,G (r, g) .
• the marginal distribution functions FR (r) and FG (g) .

90
Independent random variables

Two discrete random variables X(s) and Y (s) are independent if


P (X = x , Y = y) = P (X = x) · P (Y = y) , for all x and y ,

or, equivalently, if their probability mass functions satisfy


pX,Y (x, y) = pX (x) · pY (y) , for all x and y ,

or, equivalently, if the events


Ex ≡ X −1 ({x}) and Ey ≡ Y −1 ({y}) ,
are independent in the sample space S , i.e.,
P (Ex Ey ) = P (Ex ) · P (Ey ) , for all x and y .
NOTE :
• In the current discrete case, x and y are typically integers .
• X −1 ({x}) ≡ { s ∈ S : X(s) = x } .

91
X(s)
Y(s)
E 31
HHH
S
E 13 3
TTH
E11
HTT
1
E12 THT

E 21 HTH
2
HHT

E 22 THH

0
E 00
TTT

Three tosses : X(s) = # Heads, Y (s) = index 1st H .

• What are the values of pX (2) , pY (1) , pX,Y (2, 1) ?


• Are X and Y independent ?

92
RECALL :

X(s) and Y (s) are independent if for all x and y :

pX,Y (x, y) = pX (x) · pY (y) .

EXERCISE :

Roll a die two times in a row.

Let
X be the result of the 1st roll ,
and
Y the result of the 2nd roll .

Are X and Y independent , i.e., is

pX,Y (k, ℓ) = pX (k) · pY (ℓ), for all 1 ≤ k, ℓ ≤ 6 ?

93
EXERCISE :

Are these random variables X and Y independent ?

Joint probability mass function pX,Y (x, y)


y=0 y=1 y=2 y=3 pX (x)
x=0 1
8 0 0 0 1
8
x=1 0 1
8
1
8
1
8
3
8
x=2 0 2
8
1
8 0 3
8
x=3 0 1
8 0 0 1
8
pY (y) 1
8
4
8
2
8
1
8 1

94
EXERCISE : Are these random variables X and Y independent ?

Joint probability mass function pX,Y (x, y)


y = 1 y = 2 y = 3 pX (x)
x=1 1
3
1
12
1
12
1
2
x=2 2
9
1
18
1
18
1
3
x=3 1
9
1
36
1
36
1
6
pY (y) 2
3
1
6
1
6 1

Joint distribution function FX,Y (x, y) ≡ P (X ≤ x, Y ≤ y)


y = 1 y = 2 y = 3 FX (x)
x=1 1
3
5
12
1
2
1
2
x=2 5
9
25
36
5
6
5
6
x=3 2
3
5
6 1 1
FY (y) 2
3
5
6 1 1

QUESTION : Is FX,Y (x, y) = FX (x) · FY (y) ?

95
PROPERTY :
The joint distribution function of independent random variables
X and Y satisfies
FX,Y (x, y) = FX (x) · FY (y) , for all x, y .

PROOF :

FX,Y (xk , yℓ ) = P (X ≤ xk , Y ≤ yℓ )
P P
= i≤k j≤ℓ pX,Y (xi , yj )
P P
= i≤k j≤ℓ pX (xi ) · pY (yj ) (by independence)
P P
= i≤k { pX (xi ) · j≤ℓ pY (yj ) }
P P
= { i≤k pX (xi ) } · { j≤ℓ pY (yj ) }

= FX (xk ) · FY (yℓ ) .

96
Conditional distributions

Let X and Y be discrete random variables with joint probability


mass function
pX,Y (x, y) .

For given x and y , let


Ex = X −1 ({x}) and Ey = Y −1 ({y}) ,
be their corresponding events in the sample space S.

Then
P (Ex Ey ) pX,Y (x, y)
P (Ex |Ey ) ≡ = .
P (Ey ) pY (y)

Thus it is natural to define the conditional probability mass function


pX,Y (x, y)
pX|Y (x|y) ≡ P (X = x | Y = y) = .
pY (y)

97
X(s)
Y(s)
E 31
HHH
S
E 13 3
TTH
E11
HTT
1
E12 THT

E 21 HTH
2
HHT

E 22 THH

0
E 00
TTT

Three tosses : X(s) = # Heads, Y (s) = index 1st H .

• What are the values of P (X = 2 | Y = 1) and P (Y = 1 | X = 2) ?

98
EXAMPLE : (3 tosses : X(s) = # Heads, Y (s) = index 1st H.)
Joint probability mass function pX,Y (x, y)
y = 0 y = 1 y = 2 y = 3 pX (x)
x=0 1
8 0 0 0 1
8
x=1 0 1
8
1
8
1
8
3
8
x=2 0 2
8
1
8 0 3
8
x=3 0 1
8 0 0 1
8
pY (y) 1
8
4
8
2
8
1
8 1
pX,Y (x,y)
Conditional probability mass function pX|Y (x|y) = pY (y)
.
y=0 y=1 y=2 y=3
x=0 1 0 0 0
x=1 0 2
8
4
8 1
x=2 0 4
8
4
8 0
x=3 0 2
8 0 0
1 1 1 1

pX,Y (x,y)
EXERCISE : Also construct the Table for pY |X (y|x) = pX (x)
.

99
EXAMPLE :
Joint probability mass function pX,Y (x, y)
y = 1 y = 2 y = 3 pX (x)
x=1 1
3
1
12
1
12
1
2
x=2 2
9
1
18
1
18
1
3
x=3 1
9
1
36
1
36
1
6
pY (y) 2
3
1
6
1
6 1

pX,Y (x,y)
Conditional probability mass function pX|Y (x|y) = pY (y)
.
y=1 y=2 y=3
x=1 1
2
1
2
1
2
x=2 1
3
1
3
1
3
x=3 1
6
1
6
1
6
1 1 1

QUESTION : What does the last Table tell us?


EXERCISE : Also construct the Table for P (Y = y|X = x) .

100
Expectation
The expected value of a discrete random variable X is
X X
E[X] ≡ xk · P (X = xk ) = xk · pX (xk ) .
k k

Thus E[X] represents the weighted average value of X .

( E[X] is also called the mean of X .)

EXAMPLE : The expected value of rolling a die is


1 1 1 1
P6 7
E[X] = 1 · 6 + 2 · 6 + · · · + 6 · 6 = 6 · k=1 k = 2
.

EXERCISE : Prove the following :


• E[aX] = a E[X] ,
• E[aX + b] = a E[X] + b .

101
EXAMPLE : Toss a coin until ”Heads” occurs. Then
S = {H , T H , T T H , T T T H , · · · } .

The random variable X is the number of tosses until ”Heads” occurs :


X(H) = 1 , X(T H) = 2 , X(T T H) = 3 .
Then
n
1 1 1 X k
E[X] = 1 · + 2 · + 3 · + · · · = lim k
= 2.
2 4 8 n→∞
k=1
2
Pn k
n k=1 k/2
1 0.50000000
2 1.00000000
3 1.37500000
10 1.98828125
40 1.99999999
REMARK :
Perhaps using Sn = {all sequences of n tosses} is better · · ·

102
The expected value of a function of a random variable is

X
E[g(X)] ≡ g(xk ) p(xk ) .
k

EXAMPLE :
The pay-off of rolling a die is $k2 , where k is the side facing up.

What should the entry fee be for the betting to break even?

SOLUTION : Here g(X) = X 2 , and


6
X
21 1 6(6 + 1)(2 · 6 + 1) 91 ∼
E[g(X)] = k = = = $15.17 .
k=1
6 6 6 6

103
The expected value of a function of two random variables is
XX
E[g(X, Y )] ≡ g(xk , yℓ ) p(xk , yℓ ) .
k ℓ

EXAMPLE : y=1 y=2 y=3 pX (x)


x=1 1
3
1
12
1
12
1
2
x=2 2
9
1
18
1
18
1
3
x=3 1
9
1
36
1
36
1
6
pY (y) 2
3
1
6
1
6 1

1 1 1 5
E[X] = 1· 2
+ 2· 3
+ 3· 6
= 3
,
2 1 1 3
E[Y ] = 1· 3
+ 2· 6
+ 3· 6
= 2
,
1 1 1
E[XY ] = 1· 3
+ 2· 12
+ 3· 12

2 1 1
+ 2· 9
+ 4· 18
+ 6· 18

1 1 1 5 ( So ? )
+ 3· 9
+ 6· 36
+ 9· 36
= 2
.

104
PROPERTY :

• If X and Y are independent then E[XY ] = E[X] E[Y ] .

PROOF :
P P
E[XY ] = k ℓ xk yℓ pX,Y (xk , yℓ )
P P
= k ℓ xk yℓ pX (xk ) pY (yℓ ) (by independence)
P P
= k{ xk pX (xk ) ℓ yℓ pY (yℓ )}
P P
= { k xk pX (xk )} · { ℓ yℓ pY (yℓ )}

= E[X] · E[Y ] .

EXAMPLE : See the preceding example !

105
PROPERTY : E[X + Y ] = E[X] + E[Y ] . ( Always ! )

PROOF :
P P
E[X + Y ] = k ℓ (xk + yℓ ) pX,Y (xk , yℓ )
P P P P
= k ℓ xk pX,Y (xk , yℓ ) + k ℓ yℓ pX,Y (xk , yℓ )
P P P P
= k ℓ xk pX,Y (xk , yℓ ) + ℓ k yℓ pX,Y (xk , yℓ )
P P P P
= k {xk ℓ pX,Y (xk , yℓ )} + ℓ{ yℓ k pX,Y (xk , yℓ )}
P P
= k {xk pX (xk )} + ℓ {yℓ pY (yℓ )}

= E[X] + E[Y ] .

NOTE : X and Y need not be independent !

106
EXERCISE :
Probability mass function pX,Y (x, y)
y = 6 y = 8 y = 10 pX (x)
x=1 1
5 0 1
5
2
5
x=2 0 1
5 0 1
5
x=3 1
5 0 1
5
2
5
pY (y) 2
5
1
5
2
5 1

Show that

• E[X] = 2 , E[Y ] = 8 , E[XY ] = 16

• X and Y are not independent

Thus if
E[XY ] = E[X] E[Y ] ,

then it does not necessarily follow that X and Y are independent !

107
Variance and Standard Deviation
Let X have mean
µ = E[X] .

Then the variance of X is


X
2
V ar(X) ≡ E[ (X − µ) ] ≡ (xk − µ)2 p(xk ) ,
k

which is the average weighted square distance from the mean.

We have
V ar(X) = E[X 2 − 2µX + µ2 ]

= E[X 2 ] − 2µE[X] + µ2

= E[X 2 ] − 2µ2 + µ2

= E[X 2 ] − µ2 .

108
The standard deviation of X is
p p p
σ(X) ≡ V ar(X) = E[ (X − µ)2 ] = E[X 2 ] − µ2 .

which is the average weighted distance from the mean.

EXAMPLE : The variance of rolling a die is


6
X 1
V ar(X) = [k2 · ] − µ2
k=1
6

1 6(6 + 1)(2 · 6 + 1) 7 2 35
= − ( ) = .
6 6 2 12

The standard deviation is


r
35 ∼
σ = = 1.70 .
12

109
Covariance
Let X and Y be random variables with mean

E[X] = µX , E[Y ] = µY .

Then the covariance of X and Y is defined as


X
Cov(X, Y ) ≡ E[ (X−µX ) (Y −µY ) ] = (xk −µX ) (yℓ −µY ) p(xk , yℓ ) .
k,ℓ

We have
Cov(X, Y ) = E[ (X − µX ) (Y − µY ) ]

= E[XY − µX Y − µY X + µX µY ]

= E[XY ] − µX µY − µY µX + µX µY

= E[XY ] − E[X] E[Y ] .

110
We defined

Cov(X, Y ) ≡ E[ (X − µX ) (Y − µY ) ]
X
= (xk − µX ) (yℓ − µY ) p(xk , yℓ )
k,ℓ

= E[XY ] − E[X] E[Y ] .


NOTE :
Cov(X, Y ) measures ”concordance ” or ”coherence ” of X and Y :

• If X > µX when Y > µY and X < µX when Y < µY then

Cov(X, Y ) > 0 .

• If X > µX when Y < µY and X < µX when Y > µY then

Cov(X, Y ) < 0 .

111
EXERCISE : Prove the following :

• V ar(aX + b) = a2 V ar(X) ,

• Cov(X, Y ) = Cov(Y, X) ,

• Cov(cX, Y ) = c Cov(X, Y ) ,

• Cov(X, cY ) = c Cov(X, Y ) ,

• Cov(X + Y, Z) = Cov(X, Z) + Cov(Y, Z) ,

• V ar(X + Y ) = V ar(X) + V ar(Y ) + 2 Cov(X, Y ) .

112
PROPERTY :

If X and Y are independent then Cov(X, Y ) = 0 .

PROOF :

We have already shown ( with µX ≡ E[X] and µY ≡ E[Y ] ) that

Cov(X, Y ) ≡ E[ (X − µX ) (Y − µY ) ] = E[XY ] − E[X] E[Y ] ,

and that if X and Y are independent then

E[XY ] = E[X] E[Y ] .

from which the result follows.

113
EXERCISE : ( already used earlier · · · )
Probability mass function pX,Y (x, y)
y = 6 y = 8 y = 10 pX (x)
x=1 1
5 0 1
5
2
5
x=2 0 1
5 0 1
5
x=3 1
5 0 1
5
2
5
pY (y) 2
5
1
5
2
5 1
Show that
• E[X] = 2 , E[Y ] = 8 , E[XY ] = 16
• Cov(X, Y ) = E[XY ] − E[X] E[Y ] = 0
• X and Y are not independent

Thus if
Cov(X, Y ) = 0 ,

then it does not necessarily follow that X and Y are independent !

114
PROPERTY :

If X and Y are independent then

V ar(X + Y ) = V ar(X) + V ar(Y ) .

PROOF :

We have already shown (in an exercise !) that

V ar(X + Y ) = V ar(X) + V ar(Y ) + 2 Cov(X, Y ) ,

and that if X and Y are independent then

Cov(X, Y ) = 0 ,

from which the result follows.

115
EXERCISE :

Compute
E[X] , E[Y ] , E[X 2 ] , E[Y 2 ]

E[XY ] , V ar(X) , V ar(Y )

Cov(X, Y )
for

Joint probability mass function pX,Y (x, y)


y = 0 y = 1 y = 2 y = 3 pX (x)
x=0 1
8 0 0 0 1
8
x=1 0 1
8
1
8
1
8
3
8
x=2 0 2
8
1
8 0 3
8
x=3 0 1
8 0 0 1
8
pY (y) 1
8
4
8
2
8
1
8 1

116
EXERCISE :

Compute
E[X] , E[Y ] , E[X 2 ] , E[Y 2 ]

E[XY ] , V ar(X) , V ar(Y )

Cov(X, Y )
for

Joint probability mass function pX,Y (x, y)


y=1 y=2 y=3 pX (x)
x=1 1
3
1
12
1
12
1
2
x=2 2
9
1
18
1
18
1
3
x=3 1
9
1
36
1
36
1
6
pY (y) 2
3
1
6
1
6 1

117

You might also like