Chap3-Conditional Proba and Discrete Random Variables
Chap3-Conditional Proba and Discrete Random Variables
EXAMPLE :
If a coin is tossed two times then what is the probability of two
Heads?
ANSWER : 1
.
4
EXAMPLE :
If a coin is tossed two times then what is the probability of two Heads,
given that the first toss gave Heads ?
ANSWER : 1
.
2
45
NOTE :
• Four suits :
Hearts , Diamonds (red ) , and Spades , Clubs (black) .
46
EXERCISE :
47
The two preceding questions are examples of conditional probability .
defined as
P (EF )
P (E|F ) ≡ .
P (F )
or, equivalently
P (EF ) = P (E|F ) P (F ) ,
48
P (EF )
P (E|F ) ≡
P (F )
E
S S E
F
F
49
P (EF )
P (E|F ) ≡
P (F )
S E S F
50
EXAMPLE : Suppose a coin is tossed two times.
We have
1
P (EF ) P (E) 4 1
P (E|F ) = = = 2 = .
P (F ) P (F ) 4
2
51
EXAMPLE :
Suppose we draw a card from a shuffled set of 52 playing cards.
• What is the probability of drawing a Queen, given that the card
drawn is of suit Hearts ?
ANSWER :
1
P (QH) 52 1
P (Q|H) = = 13 = .
P (H) 52
13
(Here Q ⊂ F , so that QF = Q .)
52
The probability of an event E is sometimes computed more easily
namely, from
53
EXAMPLE :
54
SOLUTION :
Thus
P (C) = P (C|U ) P (U ) + P (C|U c ) P (U c )
4 3 2 7
= +
100 10 100 10
26
= = 2.6% .
1000
55
EXAMPLE :
Two balls are drawn from a bag with 2 white and 3 black balls.
SOLUTION :
Then
c c 1 2 2 3 2
P (S) = P (S|F ) P (F ) + P (S|F ) P (F ) = · + · = .
4 5 4 5 5
56
EXAMPLE : ( continued · · · )
Is it surprising that P (S) = P (F ) ?
w2 w1 , w2 b 1 , w2 b 2 , w2 b 3 ,
b 1 w1 , b 1 w2 , b1 b2 , b1 b3 ,
b 2 w1 , b 2 w2 , b2 b1 , b2 b3 ,
o
b 3 w1 , b 3 w2 , b3 b1 , b3 b2 ,
57
EXAMPLE :
ANSWER :
P (2nd card Q) =
3 4 4 48 204 4 1
= · + · = = = .
51 52 51 52 51 · 52 52 13
58
A useful formula that ”inverts conditioning ” is derived as follows :
P (EF ) = P (E|F ) P (F ) ,
and
P (EF ) = P (F |E) P (E) .
P (EF ) P (E|F ) · P (F )
P (F |E) = = ,
P (E) P (E)
and, using the earlier useful formula, we get
P (E|F ) · P (F )
P (F |E) = ,
P (E|F ) · P (F ) + P (E|F c ) · P (F c )
59
EXAMPLE : Suppose 1 in 1000 persons has a certain disease.
A test detects the disease in 99 % of diseased persons.
The test also ”detects” the disease in 5 % of healthly persons.
With what probability does a positive test diagnose the disease?
SOLUTION : Let
D ∼ ”diseased” , H ∼ ”healthy” , + ∼ ”positive”.
We are given that
P (D) = 0.001 , P (+|D) = 0.99 , P (+|H) = 0.05 .
By Bayes’ formula
P (+|D) · P (D)
P (D|+) =
P (+|D) · P (D) + P (+|H) · P (H)
0.99 · 0.001 ∼
= = 0.0194 (!)
0.99 · 0.001 + 0.05 · 0.999
60
EXERCISE :
Suppose 1 in 100 products has a certain defect.
EXERCISE :
Suppose 1 in 2000 persons has a certain disease.
61
More generally, if the sample space S is the union of disjoint events
S = F1 ∪ F2 ∪ · · · ∪ Fn ,
then for any event E
P (E|Fi ) · P (Fi )
P (Fi |E) = .
P (E|F1 ) · P (F1 ) + P (E|F2 ) · P (F2 ) + · · · + P (E|Fn ) · P (Fn )
EXERCISE :
Machines M1 , M2 , M3 produce these proportions of a article
Production : M1 : 10 % , M2 : 30 % , M3 : 60 % .
Defects : M1 : 4 % , M2 : 3 % , M3 : 2 % .
62
Independent Events
P (EF ) = P (E) P (F ) .
In this case
P (EF ) P (E) P (F )
P (E|F ) = = = P (E) ,
P (F ) P (F )
Thus
63
EXAMPLE : Draw one card from a deck of 52 playing cards.
Counting outcomes we find
12 3
P (Face Card) = 52
= 13
,
13 1
P (Hearts) = 52
= 4
,
3
P (Face Card and Hearts) = 52
,
3
P (Face Card|Hearts) = 13
.
We see that
3
P (Face Card and Hearts) = P (Face Card) · P (Hearts) (= ).
52
Thus the events ”Face Card ” and ”Hearts ” are independent.
64
EXERCISE :
65
EXERCISE : Two numbers are drawn at random from the set
{1, 2, 3, 4}.
X( {i, j} ) = i + j , Y ( {i, j} ) = |i − j| .
(1) X = 5 and Y = 2 ,
(2) X = 5 and Y = 1 .
REMARK :
X and Y are examples of random variables . (More soon!)
66
EXAMPLE : If E and F are independent then so are E and F c .
= P (E) · ( 1 − P (F ) )
= P (E) · P (F c ) .
EXERCISE :
Prove that if E and F are independent then so are E c and F c .
67
NOTE : Independence and disjointness are different things !
E
S S E
F
F
If E and F are independent and disjoint then one has zero probability !
68
Three events E , F , and G are independent if
69
EXERCISE :
Suppose that
9
M1 functions properly with probability 10
,
9
M2 functions properly with probability 10
,
8
M3 functions properly with probability 10
,
and that
70
DISCRETE RANDOM VARIABLES
71
Value-ranges of a random variable correspond to events in S .
72
Value-ranges of a random variable correspond to events in S ,
and
events in S have a probability .
Thus
Value-ranges of a random variable have a probability .
73
NOTATION : We will also write pX (x) to denote P (X = x) .
S = {HHH , HHT , HT H , HT T , T HH , T HT , T T H , T T T } ,
with
X(s) = the number of Heads ,
we have
1
pX (0) ≡ P ( {T T T } ) = 8
3
pX (1) ≡ P ( {HT T , T HT , T T H} ) = 8
3
pX (2) ≡ P ( {HHT , HT H , T HH} ) = 8
1
pX (3) ≡ P ( {HHH} ) = 8
where
pX (0) + pX (1) + pX (2) + pX (3) = 1 . ( Why ? )
74
E3 X(s)
HHH
S
E2 3
HHT
HTH
2
THH
HTT
E1 1
THT
TTH
E0 0
TTT
Graphical representation of X .
75
The graph of pX .
76
DEFINITION :
pX (x) ≡ P (X = x) ,
is called the probability mass function .
DEFINITION :
FX (x) ≡ P (X ≤ x) ,
is called the (cumulative) probability distribution function .
PROPERTIES :
• FX (x) is a non-decreasing function of x . ( Why ? )
• FX (−∞) = 0 and FX (∞) = 1 . ( Why ? )
• P (a < X ≤ b) = FX (b) − FX (a) . ( Why ? )
77
EXAMPLE : With X(s) = the number of Heads , and
S = {HHH , HHT , HT H , HT T , T HH , T HT , T T H , T T T } ,
1 3 3 1
p(0) = 8
, p(1) = 8
, p(2) = 8
, p(3) = 8
,
we have the probability distribution function
F (−1) ≡ P (X ≤ −1) = 0
1
F ( 0) ≡ P (X ≤ 0) = 8
4
F ( 1) ≡ P (X ≤ 1) = 8
7
F ( 2) ≡ P (X ≤ 2) = 8
F ( 3) ≡ P (X ≤ 3) = 1
F ( 4) ≡ P (X ≤ 4) = 1
We see, for example, that
P (0 < X ≤ 2) = P (X = 1) + P (X = 2)
7 1 6
= F (2) − F (0) = 8
− 8
= 8
.
78
The graph of the probability distribution function FX .
79
EXAMPLE : Toss a coin until ”Heads” occurs.
Then the sample space is countably infinite , namely,
S = {H , T H , T T H , T T T H , · · · } .
80
X(s) is the number of tosses until ”Heads” occurs · · ·
81
Joint distributions
The probability mass function and the probability distribution function
can also be functions of more than one variable.
S = {HHH , HHT , HT H , HT T , T HH , T HT , T T H , T T T } ,
we let
X(s) = # Heads , Y (s) = index of the first H (0 for T T T ) .
Then we have the joint probability mass function
pX,Y (x, y) = P (X = x , Y = y) .
For example,
pX,Y (2, 1) = P (X = 2 , Y = 1)
2 1
= 8
= 4
.
82
EXAMPLE : ( continued · · · ) For
S = {HHH , HHT , HT H , HT T , T HH , T HT , T T H , T T T } ,
X(s) = number of Heads, and Y (s) = index of the first H ,
NOTE :
• The marginal probability pX is the probability mass function of X.
• The marginal probability pY is the probability mass function of Y .
83
EXAMPLE : ( continued · · · )
X(s) = number of Heads, and Y (s) = index of the first H .
For example,
84
X(s)
Y(s)
E 31
HHH
S
E 13 3
TTH
E11
HTT
1
E12 THT
E 21 HTH
2
HHT
E 22 THH
0
E 00
TTT
85
DEFINITION :
pX,Y (x, y) ≡ P (X = x , Y = y) ,
is called the joint probability mass function .
DEFINITION :
FX,Y (x, y) ≡ P (X ≤ x , Y ≤ y) ,
is called the joint (cumulative) probability distribution function .
86
EXAMPLE : Three tosses : X(s) = # Heads, Y (s) = index 1st H .
Joint probability mass function pX,Y (x, y)
y=0 y=1 y=2 y=3 pX (x)
1 1
x=0 8 0 0 0 8
1 1 1 3
x=1 0 8 8 8 8
2 1 3
x=2 0 8 8 0 8
1 1
x=3 0 8 0 0 8
1 4 2 1
pY (y) 8 8 8 8 1
87
In the preceding example :
Joint probability mass function pX,Y (x, y)
y=0 y=1 y=2 y=3 pX (x)
1 1
x=0 8 0 0 0 8
1 1 1 3
x=1 0 8 8 8 8
2 1 3
x=2 0 8 8 0 8
1 1
x=3 0 8 0 0 8
1 4 2 1
pY (y) 8 8 8 8 1
QUESTION : Why is
P (1 < X ≤ 3 , 1 < Y ≤ 3) = F (3, 3) − F (1, 3) − F (3, 1) + F (1, 1) ?
88
EXERCISE :
Roll a four-sided die (tetrahedron) two times.
(The sides are marked 1 , 2 , 3 , 4 .)
Suppose each of the four sides is equally likely to end facing down.
Suppose the outcome of a single roll is the side that faces down ( ! ).
89
EXERCISE :
90
Independent random variables
91
X(s)
Y(s)
E 31
HHH
S
E 13 3
TTH
E11
HTT
1
E12 THT
E 21 HTH
2
HHT
E 22 THH
0
E 00
TTT
92
RECALL :
EXERCISE :
Let
X be the result of the 1st roll ,
and
Y the result of the 2nd roll .
93
EXERCISE :
94
EXERCISE : Are these random variables X and Y independent ?
95
PROPERTY :
The joint distribution function of independent random variables
X and Y satisfies
FX,Y (x, y) = FX (x) · FY (y) , for all x, y .
PROOF :
FX,Y (xk , yℓ ) = P (X ≤ xk , Y ≤ yℓ )
P P
= i≤k j≤ℓ pX,Y (xi , yj )
P P
= i≤k j≤ℓ pX (xi ) · pY (yj ) (by independence)
P P
= i≤k { pX (xi ) · j≤ℓ pY (yj ) }
P P
= { i≤k pX (xi ) } · { j≤ℓ pY (yj ) }
= FX (xk ) · FY (yℓ ) .
96
Conditional distributions
Then
P (Ex Ey ) pX,Y (x, y)
P (Ex |Ey ) ≡ = .
P (Ey ) pY (y)
97
X(s)
Y(s)
E 31
HHH
S
E 13 3
TTH
E11
HTT
1
E12 THT
E 21 HTH
2
HHT
E 22 THH
0
E 00
TTT
98
EXAMPLE : (3 tosses : X(s) = # Heads, Y (s) = index 1st H.)
Joint probability mass function pX,Y (x, y)
y = 0 y = 1 y = 2 y = 3 pX (x)
x=0 1
8 0 0 0 1
8
x=1 0 1
8
1
8
1
8
3
8
x=2 0 2
8
1
8 0 3
8
x=3 0 1
8 0 0 1
8
pY (y) 1
8
4
8
2
8
1
8 1
pX,Y (x,y)
Conditional probability mass function pX|Y (x|y) = pY (y)
.
y=0 y=1 y=2 y=3
x=0 1 0 0 0
x=1 0 2
8
4
8 1
x=2 0 4
8
4
8 0
x=3 0 2
8 0 0
1 1 1 1
pX,Y (x,y)
EXERCISE : Also construct the Table for pY |X (y|x) = pX (x)
.
99
EXAMPLE :
Joint probability mass function pX,Y (x, y)
y = 1 y = 2 y = 3 pX (x)
x=1 1
3
1
12
1
12
1
2
x=2 2
9
1
18
1
18
1
3
x=3 1
9
1
36
1
36
1
6
pY (y) 2
3
1
6
1
6 1
pX,Y (x,y)
Conditional probability mass function pX|Y (x|y) = pY (y)
.
y=1 y=2 y=3
x=1 1
2
1
2
1
2
x=2 1
3
1
3
1
3
x=3 1
6
1
6
1
6
1 1 1
100
Expectation
The expected value of a discrete random variable X is
X X
E[X] ≡ xk · P (X = xk ) = xk · pX (xk ) .
k k
101
EXAMPLE : Toss a coin until ”Heads” occurs. Then
S = {H , T H , T T H , T T T H , · · · } .
102
The expected value of a function of a random variable is
X
E[g(X)] ≡ g(xk ) p(xk ) .
k
EXAMPLE :
The pay-off of rolling a die is $k2 , where k is the side facing up.
What should the entry fee be for the betting to break even?
103
The expected value of a function of two random variables is
XX
E[g(X, Y )] ≡ g(xk , yℓ ) p(xk , yℓ ) .
k ℓ
1 1 1 5
E[X] = 1· 2
+ 2· 3
+ 3· 6
= 3
,
2 1 1 3
E[Y ] = 1· 3
+ 2· 6
+ 3· 6
= 2
,
1 1 1
E[XY ] = 1· 3
+ 2· 12
+ 3· 12
2 1 1
+ 2· 9
+ 4· 18
+ 6· 18
1 1 1 5 ( So ? )
+ 3· 9
+ 6· 36
+ 9· 36
= 2
.
104
PROPERTY :
PROOF :
P P
E[XY ] = k ℓ xk yℓ pX,Y (xk , yℓ )
P P
= k ℓ xk yℓ pX (xk ) pY (yℓ ) (by independence)
P P
= k{ xk pX (xk ) ℓ yℓ pY (yℓ )}
P P
= { k xk pX (xk )} · { ℓ yℓ pY (yℓ )}
= E[X] · E[Y ] .
105
PROPERTY : E[X + Y ] = E[X] + E[Y ] . ( Always ! )
PROOF :
P P
E[X + Y ] = k ℓ (xk + yℓ ) pX,Y (xk , yℓ )
P P P P
= k ℓ xk pX,Y (xk , yℓ ) + k ℓ yℓ pX,Y (xk , yℓ )
P P P P
= k ℓ xk pX,Y (xk , yℓ ) + ℓ k yℓ pX,Y (xk , yℓ )
P P P P
= k {xk ℓ pX,Y (xk , yℓ )} + ℓ{ yℓ k pX,Y (xk , yℓ )}
P P
= k {xk pX (xk )} + ℓ {yℓ pY (yℓ )}
= E[X] + E[Y ] .
106
EXERCISE :
Probability mass function pX,Y (x, y)
y = 6 y = 8 y = 10 pX (x)
x=1 1
5 0 1
5
2
5
x=2 0 1
5 0 1
5
x=3 1
5 0 1
5
2
5
pY (y) 2
5
1
5
2
5 1
Show that
Thus if
E[XY ] = E[X] E[Y ] ,
107
Variance and Standard Deviation
Let X have mean
µ = E[X] .
We have
V ar(X) = E[X 2 − 2µX + µ2 ]
= E[X 2 ] − 2µE[X] + µ2
= E[X 2 ] − 2µ2 + µ2
= E[X 2 ] − µ2 .
108
The standard deviation of X is
p p p
σ(X) ≡ V ar(X) = E[ (X − µ)2 ] = E[X 2 ] − µ2 .
1 6(6 + 1)(2 · 6 + 1) 7 2 35
= − ( ) = .
6 6 2 12
109
Covariance
Let X and Y be random variables with mean
E[X] = µX , E[Y ] = µY .
We have
Cov(X, Y ) = E[ (X − µX ) (Y − µY ) ]
= E[XY − µX Y − µY X + µX µY ]
= E[XY ] − µX µY − µY µX + µX µY
110
We defined
Cov(X, Y ) ≡ E[ (X − µX ) (Y − µY ) ]
X
= (xk − µX ) (yℓ − µY ) p(xk , yℓ )
k,ℓ
Cov(X, Y ) > 0 .
Cov(X, Y ) < 0 .
111
EXERCISE : Prove the following :
• V ar(aX + b) = a2 V ar(X) ,
• Cov(X, Y ) = Cov(Y, X) ,
• Cov(cX, Y ) = c Cov(X, Y ) ,
• Cov(X, cY ) = c Cov(X, Y ) ,
112
PROPERTY :
PROOF :
113
EXERCISE : ( already used earlier · · · )
Probability mass function pX,Y (x, y)
y = 6 y = 8 y = 10 pX (x)
x=1 1
5 0 1
5
2
5
x=2 0 1
5 0 1
5
x=3 1
5 0 1
5
2
5
pY (y) 2
5
1
5
2
5 1
Show that
• E[X] = 2 , E[Y ] = 8 , E[XY ] = 16
• Cov(X, Y ) = E[XY ] − E[X] E[Y ] = 0
• X and Y are not independent
Thus if
Cov(X, Y ) = 0 ,
114
PROPERTY :
PROOF :
Cov(X, Y ) = 0 ,
115
EXERCISE :
Compute
E[X] , E[Y ] , E[X 2 ] , E[Y 2 ]
Cov(X, Y )
for
116
EXERCISE :
Compute
E[X] , E[Y ] , E[X 2 ] , E[Y 2 ]
Cov(X, Y )
for
117