Solutions - Week 5
Solutions - Week 5
1. (1) Pr(X=2)=0.04+0.06+0.09+0.04+0.03=0.26
(2) Pr(Y≥3)=0.01+0.05+0.04+0.03+0.01+0.02+0.03+0.04
=0.23
(3) Pr(X≤2 and Y≤2)=0.08+0.06+0.04+0.07+0.11+0.06+0.06+0.12+0.09=0.69
(4) We sum up the probabilities of (0,0), (1,1), (2,2), and (3,3):
Pr(X=Y)=0.08+0.11+0.09+0.03=0.31
(5) We sum up the probabilities of (1,0), (2,0), (3,0), (2,1), (3,1) and (3,2): Pr(X¿
Y)=0.06+0.04+0.02+0.06+0.03+0.03=0.24
(2) # similarly,
sum(Q1$probabilities[Q1$y>=3])
0.23
(4) sum(Q1$probabilities[Q1$x==Q1$y])
0.31
(5) sum(Q1$probabilities[Q1$x>Q1$y])
0.24
2.
(1) Pr(X=4,Y=0)=0.01, where 200 is the total # of observations
(2) Pr ( X=4 ) =0.21
(3) Pr(Y=0|X=4)= Pr (4,0)/ Pr (X=4)= 0.0476
Pr (Y= 1| X= 4)= Pr (4,1)/ Pr (X=4)= 0.5952
Pr (Y= 2| X= 4)= Pr (4,2)/ Pr (X=4)= 0.2381
R command
# Load W5-Question 2.csv
Q2<-read.csv(file.choose(),header=T)
# to calculate the total number of observations
sum(Q2$frequencies)
200
# Thus,
Q2$probabilities <- Q2$frequencies / 200
#We didn’t use “sum” here (although definitely you may) because there is only one
entry satisfying the two conditions in a bivariate distribution.
Q2$probabilities[Q2$x==4 & Q2$y==0]
0.01
sum(Q2$probabilities[Q2$x==4 & Q2$y==0])
0.01
sum(Q2$probabilities[Q2$x==4 ])
0.21
#conditional probabilities
P04=(Q2$probabilities[Q2$y==0 & Q2$x==4])/sum(Q2$probabilities[Q2$x==4])
P14=(Q2$probabilities[Q2$y==1 & Q2$x==4])/sum(Q2$probabilities[Q2$x==4])
P24=(Q2$probabilities[Q2$y==2 & Q2$x==4])/sum(Q2$probabilities[Q2$x==4])
P34=(Q2$probabilities[Q2$y==3 & Q2$x==4])/sum(Q2$probabilities[Q2$x==4])
#conditional expectation
EY_X4=sum(c(0,1,2,3)*c(P04,P14,P24,P34))
5.
(1) The conditional probability distribution Pr(X|Y=20) is the probabilities of the
three weather conditions if the temperature is known to be 20 degrees. It is
important to bear in mind that, when considering “Pr(X|Y=20),” we have taken
the position that Y is known with a given value of 20. Thus, Y should be thought
of as a constant (since it can’t vary), instead of an RV.
Tutorial Solutions 5 Page 2
On the other hand, Pr(X) is the unconditional probabilities of the three weather
conditions. I illustrate the difference with a diagram of the sample space. Pr(X)
describes the relative size of the three rectangles, whereas Pr(X|Y=20) describes
the relative size of the 3 partitions in the ellipse.
Statistical independence requires that the unconditional distribution is the same as the
conditional distribution, Pr(X)=Pr(X|Y) (or Pr(Y)=Pr(Y|X)), for all values of Y and X.
Evidently, we already found evidence that Pr(X) is unequal to Pr(X|Y).
(e) If the scientists just were to consider only the (X=1) column, they would only
know, among the cancer patients, the % smokers Pr(Y=1|X=1) and % non-smokers
Pr(Y=0|X=1). They would also need to know and check whether it is true Pr(Y=1|X=1)
= Pr(Y=1|X=0) and true Pr(Y=0|X=1) = Pr(Y=0|X=0) before they can conclude the two
variable are independent.
For example, had the (X=0) column also had 688 and 21, they would have concluded
that X and Y are independent (P(Y)= P(Y|X)).