0% found this document useful (0 votes)
13 views3 pages

Solutions - Week 5

Uploaded by

pranav.garg1006
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views3 pages

Solutions - Week 5

Uploaded by

pranav.garg1006
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Tutorial Solutions :5

Topics : Bivariate Distributions and Correlations

1. (1) Pr(X=2)=0.04+0.06+0.09+0.04+0.03=0.26
(2) Pr(Y≥3)=0.01+0.05+0.04+0.03+0.01+0.02+0.03+0.04
=0.23
(3) Pr(X≤2 and Y≤2)=0.08+0.06+0.04+0.07+0.11+0.06+0.06+0.12+0.09=0.69
(4) We sum up the probabilities of (0,0), (1,1), (2,2), and (3,3):
Pr(X=Y)=0.08+0.11+0.09+0.03=0.31
(5) We sum up the probabilities of (1,0), (2,0), (3,0), (2,1), (3,1) and (3,2): Pr(X¿
Y)=0.06+0.04+0.02+0.06+0.03+0.03=0.24

R commands (must learn):


We can use the following R-commands to automate the above manual calculations
(1) Q1<-read.csv(file.choose(),header=T) # load W5-Question 1.csv. Note that the
structure of the table is different from that of the table in the question
# we only want to get Pr(X=2), we use the conditional sum, where the square
brackets
[] are used for “indexing”:
sum(Q1$probabilities[Q1$x==2])

(2) # similarly,
sum(Q1$probabilities[Q1$y>=3])

0.23

(3) sum(Q1$probabilities[Q1$y<=2 & Q1$x<=2])


0.69
# In the above command, we use the “&” to impose joint conditions, like the ∩
we learned in week 1. The indexing method is very flexible and intuitive. See
the following two examples:

(4) sum(Q1$probabilities[Q1$x==Q1$y])
0.31
(5) sum(Q1$probabilities[Q1$x>Q1$y])
0.24
2.
(1) Pr(X=4,Y=0)=0.01, where 200 is the total # of observations
(2) Pr ( X=4 ) =0.21
(3) Pr(Y=0|X=4)= Pr (4,0)/ Pr (X=4)= 0.0476
Pr (Y= 1| X= 4)= Pr (4,1)/ Pr (X=4)= 0.5952
Pr (Y= 2| X= 4)= Pr (4,2)/ Pr (X=4)= 0.2381

Tutorial Solutions 5 Page 1


Pr (Y= 3| X= 4)= Pr (4,3)/ Pr (X=4)= 0.1191

(4) E(Y| X= 4)=0×0.0476+1×0.5952+2×0.2381+3×0.1191=1.4287

R command
# Load W5-Question 2.csv
Q2<-read.csv(file.choose(),header=T)
# to calculate the total number of observations
sum(Q2$frequencies)
200
# Thus,
Q2$probabilities <- Q2$frequencies / 200
#We didn’t use “sum” here (although definitely you may) because there is only one
entry satisfying the two conditions in a bivariate distribution.
Q2$probabilities[Q2$x==4 & Q2$y==0]
0.01
sum(Q2$probabilities[Q2$x==4 & Q2$y==0])
0.01

sum(Q2$probabilities[Q2$x==4 ])
0.21
#conditional probabilities
P04=(Q2$probabilities[Q2$y==0 & Q2$x==4])/sum(Q2$probabilities[Q2$x==4])
P14=(Q2$probabilities[Q2$y==1 & Q2$x==4])/sum(Q2$probabilities[Q2$x==4])
P24=(Q2$probabilities[Q2$y==2 & Q2$x==4])/sum(Q2$probabilities[Q2$x==4])
P34=(Q2$probabilities[Q2$y==3 & Q2$x==4])/sum(Q2$probabilities[Q2$x==4])
#conditional expectation
EY_X4=sum(c(0,1,2,3)*c(P04,P14,P24,P34))

3. Var(X+Y)=Var(X)+Var(Y)+2Cov(X,Y) and Var(X-Y)=Var(X)+Var(Y)-2Cov(X,Y).


Since Cov(X,Y)<0, it follows that Var(X+Y)<Var(X-Y).

4. Cov ( X , Y ) =ρ ( X ,Y ) σxσy =−0.15 ×3 ×2=−0.9


(1) Var(X+Y)=Var(X)+Var(Y)+2Cov(X,Y)=11.2
(2) Var(2X-3Y+4)=4Var(X)+9Var(Y) – 2*2×3×Cov(X,Y)= 82.8
(3) Var(-X-Y)=Var(-(X+Y))=11.2.

5.
(1) The conditional probability distribution Pr(X|Y=20) is the probabilities of the
three weather conditions if the temperature is known to be 20 degrees. It is
important to bear in mind that, when considering “Pr(X|Y=20),” we have taken
the position that Y is known with a given value of 20. Thus, Y should be thought
of as a constant (since it can’t vary), instead of an RV.
Tutorial Solutions 5 Page 2
On the other hand, Pr(X) is the unconditional probabilities of the three weather
conditions. I illustrate the difference with a diagram of the sample space. Pr(X)
describes the relative size of the three rectangles, whereas Pr(X|Y=20) describes
the relative size of the 3 partitions in the ellipse.

(2) E(Y|X=”sunny”) is the average temperature of a sunny day.


(3) Assuming the ice-tea business is better in hot weather (a positive correlation),
we can expect E(Z|Y=35) to be greater than E(Z). At least, we would suspect Z
and Y are correlated and are not independent. Mathematically, a correlation
between them implies E(Z|Y=35) is unequal to E(Z). In other words, if Z and Y
were independent, we would see E(Z|Y=35)= E(Z), i.e., the temperature does not
play a role.

6. (a) Pr(X=1)=Pr(X=0)=709/(709+709)=0.5.That is, a random patient from the study has


an equal chance to be a lung cancer patient and not.
(b) Pr(Y=1)= (688+650)/1418 = 0.94, and Pr(Y=0)=(21+59)/1418= 0.06.
(c) Pr(X=1|Y=1)= Pr( X=1 & Y=1)/Pr(Y=1) = (688/1418)/0.94=0.52
Pr(X=1|Y=0)= Pr( X=1 & Y=0)/Pr(Y=0) = (21/1418)/0.06=0.25
(d) Since Pr(X=1|Y=1) is unequal to Pr(X=1|Y=0), the data suggest X and Y are
unlikely to be independent. Patients with a smoking history have a much higher risk
of getting lung cancer (52% vs 25%).

Statistical independence requires that the unconditional distribution is the same as the
conditional distribution, Pr(X)=Pr(X|Y) (or Pr(Y)=Pr(Y|X)), for all values of Y and X.
Evidently, we already found evidence that Pr(X) is unequal to Pr(X|Y).

(e) If the scientists just were to consider only the (X=1) column, they would only
know, among the cancer patients, the % smokers Pr(Y=1|X=1) and % non-smokers
Pr(Y=0|X=1). They would also need to know and check whether it is true Pr(Y=1|X=1)
= Pr(Y=1|X=0) and true Pr(Y=0|X=1) = Pr(Y=0|X=0) before they can conclude the two
variable are independent.

For example, had the (X=0) column also had 688 and 21, they would have concluded
that X and Y are independent (P(Y)= P(Y|X)).

Tutorial Solutions 5 Page 3

You might also like