0% found this document useful (0 votes)
78 views98 pages

Probability Theory Lecture Notes Phanuel Mariano

This document contains lecture notes on probability theory covering various topics: 1) Combinatorics including counting principles, permutations, and combinations 2) Axioms of probability including sample spaces, events, and equally likely outcomes 3) Independence including independent events 4) Conditional probability and Bayes' formula 5) Random variables including discrete, continuous, expected value, and common distributions

Uploaded by

Parth Raut
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
78 views98 pages

Probability Theory Lecture Notes Phanuel Mariano

This document contains lecture notes on probability theory covering various topics: 1) Combinatorics including counting principles, permutations, and combinations 2) Axioms of probability including sample spaces, events, and equally likely outcomes 3) Independence including independent events 4) Conditional probability and Bayes' formula 5) Random variables including discrete, continuous, expected value, and common distributions

Uploaded by

Parth Raut
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 98

Probability Theory Lecture Notes

Phanuel Mariano
Contents
Chapter 1. Combinatorics 5
1.1. Counting Principle 5
1.2. Permutations 6
1.3. Combinations 7
1.4. Multinomial Coecients 9

Chapter 2. Axioms of Probability 10


2.1. Sample Space and Events 10
2.2. Axioms of Probability 12
2.3. Equally Likely Outcomes 15

Chapter 3. Independence 18
3.1. Independent Events 18

Chapter 4. Conditional Probability and Independence 22


4.1. Conditional Probabilities 22
4.2. Bayes's Formula 26

Chapter 5. Random Variables 29


5.1. Random Variables 29
5.2. Discrete Random Variables 31
5.3. Expected Value 32
5.4. The C.D.F. 34
5.5. Expectated Value of Sums of Random Variables 37
5.6. Expectation of a Function of a Random Variable 39
5.7. Variance 41

Chapter 6. Some Discrete Distributions 43


6.1. Bernouli and Binomial Random Variables 43
6.2. The Poisson Distribution 46
6.3. Other Discrete Distributions 49

Chapter 7. Continuous Random Variables 53


7.1. Intro to continuous R.V 53
7.2. Expectation and Variance 55
7.3. The uniform Random Variable 58
7.4. More practice 59

Chapter 8. Normal Distributions 60

3
CONTENTS 4

8.1. The normal distribution 60

Chapter 9. Normal approximations to the binomial 64


9.1. The normal approximates Binomial 64

Chapter 10. Some continuous distributions 66


10.1. Exponential Random Variables 66
10.2. Other Continuous Distributions 69
10.3. The distribution function of a Random variable 70

Chapter 11. Multivariate distributions 74


11.1. Joint distribution functions 74
11.2. Independent Random Variables 78
11.3. Sums of independent Random Variables(?) 80
11.4. Conditional Distributions- Discrete(?) 82
11.5. Conditional Distributions- Continuous(?) 83
11.6. Joint PDF of functions 84

Chapter 12. Expectations 87


12.1. Expectation of Sums of R.V. 87
12.2. Covariance and Correlations. 90

Chapter 13. Moment generating functions 93


13.1. Moment Generating Functions 93

Chapter 14. Limit Laws 97


14.1. The Central Limit Theorem 97
CHAPTER 1

Combinatorics
1.1. Counting Principle

• We need a way to help us count faster rather than counting by hand one by one.

Fact. ( Basic Counting Principle) Suppose 2 experiments are to be performed.


If one experiement can result in m possibilities
Second experiment can result in n possibilities
Then together there are mn possibilities

• I like to use the box method. For example. Each box represent the number of possibilities in that
experiement.
• Example1: There are 20 teachers and 100 students in a school. How many ways can we pick a
teacher and student of the year?
 Solution: Use the box Method: 20 × 100 = 2000.
• The counting principle can be generalized to any amount of experiments: n1 · · · nr possibilities
• Example2:
 A college planning committee consists of 3 freshmen, 4 sophomores, 5 juniors, and 2 seniors.
 A subcomittee of 4 consists 1 person from each class. How many?
 Solution: Box method 3 × 4 × 5 × 2 = 120.
• Example3: How many dieren 6−place license plates are possible if the rst 3 places are to be
occupied by letters and the nals 3 by numbers?
 Solution: 26 · 26 · 26 · 10 · 10 · 10 =?
 Question: What if no repetition is allowed?
 Solution:26 · 25 · 24 · 10 · 9 · 8
• Example4: How many functions dened on n points are possible if each functional value is either
0 or 1.
 Solution: Box method on the 1, . . . , n points gives us 2n possible functions.

5
1.2. PERMUTATIONS 6

1.2. Permutations

• How many dierent ordered arrangements of the letters a, b, c are possible?


 abc, acb, bac, bca, cab Each arrangement is a permutation
 Can also use the box method to gure this out: 3 · 2 · 1 = 6.

Fact. With n objects. There are

n (n − 1) · · · 3 · 2 · 1 = n!
dierent permutations of the n objects.
(?) Note that ORDER matters when it comes to Permutations

• Example1: What is the numnber of possible batting order with 9 players?


 Answer: 9!(Box Method or permutations)
• Example2: How many ways can one arrange 4 math books, 3 chemistry books, 2 physics books,
and 1 biology book on a bookshelf so that all the math books are together, all the chemistry books
are together, and all the physics books are together.
 Answer: We can arrange the math books in 4! ways, the chemistry in 3! ways, the physics
in 2! ways, and B in 1! = 1 way.
 But we also have to decide which set of books go on the left, which next, and so on. That is
the same as the number of ways of arranging the letters M,C, P,B, and there are 4! ways of
doing that. M CP B , P BP B ect..
 So 4! (4!3!2!1!) ways.
• Example3: Repetitions: How many ways can one arrange the letters a, a, b, c?
 Let us label them A, a, b, c. There are 4!, or 24, ways to arrange these letters. But we have
repeats: we could have Aa or aA. So we have a repeat for each possibility (so divide!!!), and
so the answer should be 4!/2! = 12.
 If there were 3 a's, 4 b's, and 2c's, we would have

9!
3!4!2!
• Example4: How many dierent letter arrangements can be formed from the word PEPPER?
6!
 Answer: There 3 P 's 2 E 's and one R. So
3!2!1! = 30.
Fact. There are
n!
n1 ! · · · nr !
dierent permutations of n objects of which n1 are alike, n2 are alike, nr are alike.

• Example4: Suppose there are 4 Czech tennis players, 4 U.S. players, and 3 Russian players, in
how many ways could they be arranged?
11!
 Answer: 4!4!3! .
1.3. COMBINATIONS 7

1.3. Combinations

• We are often interested in selecting r objects from a total of n objects.


• How many ways can we choose 3 letters out of 5? (Does order matter here? NO)If the letters are
a, b, c, d, e then there would be 5 for the rst position, 4 for the second, and 3 for the third, for a
total of 5 × 4 × 3. But order doesn't matter here. So we're over counting here....
 But suppose the letters selected were a, b, c. If order doesn't matter, we will have the letters
a, b, c 3! = 6 times, because there are 3! ways of arranging a group of 3. The same is true for
any choice of three letters. So we should have

5·4·3 5!
= = 10.
3! 3!2!
Or what we did was 5 · 4, or n(n − 1) · · · (n − r + 1) then divided by the repeats 3!.
 
5
 This is often written , read  5 choose 3. More generally..
3
Fact. If r ≤ n, then  
n n!
=
r (n − r)!r!
and say n choose r, represents the number of possible combinations of objects taken r at a time.
(?) Order DOES NOT Matter here

• Recall in Permutations order did matter.


• Example1:  How many ways can one choose a committee of 3 out of 10 people?

10 10!
 Answer: = 3!7! = 10·9·8
3·2 = 10 · 3 · 4 = 120.
3
• Example2: Suppose there are 9 men and 8 women. How many ways can we choose a committee
that has 2 men and 3 women?    
9 8
 Answer: We can choose 2 men in ways and 3 women in ways. The number of
 2   3
9 8
committees is then the product · .
2 3
• Example3:A person has 8 friends, of whom 5 will be invited to a party. (We've all been through
this)
 (a) How many choices are there if 2 of the friends are feuding and will not attend together?
∗ Box
  it: [none] + [ one of them] [others]
  
6 2 6
∗ + · (recall that when we have OR, use +)
5 1 4
 (b) How many choices if 2 of the friends will only attend together?
∗ 
Box it: [none] + [with both]
  
6 6
∗ +1·1·
5 3
 
n
• The value of are called binomials coecients because of their prominence in the binomial
r
theorem.

Theorem. (The Binomial Theorem)


n  
n
X n
(x + y) = xk y n−k .
k
k=0
1.3. COMBINATIONS 8

Proof. To see this, the left hand side is (x + y)(x + y) · · · (x + y). This will be the sum of 2n terms,
and each term will have n factors. How many terms have k x's and n − k y 's? This is the same as asking
in a sequence of n positions, how many ways can one choose k of them in which to put x's? (Box it) The
   
n k n−k n
answer is , so the coecient of x y should be . 
k k
3
• Example: Expand(x + y) .
3
 Solution: (x + y) = y 3 + 3xy 2 + 3x2 y + x3 .
• Problem: Using Combinatorics: Let's prove
     
10 9 9
= +
4 3 4
with no algebra:
 The LHS represents the number of committees having 4 people out of the 10.
 Let's say the President of the university will be in one of these committees and he's special,
so we want to know when he'll be there
 or  not.
9
 When he's there, then there are 1· is the number of ways that contain the President
  3
9
while is the number of comittees that do not contain the President and contain 4 out
4
of the remaining people.
• The more general equation is
     
n n−1 n−1
= +
r r−1 r
1.4. MULTINOMIAL COEFFICIENTS 9

1.4. Multinomial Coecients

• Example: Suppose one has 9 people and one wants to divide them into one committee of 3, one
of 4, and a last of 2. How many dierent ways are there?
 
9
 Solution: (Box it) There are ways of choosing the rst committee. Once that is done,
3
 
6
there are 6 people left and there are ways of choosing the second committee. Once
4
that is done, the remainder must go in the third committee. So there is 1 one to choose that.
So the answer is
9! 6! 9!
= .
3!6! 4!2! 3!4!2!
• In general: Divide n objects into one group of n1 , one group of n2 , . . . and a k th group of nk ,
where n = n1 + · · · + nk , the answer is there are

n!
ways.
n1 !n2 ! · · · nk !
• These are known as multinomial coecients. We write them as
 
n n!
= .
n1 , n2 , . . . , nk n1 !n2 ! · · · nk !
• Example: Suppose we are to assign Police ocers their duties . Out of 10 ocers: 6 patrols, 2 in
station, 2 in schools.
10!
 Answer: 6!2!2! .
• Example: There are 10 ags:5 indistinguishable Blue ags, 3 indistinguishable Red ags, and 2
indistinguishable Yellow ags. How may dierent ways can we order them on a ag pole?
10!
 Answer: 5!3!2! .
• Example: Suppose one has 8 indistinguishable balls. How many ways can one put them in 3
boxes?
 Solution1: Let us make sequences of o's and |'s; any such sequence that has | at each side, 2
other |'s, and 8 o's represents a way of arranging balls into boxes. For example, if one has

| oo | ooo | ooo | .
 How many dierent ways can we arrange this where we have start with | and end with |. In
between, we are only arranging 8 + 2 = 10 symbols, of which only 8 are o's
 So the question is: How many ways out of 10 spaces can one pick 8 of them into which to
put an
 o?
10
 .
8  
9
 Solution2: Look at spaces between. There are 9 spaces. So + 9.
2
CHAPTER 2

Axioms of Probability
2.1. Sample Space and Events

• We will have a sample space, denoted S (sometimes Ω, or U ) that consists of all possible outcomes
from an experiment.
 Example1:
∗ Experiment: Roll two dice,
∗ Sample Space: S = would be all possible pairs made up of the numbers one through six.
List it here.{(i, j) : i, j = 1, . . . 6}. 36 points.
 Example 2:
∗ Experiment: Toss a coin twice
∗ S = {HH, HT, T H, T T }}
 Example3:
∗ Experiment: Measuring the number of accidents of a random person before they had
turn 18.
· S = {0, 1, 2, . . . }
 Others:
∗ Let S be the possible orders in which 5 horses nish in a horse race;
∗ Let S be the possible price of some stock at closing time today; or S = [0, ∞) ;
∗ The age at which someone dies, S = [0, ∞) .
• Events: An event A is a subset of S . In this case we use the notation A ⊂ S , to mean A is a
subset of S.
 A ∪ B : points in S such that is in A OR B OR BOTH.
 A ∩ B , points in A AND B . (you may also see AB )
0
 Ac is the compliment of A, the points Sn NOT in A T.n(you may also see A )
 Can extend to A1 , . . . , An events. i=1 Ai and i=1 Ai .

10
2.1. SAMPLE SPACE AND EVENTS 11


• Example1: Roll two dice.
 Example of an Events
 E =the two dies come up even and equal {(2, 2) , (4, 4) , (6, 6)}
 F = the sum of the two dice is 8. {(2, 6) , (3, 5) , (4, 4) , (5, 3) , (6, 2)}.
 E ∪ F = {(2, 2) , (2, 6) , (3, 5) , (4, 4) , (5, 3) , (6, 2) , (6, 6)}
 E ∩ F = {(4, 4)}.
 F c all the 31 other ways that does not include {(2, 6) , (3, 5) , (4, 4) , (5, 3) , (6, 2)}.
• Example2: S = [0, ∞) age someone dies.
 Event A = person dies before they reached 30.
∗ A = [0, 30).
 Interpret Ac = [30, ∞)
∗ The person dies after they turned 30.
 B = (15, 45). Do A ∪ B, A ∩ B and so on.
• Properties: Events also have commutative and associate and Distributive laws.
• What is A ∪ Ac ? = S .
• DeMorgan's Law:
c
 (A ∪ B) = Ac ∩ B c .Try to draw a picture
c
 (A ∩ B) = Ac ∪ B c .
c c
 This works for general A1 , . . . , An : (∪n n c n n c
i=1 Ai ) = ∩i=1 Ai and (∩i=1 Ai ) = ∪i=1 Ai .
• The empty set ∅ = {} is the set that has nothing in it.
• A and B are disjoint if A ∩ B = ∅.
 In Probability we may say that events A and B are mututally exclusive if they are disjoint.
 mutually exclusive means the same thing as disjoint
2.2. AXIOMS OF PROBABILITY 12

2.2. Axioms of Probability

• Let E be an event. How do we dened the probability of an event?


 We can attempt to dene a probability by the relative frequency,
 Perform an experiment (e.g. Flipping a coin)
 Perform that experiment n times and let n(E) = the number of times the event occured in n
repetitions
∗ (e.g. Flip a coin n = 1000 times, and let's say that n ({T ails}) = 551 ) Then it's
551
reasonanble to think P ({T ails}) ≈ 1000
n(E)
 So maybe we can dene the probability of an event as P (E) = limn→∞ n . But we don't
know if this limit exists, or if n(E) is even well dened!!!
 So we need a new approach.
• Probability will be a rule given by the following Axioms (Laws that we all agree on)
 A probability will be a function P (E) where the input is a set/event such that
 Axiom 1: 0 ≤ P (E) ≤ 1 for all events E .
 Axiom 2: P (S) = 1.
 Axiom 3: (disjoint property) If the events E1 , E2 , . . . are pairwise disjoint/mutually exclusive
then
∞ ∞
!
[ X
P Ei = P (Ei ) .
i=1 i=1
∗ Mutually exclusive means that Ei ∩ Ej = ∅ when i 6= j .
• Remark: Note that you take a probability of a subset of S , not of points of S. However it is
common to write P (x) for P ({x}).
 Say if the experiment is tossing a xoin. Then S = {H, T }. The probability of heads should be
written as P ({H}), but it is common to see P (H).
• Example1:
 (a) Suppose we toss a coin and they are equally likely then S = {H, T } and
1 1
∗ P ({H}) = P ({T }) =2 . We may write P (H) = P (T ) = .
2
 (b) If biased coin is tosse then one could have a dierent assignment of probability P (H) =
2 1
3 , P (T ) = 3 .
• Example2:
 Rolling a fair die, the probability space consists of S = 1, 2, 3, 4, 5, 6, each point having prob-
1
ability
6.
 We can compute the probability of rolling an even number by

P ({even}) = P ({2, 4, 6})


1
= P(2) + P (4) + P (6) =
2
where we used the rules of probability by breaking it down into a sum.

Proposition 1. (a) P (∅) = 0 Pn


(b) If A1 , . . . , An are pairwise disjoint, P (∪ni=1 Ai ) = i=1 P (Ai ).
(c) P(E c ) = 1 − P(E).
(d) If E ⊂ F, then P (E) ≤ P (F ).
(e) P (E ∪ F ) = P(E) + P (F ) − P(E ∩ F ).
• It helps to draw diagrams to prove these.
2.2. AXIOMS OF PROBABILITY 13

• Try to prove at least some of these yourself.

Proof. (a) Let Ai = ∅ for each i which are disjoint. So


∞ ∞ ∞
!
[ X X
P (∅) = P Ai = P (Ai ) = P (∅) ,
i=1 i=1 i=1

since this would be innite sum so that P (∅) = 0 since 0 ≤ P (∅) ≤ 1.


(b) Let An+1 = An+2 = · · · = ∅ so that ∪∞ n
i=1 Ai = ∪i=1 Ai hence

P (∪ni=1 Ai ) = P (∪∞
i=1 Ai )
n
X ∞
X
= P (Ai ) + P (∅)
i=1 n=1
Xn ∞
X
== P (Ai ) + 0
i=1 n=1
n
X
= P (Ai )
i=1

(c) Use S = E ∪ Ec. By Axiom (2) we have

1 = P (S) = P (E) + P (E c ) ,
hence P(E c ) = 1 − P(E).
c
(d) If E ⊂ F, then write F = E ∪ (F ∩ E ) thus since this is disjoint

P (F ) = P (E ∪ (F ∩ E )) = P (E) + P (F ∩ E c ) ≥ P (E) + 0 = P (E) .


c

(e) Write E ∪ F = E ∪ (E c ∩ F ), (Picture of venn diagram of both )hence by disjointness again

P (E ∪ F ) = P (E) + P (E c ∩ F ) .
Now write F (with picture) as F = (E ∩ F ) ∪ (E c ∩ F ) and using disjointness

P (F ) = P (E ∩ F ) + P (E ∩ F ) =⇒ P (E c ∩ F ) = P (F ) − P (E ∩ F ) ,
c

substitute into rst equation to get

P (E ∪ F ) = P (E) + P (E c ∩ F )
= P (E) + P (F ) − P (E ∩ F ) ,
as needed. 
• Example: Uconn Basketball is playing Kentucky this year.
 Home game has .5 chance of winning
 Away game has .4 chance of winning.
 .3 that uconn wins both games.
 What's the probability that Uconn loses both games?
 Answer.
∗ Let P (A1 ) = .5 , P (A2 ) = .4 and P (A1 ∩ A2 ) = .3.
∗ We want to nd P (Ac1 ∩ Ac2 ). Simplify as much as we can:
c
P (Ac1 ∩ Ac2 ) = P ((A1 ∪ A2 ) ) by DeMorgan's Law

= 1 − P (A1 ∪ A2 ) , by Proposition 1c
2.2. AXIOMS OF PROBABILITY 14

∗ Using Proposition 1e, we have

P (A1 ∪ A2 ) = .5 + .4 − .3 = .6,
c
Hence P (A1 ∩ Ac2 ) = 1 − .6 = .4 as needed.
2.3. EQUALLY LIKELY OUTCOMES 15

2.3. Equally Likely Outcomes

• In many experiments, a probability space consists of nitely many points, all with equally likely
probabilities.
1
 Basic example was a tossing a coin P (H) = P (T ) = 2
 Fair die: P (i) = 61 for i = 1, . . . , 6.
• In this case from Axiom 3 we have that
number of outcomes in E
P (E) = .
number of outcomes in S
• Example1: What is the probability that if we roll 2 dice, the sum is 7?
 Answer: There are 36 total outcomes , of which 6 have a sum of 7:
∗ E = ”sum is 7” = {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)}. Since they are all equally
6
likely, the probability is P (E) = 6·6 = 16 .
• Example 2: If 3 balls are randomly drawn from a bowl containing 6 white and 5 black balls,
what is the probability that one ball is white and the other two are black?
 Method 1: (regard as a ordered selection)
W BB + BW B + BBW
P (E) =
11 · 10 · 9
6·5·4+5·6·4+5·4·6 120 + 120 + 120 4
= = = .
990 990 11
 Method2: (Regard as unordered set of drawn balls)
  
6 5
(1 white) (2 black) 1 2 4
P (E) =   =   = .
11 11 11
3 3
• We can always choose which way to regard our experiements.
• Example 3 A committee of 5 is to selected from a group of 6 men and 9 women. What is probability
consistsd of 3 men and 2 women   

6  9 
3 2
 Answer: Easy men·women
all =   240
= 1001 .

15 
5
• Example 4: Seven balls are randomly withdrawn from an urn that contains 12 red, 16 blue, and
18 green.
 (b) Find probability that  at least 2 red balls are withdrawn;
 Ans: Let E be this event then P (E) = 1−P (E c ), P (at least 2 red) = 1−P (drawing 0 or 1 balls).
Now
    
16 + 18 = 34 12 34
7 1 6
P (drawing 0 or 1 red balls) =   +   .
46 46
7 7
• Explanation of Poker/Playing cards : Ranks and suits,etc!
 There are 52 cards in a standard deck of playing cards. The poker hand is consists of ve
cards. There are 4 suits : heats, spades, diamonds, and clubs (♥♠♦♣). The suits diamonds
2.3. EQUALLY LIKELY OUTCOMES 16

and hearts are red while clubs and spades are black. In each suit there are 13 ranks : the
numbers 2, 3 . . . , 10, the face cards, Jack, Queen, King, and the Ace(not a face card).
• Example 5: What is the probability that in a poker hand (5 cards out of 52) we get exactly 4 of
a kind?   
4 4
 Answer: Consider 4 aces and 1 king: AAAK = . But JJJJ3 is the same
4 1
probability.
∗ Thus there are 13 ways to pick the rst rank, and 12 ways to pick the second rank

[choice of ranks] [given rank how to choose a hand]


P (4 of a kind) =  
52
5
   
4 4
13 · 12 ·
4 1
=   ≈ .0000139
52
5
• Example 6: What is the probability that in a poker hand (5 cards out of 52) we get a straight.
(no straight ushes, can't be of the same suit)
 Answer: Consider: A-2-3-4-5-6-7-8-9-10-J-Q-K-A- There are 10 possible straights.
∗ Given a straight Say A2345: There are 4 · 4 · 4 · 4 · 4 − (of the same suit) = 45 − 4.
[choice of straight] [given striaght how to choose a hand]
P (Straight) =  
52
5

10 · 45 − 4
=   ≈ .0039
52
5
• Example 7: What is the probability that in a poker hand (5 cards out of 52) we get a Full House.
(3 and a 2 of a kind)
 Answer: It would be [3 of a kind][2 of a kind]. AAAKK or KKAAA are dierent!Choose suit:
13 · 12.
 Then once we choose within each group there

[choice of rank] [3 of a kind] [2 of a kind]


P (Full House) =  
52
5
  
4 4
13 · 12
3 2
=   ≈ .0014.
52
5
• Example 8: (Birthday Problem) In a class of 32 people, what is the probability that at least two
people have the same birthdays? (We assume each day is equally likely.)
 Answer: Let the rst person have a birthday on some day. The probability that the second
364
person has a dierent birthday will be
365 . The probability that the third person has a
2.3. EQUALLY LIKELY OUTCOMES 17

363
dierent birthday from the rst two people is
365 . So the answer is
P (at least 2 people) = 1 − P (Everyone dierent birthday)
365 364 363 (365 − 31)
= 1− · · ···
365 365 365 365
364 363 334
= 1−1· · ··· ≈ 0.752374.
365 365 365
 Really High!!!
CHAPTER 3

Independence
3.1. Independent Events

Definition. We say E and F are independent events if

P (E ∩ F ) = P (E) P (F ) .
• Example1: Suppose you ip two coins.
 The event that you get heads on the second coin is independent of the event that you get tails
on the rst.
 This is why: Let At be the event of getting is tails for the rst coin and Bh is the event
of getting heads for the second coin, and we assume we have fair coins (although this is not
necessary), then

1
P (At ∩ Bh ) = , list out all outcomes
4
11 1
P (At ) P (Bh ) = = .
22 4
• Example2: Experiment: Draw a card from an ordinary deck of cards
 Let A = draw ace, S = draw a spade.
∗ These are independent events since you're taking one at a time, so one doesn't eect the
other. To see this using the denition we have compute
1 1
∗ P (A) P (S) = 13 4.
1
∗ White P (A ∩ S) = 52 since there is only 1 Ace of spades.

Proposition 2. If E and F are independent, then E and Fc are independent.

Proof. Draw a Venn Diagram to help with the computation, but note that

P (E ∩ F c ) = P (E) − P (E ∩ F )
= P (E) − P (E) P (F )
= P (E) (1 − P (F ))
= P (E) P (F c ) .

• Remark: Independence and mutually exclusive, are two dierent things!

Definition. We say E, F, G are independent if E, F are independent, E, G are independent, F, G are


independent, and P (E ∩ F ∩ G) = P (E) P (F ) P (G).
• Example: Experiment is you roll two dice:
 Dene the following events:

18
3.1. INDEPENDENT EVENTS 19

 S7 = {sum is 7}
 A4 = {rst die is a 4}
 B3 = {second die is a 3}
 Are the events S7 , A4 , B3 independent?
∗ Compute

1
P (S7 ∩ A4 ∩ B3 ) = P ({(4, 3)}) =
36
but

6 11 1
P (S7 ) P (A4 ) P (B3 ) = = .
36 6 6 36 · 6
• Remark: This generalizes to events A1 , . . . , A n . We say events
T A1 , . . . , An are independent if for
r Qr 
all subcollections i1 , . . . , ir ∈ {1, . . . , n} we have that P j=1 Aij = j=1 P Aij .
• Example:
 An urn contains 10 balls: 4 red and 6 blue.
 A second urn contains 16 red balls and an unknown number of blue balls.
 A single ball is drawn from each urn. The probability that both balls are the same color is
0.44.
 Question: Calculate the number of blue balls in the second urn.
 Solution: Let Ri = even that a red ball is drawn from urn i and let Bi =event that a blue
ball is drawn from urn i.
∗ Let x be the number of blue balls in urn 2,
∗ Note that drawing from urn 1 and independent from drawing from urn 2. They are
completely dierent urns! They shouldn't eect the other.
∗ Then

 [ 
.44 = P (R1 ∩ R2 ) (B1 ∩ B2 ) = P (R1 ∩ R2 ) + P (B1 ∩ B2 )
= P (R1 ) P (R2 ) + P (B1 ) P (B2 ) , by independence
4 16 6 x
= + .
10 x + 16 10 x + 16

∗ Solve for x! You will get x = 4.


• Example (Gambler's Ruin)(Used in Finance or Actuarial Science)
 Experiment: Suppose you toss a fair coin repeatedly and independently. If it comes up heads,
you win a dollar, and if it comes up tails, you lose a dollar. Suppose you start with $50.
What's the probability you will get to $200 before you go broke?
 Answer: It's actuallly easier if we generalize the problem.
∗ Let p(x) be the probability you get 200 before 0 if you start with x dollars.
∗ We know p(0) = 0 and p(200) = 1. So by the law of total probability

p(x) = P (Win 200 before 0)


= P (H) P (Win 200 before 0 | H) + P (H c ) P (Win 200 before 0 | H c)
1 1
= p (x + 1) + p (x − 1) .
2 2
3.1. INDEPENDENT EVENTS 20

∗ Rearrange the function to get

2p(x) = p (x − 1) + p (x + 1) ⇐⇒ p(x) + p(x) = p (x − 1) + p (x + 1)


⇐⇒ p(x) − p (x − 1) = p (x + 1) − p(x)
p(x) − p (x − 1) p (x + 1) − p(x)
⇐⇒ = .
x − (x − 1) (x + 1) − x

∗ This tellls you that the slows are constant. What does that tell you about p(x)? It's a
line!
x
· Thus we must have p(x) = 200 .
1
∗ Thus p(50) = 4.
• Example (A variation of Gambler's ruin)
 Problem: Suppose we are in the same situation, but you are allowed to go arbitrarily far in
debt. Let p(x)be the probability you ever get to $200. What is a formula for p(x)?
∗ Answer: Just as before p(x) = 12 p(x + 1) + 12 p(x − 1). So that p(x) is linear.
∗ But now all we have is that p(200) = 1 and linear and domain is (−∞, 200).
∗ Draw a graph: Now the slope, or p0 (x) can't be negative, or else we would have it that
p(x) > 1 for x ∈ (−∞, 200).
· The slope can't be positive or else we would get p(x) < 0 for x ∈ (−∞, 200).
∗ Thus we must have that p(x) ≡ constant. Hence p(x) = 1 for all x ∈ (−∞.200).
∗ Sol: So we are certain to get $200 if we cna get into debt.
 Method2:
∗ Just compute There is nothing special about the gure 200. Another way of seeing this
is to compute as above the probability of getting to 200 before −M and then letting
M → ∞.
· We would get p(x) is a line with p(−M ) = 0 and p(200) = 1 so that

1−0
p(x) − 0 = (x − (−M ))
200 − (−M )

x+M
and letting M →∞ wee see that p(x) = 200+M → 1.
• Example: Experiment: Roll 10 dice.
 What is the probability that exactly 4 twos will show if you roll 10 dice?
 Answer: These are independent. The probability that the 1st, 2nd, 3rd, and 10th dice will
1 3 5 7
 
show a three and the other 6 will not is
6 6 .
 Independence is used here: the probability is 16 16 16 65 65 56 65 65 56 61 . Note that the probability
that the 10th, 9th, 8th, and 7th dice will show a two and the other 6 will not has the same
probability.
1 4 5 6
 
 So to answer our original question, we take 6 6 and multiply it by the number of ways
 
10
of choosing 4 dice out of 10 to be the ones showing the twos. There are ways to do
3
 
10 1 4
 5 6

this
4 6 6 .
• This is an example of Bernoulli trials, or the Binomial distribution.
3.1. INDEPENDENT EVENTS 21

 If we have n independent trials, where the probability of success if p. The probability that
there are k successes in n trials is
 
n n−k
pk (1 − p) .
k
CHAPTER 4

Conditional Probability and Independence


4.1. Conditional Probabilities

• Suppose there are


 200 men, of which 100 are smokers,
 100 women, of which 20 are smokers.
120
 Question1: What is the probability that a person chosen at random will be a smoker?
300
 Question2: Now, let us ask, what is the probability that a person chosen at random is a smoker
20
given that the person is a women?
100 right?
∗ Note this is
# (women smokers) P (women and a smoker)
= .
# (women) P (woman)
• Thus we make the following denition:

Definition. If P (F ) > 0, we dene

P (E ∩ F )
P (E | F ) = .
P (F )
Now P (E | F ) is read the probability of E given F .
• Note that P (E ∩ F ) = P (E | F ) P (F )!
• This is the conditional probability that E occurs given that F has already occured!
• Remark: Suppose P (E | F ) = P(E) , i.e. knowing F doesn't help predict E . Then this implies
P(E∩F )
that E and F are independent of each other. Rearranging P (E | F ) =
P(F ) = P (E) we see that
P (E ∩ F ) = P(E)P(F ).
• Example1: Experiment: Roll two dice.
 (a) What is the probability the sum is 8?
5
∗ Solution: Note that A = {(2, 6), (3, 5), (4, 4), (5, 3), (6, 2)} 36 .
so we know P (A) =
 (b) What is the probability that the sum is 8 given that the rst die shows a 3? (In other
words, nd P (A | B))
∗ Solution: Let B = {rst die shows three}.
1
∗ P (A ∩ B) = P ({(3, 5)}) = 36 is probability that the rst die shows a 3 and the sum is
8
∗ Finally we can compute

1/36 1
P (A | B) = P (sum is 8 | 1st is a 3) = = .
1/6 6
• Remark: When computing P (E | F ), Sometime its easier to work with the reduced sample space
F ⊂ S.
22
4.1. CONDITIONAL PROBABILITIES 23

 Note in the previous example when we computed

P (sum is 8 | 1st is a 3)

we could have worked in the smaller sample space of {1st is a 3} = {(3, 1) , (3, 2) , (3, 3) , (3, 4) , (3, 5) , (3, 6)}.
Since only (3, 5) begins with a 3 and has the sum of 8, then the probability is

total number of outcomes in the event 1


= .
total number of outcomes in new sample space 6
• Example2: Experiment: Suppose a box has 3 red marbles and 2 black ones. We select 2 marbles.
 Question: What is the probability that second marble is red given that the rst one is red?
∗ Answer:
· R1 = {First marble is is red },
· R2 = {Second marble is red}, then
P (R1 ∩ R2 )
P (R2 | R1 ) =
P (R1 )
 
5
(2 red) (0 black) /
2
=
3/5
    
3 2 5
/
2 0 2
=
3/5
3/10 1
= = .
3/5 2
∗ Solution 2:
· We could have done the same example more easiely if we look at the new sample
space S 0 = {R, R, B, B} thus P (R2 | R1 ) = P0 ({drawing
2 1
4 = 2. red}) =
• Example3: Landon is 80% sure he forgot his textbook at the Union or Monteith buildings. 40%
sure that it is at the union, and 40% sure that it is at Monteith. Given that Landon already went
to Monteith and noticed his textbook not there, what is the probability that it's at the Union?
 Solution:

P (U ∩ M c )
P (U nion | N ot Monteith) =
P (M c )
P (U )
= , since U ⊂ Mc
1 − P (M )
4/10 2
= = .
6/10 3
• Example4: Suppose that Annabelle and Bobby each draw 13 cards from a standard deck of 52.
Given that Sarah has exactly two aces, what is the probability that Bobby has exactly one ace?
 Solution: Let A be the event Annabelle has two aces," and let B be the event Bobby has
exactly one ace." Again, we want
  P (B | A), so we calculate
  and
P(A)  P(A ∩ B). Annabelle
52 4 48
could have any of possible hands. Of these hands, · will have exactly
13 2 11
4.1. CONDITIONAL PROBABILITIES 24

two aces, so
   
4 48
·
2 11
P (A) =   .
52
13
Now the number of ways in which Annabelle can have a certain hand and Bobby can have a
   
52 39
certain hand is · , and the number of ways in which A and B can both occur
13 13
       
4 48 2 37
is · · · . so
2 11 1 12
       
4 48 2 37
· · ·
2 11 1 12
P(A ∩ B) =     .
52 39
·
13 13

Therefore,
    
4 ·
48 ·
2   37
 · 
2 11 1 12
  

52 ·
39 
P (A ∩ B) 13 13
P (B | A) = =   
P(A) 
4 ·
48 
2 11
 

52 
13
   
2 37
·
1 12
=   .
39
13

• P (B | A) = P(A∩B)
Note that since
P(A) then P (A ∩ B) = P(A)P (B | A).
 In general: If E1 , . . . , En are events then

P (E1 ∩ E2 ∩ · · · ∩ En ) = P (E1 ) P (E2 | E1 ) P (E3 | E1 ∩ E2 ) · · · P (En | E1 ∩ E2 ∩ · · · ∩ En−1 ) .

• Example5:
 Experiment: Suppose an urn has 5 White balls and 7 Black balls. Each ball that is selected is
returned to the urn along with an additional ball of the same color. Suppose draw 3 balls.
 Part (a): What is the probability that you get 3 white balls.
∗ Then

P (3 white balls) = P (1st W) P (2nd W | 1st W) P (3nd W | 1st & 2ndW)


5 6 7
=
12 13 14
4.1. CONDITIONAL PROBABILITIES 25

 Part (b): What is the probability of getting 1 white ball.

P (1 white ball) = P (W BB) + P (BW B) + P (BBW )


5·7·8
= 3 .
12 · 13 · 14
• Note that
P (E ∩ F ) = P (E | F ) P (F )
• Example 6: Phan wants to take a Biology course or a Chemistry course. Given that the students
take Biology, the probability that they get anA is is 54 . While the probability of getting an A given
1
that the student took Chemistry is
7 . If Phan makes a decision on the course to take randomly,
what's probability of getting an A in Chem?
 Solution: Let B = {Takes Biology} and C = {Takes Chemistry} and A = {"gets an A"},
then

P (A ∩ C) = P (C) P (A | C)
1 1 1
= · = .
2 7 14
• Example 7: A total of 500 married couples are poled about salaries:
Wife Husband makes less than 25,000 Husband makes more than 25,000

 Less than $25,000 212 198


More than $25,000 36 54
 Part (a): Find the probability that a Husband earns less than 25,000?
212+36
∗ Answer:
500
 Part (b): Find P (wife makes > 25, 000 | Husband makes > 25, 000)
54/500 54
∗ Answer: (198+54)/500 = 252 = .214
 Part (c): Find P (wife makes > 25, 000 | Husband makes < 25, 000)
36/500
∗ Answer: (248)/500 = .145.
4.2. BAYES'S FORMULA 26

4.2. Bayes's Formula

• Sometimes it's easier to compute a probability once we know something has or has not happened.
• Note that we can compute,

P (E) = P (E ∩ F ) + P (E ∩ F c )
= P (E | F ) P (F ) + P (E | F c ) P (F c )
= P (E | F ) P (F ) + P (E | F c ) (1 − P (F )) .
• This formula is called: The Law of Total Probability:

P (E) = P (E | F ) P (F ) + P (E | F c ) (1 − P (F ))
• The following problem will describe the types of problems of this section.
• Example1: Insurance company believes
 The probability that an accident prone person has an accident within a year is .4.
 The probability that Non-accident prone person has an accident with year is .2.
 30% of the population is accident prone.
 Part (a): Find P (A1 ) where A1 =new policy holder will have an accident within a year?
∗ Let A = {Policy holder IS accident prone.}

P (A1 ) = P (A1 | A) P (A) + P (A1 | Ac ) (1 − P (A))


= .4 (.3) + .2 (1 − .3)
= .26
 Part (b): Suppose new policyholder has accident with one year. What's probability that he
or she is accident prone?

P (A ∩ A1 )
P (A | A1 ) =
P (A1 )
P (A) P (A1 | A)
=
.26
(.3) (.4) 6
= = .
.26 13
• In general:
 So in Part (a) we had to break a probability into two cases: If F1 , . . . , Fn are mutually exclusive
Sn
events such that they make up everythinn S= i=1 Fi then
n
X
P (E) = P (E | Fi ) P (Fi ) .
i=1
∗ This is called Law of Total Probability.
 In Part (b), we wanted to nd a probability of a separate conditional event: then
P (E | Fj ) P (Fj )
P (Fj | E) = Pn .
i=1 P (E | Fi ) P (Fi )
∗ This is known as Baye's Formula
∗ Note that the denominator of the Bayes's formula is the Law of total probability.
• Example2: Suppose the test for HIV is
 98% accurate in both directions
 0.5% of the population is HIV positive.
4.2. BAYES'S FORMULA 27

 Question: If someone tests positive, what is the probability they actually are HIV positive?
 Solution: Let T+ = {tests positive} , T− = {tests negative}, while + = {actually HIV positive,}
− = {actually negative}.
∗ Want
P (+ ∩ T+ )
P (+ | T+ ) =
P (T+ )
P (T+ | +) P (+)
=
P (T+ | +) P (+) + P (T+ | −) P (−)
(.98) (.005)
=
(.98) (.005) + .02 (.995)
= 19.8%.
• Example3: Suppose
 30% of the women in a class received an A on the test
 25% of the men/or else received an A.
 60% of the class are women.
 Question: Given that a person chosen at random received an A, what is the probability this
person is a women?
∗ Solution: Let A the event that a students receives an A. Let W =being a women,
M =not a women. Want

P (A | W ) P (W )
P (W | A) = , by Bayes's
P (A | W ) P (W ) + P (A | M ) P (M )
.3 (.6) .18
= = ≈ .64.
.3 (.6) + .25 (.4) .28
• (General Baye's Theorem) Here's one with more than 3 possibilities:
• Example4: Suppose in Factory with Machines I,II,III producing Iphones
 Machines I,II,III produce 2%,1%, and 3% defective iphones, respectively.
 Out of total production, Machines I makes 35% of all Iphones, II -25%, III - 40%.
 If one Iphone is selected at random from the factory,
 Part (a): what is probability that one Iphone selected is defective?

P (D) = P (I) P (D | I) + P (II) P (D | II) + P (III) P (D | III)


= (.35) (.02) + (.25) (.01) + (.4) (.03)
215
= .
10, 000
 Part (b): What is the conditional prob that if an Iphone is defective, that it was produced by
machine III?

P (III) P (D | III)
P (III | D) =
P (D)
(.4) (.03) 120
= = .
215/10, 000 215
• Example5: In a Multiple Choice Test, students either knows the answer or randomly guesses the
answer to a question.
 Let m =number of choices in a question.
4.2. BAYES'S FORMULA 28

 Let p = the probability that the students knows the answer to a question.
 Question: What is the probability that the student actually knew the answer, given that the
student answers correctly.
 Solution:
 Let K = {Knows the answer} and C = {Answer's correctly}. Then

P (C | K) P (K)
P (K | C) =
P (C | K) P (K) + P (C | K c ) P (K c )
1·p mp
= 1 = .
1 · p + m (1 − p) 1 + (m − 1)p
CHAPTER 5

Random Variables
5.1. Random Variables

• When we perform an experiment, we are interested in some function of the outcomes, instead of
the actual outcome.
 We want to attach for each outcome, a numerial value.
• Denition: A random variable is a function X:S→R or write X : Ω → R. (Use capital letters
to denote r.v)
 We can think of X as a numerical value that is random, like as if X is a random number.
• Example: Toss a coin
 Let X be 1 if heads and X = 0 if tails
 Then X (H) = 1 and X (T ) = 0.
 We can do calculus on real numbers but not on Ω = S = {H, T }.
• Example: Roll a die
 Let X denote the outcome, so X = 1, 2, 3, 4, 5, 6 (its random)
 That is X(1) = 1, X(2) = 2, . . . .
• Example: Roll a die, dene

(
1 outomce= odd
Y =
0 outomce= even

 Can be thought of as
(
1 s = odd
Y (s) = .
0 s = even

• A common question we'll have is What values can X attain ?


 In other words, what is the range of X? Since X : S →?
• Example: Toss a coin 10 times
 Let X be the number of heads showing
 What random values can X be? 0, 1, 2, . . . , 10.
• Example: In general in n trials, X is the number of successes
• Example1: Let X be the amount of liability(damages) a driver incurs in a year.
 X : S → [0, ∞).
• Example2: Toss a coin 3 times
 Let X be the number of heads that appear, so X = 0, 1, 2, 3.
 In other words, X : S → {0, 1, 2, 3}

29
5.1. RANDOM VARIABLES 30

 We may assign probabiliies to the dierent values of the random variable:


1 1
P (X = 0) = P ((T, T, T )) = 3
=
2 8
3
P (X = 1) = P ((T, T, H) , (T, H, T ) , (H, T, T )) =
8
3
P (X = 2) = P ((T, H, H) , (H, H, T ) , (H, T, H)) =
8
1
P (X = 3) = P ((H, H, H)) = .
8
 Note that since X must take the values of 0 through 3 then
3
! 3
[ X
1=P {X = i} = P (X = i) ,
i=0 i=0
which makes sense from our previous calculation.
5.2. DISCRETE RANDOM VARIABLES 31

5.2. Discrete Random Variables

Definition. A random variable that can take on at most countable number of possible values is said
to be a discrete r.v.

Definition. For a discrete random variable, we can dene the probability mass function (pmf ), or
the density function of X by p(x) = P (X = x). Note that p : R → [0, 1].
• Note that (X = x) = (ω ∈ Ω | X (ω) = x) is an abbreviation.
• Let X x1 , x2 , x3 . . .
assume only the values
 In other words, X : S → {x1 , x2 , . . . }
 Properties of a pmf p(x):
∗ Note that we must have 0 < p(xi ) ≤ 1 for ,i = 1, 2, . . . . and p (x) = 0 for all other values
of x can't attain.
∗ Also must have
X∞
p(xi ) = 1.
i=1
• We often draw bar graphs for discrete r.v.
• Example: If we toss a coin
 X = 1 if we have H and X = 0 if we have T .
 Then draw a BAR graph

1
2
 x=0
1
pX (x) = x = 1,
2
0 otherwise

• Oftentimes someone has already found the pmf for you, and you can use to compute probabilities.
i
• Example: The pmf of X is given by p(i) = e−λ λi! for i = 0, 1, 2, . . . where λ is a parameter(what
is this?) that is any positive number
 Part (a) What values can the random variable X attain? In other words, what is the range
of X?
0
∗ Sol: By denition we have P (X = 0) = p(0) = e−λ λ0! = e−λ
 Part (b) Find P (X = 0)
0
∗ Sol: By denition we have P (X = 0) = p(0) = e−λ λ0! = e−λ
 Part (c) Find P (X > 2)
∗ Sol: Note that
P (X > 2) = 1 − P (X ≤ 2)
= 1 − P (X = 0) − P (X = 1) − P (X = 2)
= 1 − p(0) − p(1) − p(2)
λ2 e−λ
= 1 − e−λ − λe−λ − .
2
5.3. EXPECTED VALUE 32

5.3. Expected Value

• One of the most important concepts in probability is that of expectation. If X is a random variable
that what is the average value of X, that is what is the expected value of X.

Definition. Let X have a pmf p(x). We dene the expectation, or expected value of X to be

X
E [X] = xp(x).
x:p(x)>0

• Notation EX , or EX .
• Example1: Let X(H) = 0 and X (T ) = 1. What is EX ?

EX = 0 · p(0) + 1 · p(1)
1 1 1
= 0 +1· = .
2 2 2
• Example2: Let X be the outcome when we roll a fair die. What is EX ?
   
1 1 1
EX = 1 +2 + ··· + 6
6 6 6
1 21 7
= (1 + 2 + 3 + 4 + 5 + 6) = = = 3.5
6 6 2
 Note that X can never be 3.5 , so expectation is to give you an idea, what an exact.
• Recall innite series: If 0 ≤ x < 1 then a geometric series is

X
xn = 1 + x + x2 + x3 + · · ·
n=0
1
= .
1−x
 One thing you can do with series is dierentiate them and integrate them: So if

1
1 + x + x2 + x3 + · · · + =
1−x
then
1
0 + 1 + 2x + 3x2 + · · · + = 2
(1 − x)
• Example3: Let X be the number or tornados in Connecticut per year. Meaning that the random
variable X can be any number X = 0, 1, 2, 3, . . . . Suppose the state of Connecticut did some
analysis and found out that

1
P (X = i) = .
2i+1
 Question: What is EX ? That is, what is the expected number of tornados per year in
Connecticut.
 Solution: Note that X is innite, but still countable, hence still discrete.
5.3. EXPECTED VALUE 33

 Note that 1
2
 i = 0,
 1

4
 i = 1,

1
p(i) = 8 i = 2,
 . .
. .

. .




 1
2n+1 i = n.
 We have that
EX = 0 · p(0) + 1 · p(1) + 2 · p(2) + · · ·
1 1 1 1
= 0 · + 1 2 + 2 3 + 3 4 + ···
2 2 2 2
1 1 1
= 2
1 + 2 + 3 2 + ···
2 2 2
1 1
1 + 2x + 3x2 + · · · , withx =

=
4 2
1 1 1
= = 2 = 1.
4 (1 − x)2 4 1− 1 2
5.4. THE C.D.F. 34

5.4. The C.D.F.

Definition. Dene F : R → [0, 1] to be the function

F (x) = P (X ≤ x) , for any −∞<x<∞


to be the cumulative distribution function, or the distribution function of X ., or CDF of X, or c.d.f

• Note that when X is discrete,


X
F (x0 ) = P (X ≤ x0 ) = p(x).
x≤x0

• We sometimes use the notation FX (x) to highlight that FX is the CDF of the random variable X .
• Example: Suppose X is equals to the number of heads in 3 coin ips. From Section 5.1, we
calculated the p.m.f to be.:

1
p(0) = P (X = 0) =
8
3
p(1) = P (X = 1) =
8
3
p(2) = P (X = 2) =
8
1
p(3) = P (X = 3) = .
8
Question: Find the c.d.f of X . Plot the graph of the c.d.f.
 Solution: Summing up the probabilities up to that value of x we get the following:



0 −∞ < x < 0
1
0≤x<1


8

4
F (x) = 8 1≤x<2 .
7

2≤x<3




 8
1 3≤x<∞
 The graph is given by


5.4. THE C.D.F. 35

 Note that this is a step function.


 This function has jumps, and not continouous everywhere.
 But it looks like it never decreases.
• Properties of the CDF:
 1. F is nondecreasing , that is
∗ if x < y then F (x) ≤ F (y)
 2. limx→∞ F (x) = 1.
 3. limx→−∞ F (x) = 0.
 4. F is right continuous. That are two ways that you can think of right continuity:
∗ limx→1+ F (xn ) = F (x), meaning the limit from the right equals where the function is
dened
∗ If xn ↓ x is a decreasing sequence then limn→∞ F (xn ) = F (x).
• We take these properties as facts, though one would normally have to prove these.
• The following proposition does not have to be proved in class, and can be highlighted with the
following example. But we include it here for completeness.

Proposition 3. Let FX (x) be the CDF for some random variable X. Then the following holds:
(a) For any a ∈ R, we have P (X < a) = limx→a− FX (x)
(b) For any a ∈ R, we have P (X = a) = FX (a) − limx→a− FX (x)
Proof. For part (a).
We rst write
∞  
[ 1
(X < a) = X ≤a−
n=1
n
"∞  #
[ [ 1 1
= (X ≤ a − 1) a− <X ≤a−
n=1
n n+1
 
and since the events En = a − n1 ≤ X ≤ a − n+1
1
are disjoint then we can use Axiom 3 so prove that

∞  
X 1 1
P (X < a) = P (X ≤ a − 1) + P a− <X ≤a−
n=1
n n+1
k     
X 1 1
= P (X ≤ a − 1) + lim P X ≤a− −P X ≤a−
k→∞
n=1
n+1 n
   
1
= P (X ≤ a − 1) + lim P X ≤ a − − P (X ≤ a − 1) , by telescoping
k→∞ k+1
 
1
= lim P X ≤ a − + P (X ≤ a − 1) − P (X ≤ a − 1)
k→∞ k+1
 
1
= lim FX a − .
n→∞ n
Now you can replace the sequence an = a − n1 with any sequence an that is increasing towards a, and we get
the similar result,

lim FX (an ) = P (X < a) ,


n→∞
5.4. THE C.D.F. 36

since this holds for all increasing sequences an towards a, then we've shown that

lim FX (x) = P (X < a) .


x→a−
For part (b). We use part (a) and get

P (X = a) = P (X ≤ a) − P (X < a)
= FX (a) − lim− FX (x).
x→a

• Example: Let X have distribution



 0 x<0
x
0≤x<1


2

2
F (x) = 3 1≤x<2
11

2≤x<3


 12


1 3 ≤ x.
Graph this and answer the following:
 Part (a): Compute P(2 < X ≤ 4). We have that

P(2 < X ≤ 4) = P(X ≤ 4) − P (X ≤ 2)


= F (4) − F (2)
1
= .
12
 Part (b): Compute P (X < 3).
∗ We have that
 
1
P (X < 3) = lim P X ≤ 3 −
n→∞ n
= lim FX (x)
x→3−
11
=
12
 Part (c): Compute P(X = 1).
∗ We have that

P(X = 1) = P(X ≤ 1) − P (X < 1)


= FX (1) − lim− FX (x)
x→1
2 x
= − lim
3 x→1 2
2 1 1
= − = .
3 2 6
5.5. EXPECTATED VALUE OF SUMS OF RANDOM VARIABLES 37

5.5. Expectated Value of Sums of Random Variables

• Recall our current denition of EX


 List out X = x1 , x2 , . . . and let p(xi ) be the density of X

X
 Then EX = xi p(xi ).
i=1
• We need a new denition that will help the linearity of expectation.
 Goal: If Z = X + Y then E [X + Y ] = EX + EY .
• Denition 2: Let S (or Ω ) be the sample space then dene
X
EX = X(ω)P ({ω}) .
ω∈S

 Example: Let S = {1, 2, 3, 4, 5, 6} and X(1) = X(2) = 1 and X(3) = X(4) = 3 and
X(5) = X(6) = 5
∗ Def1: We know X = 1, 3, 5 with p(1) = p(3) = p(5) = 13
∗ Then EX = 1 · 31 + 3 13 + 5 13 = 39 = 3.
∗ Def2: We list all of S = {1, 2, 3, 4, 5, 6} and
∗ Then
EX = X(1)P ({1}) + · · · + X(6) · P ({6})
1 1 1 1 1 1
= 1 + 1 + 3 + 3 + 5 + 5 = 3.
6 6 6 6 6 6
• Dierence
 Def1: We list all the values that X can attain and only care about those. (Range)
 Def2: List all possible outcomes. (Domain)

Proposition 4. If X is a discrete random variable and S is countable, then the two denitions are
equivalent

• NOTE: No need to prove in lecture. But here for completeness.

Proof. We start with the rst denition. Let X = x1 , x2 , . . .


X
EX = xi p(xi )
xi
X
= xi P (X = xi )
xi
X X
= xi P (ω)
xi ω∈{ω:X(ω)=xi }
X X
= xi P (ω)
xi ω∈{ω:X(ω)=xi }
X X
= X(ω)P (ω)
xi ω∈{ω:X(ω)=xi }
X
= X(ω)P (ω) ,
ω∈S

where I used that each Si = {ω : X(ω) = xi } are mutually exclusinve events that union up to S. 
5.5. EXPECTATED VALUE OF SUMS OF RANDOM VARIABLES 38

• Using this denition, we can prove linearity of the expectation.

Theorem 5. (Linearity) If X and Y are discrete random variables and a∈R then
(a) E [X + Y ] = EX + EY .
(b) E [aX] = aEX .
Proof. We have that
X
E [X + Y ] = (X(ω) + Y (ω)) P (ω)
ω∈S
X
= (X(ω)P (ω) + Y (ω)P (ω))
ω∈S
X X
= X(ω)P (ω) + Y (ω)P (ω)
ω∈S ω∈S
= EX + EY.
If a∈R then
X
E [aX] = (aX(ω)) P (ω)
ω∈S
X
= a X(ω)P (ω)
ω∈S
= aEX.

• Generality: Linearity is true for general random variable X1 , X2 , . . . , Xn .
5.6. EXPECTATION OF A FUNCTION OF A RANDOM VARIABLE 39

5.6. Expectation of a Function of a Random Variable

• Let X be a random variable.


 Can we nd the expected value of things like X 2 , eX , sin X etc?
• Example1: Let X denote a random variable such that

P (X = −1) = .2,
P (X = 0) = .5
P (X = 1) = .3
2
Let Y =X . Find EY . n o
2 2
 Solution: Note that Y = 02 , (−1) , (1) = {0, 1}.
 Note that pY (1) = .2 + .3 = .5 and pY (0) = .5.
 Thus EY = 0 · .5 + 1 · .5 = .5.
• IMPORTANT:
 Note that EX 2 = .5 .
2
 While (EX) = .01. Not equal!
∗ Since EX = .3 − .2 = .1. Thus
2
EX 2 6= (EX) .
• In general, there is a formula for g(X) where g is function. That use the fact that g(X) will be
g(x) for some x such that X = x.
Theorem 6. If X is a discrete random varianle that takes values X ∈ {x1 , x2 , x3 , . . . } with respective
probability mass function p(xi ), then for any real valued function g : R → R we have that

X
E [g (X)] = g (xi ) p(xi ).
i=1

• NOTE: No need to prove in lecture. But here for completeness.

Proof. The random variable Y = g(X) can take on values, say Y = y1 , y2 , . . . . But we know that

yj = g(xi )
and as we see there could be more than one value xi such that yj = g(xi ). Thus we will group this sum into
this fashion: Using the denition of expectation we have that
X
E [Y ] = yj P (Y = yj )
j
X
= yj P (g(X) = yj )
j
= (?).
Now
 
[
P (g(X) = yj ) = P (g(xi ) = yj )
i:g(xi )=yj
X
= p(xi ).
i:g(xi )=yj
5.6. EXPECTATION OF A FUNCTION OF A RANDOM VARIABLE 40

Thus plugging this back into (?) we have that


X X
E [Y ] = yj p(xi )
j i:g(xi )=yj
X X
= yj p(xi )
j i:g(xi )=yj
X X
= g(xi )p(xi )
j i:g(xi )=yj

X
= g(xi )p(xi ),
i=1
as needed. 
• Remark: EX 2 = x2i p(xi ).
P
• Example1(Revisted): Let X denote a random variable such that

P (X = −1) = .2,
P (X = 0) = .5
P (X = 1) = .3
2
Let Y =X . Find EY .
2
EX 2 = x2i p(xi ) = (−1) (.2) + 02 (.5) + 12 (.3) = .5.
P
 Sol: We have that

Definition. We call µ = EX to be the mean, or the rst moment of X.


The quantity EX n forn ≥ 1, is called the nth moment of X .
• From out theorem we know that the nth moments can be calculated a
X
n
EX = xn p(x).
x:p(x)>0
5.7. VARIANCE 41

5.7. Variance

• The variance of a r.v. is a measure of how spread out the values of X are.
• The expectation of a r.v. is quantity that help us dierentiate dierent r.v.'s, but it doesn't tell us
how spread out values are.
 For example, take

X = 0 with probability 1
(
−1 p = 12
Y =
1 p = 12
(
−100 p = 12
Z = .
100 p = 12

 What are the expected values? 0, 0, 0.


 But there is much greater spread in Z than Y and Y than X. Thus expectation is not enough
to detect spread, or variation.

Definition. If X is a r.v with mean µ = EX , then the variance of X, denoted by Var(X), is dened
by
h i
2
Var (X) = E (X − µ) .

• Remark: Ec = c.
• We prove an alternate formula for the variance. (The technique of using linearity is important
here!!! Hint Hint)

h i
2
Var (X) = E (X − µ)
= E X 2 − 2µX + µ2
 

= E X 2 − 2µE [X] + E µ2
   

= E X 2 − 2µ2 + µ2
 

= E X 2 − µ2 .
 

Theorem. We have that

2
= E X 2 − (E [X]) .
 
Var (X)

• Example1: Calculate Var(X) if X represents the outcome when a fair die is rolled.
 Solution: Previously we calculated that EX = 72 .
 Thus we only need to calculate the second moment:
 
1 1
EX 2 = 12 + · · · + 62
6 6
91
= .
6
5.7. VARIANCE 42

 Using our formula we have that


2
E X 2 − (E [X])
 
Var (X) =
 2
91 7
= −
6 2
35
= .
12
• Here is a useful formula:

Proposition 7. For constants a, b we have that Var (aX + b) = a2 Var (X).


Proof. We compute
h i
2
Var (aX + b) = E (aX + b − E [aX + b])
h i
2
= E (aX + b − aµ − b)
h i
2
= E a2 (X − µ)
h i
2
= a2 E (X − µ)
= a2 Var (X) .

Definition. We dene p
SD (X) = Var(X)

to be the standard deviation of X.


CHAPTER 6

Some Discrete Distributions


6.1. Bernouli and Binomial Random Variables

• Bernoulli Distribution
 Suppose that a trial or experiment takes place, whose outcome is either success or failure.
 Let X = 1 when the outcome is a success and X = 0 if it is a failure.
 The pmf of X is given by
p(0) = P (X = 0) = 1 − p
p(1) = P (X = 1) = p
where 0 ≤ p ≤ 1.
 For this X , X is said to be a Bernoulli random variable with parameter p,
∗ We wrtie this as X ∼ Bernoulli(p),
∗ Properties:
· EX = p · 1 + (1 − p) · 0 = p
· EX 2 = 12 · p + 02 (1 − p) = p.
· So VarX = p − p2 = p(1 − p).
• Binomial Distribution:
• We say X has a binomial distribution with parameters n and p if
 
n n−k
pX (k) = P (X = k) = pk (1 − p) .
k
 Interpret: X =the number of successes in n indepedent trials.
∗ Let's take this as given.
 We say X ∼ Binomial(n, p) or X ∼ bin(n, p).
• Properties of the Binomial
 Check that probabilities sums to 1: Not really a property but more of a check that X is
indeed a random variable:
∗ We need to check two things:
(1) That pX (k) ≥ 0, and this is obvious from the fomula
Pn
(2) Need to check that k=0 pX (k) = 1.  
Pn n n
∗ First recall the Binomial Theorem: k=0 xk y n−k = (x + y) .
k
 Then
n n  
X X n n−k n
pX (k) = pk (1 − p) = (p + (1 − p)) = 1n = 1.
k
k=0 k=0
 Mean: Easiest way to compute EX is by recognizing that X = Y1 + · · · + Yn where Yi are
independent Bernoulli's.

43
6.1. BERNOULI AND BINOMIAL RANDOM VARIABLES 44

∗ Thus EX = EY1 + · · · + EYn = p + · · · + p = np.


∗ We can do this directly too, but this would involve proving that
n
X
EX = kp(k) = np,
k=0
meaning we would have to prove
n  
X n n−k
k pk (1 − p) = np.
k
k=0
 Variance: We rst compute the second moment. As before write X = Y1 + · · · + Yn where
Yi are bernoulli's.
2
EX 2 = E (Y1 + · · · + Yn )
Xn X
= EYk2 + E [Yi Yj ]
k=1 i6=j
Xn X
= p+ E [Yi Yj ]
k=1 i6=j
X
= np + E [Yi Yj ]
i6=j
= (?)
∗ Now each term E [Yi Yj ] for xed i, j can be computed as =

E [Yi Yj ] = 1 · P (Yi Yj = 1) + 0 · P (Yi Yj = 0)


= P ((Yi = 1) ∩ (Yj = 1))
= P (Yi = 1) P (Yj = 1) , by independence
2
= p .
2
∗ Now there are a total of n2
(Y1 + · · · + Yn ) , n of which are
terms is of form Yk2 . Thus
2
there are n − n terms of the form Yi Yj with i 6= j .

∗ Hence using (?) we have EX 2 = np + n2 − n p2 .
∗ Thus
2 2 2
 2 2
VarX = EX − (EX) = np + n − n p − (np) = np (1 − p) .

 Sumarize: EX = np and VarX = np (1 − p)h. i


k−1
 Moments: We can also prove EX k = npE (Y + 1) .

• Calculator(TI-84):
 2ndDistri>binomialpdf(n, p, x)=P (X = x).
 same with cdf.
• Example1: A company prices its hurricane insurance using the following assumptions:
 (i) In any calendar year, there can be at most one hurricane.
 (ii) In any calendar year, the probability of a hurricane is 0.05.
 (iii) The numbers of hurricanes in dierent calendar years are mutually independent. Using
the company's assumptions, calculate the probability that there are fewer than 3 hurricanes
in a 20-year period
6.1. BERNOULI AND BINOMIAL RANDOM VARIABLES 45

 Solution: We have that X ∼ bin(20, .05) then

P (X < 3) = P (X ≤ 2)
     
20 0 20 20 1 19 20 2 12
= (.05) (.95) + (.05) (.95) + (.05) (.95)
0 1 2
= .9245.
• Example2: Phan has a .6 probability of making a free throw. Suppose each free throw is inde-
pendent of the other. If he attempts 10 free throws, what is the probability that he makes at least
2 of them?
 Solution: Let X ∼ bin(10, .6) then

P (X ≥ 2) = 1 − P (X = 0) − P (X = 1)
   
10 0 10 10 1 9
= 1− (.6) (.4) − (.6) (.4)
0 1
= .998.
6.2. THE POISSON DISTRIBUTION 46

6.2. The Poisson Distribution

• We say that X = 0, 1, 2, . . . is Poisson with parameter λ>0 if

λi
pX (i) = P (X = i) = e−λ for i = 0, 1, 2, 3, . . . .
i!
 Or X ∼Poisson(λ).
• In general Poisson random variables are of the following form
 Suppose success happens λ times on average in a given period (per year, per month etc). Then
X= number of times sucess happens in that given period.
 Possion is like binomial, excpect, X is innitely countable!
• Examples that obey Poisson R.V
 1. The number of misprints on a page ogf a book
 2. # of people in community that survive to age 100
 3. # of telephone numbers that are dialed in a day.
 4. # of customers entering post oce on a day.
P∞ xn x
• Calc2: Recall that n=0 n! = e .
• Properties of Poisson: Let X ∼ P oisson(λ)
 First we check that pX (i) is indeed a pmf: First it is obvious that pX (i) ≥ 0 since λ > 0. We
to need to check that all the probabilities add up to one:
∞ ∞ i ∞
X X
−λ λ −λ
X λi
pX (i) = e =e = e−λ eλ = 1.
i=0 i=0
i! i=0
i!
 Mean: We have
∞ ∞
X λi X λi−1
EX = ie−λ = e−λ λ
i=0
i! i=1
(i − 1)!
= e−λ λeλ = λ.
 Variance: We rst have

X e−λ λi
EX 2 = i2
i=0
i!

X e−λ λi−1
= λ i
i=0
(i − 1)!

X e−λ λj
= λ (j + 1), let j = i − 1
j=0
j!
 
∞ −λ j ∞ −λ j
X e λ X e λ 
= λ j +
j=0
j! j=0
j!

λ λ + e−λ eλ
 
=
= λ (λ + 1) .
Thus

VarX = λ (λ + 1) − λ2 = λ.
6.2. THE POISSON DISTRIBUTION 47

• Example1: Suppose on average there are 5 homicides per month in Hartford, CT. What is the
probability there will be at most 1 in a certain month?
 Answer: If X is the number of homicides, we are given that EX = 5. Since the expectation
for a Poisson is λ = 5. Therefore P (X = 0) + P (X = 1) = e−5 + 5e−5 .
• Example2: Suppose on average there is one large earthquake per year in Mexico. What's the
probability that next year there will be exactly 2 large earthquakes?
−1
 Answer: λ = EX = 1, so P (X = 2) = e 2 .
• Example3: Phan receives texts on the average of two every 3 minutes. Assume Poisson.
 Question: What is the probability of ve or more texts arriving in a 9−minute period.
 Answer: Let X number of calls in a 9−minute period. Let n = number of periods, λ1 =2
Thus λ = 3 · 2 = 6. Thus

P (X ≥ 5) = 1 − P (X ≤ 4)
4
X e−6 6n
= 1−
n=0
n!
= 1 − .285 = .715.

• Important: Poisson is similar to Binomial in the following way


 FACT: Poisson approximates Bin(n, p) when n is large and p is small enough so that np is
of moderate size.

Theorem 8. If Xn is binomial with parameters n and pn and npn → λ, then

P (Xn = i) → P (Y = i)

where Y ∼ P oisson(λ).

Proof. See class textbook. 

• Summary of Theorem: This theorem says that suppose n is large and p is small, Thus
 If X ∼ Bin(n, p) then we approximate X with a possion by letting let λ = np so that

i
(np)
P (X = i) ≈ e−np .
i!
• When can we assume X is Poisson: Another consequence of this theorem says that when
Y =the number of successes in a given period. And if the number possible of trials n is large, and
if the probability p of success is small, then Y can be treated as a Poisson random variable.
• NOTE:
 (1)Why is number of misprints on a page will be approximately Poisson with λ = np
∗ Let X = number of misprints on a page of a book.
∗ Since prob of error, say p = .01 is usually small, and number of letters on a page is
usually large, say n = 1000. Then the average is λ = np.
∗ Then because p is small and n is large, then X can be approximated by a Poisson.
 (2) Let X number of accidents in a year
∗ X is Poisson because the probability of an accident p in a given periord is usually small
and while the number n of times someone drives in a given period is high.
• Example: Here is an example showing this.
6.2. THE POISSON DISTRIBUTION 48

1
 If X is number of times you get heads on a biased coin where P (H) = 100 . Suppose you you
toss 1000 times. Then np = 10
105
P (X = 5) ≈ e−10 = .0378
5!
while the actual value is
 
1000 5 995
P (X = 5) = (.01) (.99)
5
1000! 5 995
= (.01) (.99)
995!5!
= .0375.
6.3. OTHER DISCRETE DISTRIBUTIONS 49

6.3. Other Discrete Distributions

• Uniform Distribution:
 We say X is uniform, and write this as X ∼ unif orm(n), if X ∈ {1, 2, . . . , n} and

1
pX (i) = P (X = i) = for i = 1, 2, . . . , n.
n
Pn 1 1
Pn 1 n(n−1) n−1
 Exercise: EX = i=1 i n = n i=1 i= n 2 = 2 and nd VarX .

• Geometric Distribution:
 Experiment: Suppose that independent trials are held until success occurs. Trials are stopped
once success happens. Let p be the probabiliy of having a success in each trial.
 Let X = number of trials required until rst success occurs. Thus X ∈ {1, 2, 3, 4, . . . }Here
we have

i−1
pX (i) = P (X = i) = (1 − p) p fori = 1, 2, 3, 4 . . . .

 We say X ∼ geometric(p).
 Properties:
∗ We rst double check is indeed a discrete random variable: This follows from what we
know about geometric series:

∞ ∞
X X i−1 p
P (X = i) = (1 − p) p= = 1.
i=1 i=1
1 − (1 − p)

∗ Mean: P Recall that by dierentiation of the geometric series, we came up with the
∞ n−1 1
formula n=0 nx = (1−x) 2 , so that


X
EX = iP (X = i)
i=1
X∞
i−1
= i (1 − p) p
i=1
p 1
= 2
= .
(1 − (1 − p)) p

∗ Variance:(Leave as Exercise for student) Note that


X
2 i−1
EX = i2 (1 − p) p. (?)
i=1

P∞ 1
P∞
Thus we can dierentiate n=1 nxn−1 = (1−x)2 again to get n=2 n (n − 1) xn−2 =
2
.
(1−x)3
6.3. OTHER DISCRETE DISTRIBUTIONS 50

∗ From this we will attempt to get EX 2 in (?) by splitting the sum up:


X n−2 2 2
n (n − 1) (1 − p) = 3 = ,
n=2 (1 − (1 − p)) p3

X n−2 2
n (n − 1) (1 − p) p = , now split,
n=2
p2
∞ ∞
X n−2 2 X n−2
n2 (1 − p) p = + n (1 − p) p
n=2
p2 n=2
∞ ∞
−1
X n−1 2 X n−2 −1
(1 − p) n2 (1 − p) p = 2
+ n (1 − p) p + (1 − p) p
n=1
p n=2

−1 2 −1
X n−1
(1 − p) EX 2 = + (1 − p) n (1 − p) p
p2 n=1
2 −1 1
= 2
+ (1 − p)
p p
Thus

2 (1 − p) 1
EX 2 = +
p2 p
2 − 2p + p 2−p
= =
p2 p2
∗ So Thus

2 2−p 1
VarX = EX 2 − (EX) = − 2
p2 p
(1 − p)
=
p2
• Example1: An urn contains 10 white balls and 15 black balls. Balls are randomly selected, one
at a time, until a black one is obtained. If we assume that each ball selected is replaced before the
next one is drawn, what is the probability.
 Part (a): Exactly 6 draws are needed?
∗ X =number of draws needed to select a black ball, the probability of sucess is

15 15
p= = = .6.
10 + 15 25
∗ Thus
6−1
P (X = 6) = (.4) (.6) = .006144
 Part (a): What is the expected number of draws in this game?
∗ Since X ∼ geometric(.6) then

1 10
EX = = = 1.6̄
p 6
 Part (c)(Extra Problem to be done at home) Find exactly that probability at least k
draws are needed?
6.3. OTHER DISCRETE DISTRIBUTIONS 51

∗ We have that

X
P (X ≥ k) = P (X = k)
n=k
X∞
n−1
= (.4) (.6)
n=k

−1
X n
= (.6) (.4) (.4)
n=k

−1 k
X n
= (.6) (.4) (.4) (.4)
n=0
k−1 1
= (.6) (.4)
1 − .4
k−1
= (.4) .
• Note: This could have been done for a general p. Thus

k−1
P (X ≥ k) = (1 − p) .
• Negative Binomial(Need to know for Actuarial Exam):
 Experiment: Suppose that independent trials are held with probability p of having a success.
The trials are perfomed until a total of r sucesses are accumulated.
∗ Let X equal the number of trials required to obtain r succeses. Here we have
 
n−1 n−r
P (X = n) = pr (1 − p) forn = r, r + 1, . . . .
r−1
 We say X ∼ N egativeBinomial(r, p).
 Properties:
P∞
∗ This is a probability mass function. Can check that n=r P (X = n) = 1.
∗ Mean:
r
EX = .
p
∗ Variance:
r(1 − p)
Var(X) = .
p2
 Note that Geometric(p) = N egativeBinomial (1, p).
• Example: Find the expected value of the number of times one must throw a die until the outcome
1 has occured 4 times.
 Solution: X ∼ N egativeBinomial 4, 16 . So


4
EX = 1 = 24.
6

• Hypergeometric Distribution(Need to know for Actuarial Exam):


 Experiment: Suppose that a sample of size n is to be chosen randomly (without replacement)
from an urn containing N balls, of which m are white and N − m are black.
6.3. OTHER DISCRETE DISTRIBUTIONS 52

∗ Let X equal the number of white balls selected. Then


  
m N −m
i n−i
P (X = i) =   forn = 0, 1, . . . , n.
N
n
 We say X ∼ Hypergeometric(n, N, m).
 Properties:
∗ Mean:
nm
EX = .
N
∗ Variance:
 
m m n−1
Var(X) = n 1− 1− .
N N N −1
CHAPTER 7

Continuous Random Variables


7.1. Intro to continuous R.V

Definition. A random variable X is said to have a continuous distribution if there exists a non-
negative function f such that
Z b
P (a ≤ X ≤ b) = f (x)dx
a
R
for every a and b. B ⊂ R we have P (X ∈ B) = B f (x)dx.]
[Sometimes we write that for nice sets
We call f the pdf (probability density function) for X . Sometime we we the notation fX to signify
fX correponds to the pdf of X . We sometimes call fX the density of X .

• In fact, any function f satisfying the following two properties is called a density, and could be
considered a pdf of some random variable X:
(1)
Rf (x) ≥ 0 for all x

(2)
−∞
f (x)dx = 1.
• Important Note!
 (1) In this case X :S →R and the could attain uncountably many values (doesn't have to
discrete)
R∞
 (2) −∞ f (x)dx = P (−∞ < X < ∞) = 1.
Ra
 (3) P (X = a) = a f (x)dx = 0.
Ra
 (4) P (X < a) = P (X ≤ a) = F (a) = −∞
f (x)dx.
∗ Recall that F is the cdf of X .
 (5) Draw a pdf of X
∗ Note that P (a < X < b) is just the area under the curve.
• Remark: What are some random variables that are considered continuous?
 Let X be the time it takes it take for a student to nish a probability exam. X ∈ (0, ∞).
 Let X be the value of a Apple's stock price at the end of the day. Again X ∈ [0, ∞).
 Let X be the height of a college student.
 Any sort of continuous measurement can be considered a continuous random variable.
• Example1: Suppose we are given
(
c
x3 x≥1
f (x) =
0 x<1
is the pdf of X. What must the value of c be?
 Solution: We would need
Z ∞ Z ∞
1 c
1= f (x)dx = c dx = ,
−∞ 1 x3 2
53
7.1. INTRO TO CONTINUOUS R.V 54

thus c = 2.
• Example2: Suppose we are given
(
2
x3 x≥1
fX (x) =
0 x<1
is the pdf of X from Example1.
 Part (a): Find the c.d.f, FX (x).
∗ Solution: First we check thast if x<1 then
Z x Z x
Fx (x) = P (X ≤ x) = fX (y)dy = 0dy = 0.
−∞ −∞
Now when x≥1 we hav e
Z x
FX (x) = P (X ≤ x) = fX (y)dy
−∞
Z 1 Z x
2
= 0dy + dy
−∞ 1 y3
Z x
2
= 3
dy
1 y
1
= 1 − 2.
x
thus (
1 − x12 x≥1
FX (x) =
x x<1
 Part (b): Use the cdf in Part (a) to help you nd P (3 ≤ X ≤ 4).
∗ Solution: We have
P (3 ≤ X ≤ 4) = P (X ≤ 4) − P (X < 3)
= FX (4) − FX (3)
   
1 1 7
= 1− 2 − 1− 2 = .
4 3 144
• Fact: For continuous R.V we have the following useful relationship
Rx
 Since F (x) = −∞
f (y)dy then by the fundamentat theorem of calculus(Do you remenber this
form Calculus 1 or 2?)
F 0 (x) = f (x).
 This means that for continuous random variables, the derivative of the CDF is
the PDF!
• Example3: Let
(
ce−2x x≥0
f (x) =
0 x<0
Find c.
 Solution: c = 2.
7.2. EXPECTATION AND VARIANCE 55

7.2. Expectation and Variance

• Recall that if p(x) is the pmf (density) of a discrete random variable, we had

X
EX = xi p(xi ).
i=1

Definition. If X is continuous with density f (x) then


Z ∞
EX = xf (x)dx.
−∞

• Example1: Suppose X has density


(
2x 0 ≤ x ≤ 1
f (x) = .
0 otherwise

Find EX .
 Solution: We have that
Z ∞
E [X] = xf (x)dx
−∞
Z 1
= x · 2xdx
0
2
= .
3
Theorem 9. If X and Y are continuous random variable then
(a) E [X + Y ] = EX + EY .
(b) E [aX] = aEX where a ∈ R.
Proof. See textbook. It will be shown later. 
Proposition. If X is a continuous R.V. with pdf f (x), then for any real valued function g,
Z ∞
E [g(X)] = g(x)f (x)dx.
−∞

• Example2: The density of X is given by


(
1
2 if 0≤x≤2
f (x) = .
0 otherwise
 
Find E eX .
 Solution: From the previous proposition we have that g(x) = ex in this case thus
Z 2
1 1 2
EeX = ex · dx =

e −1 .
0 2 2
Lemma 10. For nonnegative random variable Y ≥0 we have
Z ∞
EY = P (Y > y) dy.
0
• Bonus:
 This proof is a good practice with interchanging order of integrals in Multivariable Calculus.
7.2. EXPECTATION AND VARIANCE 56

Proof. Recall that dxdy means Right-Left and dydx means Top-Bottom.
Z ∞ Z ∞ Z ∞
P (Y > y) dy = fY (x)dxdy
0 0 y
Z Z
= fY (x)dydx, interchange order in Calc III
D
Z ∞Z x
= fY (x)dydx draw the region to do this
0 0
Z ∞
= xfY (x)dx
0
= EX.


• Variance:
 Will be dene in the same way as we did with discrete random variable:
h i
2
Var(X) = E (X − µ)
2
Var(X) = EX 2 − (EX) .
 As before
Var (aX + b) = a2 Var(X).
• Example3: (Example 1 continued) Suppose X has density
(
2x 0 ≤ x ≤ 1
f (x) = .
0 otherwise

Find Var(X).
2
 Solution: From Example 1 we found E [X] = 3 . Now
Z 1 Z 1
E X2 = x2 · 2xdx = 2 x3 dx
 
0 0
1
= .
2
Thus
 2
1 2 1
Var(X) = − = .
2 3 18
• Example4: Suppose X has density
(
ax + b 0 ≤ x ≤ 1
f (x) = .
0 otherwise

1
 
and that E X2 =6 . Find the values of a and b.
R∞   1
 Solution: We need to use the fact that −∞ f (x)dx = 1 and E X2 = 6 . The rst one gives
us,
Z 1
a
1= (ax + b) dx = +b
0 2
7.2. EXPECTATION AND VARIANCE 57

and the second one give us


Z 1
1 a b
= x2 (ax + b) dx = + .
6 0 4 3
Solving these equations gives us

a = −2, and b = 2.
7.3. THE UNIFORM RANDOM VARIABLE 58

7.3. The uniform Random Variable

• A continuous random variable is said to be uniformly distributed on the interval [a, b] if


(
1
b−a a≤x≤b
fX (x) = .
0 otherwise

 So X can only attain values in X ∈ [a, b].


 We say X ∼ U nif orm(a, b).
 The cdf is 
0
 x<a
x−a
FX (x) = a≤x≤b.
 b−a
1 x>b

• Example1: Suppose X ∼ U nif orm(a, b) Part (a) Find the mean of X . Part (b) Find the variance
of X .
 Part (a): We compute
Z ∞ Z b
1
EX = xfX (x)dx = x dx
−∞ a b − a
 2
a2

1 b a+b
= − = .
b−a 2 2 2
∗ Which makes sense right? It should be the midpoint of the interval [a, b].
 Part(b): We compute rst the second moment
b
b3 a3
Z  
1 1
EX 2 = x2 dx = −
a b−a b−a 3 3
1 1
(b − a) a2 + ab + b2

=
3b−a
a2 + ab + b2
= .
3
Thus after some algebra
2 2
a2 + ab + b2

a+b (b − a)
VarX = − = .
3 2 12
7.4. MORE PRACTICE 59

7.4. More practice

• Suppose we are given the pd.f.


(
9e−9x x≥0
f (x) =
0 x<0
 Part (a): Set up integral to nd FX (x):
∗ We have for x > 0, that
Z x
FX (x) = 9e−9y dy = 1 − e−9x ,
0
so that (
1 − e−9x x≥0
FX (x) = /
0 x<0
 Part (b): Set up integral to nd P (1 < X < 5)
R5
∗ 1
9e−9x dx
 Part (c): Set up integral to nd P (X > 3)

9e−9x dx.
R
∗ 3
 Part (s): Set up integral to nd P (X < 2)
R2
∗ 0
9e−9x dx.
CHAPTER 8

Normal Distributions
8.1. The normal distribution

• We say that X is a normal (Gaussian) random variable, or X is normally distributed with param-
eters µ and σ 2 if the density of X is given by
1 2 2
f (x) = √ e−(x−µ) /(2σ ) .
2πσ


• We'll usually write X ∼ N µ, σ 2 .
 Turns out that in practice, many random variable overy the normal distribution
∗ Grades
∗ Height of a man or a women
• Note the following:
 If X ∼ N (0, 1) then
Z ∞
1 2
√ e−x /2 dx = 1.
−∞ 2π
R ∞ −x2 /2 R∞ 2
 To show this we use polar coordinates. Let I =
−∞
e dx = 2 0 e−x /2 dx The trick is
to write
Z ∞ Z ∞
2
/2 −y 2 /2
I 2
= 4 e−x e dxy
0 0
Z π/2 Z ∞
2 π
= 4 re−r /2
dr = 4 · = 2π,
0 0 2

Thus I= 2π as needed.

60
8.1. THE NORMAL DISTRIBUTION 61


Theorem 11. To help us compute the mean and variance of X its not too hard to show X ∼ N µ, σ 2
if and only if

X −µ
=Z where Z ∼ N (0, 1).
σ
Proof. We only show the (⇐=) direction. Note that

FX (x) = P (X ≤ x) = P (σZ + µ ≤ x)
 
x−µ
= P Z≤
σ
 
x−µ
= FY
σ

for σ > 0. Similar for σ < 0. By the chain rule

0
fX (x) = FX (x)
 
x−µ 1
= FY0
σ σ
x−µ

fZ σ
=
σ
1 1 −(x−µ)2 /(2σ2 )
= √ e .
σ 2π

• Summary of the normal


 distribution:
 If X ∼ N µ, σ 2 then X is normally distributed with

EX = µ,
Var(X) = σ2 .

 If X ∼ N (µ, σ 2 ) then X = σZ + µ where Z ∼ N (0, 1). We call Z a standard normal


random variable.
∗ A Table of probabilities for Z will be given!!!
∗ This will be called a z-score table.
• Z scores:
 Because Z ∼ N (0, 1) is so important we give it's cumulative distribution function (cdf ) a
name. The distribution FZ (x) of Z is

Z x
1 2
Φ(x) = P(Z ≤ x) = √ e−y /2
dy.
2π −∞

 NOTE: A table of Φ(x) will be given but only for values of x > 0
 Note this is symmetric[DRAW this ] thus here is an important fact: Φ(−x) = 1 − Φ(x)
8.1. THE NORMAL DISTRIBUTION 62

Theorem 12. If X ∼ N (µ, σ) then


 
a−µ b−µ
P (a < X < b) = P <Z< .
σ σ
• Example1: Find P (1 ≤ X ≤ 4) if X ∼ N (2, 25).
 Answer: Then µ = 2 and σ 2 = 25 thus X−2
5 = Z so that
 
1−2 X −2 4−2
P (1 ≤ X ≤ 4) = P ≤ ≤
5 5 5
= P (−.2 ≤ Z ≤ .4)
= P (X ≤ .4) − P (X ≤ −.2)
= Φ(.4) − Φ(−2)
= .6554 − (1 − Φ (.2))
= .6554 − (1 − .5793) .
• Example2: Suppose X is normal with mean 6. If P (X > 16) = .0228, then what is the standard
deviation of X?
X−µ
 Answer: We apply our Theorem that says σ =Z N (0, 1) and get
is
 
X −6 16 − 6
P (X > 16) = .0228 ⇐⇒ P > = .0228
σ σ
 
10
⇐⇒ P Z> = .0228
σ
 
10
⇐⇒ 1−P Z ≤ = .0228
σ
 
10
⇐⇒ 1−Φ = .0228
σ
 
10
⇐⇒ Φ = .9772.
σ
Using the standard normal table we see that Φ (2) = .9772, thus we must have that

10
2=
σ
and hence σ = 5.
• Example (Extra): Suppose X ∼ N (3, 9) nd P (|X − 3| > 6).
8.1. THE NORMAL DISTRIBUTION 63

 Answer: Get
P (|X − 3| > 6) = P (X − 3 > 6) + P (− (X − 3) > 6)
= P (X > 9) + P (X < −3)
= P (Z > 2) + P (Z < −2)
= 1 − Φ(2) + Φ(−2)
= 2 (1 − Φ(2))
≈ .0456.
• FACT: The 68 − 95 − 99.7 Rule
 About 68% of all area is contained within 1 standard deviation of the mean
 About 95% of all area is contained within 2 standard deviation of the mean
 About 99.7% of all area is contained within 3 standard deviation of the mean
 This can be explained by the following graph:
CHAPTER 9

Normal approximations to the binomial


9.1. The normal approximates Binomial

Theorem 13. If Sn is a binomial with parameter n and p, then


!
Sn − np
P a≤ p ≤b → P (a ≤ Z ≤ b)
np (1 − p)
as n→∞ where Z is a N (0, 1).
p
• Recall that if Sn ∼ Bin (n, p) then its mean is µ = np and standard deviation is σ= np(1 − p).
 So what this theorem says is that if you want to compute P (c ≤ Sn ≤ d) then using the fact
that
S − np
p n ≈Z
np (1 − p)
or
Sn − µ
≈ Z,
σ
then
 
c−µ Sn − µ d−µ
P (c ≤ Sn ≤ d) = P ≤ ≤
σ σ σ
 
c−µ d−µ
≈P ≤Z≤ .
σ σ
• Note thatSn is really discrete. In fact Sn ∈ {0, 1, 2, . . . , 100}, while the normal distribution is
continuous!
 Note that if I tried to estimate an equality : The wrong way
to do it would be:
 
Sn − µ i−µ
P (Sn = i) = P =
σ σ
 
i−µ
≈P Z= =0
σ
as we know that for continuous random variables X we always have P (X = a) = 0!
 Hence we need inequalities if we want to estimate a discrete random variable using a continuous
random variable.
1 1

∗ So we use the following convention. P(Sn = i) = P i −
2 < Sn < i + 2 .
∗ We have no problem here, because Sn can only be integers, so we'r not hurting anything
1 1
by saying  i −
2 < Sn < i + 2  as we know that Sn can only be i in that interval
anyways.
• Example: Suppose a fair coin is tossed 100 times.

64
9.1. THE NORMAL APPROXIMATES BINOMIAL 65

 (a) What is the probability there will be more than 60 heads?


 Answer: Let S100 ∼ Bin 100, 12 so that S100 represents the numbers of heads in 100 coin
tosses.
∗ The actual answer would be
100
X
P (S100 > 60) = P (S100 = i)
i=61
100    i  100−i
X 100 1 1
=
i 2 2
i=61
100     100
X 100 1
= .
i 2
i=61
∗ But this would be almost impossible to do this the long way by hand.
∗ So we will give an approximate answer using the normal distribution:
q
p
· So here take µ = np = 50 and σ= np(1 − p) = 50 12 = 5. We want more than
1
60, so approximate using 60 + 5:
 
S100 − 50 60.5 − 50
P (S100 > 60) = P (S100 ≥ 60.5) = P ≥
5 5
≈ P (Z ≥ 2.1)
≈ 1 − Φ(2.1)
= .0179
 (b) Estimate the probability of getting exactly 60 heads?

P (Sn = 60) = P (59.5 ≤ Sn ≤ 60.5)


≈ P (1.9 ≤ Z ≤ 2.1)
≈ Φ(2.1) − Φ(1.9).
CHAPTER 10

Some continuous distributions


10.1. Exponential Random Variables

• A continuous R.V. is said to be exponential with parameter λ if its pdf is


(
λe−λx if x≥0
f (x) =
0 if x<0

 We write X ∼ exponential (λ).


• Summary:
 CDF: Let a > 0. Note that the cdf is
Z a
FX (a) = P (X ≤ a) = λe−λy dy = −eλy |a0 = 1 − e−λa .
0

∗ Thus

P (X > a) = 1 − P (X ≤ a) = e−λa .
1
 Mean: EX = λ Thus λ = µ1 .
1
 Variance: We have Var(X) =
λ2 .
• How to interpret X
 X = The amount of time until some specic event occurs.
 Example:
∗ Time until earthquake occurs
∗ Length of a phone call
∗ Time until an accident happens
• Example1: Suppose that the length of a phone call in minutes is an exponential r.v with average
length 10 minutes.
 Part (a) What's probability of your phone call being more than 10 minutes?
1
∗ Answer: Here λ= 10 thus

P(X > 10) = e−( 10 )10 = e−1 ≈ .368.


1

 Part (b) Between 10 and 20 minutes?


∗ Answer: We have that
P(10 < X < 20) = F (20) − F (10) = e−1 − e−2 ≈ .233.
• Exponential distribution is Memoryless (Markov)
• Example2: Suppose the life of an iphone has exponential distribution with mean life of 4 years.
 Part(a): What is the probability the phone lasts more than 5 years?

66
10.1. EXPONENTIAL RANDOM VARIABLES 67

 Answer: Let X denote the life of an iphone (or time until it dies). Note that X ∼
exponential( 14 ) since λ= 1
µ = 1
4 . Then
1
P (X > 5) = e− 4 ·5 .
 Part(b): Given that the iphone has already lasted 3 years, what is the probability that it will
last another 5 more years?
 Answer: We compute

P ((X > 8) ∩ (X > 3))


P (X > 5 + 3 | X > 3) =
P (X > 3)
P (X > 8)
=
P (X > 3)
1
e− 4 ·8
= 1
e− 4 ·3
1
= e− 4 ·5 .
 Memoryless: Note that the probability of lasting 5 more years, is the same as if it started 5
years from anew!!
• In general the memoryless property says that if t, s > 0 then

P (X > t + s | X > t) = P (X > s) .


Theorem 14. If X is an exponential random variable, then X is memoryless.

Proof. To show this we have

P ((X > t + s) ∩ (X > t))


P (X > t + s | X > t) =
P (X > t)
P (X > t + s)
=
P (X > t)
e−λ(t+s)
=
e−λt
= e−λs
= P (X > s) ,
as needed. 
• Example3:(Exam P Q29)
 The # of days from beginning of a calendar year until accident for a BAD driver is exponentially
distributed
 An insurance company expects 30% of BAD drivers will have an accident during rst 50 days.
 Q: Whats prob that a BAD driver will have Accident during rst 80 days?
 Answer:
 Step1: Let X ∼ exp(λ) number of days until accident. We know
Z 50
.3 = P (X ≤ 50) = λe−λx dx = −e−λt |50
0 =1−e
−50λ
.
0
1
 Solve for λ and get λ = − 50 ln .7.
10.1. EXPONENTIAL RANDOM VARIABLES 68

 Step2: Then compute


Z 80
P (X ≤ 80) = λe−λx dx = −e−λt |80
0 =1−e
−80λ
0

1 − e( 50 ) ln .7 = .435.
80
=
10.2. OTHER CONTINUOUS DISTRIBUTIONS 69

10.2. Other Continuous Distributions

• Gamma Distribution:
 We say X ∼ Gamma (α, λ) has density

λe−λx (λx)α−1
(
Γ(α) x≥0
f (x) =
0 x<0
where Γ(α) is the Gamma function
Z ∞
Γ(α) = e−y y α−1 dy.
0
 If Y ∼ Gamma n 1
2
2 , 2 = χn , this is called the Chi-Squared distribution.
 The chi-sqaure distribution is used a lot in statistics.
α α
∗ Its mean is EX = λ and VarX = λ2 .
• Weibull Distribution:
 Usefull in engineering: Look in the book for its pdf.
 X =. If there is an object consisting many parts, and suppose that the object experiences
death once any of tis parts fails. X= lifetime of this object.
• Cauchy Distribution:
 We say X is cauchy with parameter −∞ < θ < ∞ if
1 1
f (x) = .
π 1 + (x − θ)2
 Importance: It does not have nite mean: That is EX = ∞.
 To see this, We compute for θ = 0
1 ∞
Z
x
EX = dx
π −∞ 1 + x2
1 ∞ 1
Z
∼ dx
π −∞ x
∼ lim ln |x| − ln lim |x|
x→∞ x→−∞
which is not dened.
10.3. THE DISTRIBUTION FUNCTION OF A RANDOM VARIABLE 70

10.3. The distribution function of a Random variable

• Fact: For continuous R.V we have the following usefull relationship


Rx
 Since F (x) = P (X ≤ x) = −∞
f (y)dy then by the fundamentat theorem of calculus we have

F 0 (x) = f (x).

• Example1: If X is continuous with distribution function FX and density function fX , nd a


fomula for the density function of the random varianle Y = 2X .
 Solution: First you start with the distribution of Y :
 Step1: First start by writing the cdf of Y and in terms of FX :

FY (x) = P (Y ≤ x)
= P (2X ≤ x)
 x
= P X≤
x 2
= FX .
2

 Step2: Then use the relation fY (y) = FY0 (y) and take a derivative of both sides to get

d h  x i
FY0 (x) = FX ,
dx 2
 x   x 0
FY0 (x) = FX0
· , by chain rule on RHS
2 2
x 1
fY (x) = fX .
2 2

• Goal: To be able to compute the cdf and pdf of Y = g(X) where g:R→R is a function given
that we know the cdf and pdf of X.
 Why is this useful?
∗ For example suppose X represent the income for a random US worker. And let Y = g (X)
be the amount of taxes a US worker pays per year. Note that taxes Y is dependent on
the random variable X. So if we only care about the random varibale Y then nding
its PDF and CDF can help us nd out everything we need to know about Y given we
can nd the PDF. Recall that any probability and expected value can be found using
the pdf.
• Example2: X ∼ U nif orm ((0, 10)) and Y = e3X . Find the
Let pdf fY of Y.
 Solution: Recall that since X ∼ U nif orm ((0, 1)) then

(
1
10 0 < x < 10
fX (x) = .
0 otherwise
10.3. THE DISTRIBUTION FUNCTION OF A RANDOM VARIABLE 71

 Step1: First start by writing the cdf of Y and in terms of FX :


FY (y) = P (Y ≤ y)
= P e3X ≤ y ,

then solve for X
= P (3X ≤ ln y)
 
1
= P X ≤ ln y
3
 
1
= FX ln y .
3
 Step2: Then use the relation fY (y) = FY0 (y) and take a derivative

fY (y) = FY0 (y)


  
d 1
= FX ln y , use chain rule
dy 3
 
0 1 1
= FX ln y
3 3y
 
1 1 0
= fX ln y , since FX = fX
3 3y
(
1 1
10 · 3y 0 < 13 ln y < 10
=
0 otherwise

 but since
1
0< ln y < 10 ⇐⇒ 0 < ln y < 30
3
⇐⇒ e0 < y < e30
⇐⇒ 1 < y < e30 .
 then (
1
30y 1 < y < e30
fY (y) = .
0 otherwise

• Example3: Let X ∼ U nif orm ((0, 1]) and Y = −lnX . Find the pdf of Y? What distribution is
it?
 Solution: Recall that
(
1 0<x<1
fX (x) = .
0 otherwise
 Step1: First start with the cdf and write it terms of FX
FY (x) = P (Y ≤ x)
= P (−lnX ≤ x)
= P (lnX > −x)
= P X > e−x


1 − P X ≤ e−x

=
1 − FX e−x .

=
10.3. THE DISTRIBUTION FUNCTION OF A RANDOM VARIABLE 72

 Step2: Then take a derivative

fY (x) = FY0 (x)


d
FX e−x

= 1−
dx
0
(e−x ) · −e−x

= −FX
= −fX (e−x ) · −e−x


= fX (e−x ) · e−x
(
1 · e−x 0 < e−x < 1
=
0 otherwise
(
e−x −∞ < −x < 0
=
0 otherwise
(
e−x 0 < x < ∞
=
0 otherwise

 Thus Y ∼ exponential (1).


− π2 , π2

• Example4: Suppose X is uniform on and Y = tan X . Find the density of Y and what
known distribution is it?
 Solution:
 Step1: Find the cdf and write in terms of FX
FY (x) = P (tan X ≤ x)
= P X ≤ tan−1 x


= FX (tan−1 x)
1 1
 Step2: Take a derivative and recall that since π π = π then
2+2
(
1
π − π2 < x < π
2
fX (x) =
0 otherwise.

Thus

fY (x) = FY0 (x)


d
= FX (tan−1 x)
dx
0
0
tan−1 x tan−1 x

= FX
0
 1
= FX tan−1 x
1 + x2
(
1 1
π 1+x2 − π2 < tan−1 x < π
2
=
0 otherwise.
(
1 1
π 1+x2 −∞ < x < ∞
=
0 otherwise.

 Thus Y is Cauchy(0).
10.3. THE DISTRIBUTION FUNCTION OF A RANDOM VARIABLE 73

• Exercise: Show that if Z ∼ N (0, 1) then Y = Z 2 is a Gamma with parameter 12 and


1
2.
• Example5:(Actuarial Exam type question) The time, T , that a manufacturing system is out of
operation has cumulative distribution function
(
2 2

1− t ,t > 2
F (t) = .
0 otherwise

The resulting cost to the company is Y = T 2. Let fY be the density function for Y. Determine
fY (y), for y > 4.
 Answer:
 Step1: Find the cdf of Y is and

P T2 ≤ y

FY (y) =

= P (T ≤ y)

= F ( y)
4
= 1−
y
for y > 4.
 Step2: Take a derivative
fY (y) = FY0 (y)
4
= .
y2
• One thing to note, is that we've been using the following useful property:

Proposition 15. Suppose g:R→R is a strictly increasing function, then the inverse g −1 exists and
−1
g(x) ≤ y implies x≤g (y) .
CHAPTER 11

Multivariate distributions
11.1. Joint distribution functions

• We discuss the collection of random variables (X1 , . . . , Xn ).


• Discrete:
 For random variables X, Y we let p (x, y) be the joint probability mass(discrete density)
function

p(x, y) = P (X = x, Y = y) .

∗ Properties of joint pmf:


· 1) 0 ≤P
P p≤1
· 2) i j p(xi , yj ) = 1
 We also have the multivariate cdf:(??) dened by

FX,Y (x, y) = P (X ≤ x, Y ≤ y) .

• Example1: Experiment: Suppose you roll two 3-sided die.


 Let X be the largest value obtained on any of the two dice . Let Y be the sum of the two
dice. Find the joint pmf of X and Y .
 Solution: First need to nd the values of X = 1, 2, 3 and Y = 2, 3, 4, 5, 6.
 The table for possible outcomes and their associated values (X, Y ):
outcome 1 2 3

1 (X = 1, Y = 2) = (1, 2) (2, 3) (3, 4)



2 (2, 3) (2, 4) (3, 5)
3 (3, 4) (3, 5) (3, 6)
 Using this table we have that the p.m.f. is given by:
X\Y 2 3 4 5 6

1 P (X = 1, Y = 2) = 19 0 0 0 0
 2 1
2 0 9 9 0 0
3 0 0 29 29 19
 Question: Find P (X = 2 | Y = 4)?
∗ Answer: P (X = 2 | Y = 4) = 1/9 1
3/9 = 3 .
• Continuous
 For random variables X, Y we let f (x, y) be the joint probability density function, if
Z b Z d
P (a ≤ X ≤ b, c ≤ Y ≤ d) = f (x, y)dydx.
a c
74
11.1. JOINT DISTRIBUTION FUNCTIONS 75

which is equivalent to saying that for any set D ⊂ R2 then


Z Z
P ((X, Y ) ∈ D) = f (x, y)dA.
D

∗ Properties:
· 1)
Rf (x, y) ≥ 0
∞ R∞
· 2)
−∞ −∞
f (x, y)dxdy = 1.
 We also have the multivariate cdf:(??) dened by

FX,Y (x, y) = P (X ≤ x, Y ≤ y) .
Ra Rb
∗ Note that FX,Y (a, b) = −∞ −∞ f (x, y)dydx.
 Thus note that
∂ 2 F (x, y)
f (x, y) = .
∂x∂y
 Marginal Density: If fX,Y is the joint density of X, Y . We recover the marginal densities
of X, Y respectively by the following
Z ∞
fX (x) = fX,Y (x, y)dy,
−∞
Z ∞
fY (y) = fX,Y (x, y)dx.
−∞

• Example2: Let X, Y have joint pdf


(
ce−x e−2y , 0 < x < ∞, 0 < y < ∞
f (x, y) = .
0 otherwise

 Part(a): Find c that makes this a joint pdf:


∗ Sol: Step1: Draw region of Domain rst!!!

·
∗ Thus
Z ∞ Z ∞ ∞ Z
x=∞
−x −2y
e−2y −e−x x=0 dy

1 = ce e
dxdy = c
0 0
Z ∞  ∞0
1 1
= c e−2y dy = c − e−2y =c .
0 2 0 2
Then c = 2.
 Part(b): Find P (X < Y ).
∗ Sol: Need to draw the region (Recall Calc III!!) Let D = {(x, y) | 0 < x < y, 0 < y < ∞}
11.1. JOINT DISTRIBUTION FUNCTIONS 76

·
· There are two ways to set up this integral:
· Method1: To set up dA = dydx. We use the Top-Bottom Method:
· Where the region is bounded by

Top Function:y =∞
Bottom Functiony =x
Range of Values0 ≤x≤∞
· Hence we use this information to set up
Z Z
P (X < Y ) = f (x, y)dA
D
Z ∞Z ∞
= 2e−x e−2y dydx
0 x
Z ∞
1  −2y y=∞
= 2e−x
−e y=x
dx
2
Z0 ∞ Z ∞
= e−x e−2x dx = e−3x x
0 0
1
= .
3
· Method2: To set up dA = dxdy . We use the Right-Left Method:
· Where the region is bounded by

Right Function:x =y
Left Functionx =0
Range of Values0 ≤y≤∞
· Hence we use this information to set up
Z Z
P (X < Y ) = f (x, y)dA
D
Z ∞Z y
= 2e−x e−2y dxdy
0 0
= do some work
1
= ,
3
which matches the answer from before.
 Part(c): Set up P (X > 1, Y < 1)
11.1. JOINT DISTRIBUTION FUNCTIONS 77

∗ The region is given by


∗ Setting this up we have
Z 1 Z ∞
P (X > 1, Y < 1) = 2e−x e−2y dxdy.
0 1
 Part(d): Find the marginal fX (x):
∗ Sol:
∗ Then
Z ∞ Z ∞
fX (x) = f (x, y)dy = 2e−x e−2y dy
0 0
 ∞  
−x −1 −2y −x 1
= 2e e = 2e 0+
2 0 2
= e−x .
 Part(e): Find EX We have
Z ∞
EX = xe−x dx = 1
0
11.2. INDEPENDENT RANDOM VARIABLES 78

11.2. Independent Random Variables

• Discrete: We say discrete r.v. X, Y are independent if

P (X = x, Y = y) = P (X = x) P (Y = y) ,
for every x, y in the range of Y. X and
 This is the same as saying that X, Y ar independent if the joint pmf splits into the marginal
pmfs: pX,Y (x, y) = pX (x) · pY (y)
• Continuous: We say continuous r.v. X, Y are independent if

P (X ∈ A, Y ∈ B) = P (X ∈ A) P (Y ∈ B)
for any set A, B
 This equivalent: P (X ≤ a, Y ≤ b) = P (X ≤ a) P (Y ≤ b).
 Equivalent to FX,Y (x, y) = FX (x)FY (y).
• Random variables that are not independent, are said to be dependent.
• How can we check independence?

Theorem 16. Continuous (discrete) r.v. X, Y are independent if and only if their joint pdf (pmf ) can
be expressed as

fX,Y (x, y) = fX (x)fY (y). (Continuous Case),

pX,Y (x, y) = pX (x)pY (y) (Discrete Case).

Proof. See textbook. 


• Example1: Let X, Y be r.v. with joint pdf

f (x, y) = 6e−2x e−3y 0 < x < ∞, 0 < y < ∞.


Are X, Y independent?
 Solution: Find the marginals fX and fY and see if f = fX fY . First
Z ∞
fX (x) = 6e−2x e−3y dy = 2e−2x ,
Z0 ∞
fY (y) = 6e−2x e−3y dx = 3e−2y .
0
which are both exponential. Since f = fX fy then yes they are independent!
• Example2: Let X, Y have

fX,Y (x, y) = x + y, 0 < x < 1, 0 < y < 1


Are X, Y independent?
 Solution: Note that there is no way to factor x + y = fX (x)fY (y), hence they can't be
independent.
• Example3: Let X, Y have

fX,Y (x, y) = 2, 0 < x < y < 1


• Are X, Y independent?
 Solution:
∗ We cannot use the previous argument to claim fX,Y can't split, because for example,
maybe hypothetically speaking 2=1·1 , so hypothetically it could split.
∗ So we must nd the marginal pdfs and then check if fX,Y = fX · fY .
11.2. INDEPENDENT RANDOM VARIABLES 79

∗ Important! But whenever the domain of f is not a rectangle, you MUST draw
the region of domain for fX,Y . And here the region is D = {(x, y) | 0 < x < y < 1}.
(Please try drawing this region on your own. If you struggle with this region, go to
https://www.wolframalpha.com/ and type in0 < x < y < 1)
R1
∗ fX (x) = x 2dy = 2 (1 − x) for 0 < x < 1
Note that
Ry

Then fY (y) =
0
2dx = 2y for 0 < y < 1 .

But fX,Y (x, y) = 2 6= fX (x)fY (y) = 2(1 − x)2y !! Therefore X, Y are NOT independent.
• Example4: Suppose X, Y are independent uniformly distributed over (0, 1). Find P (Y < X).
 Solution: Since X, Y are independent then using the Theorem form this section we have
fX,Y (x, y) = fX (x)fY (y) = 1 · 1,
for 0 ≤ x ≤ 1 and 0 ≤ y ≤ 1. Draw region ('What do you think probability will be by
looking at the region?)


∗ and get
Z 1 Z x
P (Y < X) = f (x, y)dydx
0 0
Z 1 Z x Z 1
= 1dydx = xdx
0 0 0
1
= .
2
11.3. SUMS OF INDEPENDENT RANDOM VARIABLES(?) 80

11.3. Sums of independent Random Variables(?)

• Fact: If X, Y are independent, its not too hard to show that the cdf of Z =X +Y is

FX+Y (a) = P (X + Y ≤ a)
Z Z
= fX (x)fY (y)dxdy
{x+y≤a}
Z ∞ Z a−y
= fX (x)fY (y)dxdy
−∞ −∞
Z ∞ Z a−y
= fX (x)dxfY (y)dy
−∞ −∞
Z ∞
= FX (a − y) fY (y)dy.
−∞

 By dierentiating we have that


Z ∞
fX+Y (a) = fX (a − y) fY (y)dy.
−∞

• (?)Here are some interesting cases: 


• Fact 1(Only thing I'll test you on): If Xi ∼ N µi , σi2 for 1 ≤ i ≤ n and are all independent
2 2
then Y = X1 + · · · + Xn ∼ N µ1 + · · · µn , σ1 + · · · + σn .
2
 2 2

 In particular if X ∼ N µx , σx and Y ∼ N (µy , σy ) are independent then X+Y ∼ N µx + µy , σx + σy2
2 2

and X − Y ∼ N µx − µy , σx + σy .

 In general aX ± bY ∼ N aµx ± bµy , a2 σx 2
+ b2 σy2 .
• Example1: Suppose T ∼ N (95, 25) and H ∼ N (65, 36) represents the grades of Tyler and Habib.
Asssume their grades are independent.
 Part(a): What is the probability that their average grades will be less than 90?
 Solution: T + H ∼ N (160, 61). Thus
 
T +H
P ≤ 90 = P (T + H ≤ 180)
2
   
180 − 160 180 − 160
= P Z≤ √ =Φ √
61 61
= Φ (2.56) = .9961
 Part (b): What is the probability that Habib will have scored higher than Tyler?
 Solution: Using H − T ∼ N (−30, 61) we compute

P (H > T ) = P (H − T > 0)
= 1 − P (H − T < 0)
 
0 − (−30)
= 1−P Z ≤ √
61
= 1 − Φ(3.84) = 1 − 1 = 0.
• Other facts.
• Fact 2: Let Z ∼ N (0, 1) then Z 2 ∼ χ21 .
 If Z1 , . . . , Zn are indepedent N (0, 1) then Y = Z12 + · · · Zn2 ∼ χ2n .
11.3. SUMS OF INDEPENDENT RANDOM VARIABLES(?) 81

• Fact 3: If X ∼ P oisson(λ) and Y ∼ P oisson(µ) , and they are independent, then X+Y ∼
P oisson(λ + µ).
• List out stu and then stop.
11.4. CONDITIONAL DISTRIBUTIONS- DISCRETE(?) 82

11.4. Conditional Distributions- Discrete(?)

• The conditional pmf for a discrete R.V. is


pX|Y (x | y) = P (X = x | Y = y)
p(x, y)
= .
pY (y)
• We also have the condition cdf: FX|Y (x | y) = P (X ≤ x | Y = u)
• Fact:
 If X, Y are indepedent then

pX|Y (x | y) = pX (x)
• Example1: Suppose the joint pmf of (X, Y ) is
x\y 0 1
 0 .4 .2
1 .1 .3
 Compute some conditional pmf: Then the second column is
.2 2 .3 3
pX|Y (0 | 1) = = and pX|Y (1 | 1) = = .
.5 5 .5 5
 Are they independent? Note that pX (0) = .4 + .2 = .6 6= pX|Y (0 | 1), so no!
11.5. CONDITIONAL DISTRIBUTIONS- CONTINUOUS(?) 83

11.5. Conditional Distributions- Continuous(?)

• Def: If X, Y are continuous with joint pdf f (x, y) then the conditional pdf of X given Y = y
is dened as
f (x, y)
fX|Y (x | y) = .
fY (y)
 dened only when fY (y) > 0.
• Def: The conditional cdf of X given Y =y is

FX|Y (a | y) = P (X ≤ a | Y = y)
Z a
= fX|Y (x | y) dx.
−∞
• Fact: If X, Y are indepedent then

fX|Y (x | y) = fX (x).
• Example1: The joint pdf ofX, Y is given by
(
12
x (2 − x − y) 0 < x < 1, 0 < y < 1
f x(x, y) = 5 .
0 otherwise

Commpute the conditional pdf of X givne that Y =y where 0 < y < 1.


 Solution: We have
f (x, y) x (2 − x − y)
fX|Y (x | y) = = R1
fY (y) x (2 − x − y) dy
0
x (2 − x − y)
= 2 y .
3 − 2
11.6. JOINT PDF OF FUNCTIONS 84

11.6. Joint PDF of functions

• Goal:
 Recall that from section 5.7 we can nd the pdf of a new random variableY = g (X).
 Suppose we know the distributions of X1 , X2 then what is the distribution of g1 (X1 , X2 ) and
g2 (X1 , Y1 )
∗ For example if we know X1 , X2 what is the distribution of Y1 = X1 + X2 and Y2 =
X12 − eX1 X2 .
• Steps to nding the joint cdf of new R.V. made from old ones.:
 Suppose X1 , X2 are jointly distributed with pdf fX1 ,X2 . Let g2 (x1 , x2 ) , g2 (x2 , x2 ) be multi-
variable functions.
 Goal: Find the joint pdf of Y1 = g1 (X1 , X2 ) and Y2 = g1 (X2 , X2 )
 Step1: Find the Jacobian:
∂g1 ∂g1

∇g1
∂x1 ∂x2
∂g1 ∂g2 ∂g1 ∂g2
J (x1 , x2 ) = = = − 6= 0.

∂g2 ∂g2
∇g2 ∂x1 ∂x2
∂x1 ∂x2 ∂x2 ∂x1

at all points (x1 , x2 )


 Step2: Find the unique solutions of equationf y1 = g1 (x1 , x2 ) and y2 = g2 (x1 , x2 ) in terms
of

x1 = h1 (y1 , y2 ) ,
x2 = h2 (y1 , y2 ) .
 Step3: The joint pdf of Y1 , Y2 is

−1
fY1 ,Y2 (y1 , y2 ) = fX1 ,X2 (x1 , x2 ) |J (x1 , x2 )|
−1
= fX1 ,X2 (h1 (y1 , y2 ) , h2 (y1 , y2 )) |J (x1 , x2 )| .
• Example1: Suppose X1 , X2 have joint distribution
(
2x1 x2 0 ≤ x1 , x2 ≤ 1
fX1 ,X2 (x1 , x2 ) = .
0 otherwise

Question:Find the joint pdf of Y1 = X1 + X2 and Y2 = X1 − X2 .


 Step1: Find the Jacobian: Note that

y1 = g1 (x1 , x2 ) = x1 + x2 ,
y2 = g2 (x1 , x2 ) = x1 − x2 .
So

1 1
J (x1 , x2 ) = = −2.
1 −1
 Step2: Solve for x1 , x2 and get

1
x1 = (y1 + y2 ) ,
2
1
x2 = (y1 − y2 ) .
2
11.6. JOINT PDF OF FUNCTIONS 85

 Step3: The joint pdf of Y1 , Y2 is given by the formula:


−1
fY1 ,Y2 (y1 , y2 ) = fX1 ,X2 (x1 , x2 ) |J (x1 , x2 )|
 
1 1 1
= fX1 ,X2 (y1 + y2 ) , (y1 − y2 )
2 2 |−2|

1 1
 2 (y1 + y2 ) (y1 − y2 ) 0 ≤ 2 (y1 + y2 ) ≤ 1,

= 0 ≤ 21 (y1 − y2 ) ≤ 1

0 otherwise

• Example2: Suppose X1 ∼ N (0, 1) and X2 ∼ N (0, 4) and independent.


 Let Y1 = 2X1 + X2 and Y2 = X1 − 3X2 .
 Question: Find the joint pdf fY1 ,Y2 (y1 , y2 ) of Y1 and Y2 .
 Step1: Find the Jacobian: Note that

y1 = g1 (x1 , x2 ) = 2x1 + x2 ,
y2 = g2 (x1 , x2 ) = x1 − 3x2 .
So
2 1
J (x1 , x2 ) = = −7.
1 −3
 Step2: Solve for x1 , x2 and get

3 1
x1 = y1 + y2
7 y
1 2
x2 = y1 − y2
7 7
 Step3: The joint pdf of Y1 , Y2 is given by the formula:
−1
fY1 ,Y2 (y1 , y2 ) = fX1 ,X2 (x1 , x2 ) |J (x1 , x2 )|
 
3 1 1 2 1
= fX1 ,X2 y1 + y2 , y1 − y2 .
7 y 7 7 7
So we need to nd the joint pdf of X1 and X2 .
∗ But since X1 ∼ N (0, 1) and X2 ∼ N (0, 4) and indepedent Then
1 2 1 2
fX1 (x1 ) = √ e−x /2 and fX2 (x2 ) = √ e−x /(2·4) .
2π 2 · 4π
Thus by inpedence

fX1 ,X2 (x1 , x2 ) = fX1 (x1 )fX2 (x2 )


1 2 1 2
= √ e−x /2 √ e−x /(2·4) .
2π 2 · 4π
∗ Thus we have
1 2 1 2 1
fY1 ,Y2 (y1 , y2 ) = √ e−( 7 y1 + y y2 ) /2 √ e−( 7 y1 − 7 y2 ) /8 .
3 1 1 2

2π 8π 7
• Example3(if time): Suppose X1 , X2 have joint distribution
(
2
x1 + 32 (x2 ) 0 ≤ x1 ≤ 1, 0 ≤ x2 ≤ 1
fX1 ,X2 (x1 , x2 ) = .
0 otherwise
11.6. JOINT PDF OF FUNCTIONS 86

Question:Find the joint pdf of Y1 = X1 + X2 and Y2 = X12 .


 Step1: Find the Jacobian: Note that
y1 = g1 (x1 , x2 ) = x1 + x2 ,
y2 = g2 (x1 , x2 ) = x21 .
So
1 2
J (x1 , x2 ) = = −4x2
2x1 0
 Step2: Solve for x1 , x2 and get

x1 = y2 ,

x2 = y1 − y2 .
 Step3: The joint pdf of Y1 , Y2 is given by the formula:
−1
fY1 ,Y2 (y1 , y2 ) = fX1 ,X2 (x1 , x2 ) |J (x1 , x2 )|
√ √ 1
= fX1 ,X2 ( y2 , y1 − y2 )
|4x2 |
 h√ √ 2 i √
1 3

 |4x2 |
 y 2 + 2 y 1 − y2 0 ≤ y2 ≤ 1,
= √
 0 ≤ y1 − y2 ≤ 1

0 otherwise
CHAPTER 12

Expectations
12.1. Expectation of Sums of R.V.

Theorem 17. Let g : R2 → R. If X, Y have joint pmf p(x, y) then


XX
E [g (X, Y )] = g(x, y)p(x, y).
y x

If X, Y have joint pdf f (x, y) then


Z ∞ Z ∞
E [g(X, Y )] = g(x, y)f (x, y)dxdy.
−∞ −∞

XY 0 2
• Example1: Suppose the joint p.m.f of X and Y is given by 0 .2 .7 . Find E [XY ].
1 0 .1
 Solution: Using the formula we have with the function g(x, y) = xy :
X
E [XY ] = xi yj p(xi , yj )
i,j
= 0 · 0p(0, 0) + 1 · 0p(1, 0) + 0 · 2p(0, 2) + 1 · 2p(1, 2)
= 0 · 0 · .2 + 1 · 0 · 0 + 0 · 2 · .7 + 1 · 2 · .1
= .2
• Example2: Suppose X, Y are independent exponential r.v. with parameter λ = 1. Set up a double
integral that represents
E X 2Y .
 

 Solution: Since X, Y are independent then

fX,Y (x, y) = e−1x e−1y = e−(x+y) . 0 < x, y < ∞.


 Then DRAW FIRST then
Z ∞ Z ∞
E X 2Y x2 ye−(x+y) dydx.
 
=
0 0
• Example3: Suppose the joint pdf of X, Y is
(
10xy 2 0 < x < y, 0 < y < 1
f (x, y) = .
0 otherwise

Find EXY and Var (Y ).


 Solution:

87
12.1. EXPECTATION OF SUMS OF R.V. 88

 We First DRAW and then set up


Z 1 Z y Z 1 Z y
xy 10xy 2 dxdy = 10 x2 y 3 dxdy

EXY =
0 0 0 0
Z 1
10 10 1 10
= y 3 y 3 dy = = .
3 0 3 7 21
2
 First note that Var (Y ) = EY 2 − (EY ) .
 Then
Z 1 Z y Z 1 Z y
2 2 2
y 4 xdxdy

EY = y 10xy dxdy = 10
0 0 0 0
Z 1
5
= 5 y 4 y 2 dy = .
0 7
and
Z 1 Z y Z 1 Z y
2
y 3 xdxdy

EY = y 10xy dxdy = 10
0 0 0 0
Z 1
5
= 5 y 3 y 2 dy = .
0 6
5 5 2 5

So that Var (Y )= 7 − 6 = 252 .

Theorem 18. (Properties of Expectation)


(a) E [X + Y ] = EX + EY
(b) If X ≤ Y then EX ≤ EY .

Proof. Part (a) was proved for the discrete case. So we only need to show the continuous case:
Z Z
E [X + Y ] = (x + y) f (x, y)dydx
Z Z Z Z
= xf (x, y)dydx + yf (x, y)dydx
Z Z
= xfX (x)dx + yfY (y)dy
= EX + EY.

• Example4: Let X1 , . . . , Xn be independent and identically distributed random (i.i.d.) ran-


dom variables. Suppose EXi = µ. We call the quantity
n
X Xi
X̄ =
i=1
n
 
the sample mean. Compute E X̄ .
12.1. EXPECTATION OF SUMS OF R.V. 89

 Solution: We use the properties of expectation


" n
#
  X Xi
E X̄ = E
i=1
n
1
= E [X1 + · · · + Xn ]
n
1
= (E [X1 ] + · · · + E [Xn ])
n
1 nµ
= (µ + · · · + µ) =
n n
= µ.
 In statistics, the sample mean is used to estimate the actual mean of a distribution.

Theorem 19. If X, Y are indepedent then

E [XY ] = (EX) (EY ) .


Proof. In the continuosuc case we have
Z Z
E [XY ] = xyfX,Y (x, y)dydx
Z Z
= xyfX (x)fY (y)dydx
Z  Z 
= xfX (x)dx yfY (y)dy

= (EX) (EY ) .
The discrete case is the same, except replace integrals with summations. 
• In general, the following is true:

Theorem 20. If X, Y are indepedent and g, h : R → R then

E [g (X) h (Y )] = E [g (X)] E [h (Y )] .
12.2. COVARIANCE AND CORRELATIONS. 90

12.2. Covariance and Correlations.

• Note that EX and VarX give information about a single random variable.
• What statistic can give us information about how X eects Y, or vice versa?

Definition. The covariance between X and Y, is dened by

Cov (X, Y ) = E [(X − µX ) (Y − µY )] .


• After some algebra one can show that

Cov (X, Y ) = E [XY ] − EXEY.


• The covariance between two random variables give us information about relationship between the
random variables.
 Covariance is a measure of how much two random variables change together.
 If the greater values of one variable mainly correspond with the greater values of the other
variable, and the same holds for the lesser values, i.e., the variables tend to show similar
behavior, the covariance is positive.
∗ Thus convariance measures if there is a linear relationship betwen X and Y.
 The sign of the covariance therefore shows the tendency in the linear relationship between
the variables.
 For example, the following plots shows a positive linear relationship between X and Y:

 In this case Cov (X, Y ) > 0.


• Note: X, Y are independent then Cov (X, Y ) = 0. (This is not
If true in the other direction.
Meaning Cov (X, Y ) = 0 does not imply that X, Y are independent!)
 So Cov (X, Y ) = 0 means they are uncorrelated.
• Properties:
 (i) Cov (X, Y ) = Cov (Y, X)
 (ii) Cov (X, X) = Var (X)
 (iii) Cov 
(aX, Y ) = aCov (X,PY)
P P P
 (iv) Cov i Xi , j Yj = i j Cov (Xi , Yj ).

Theorem 21. (?) Formula for Sum of Variation:

Var (X + Y ) = Var (X) + Var (Y ) + 2Cov (X, Y ) .


12.2. COVARIANCE AND CORRELATIONS. 91

Gives us a formula for variation of X1 , . . . , Xn :


n
! n
X X XX
Var Xi = Var (Xi ) +2 Cov (Xi , Xj ) .
i=1 i=1 i<j

• Fact: Note that if X, Y are independent then

Var (X + Y ) = Var(X) + Var (Y ) .

• Finally we have the following: Its standardized way to know how correlated two random variables
are:

Definition. The correlation coecient of two random variables X and Y, denoted by ρ(X, Y ) is
dened by

Cov (X, Y )
ρ (X, Y ) = p .
Var (X) Var (Y )

• Fact:
 (1) −1 ≤ ρ(X, Y ) ≤ 1
σ
 (2) If ρ(X, Y ) = 1 then Y = a + bX where b = σxy > 0 (Straight positive sloped line)
σy
 (3) If ρ(X, Y ) = −1 then Y = a + bX where b = −
σx < 0 (Straight negatively sloped line)
 (4) This ρ is a measure of linearity between Y and X .
∗ ρ > 0 positive linearity: Meaning that if you were to draw a line of best t, then it
must be a positive sloped line
· The closer ρ gets to 1, the more (X, Y ) seems to be in a positive sloped straight
line
∗ ρ < 0 negative linearity: Meaning that if you were to draw a line of best t, then it must
be a negative sloped line
· The closer ρ gets to −1, the more (X, Y ) seems to be in a negative sloped straight
line
 (5) If ρ (X, Y ) = 0, then X and Y are uncorrelated.
• Warning:
 ρ (X, Y ) does not pick up any other relationship, such as quadratic, or cubic
 ρ(X, Y ) is not the slope of the line of best t. It is simply tell us if it's positive, or negative
relationship, and the strength of relationship.
• Example1:Suppose X, Y are random variables whose joint pdf is given by

(
1
y 0 < y < 1, 0 < x < y
f (x, y) = .
0 otherwise

 Part (a): Find the covariance of X and Y.


 Part (b) Compute Var(X) and Var(Y ).
 Part (c) Calculate ρ(X, Y ).
 Solution:
 Part (a): Find the covariance of X and Y.
12.2. COVARIANCE AND CORRELATIONS. 92

 Recall that Cov (X, Y )= EXY − EXEY . So


Z 1Z y Z 1 2
1 y 1
EXY = xy dxdy = dy =
0 0 y 0 2 6
Z 1Z y Z 1
1 y 1
EX = x dxdy = dy = .
0 0 y 0 2 4
Z 1Z y Z 1
1 1
EY = y dxdy = ydy = .
0 0 y 0 2
Thus

Cov (X, Y ) = EXY − EXEY


1 11
= −
6 42
1
= .
24
 Part (b): Compute Var(X) and Var(Y ).
 We have that
1 y 1
y2
Z Z Z
1 1
EX 2 = x2 dxdy = dy = .
0 0 y 0 3 9
Z 1 Z y Z 1
1 1
EY 2 = y 2 dxdy = y 2 dy = .
0 0 y 0 3
 Thus recall that
2
Var (X) = EX 2 − (EX)
 2
1 1 7
= − =
9 4 144
Also
2
Var (Y ) = EY 2 − (EY )
 2
1 1 1
= − = .
3 2 12
 Part (c): Calculate ρ(X, Y ).
 We now use
Cov (X, Y )
ρ (X, Y ) = p
Var (X) Var (Y )
1
24
= q
7
 1
 ≈ .6547.
144 12
CHAPTER 13

Moment generating functions


13.1. Moment Generating Functions

• For each random variable X, we can dene its moment generating function mX (t) by

E etX
 
mX (t) =
(P
etxi p(xi ) , if X is discrete
= R ∞xi tx .
−∞
e f (x)ds ,if X is continuous

• mX (t) is called the moment generating function (m.g.f.) because we can nd all the moments of
X by dierentiating m(t) and then evaluating at t = 0.
• Note that
d  tX 
m0 (t) = E e
dt 
d tX
= E e
dt
 tX 
= E Xe .
Now evaluate at t=0 and get

m0 (0) = E Xe0·X = E [X] .


 

• Similarly,

d  tX 
m00 (t) = E Xe
dt
E X 2 etX

=
so that

m00 (0) = E X 2 e0 = E X 2 .
   

Theorem 22. For all n≥0 we have

E [X n ] = m(n) (0) .

• Examples of Moment generating Functions


• Bernoulli: Recall that p(1) = p and p(0) = 1 − p. Thus

mX (t) = EetX = et·0 p(0) + et·1 p(1)


= pet + (1 − p).
93
13.1. MOMENT GENERATING FUNCTIONS 94

Pn
• Binomial: Recall that X ∼ Bin(n, p) if X= i=0 Yi where Yi ∼ Bern(p) thus
Pn
tX tX i=0 Yi
mX (t) = Ee = Ee
= E etY1 · · · etYn
  

= E etY1 · · · E etYn ,
   
by independence
n
= pet + (1 − p)
• Poisson: If X ∼ P oisson(λ) then

X λn
mX (t) = EetX = etn e−λ
n=0
n!

X λn
= e−λ etn
n=0
n!
∞ n
X (et λ)
= e−λ
n=0
n!
xn
P∞
now recall from Calculus 2 that ex = n=0 n! so that

X xn
mX (t) = e−λ , with x = et λ
n=0
n!
−λ et λ
=e e
et λ−λ
=e
= exp λ et − 1


• Exponential: If X ∼ exp(λ) then

mX (t) = EetX
Z ∞
= etx λe−λx dx
0
λ
= ,
λ−t
which is valied whenever t > λ.
• Standard Normal: If X ∼ N (0, 1) then
Z ∞
1 2
mX (t) = Ee tX
=√ etx e−x /2
2π −∞
2
= et /2
.
2
• Normal: If X ∼ N (µ, σ ) then X = µ + σZ so that

tX
mX (t) = Ee
= Eetµ etσZ = etµ Ee(tσ)Z
2
= etµ mX (tσ) = etµ e(tσ) /2

t2 σ 2
 
= exp tµ + .
2
13.1. MOMENT GENERATING FUNCTIONS 95

• Property: Suppose X, Y are indepedent then what is that m.g.f. of X +Y?


 Let's try to gure out:

Eet(X+Y ) = E etX etY



mX+Y (t) =
E etX E etY , by independence
 
=
= mX (t)mY (t).
 Thus we know that
mX+Y (t) = mX (t)mY (t).
• Note: Also note that of fX (x) is the pdf of a r.v. then it's m.g.f is
Z
EetX = etx fX (x)dx.

e−sx fX (x)dx].
R
This is similar to the laplace transform of fX (x). [L [f ] (s) =
 Recall that there is one to one correspondence of laplace transforms. That completely deter-
mines a function.

Theorem 23. If mX (t) = mY (t) < ∞ for all t in an interval, then X and Y have the same distribution.
That is, m.g.f 's completely determines the distribution.
t
−1)
• Example1: Suppose that m.g.f of X is given by m(t) = e3(e . Find P (X = 0).
 Solution: (We want to work backwords). Match this m.g.f to a known m.g.f in our table.
Looks like
t t
−1) −1)
m(t) = e3(e = eλ(e where λ = 3.
Thus X ∼ P oisson(3). Thus
λ0
P (X = 0) = e−λ = e−3 .
0!
• Summary:
tX
(1) m(t) = Ee . We have a table of mgf of distributions:
(2) The m.g.f helps us nd moments: E [X n ] = m(n) (0)
(3) If X, Y are independent then mX+Y (t) = mX (t)mY (t).
(4) The m.g.f. helps us determine the distribution of random variables. If mX (t) = mY (t) then
X and Y have the same distribution.
• Recall we had a section on sums ofindependent random variables.
• Example2: Recall X ∼ N µx , σx2 and Y ∼ N (µy , σy2 ), indepedent. Then what is

X + Y ∼ N (?, ?)
 Sol: Note that

mX+Y (t) = mX (t)mY (t)


!
t2 σx2 t2 σy2
 
= exp tµx + exp tµy +
2 2
 !
t2 σx2 + σy2
= exp t (µx + µy ) + .
2
So then you look at our table and check which distribution has this mg.f. with
 µ = µx + µy
and σ 2 = σx2 + σy2 . so that X + Y ∼ N µx + µy , σx2 + σy2
13.1. MOMENT GENERATING FUNCTIONS 96

• Example3: Suppose X ∼ bin(n, p) and Y ∼ bin(m, p) , independent, then what is the distribution
of X + Y ?
 Solution: We use
mX+Y (t) = mX (t)mY (t)
n m
= pet + (1 − p) pet + (1 − p)
n+m
= pet + (1 − p) .
Look at the table and see what distribution has this m.g.f. Thus

X + Y ∼ bin(n + m, p).
• Example4: Suppose X is a discrete random variable and has the m.g.f.
1 2t 3 3t 2 5t 1 8t
mX (t) = e + e + e + e .
7 7 7 7
Question: What is the p.m.f of X ? Find EX .
 Solution(a): This doesn't match any of the known mg.f.s. Thus we can read o from the
mgf that since
4
1 2t 3 3t 2 5t 1 8t X
e + e + e + e = etxi p(xi )
7 7 7 7 i=1
1 3 2 1
then p(2) = , p(3) = , p(5) = p(8) =
7 7 7 and 7.
 Solution(b): First
2 2t 9 3t 10 5t 8 8t
m0 (t) = e + e + e + e ,
7 7 7 7
so that
2 9 10 8 29
E [X] = m0 (0) = + + + = .
7 7 7 7 7
• Example5: Suppose X has m.g.f

− 21 1
mX (t) = (1 − 2t) for t< .
2
Find the rst and second moments of X.
 Solution: We have
1 −3 −3
m0X (t) = − (1 − 2t) 2 (−2) = (1 − 2t) 2 ,
2
3 −5 −5
m00X (t) = − (1 − 2t) 2 (−2) = 3 (1 − 2t) 2 .
2
So that
− 32
EX = m0X (0) = (1 − 2 · 0) = 1,
− 25
EX 2
= m00X (0) = 3 (1 − 2 · 0) = 3.
CHAPTER 14

Limit Laws
14.1. The Central Limit Theorem

• The CLT is one of the most remarkable theorems in Probability.


 It helps us understand why the emperical frequencies of so many natural populations exihibit
bell-shaped (normal) curves.
• Recall that i.i.d. means independent and identically distributed random variables.

Theorem 24. (CLT) Let X1 , X2 , X3 . . . be i.i.d. each with mean µ and variance σ2 . Then the distri-
bution of

X1 + · · · + Xn − nµ

σ n
tends to the standard normal Z as n → ∞. That is,
 
X1 + · · · + Xn − nµ
P √ ≤ b ≈ P (Z ≤ b) = Φ(b).
σ n

when n is large.

• The CLT helps us approximate the probability of anything involving X1 + · · · + Xn where Xi are
independent and identically distributed.
• When approximating discrete distributions: USE the ±.5 continuity correction:
• Example1: If 10 fair dice are rolled, nd the approximate probability that the sum obtained is
between 30 and 40, inclusive.
 Solution: Let Xi denote the value of the ith die. Recall that

7 35
E (Xi ) = Var(Xi ) = .
2 12
Take

X = X1 + · · · + Xn

to be their sum.
 Using the CLT we need

7
nµ = 10 · = 35
r 2
√ 350
σ n =
12
97
14.1. THE CENTRAL LIMIT THEOREM 98

thus using the continuity correction, then


 
29.5 − 35 X − 35 40.5 − 35
P (29.5 ≤ X ≤ 40.5) = P q ≤ q ≤ q 
350 350 350
12 12 12

≈ P (−1.0184 ≤ Z ≤ 1.0184)
= Φ (1.0184) − Φ (−1.0184)
= 2Φ (1.0184) − 1 = .692.
• Example2: An instructor has 1000 exams that will be graded in sequence.
 The times required to grade exam exam are i.i.d. with µ = 20 minutes and SDσ = 4 minutes.
 Approximate prob that the intructor will grade at least 25 exams in the rst 450 minutes of
work.
 Solution:
 Let Xi be the time it takes to grade exam i. Then
X = X1 + · · · + X25
is the time it takes to grade the rst 25 exams. We want P (X ≤ 450).
 Use CLT,
nµ = 25 · 20 = 500
√ √
σ n = 4 25 = 20.
 Thus
 
X − 500 450 − 500
P (X ≤ 450) = P ≤
20 20
≈ P (Z ≤ −2.5)
= 1 − Φ(2.5)
= .006.

You might also like