Probability Theory Lecture Notes Phanuel Mariano
Probability Theory Lecture Notes Phanuel Mariano
Phanuel Mariano
Contents
Chapter 1. Combinatorics 5
1.1. Counting Principle 5
1.2. Permutations 6
1.3. Combinations 7
1.4. Multinomial Coecients 9
Chapter 3. Independence 18
3.1. Independent Events 18
3
CONTENTS 4
Combinatorics
1.1. Counting Principle
• We need a way to help us count faster rather than counting by hand one by one.
• I like to use the box method. For example. Each box represent the number of possibilities in that
experiement.
• Example1: There are 20 teachers and 100 students in a school. How many ways can we pick a
teacher and student of the year?
Solution: Use the box Method: 20 × 100 = 2000.
• The counting principle can be generalized to any amount of experiments: n1 · · · nr possibilities
• Example2:
A college planning committee consists of 3 freshmen, 4 sophomores, 5 juniors, and 2 seniors.
A subcomittee of 4 consists 1 person from each class. How many?
Solution: Box method 3 × 4 × 5 × 2 = 120.
• Example3: How many dieren 6−place license plates are possible if the rst 3 places are to be
occupied by letters and the nals 3 by numbers?
Solution: 26 · 26 · 26 · 10 · 10 · 10 =?
Question: What if no repetition is allowed?
Solution:26 · 25 · 24 · 10 · 9 · 8
• Example4: How many functions dened on n points are possible if each functional value is either
0 or 1.
Solution: Box method on the 1, . . . , n points gives us 2n possible functions.
5
1.2. PERMUTATIONS 6
1.2. Permutations
n (n − 1) · · · 3 · 2 · 1 = n!
dierent permutations of the n objects.
(?) Note that ORDER matters when it comes to Permutations
9!
3!4!2!
• Example4: How many dierent letter arrangements can be formed from the word PEPPER?
6!
Answer: There 3 P 's 2 E 's and one R. So
3!2!1! = 30.
Fact. There are
n!
n1 ! · · · nr !
dierent permutations of n objects of which n1 are alike, n2 are alike, nr are alike.
• Example4: Suppose there are 4 Czech tennis players, 4 U.S. players, and 3 Russian players, in
how many ways could they be arranged?
11!
Answer: 4!4!3! .
1.3. COMBINATIONS 7
1.3. Combinations
5·4·3 5!
= = 10.
3! 3!2!
Or what we did was 5 · 4, or n(n − 1) · · · (n − r + 1) then divided by the repeats 3!.
5
This is often written , read 5 choose 3. More generally..
3
Fact. If r ≤ n, then
n n!
=
r (n − r)!r!
and say n choose r, represents the number of possible combinations of objects taken r at a time.
(?) Order DOES NOT Matter here
Proof. To see this, the left hand side is (x + y)(x + y) · · · (x + y). This will be the sum of 2n terms,
and each term will have n factors. How many terms have k x's and n − k y 's? This is the same as asking
in a sequence of n positions, how many ways can one choose k of them in which to put x's? (Box it) The
n k n−k n
answer is , so the coecient of x y should be .
k k
3
• Example: Expand(x + y) .
3
Solution: (x + y) = y 3 + 3xy 2 + 3x2 y + x3 .
• Problem: Using Combinatorics: Let's prove
10 9 9
= +
4 3 4
with no algebra:
The LHS represents the number of committees having 4 people out of the 10.
Let's say the President of the university will be in one of these committees and he's special,
so we want to know when he'll be there
or not.
9
When he's there, then there are 1· is the number of ways that contain the President
3
9
while is the number of comittees that do not contain the President and contain 4 out
4
of the remaining people.
• The more general equation is
n n−1 n−1
= +
r r−1 r
1.4. MULTINOMIAL COEFFICIENTS 9
• Example: Suppose one has 9 people and one wants to divide them into one committee of 3, one
of 4, and a last of 2. How many dierent ways are there?
9
Solution: (Box it) There are ways of choosing the rst committee. Once that is done,
3
6
there are 6 people left and there are ways of choosing the second committee. Once
4
that is done, the remainder must go in the third committee. So there is 1 one to choose that.
So the answer is
9! 6! 9!
= .
3!6! 4!2! 3!4!2!
• In general: Divide n objects into one group of n1 , one group of n2 , . . . and a k th group of nk ,
where n = n1 + · · · + nk , the answer is there are
n!
ways.
n1 !n2 ! · · · nk !
• These are known as multinomial coecients. We write them as
n n!
= .
n1 , n2 , . . . , nk n1 !n2 ! · · · nk !
• Example: Suppose we are to assign Police ocers their duties . Out of 10 ocers: 6 patrols, 2 in
station, 2 in schools.
10!
Answer: 6!2!2! .
• Example: There are 10 ags:5 indistinguishable Blue ags, 3 indistinguishable Red ags, and 2
indistinguishable Yellow ags. How may dierent ways can we order them on a ag pole?
10!
Answer: 5!3!2! .
• Example: Suppose one has 8 indistinguishable balls. How many ways can one put them in 3
boxes?
Solution1: Let us make sequences of o's and |'s; any such sequence that has | at each side, 2
other |'s, and 8 o's represents a way of arranging balls into boxes. For example, if one has
| oo | ooo | ooo | .
How many dierent ways can we arrange this where we have start with | and end with |. In
between, we are only arranging 8 + 2 = 10 symbols, of which only 8 are o's
So the question is: How many ways out of 10 spaces can one pick 8 of them into which to
put an
o?
10
.
8
9
Solution2: Look at spaces between. There are 9 spaces. So + 9.
2
CHAPTER 2
Axioms of Probability
2.1. Sample Space and Events
• We will have a sample space, denoted S (sometimes Ω, or U ) that consists of all possible outcomes
from an experiment.
Example1:
∗ Experiment: Roll two dice,
∗ Sample Space: S = would be all possible pairs made up of the numbers one through six.
List it here.{(i, j) : i, j = 1, . . . 6}. 36 points.
Example 2:
∗ Experiment: Toss a coin twice
∗ S = {HH, HT, T H, T T }}
Example3:
∗ Experiment: Measuring the number of accidents of a random person before they had
turn 18.
· S = {0, 1, 2, . . . }
Others:
∗ Let S be the possible orders in which 5 horses nish in a horse race;
∗ Let S be the possible price of some stock at closing time today; or S = [0, ∞) ;
∗ The age at which someone dies, S = [0, ∞) .
• Events: An event A is a subset of S . In this case we use the notation A ⊂ S , to mean A is a
subset of S.
A ∪ B : points in S such that is in A OR B OR BOTH.
A ∩ B , points in A AND B . (you may also see AB )
0
Ac is the compliment of A, the points Sn NOT in A T.n(you may also see A )
Can extend to A1 , . . . , An events. i=1 Ai and i=1 Ai .
10
2.1. SAMPLE SPACE AND EVENTS 11
• Example1: Roll two dice.
Example of an Events
E =the two dies come up even and equal {(2, 2) , (4, 4) , (6, 6)}
F = the sum of the two dice is 8. {(2, 6) , (3, 5) , (4, 4) , (5, 3) , (6, 2)}.
E ∪ F = {(2, 2) , (2, 6) , (3, 5) , (4, 4) , (5, 3) , (6, 2) , (6, 6)}
E ∩ F = {(4, 4)}.
F c all the 31 other ways that does not include {(2, 6) , (3, 5) , (4, 4) , (5, 3) , (6, 2)}.
• Example2: S = [0, ∞) age someone dies.
Event A = person dies before they reached 30.
∗ A = [0, 30).
Interpret Ac = [30, ∞)
∗ The person dies after they turned 30.
B = (15, 45). Do A ∪ B, A ∩ B and so on.
• Properties: Events also have commutative and associate and Distributive laws.
• What is A ∪ Ac ? = S .
• DeMorgan's Law:
c
(A ∪ B) = Ac ∩ B c .Try to draw a picture
c
(A ∩ B) = Ac ∪ B c .
c c
This works for general A1 , . . . , An : (∪n n c n n c
i=1 Ai ) = ∩i=1 Ai and (∩i=1 Ai ) = ∪i=1 Ai .
• The empty set ∅ = {} is the set that has nothing in it.
• A and B are disjoint if A ∩ B = ∅.
In Probability we may say that events A and B are mututally exclusive if they are disjoint.
mutually exclusive means the same thing as disjoint
2.2. AXIOMS OF PROBABILITY 12
P (∪ni=1 Ai ) = P (∪∞
i=1 Ai )
n
X ∞
X
= P (Ai ) + P (∅)
i=1 n=1
Xn ∞
X
== P (Ai ) + 0
i=1 n=1
n
X
= P (Ai )
i=1
1 = P (S) = P (E) + P (E c ) ,
hence P(E c ) = 1 − P(E).
c
(d) If E ⊂ F, then write F = E ∪ (F ∩ E ) thus since this is disjoint
P (E ∪ F ) = P (E) + P (E c ∩ F ) .
Now write F (with picture) as F = (E ∩ F ) ∪ (E c ∩ F ) and using disjointness
P (F ) = P (E ∩ F ) + P (E ∩ F ) =⇒ P (E c ∩ F ) = P (F ) − P (E ∩ F ) ,
c
P (E ∪ F ) = P (E) + P (E c ∩ F )
= P (E) + P (F ) − P (E ∩ F ) ,
as needed.
• Example: Uconn Basketball is playing Kentucky this year.
Home game has .5 chance of winning
Away game has .4 chance of winning.
.3 that uconn wins both games.
What's the probability that Uconn loses both games?
Answer.
∗ Let P (A1 ) = .5 , P (A2 ) = .4 and P (A1 ∩ A2 ) = .3.
∗ We want to nd P (Ac1 ∩ Ac2 ). Simplify as much as we can:
c
P (Ac1 ∩ Ac2 ) = P ((A1 ∪ A2 ) ) by DeMorgan's Law
= 1 − P (A1 ∪ A2 ) , by Proposition 1c
2.2. AXIOMS OF PROBABILITY 14
P (A1 ∪ A2 ) = .5 + .4 − .3 = .6,
c
Hence P (A1 ∩ Ac2 ) = 1 − .6 = .4 as needed.
2.3. EQUALLY LIKELY OUTCOMES 15
• In many experiments, a probability space consists of nitely many points, all with equally likely
probabilities.
1
Basic example was a tossing a coin P (H) = P (T ) = 2
Fair die: P (i) = 61 for i = 1, . . . , 6.
• In this case from Axiom 3 we have that
number of outcomes in E
P (E) = .
number of outcomes in S
• Example1: What is the probability that if we roll 2 dice, the sum is 7?
Answer: There are 36 total outcomes , of which 6 have a sum of 7:
∗ E = ”sum is 7” = {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)}. Since they are all equally
6
likely, the probability is P (E) = 6·6 = 16 .
• Example 2: If 3 balls are randomly drawn from a bowl containing 6 white and 5 black balls,
what is the probability that one ball is white and the other two are black?
Method 1: (regard as a ordered selection)
W BB + BW B + BBW
P (E) =
11 · 10 · 9
6·5·4+5·6·4+5·4·6 120 + 120 + 120 4
= = = .
990 990 11
Method2: (Regard as unordered set of drawn balls)
6 5
(1 white) (2 black) 1 2 4
P (E) = = = .
11 11 11
3 3
• We can always choose which way to regard our experiements.
• Example 3 A committee of 5 is to selected from a group of 6 men and 9 women. What is probability
consistsd of 3 men and 2 women
6 9
3 2
Answer: Easy men·women
all = 240
= 1001 .
15
5
• Example 4: Seven balls are randomly withdrawn from an urn that contains 12 red, 16 blue, and
18 green.
(b) Find probability that at least 2 red balls are withdrawn;
Ans: Let E be this event then P (E) = 1−P (E c ), P (at least 2 red) = 1−P (drawing 0 or 1 balls).
Now
16 + 18 = 34 12 34
7 1 6
P (drawing 0 or 1 red balls) = + .
46 46
7 7
• Explanation of Poker/Playing cards : Ranks and suits,etc!
There are 52 cards in a standard deck of playing cards. The poker hand is consists of ve
cards. There are 4 suits : heats, spades, diamonds, and clubs (♥♠♦♣). The suits diamonds
2.3. EQUALLY LIKELY OUTCOMES 16
and hearts are red while clubs and spades are black. In each suit there are 13 ranks : the
numbers 2, 3 . . . , 10, the face cards, Jack, Queen, King, and the Ace(not a face card).
• Example 5: What is the probability that in a poker hand (5 cards out of 52) we get exactly 4 of
a kind?
4 4
Answer: Consider 4 aces and 1 king: AAAK = . But JJJJ3 is the same
4 1
probability.
∗ Thus there are 13 ways to pick the rst rank, and 12 ways to pick the second rank
363
dierent birthday from the rst two people is
365 . So the answer is
P (at least 2 people) = 1 − P (Everyone dierent birthday)
365 364 363 (365 − 31)
= 1− · · ···
365 365 365 365
364 363 334
= 1−1· · ··· ≈ 0.752374.
365 365 365
Really High!!!
CHAPTER 3
Independence
3.1. Independent Events
P (E ∩ F ) = P (E) P (F ) .
• Example1: Suppose you ip two coins.
The event that you get heads on the second coin is independent of the event that you get tails
on the rst.
This is why: Let At be the event of getting is tails for the rst coin and Bh is the event
of getting heads for the second coin, and we assume we have fair coins (although this is not
necessary), then
1
P (At ∩ Bh ) = , list out all outcomes
4
11 1
P (At ) P (Bh ) = = .
22 4
• Example2: Experiment: Draw a card from an ordinary deck of cards
Let A = draw ace, S = draw a spade.
∗ These are independent events since you're taking one at a time, so one doesn't eect the
other. To see this using the denition we have compute
1 1
∗ P (A) P (S) = 13 4.
1
∗ White P (A ∩ S) = 52 since there is only 1 Ace of spades.
Proof. Draw a Venn Diagram to help with the computation, but note that
P (E ∩ F c ) = P (E) − P (E ∩ F )
= P (E) − P (E) P (F )
= P (E) (1 − P (F ))
= P (E) P (F c ) .
• Remark: Independence and mutually exclusive, are two dierent things!
18
3.1. INDEPENDENT EVENTS 19
S7 = {sum is 7}
A4 = {rst die is a 4}
B3 = {second die is a 3}
Are the events S7 , A4 , B3 independent?
∗ Compute
1
P (S7 ∩ A4 ∩ B3 ) = P ({(4, 3)}) =
36
but
6 11 1
P (S7 ) P (A4 ) P (B3 ) = = .
36 6 6 36 · 6
• Remark: This generalizes to events A1 , . . . , A n . We say events
T A1 , . . . , An are independent if for
r Qr
all subcollections i1 , . . . , ir ∈ {1, . . . , n} we have that P j=1 Aij = j=1 P Aij .
• Example:
An urn contains 10 balls: 4 red and 6 blue.
A second urn contains 16 red balls and an unknown number of blue balls.
A single ball is drawn from each urn. The probability that both balls are the same color is
0.44.
Question: Calculate the number of blue balls in the second urn.
Solution: Let Ri = even that a red ball is drawn from urn i and let Bi =event that a blue
ball is drawn from urn i.
∗ Let x be the number of blue balls in urn 2,
∗ Note that drawing from urn 1 and independent from drawing from urn 2. They are
completely dierent urns! They shouldn't eect the other.
∗ Then
[
.44 = P (R1 ∩ R2 ) (B1 ∩ B2 ) = P (R1 ∩ R2 ) + P (B1 ∩ B2 )
= P (R1 ) P (R2 ) + P (B1 ) P (B2 ) , by independence
4 16 6 x
= + .
10 x + 16 10 x + 16
∗ This tellls you that the slows are constant. What does that tell you about p(x)? It's a
line!
x
· Thus we must have p(x) = 200 .
1
∗ Thus p(50) = 4.
• Example (A variation of Gambler's ruin)
Problem: Suppose we are in the same situation, but you are allowed to go arbitrarily far in
debt. Let p(x)be the probability you ever get to $200. What is a formula for p(x)?
∗ Answer: Just as before p(x) = 12 p(x + 1) + 12 p(x − 1). So that p(x) is linear.
∗ But now all we have is that p(200) = 1 and linear and domain is (−∞, 200).
∗ Draw a graph: Now the slope, or p0 (x) can't be negative, or else we would have it that
p(x) > 1 for x ∈ (−∞, 200).
· The slope can't be positive or else we would get p(x) < 0 for x ∈ (−∞, 200).
∗ Thus we must have that p(x) ≡ constant. Hence p(x) = 1 for all x ∈ (−∞.200).
∗ Sol: So we are certain to get $200 if we cna get into debt.
Method2:
∗ Just compute There is nothing special about the gure 200. Another way of seeing this
is to compute as above the probability of getting to 200 before −M and then letting
M → ∞.
· We would get p(x) is a line with p(−M ) = 0 and p(200) = 1 so that
1−0
p(x) − 0 = (x − (−M ))
200 − (−M )
x+M
and letting M →∞ wee see that p(x) = 200+M → 1.
• Example: Experiment: Roll 10 dice.
What is the probability that exactly 4 twos will show if you roll 10 dice?
Answer: These are independent. The probability that the 1st, 2nd, 3rd, and 10th dice will
1 3 5 7
show a three and the other 6 will not is
6 6 .
Independence is used here: the probability is 16 16 16 65 65 56 65 65 56 61 . Note that the probability
that the 10th, 9th, 8th, and 7th dice will show a two and the other 6 will not has the same
probability.
1 4 5 6
So to answer our original question, we take 6 6 and multiply it by the number of ways
10
of choosing 4 dice out of 10 to be the ones showing the twos. There are ways to do
3
10 1 4
5 6
this
4 6 6 .
• This is an example of Bernoulli trials, or the Binomial distribution.
3.1. INDEPENDENT EVENTS 21
If we have n independent trials, where the probability of success if p. The probability that
there are k successes in n trials is
n n−k
pk (1 − p) .
k
CHAPTER 4
P (E ∩ F )
P (E | F ) = .
P (F )
Now P (E | F ) is read the probability of E given F .
• Note that P (E ∩ F ) = P (E | F ) P (F )!
• This is the conditional probability that E occurs given that F has already occured!
• Remark: Suppose P (E | F ) = P(E) , i.e. knowing F doesn't help predict E . Then this implies
P(E∩F )
that E and F are independent of each other. Rearranging P (E | F ) =
P(F ) = P (E) we see that
P (E ∩ F ) = P(E)P(F ).
• Example1: Experiment: Roll two dice.
(a) What is the probability the sum is 8?
5
∗ Solution: Note that A = {(2, 6), (3, 5), (4, 4), (5, 3), (6, 2)} 36 .
so we know P (A) =
(b) What is the probability that the sum is 8 given that the rst die shows a 3? (In other
words, nd P (A | B))
∗ Solution: Let B = {rst die shows three}.
1
∗ P (A ∩ B) = P ({(3, 5)}) = 36 is probability that the rst die shows a 3 and the sum is
8
∗ Finally we can compute
1/36 1
P (A | B) = P (sum is 8 | 1st is a 3) = = .
1/6 6
• Remark: When computing P (E | F ), Sometime its easier to work with the reduced sample space
F ⊂ S.
22
4.1. CONDITIONAL PROBABILITIES 23
P (sum is 8 | 1st is a 3)
we could have worked in the smaller sample space of {1st is a 3} = {(3, 1) , (3, 2) , (3, 3) , (3, 4) , (3, 5) , (3, 6)}.
Since only (3, 5) begins with a 3 and has the sum of 8, then the probability is
P (U ∩ M c )
P (U nion | N ot Monteith) =
P (M c )
P (U )
= , since U ⊂ Mc
1 − P (M )
4/10 2
= = .
6/10 3
• Example4: Suppose that Annabelle and Bobby each draw 13 cards from a standard deck of 52.
Given that Sarah has exactly two aces, what is the probability that Bobby has exactly one ace?
Solution: Let A be the event Annabelle has two aces," and let B be the event Bobby has
exactly one ace." Again, we want
P (B | A), so we calculate
and
P(A) P(A ∩ B). Annabelle
52 4 48
could have any of possible hands. Of these hands, · will have exactly
13 2 11
4.1. CONDITIONAL PROBABILITIES 24
two aces, so
4 48
·
2 11
P (A) = .
52
13
Now the number of ways in which Annabelle can have a certain hand and Bobby can have a
52 39
certain hand is · , and the number of ways in which A and B can both occur
13 13
4 48 2 37
is · · · . so
2 11 1 12
4 48 2 37
· · ·
2 11 1 12
P(A ∩ B) = .
52 39
·
13 13
Therefore,
4 ·
48 ·
2 37
·
2 11 1 12
52 ·
39
P (A ∩ B) 13 13
P (B | A) = =
P(A)
4 ·
48
2 11
52
13
2 37
·
1 12
= .
39
13
• P (B | A) = P(A∩B)
Note that since
P(A) then P (A ∩ B) = P(A)P (B | A).
In general: If E1 , . . . , En are events then
• Example5:
Experiment: Suppose an urn has 5 White balls and 7 Black balls. Each ball that is selected is
returned to the urn along with an additional ball of the same color. Suppose draw 3 balls.
Part (a): What is the probability that you get 3 white balls.
∗ Then
P (A ∩ C) = P (C) P (A | C)
1 1 1
= · = .
2 7 14
• Example 7: A total of 500 married couples are poled about salaries:
Wife Husband makes less than 25,000 Husband makes more than 25,000
• Sometimes it's easier to compute a probability once we know something has or has not happened.
• Note that we can compute,
P (E) = P (E ∩ F ) + P (E ∩ F c )
= P (E | F ) P (F ) + P (E | F c ) P (F c )
= P (E | F ) P (F ) + P (E | F c ) (1 − P (F )) .
• This formula is called: The Law of Total Probability:
P (E) = P (E | F ) P (F ) + P (E | F c ) (1 − P (F ))
• The following problem will describe the types of problems of this section.
• Example1: Insurance company believes
The probability that an accident prone person has an accident within a year is .4.
The probability that Non-accident prone person has an accident with year is .2.
30% of the population is accident prone.
Part (a): Find P (A1 ) where A1 =new policy holder will have an accident within a year?
∗ Let A = {Policy holder IS accident prone.}
P (A ∩ A1 )
P (A | A1 ) =
P (A1 )
P (A) P (A1 | A)
=
.26
(.3) (.4) 6
= = .
.26 13
• In general:
So in Part (a) we had to break a probability into two cases: If F1 , . . . , Fn are mutually exclusive
Sn
events such that they make up everythinn S= i=1 Fi then
n
X
P (E) = P (E | Fi ) P (Fi ) .
i=1
∗ This is called Law of Total Probability.
In Part (b), we wanted to nd a probability of a separate conditional event: then
P (E | Fj ) P (Fj )
P (Fj | E) = Pn .
i=1 P (E | Fi ) P (Fi )
∗ This is known as Baye's Formula
∗ Note that the denominator of the Bayes's formula is the Law of total probability.
• Example2: Suppose the test for HIV is
98% accurate in both directions
0.5% of the population is HIV positive.
4.2. BAYES'S FORMULA 27
Question: If someone tests positive, what is the probability they actually are HIV positive?
Solution: Let T+ = {tests positive} , T− = {tests negative}, while + = {actually HIV positive,}
− = {actually negative}.
∗ Want
P (+ ∩ T+ )
P (+ | T+ ) =
P (T+ )
P (T+ | +) P (+)
=
P (T+ | +) P (+) + P (T+ | −) P (−)
(.98) (.005)
=
(.98) (.005) + .02 (.995)
= 19.8%.
• Example3: Suppose
30% of the women in a class received an A on the test
25% of the men/or else received an A.
60% of the class are women.
Question: Given that a person chosen at random received an A, what is the probability this
person is a women?
∗ Solution: Let A the event that a students receives an A. Let W =being a women,
M =not a women. Want
P (A | W ) P (W )
P (W | A) = , by Bayes's
P (A | W ) P (W ) + P (A | M ) P (M )
.3 (.6) .18
= = ≈ .64.
.3 (.6) + .25 (.4) .28
• (General Baye's Theorem) Here's one with more than 3 possibilities:
• Example4: Suppose in Factory with Machines I,II,III producing Iphones
Machines I,II,III produce 2%,1%, and 3% defective iphones, respectively.
Out of total production, Machines I makes 35% of all Iphones, II -25%, III - 40%.
If one Iphone is selected at random from the factory,
Part (a): what is probability that one Iphone selected is defective?
P (III) P (D | III)
P (III | D) =
P (D)
(.4) (.03) 120
= = .
215/10, 000 215
• Example5: In a Multiple Choice Test, students either knows the answer or randomly guesses the
answer to a question.
Let m =number of choices in a question.
4.2. BAYES'S FORMULA 28
Let p = the probability that the students knows the answer to a question.
Question: What is the probability that the student actually knew the answer, given that the
student answers correctly.
Solution:
Let K = {Knows the answer} and C = {Answer's correctly}. Then
P (C | K) P (K)
P (K | C) =
P (C | K) P (K) + P (C | K c ) P (K c )
1·p mp
= 1 = .
1 · p + m (1 − p) 1 + (m − 1)p
CHAPTER 5
Random Variables
5.1. Random Variables
• When we perform an experiment, we are interested in some function of the outcomes, instead of
the actual outcome.
We want to attach for each outcome, a numerial value.
• Denition: A random variable is a function X:S→R or write X : Ω → R. (Use capital letters
to denote r.v)
We can think of X as a numerical value that is random, like as if X is a random number.
• Example: Toss a coin
Let X be 1 if heads and X = 0 if tails
Then X (H) = 1 and X (T ) = 0.
We can do calculus on real numbers but not on Ω = S = {H, T }.
• Example: Roll a die
Let X denote the outcome, so X = 1, 2, 3, 4, 5, 6 (its random)
That is X(1) = 1, X(2) = 2, . . . .
• Example: Roll a die, dene
(
1 outomce= odd
Y =
0 outomce= even
Can be thought of as
(
1 s = odd
Y (s) = .
0 s = even
29
5.1. RANDOM VARIABLES 30
Definition. A random variable that can take on at most countable number of possible values is said
to be a discrete r.v.
Definition. For a discrete random variable, we can dene the probability mass function (pmf ), or
the density function of X by p(x) = P (X = x). Note that p : R → [0, 1].
• Note that (X = x) = (ω ∈ Ω | X (ω) = x) is an abbreviation.
• Let X x1 , x2 , x3 . . .
assume only the values
In other words, X : S → {x1 , x2 , . . . }
Properties of a pmf p(x):
∗ Note that we must have 0 < p(xi ) ≤ 1 for ,i = 1, 2, . . . . and p (x) = 0 for all other values
of x can't attain.
∗ Also must have
X∞
p(xi ) = 1.
i=1
• We often draw bar graphs for discrete r.v.
• Example: If we toss a coin
X = 1 if we have H and X = 0 if we have T .
Then draw a BAR graph
1
2
x=0
1
pX (x) = x = 1,
2
0 otherwise
• Oftentimes someone has already found the pmf for you, and you can use to compute probabilities.
i
• Example: The pmf of X is given by p(i) = e−λ λi! for i = 0, 1, 2, . . . where λ is a parameter(what
is this?) that is any positive number
Part (a) What values can the random variable X attain? In other words, what is the range
of X?
0
∗ Sol: By denition we have P (X = 0) = p(0) = e−λ λ0! = e−λ
Part (b) Find P (X = 0)
0
∗ Sol: By denition we have P (X = 0) = p(0) = e−λ λ0! = e−λ
Part (c) Find P (X > 2)
∗ Sol: Note that
P (X > 2) = 1 − P (X ≤ 2)
= 1 − P (X = 0) − P (X = 1) − P (X = 2)
= 1 − p(0) − p(1) − p(2)
λ2 e−λ
= 1 − e−λ − λe−λ − .
2
5.3. EXPECTED VALUE 32
• One of the most important concepts in probability is that of expectation. If X is a random variable
that what is the average value of X, that is what is the expected value of X.
Definition. Let X have a pmf p(x). We dene the expectation, or expected value of X to be
X
E [X] = xp(x).
x:p(x)>0
• Notation EX , or EX .
• Example1: Let X(H) = 0 and X (T ) = 1. What is EX ?
EX = 0 · p(0) + 1 · p(1)
1 1 1
= 0 +1· = .
2 2 2
• Example2: Let X be the outcome when we roll a fair die. What is EX ?
1 1 1
EX = 1 +2 + ··· + 6
6 6 6
1 21 7
= (1 + 2 + 3 + 4 + 5 + 6) = = = 3.5
6 6 2
Note that X can never be 3.5 , so expectation is to give you an idea, what an exact.
• Recall innite series: If 0 ≤ x < 1 then a geometric series is
∞
X
xn = 1 + x + x2 + x3 + · · ·
n=0
1
= .
1−x
One thing you can do with series is dierentiate them and integrate them: So if
1
1 + x + x2 + x3 + · · · + =
1−x
then
1
0 + 1 + 2x + 3x2 + · · · + = 2
(1 − x)
• Example3: Let X be the number or tornados in Connecticut per year. Meaning that the random
variable X can be any number X = 0, 1, 2, 3, . . . . Suppose the state of Connecticut did some
analysis and found out that
1
P (X = i) = .
2i+1
Question: What is EX ? That is, what is the expected number of tornados per year in
Connecticut.
Solution: Note that X is innite, but still countable, hence still discrete.
5.3. EXPECTED VALUE 33
Note that 1
2
i = 0,
1
4
i = 1,
1
p(i) = 8 i = 2,
. .
. .
. .
1
2n+1 i = n.
We have that
EX = 0 · p(0) + 1 · p(1) + 2 · p(2) + · · ·
1 1 1 1
= 0 · + 1 2 + 2 3 + 3 4 + ···
2 2 2 2
1 1 1
= 2
1 + 2 + 3 2 + ···
2 2 2
1 1
1 + 2x + 3x2 + · · · , withx =
=
4 2
1 1 1
= = 2 = 1.
4 (1 − x)2 4 1− 1 2
5.4. THE C.D.F. 34
• We sometimes use the notation FX (x) to highlight that FX is the CDF of the random variable X .
• Example: Suppose X is equals to the number of heads in 3 coin ips. From Section 5.1, we
calculated the p.m.f to be.:
1
p(0) = P (X = 0) =
8
3
p(1) = P (X = 1) =
8
3
p(2) = P (X = 2) =
8
1
p(3) = P (X = 3) = .
8
Question: Find the c.d.f of X . Plot the graph of the c.d.f.
Solution: Summing up the probabilities up to that value of x we get the following:
0 −∞ < x < 0
1
0≤x<1
8
4
F (x) = 8 1≤x<2 .
7
2≤x<3
8
1 3≤x<∞
The graph is given by
∗
5.4. THE C.D.F. 35
Proposition 3. Let FX (x) be the CDF for some random variable X. Then the following holds:
(a) For any a ∈ R, we have P (X < a) = limx→a− FX (x)
(b) For any a ∈ R, we have P (X = a) = FX (a) − limx→a− FX (x)
Proof. For part (a).
We rst write
∞
[ 1
(X < a) = X ≤a−
n=1
n
"∞ #
[ [ 1 1
= (X ≤ a − 1) a− <X ≤a−
n=1
n n+1
and since the events En = a − n1 ≤ X ≤ a − n+1
1
are disjoint then we can use Axiom 3 so prove that
∞
X 1 1
P (X < a) = P (X ≤ a − 1) + P a− <X ≤a−
n=1
n n+1
k
X 1 1
= P (X ≤ a − 1) + lim P X ≤a− −P X ≤a−
k→∞
n=1
n+1 n
1
= P (X ≤ a − 1) + lim P X ≤ a − − P (X ≤ a − 1) , by telescoping
k→∞ k+1
1
= lim P X ≤ a − + P (X ≤ a − 1) − P (X ≤ a − 1)
k→∞ k+1
1
= lim FX a − .
n→∞ n
Now you can replace the sequence an = a − n1 with any sequence an that is increasing towards a, and we get
the similar result,
since this holds for all increasing sequences an towards a, then we've shown that
P (X = a) = P (X ≤ a) − P (X < a)
= FX (a) − lim− FX (x).
x→a
• Example: Let X have distribution
0 x<0
x
0≤x<1
2
2
F (x) = 3 1≤x<2
11
2≤x<3
12
1 3 ≤ x.
Graph this and answer the following:
Part (a): Compute P(2 < X ≤ 4). We have that
Example: Let S = {1, 2, 3, 4, 5, 6} and X(1) = X(2) = 1 and X(3) = X(4) = 3 and
X(5) = X(6) = 5
∗ Def1: We know X = 1, 3, 5 with p(1) = p(3) = p(5) = 13
∗ Then EX = 1 · 31 + 3 13 + 5 13 = 39 = 3.
∗ Def2: We list all of S = {1, 2, 3, 4, 5, 6} and
∗ Then
EX = X(1)P ({1}) + · · · + X(6) · P ({6})
1 1 1 1 1 1
= 1 + 1 + 3 + 3 + 5 + 5 = 3.
6 6 6 6 6 6
• Dierence
Def1: We list all the values that X can attain and only care about those. (Range)
Def2: List all possible outcomes. (Domain)
Proposition 4. If X is a discrete random variable and S is countable, then the two denitions are
equivalent
where I used that each Si = {ω : X(ω) = xi } are mutually exclusinve events that union up to S.
5.5. EXPECTATED VALUE OF SUMS OF RANDOM VARIABLES 38
Theorem 5. (Linearity) If X and Y are discrete random variables and a∈R then
(a) E [X + Y ] = EX + EY .
(b) E [aX] = aEX .
Proof. We have that
X
E [X + Y ] = (X(ω) + Y (ω)) P (ω)
ω∈S
X
= (X(ω)P (ω) + Y (ω)P (ω))
ω∈S
X X
= X(ω)P (ω) + Y (ω)P (ω)
ω∈S ω∈S
= EX + EY.
If a∈R then
X
E [aX] = (aX(ω)) P (ω)
ω∈S
X
= a X(ω)P (ω)
ω∈S
= aEX.
• Generality: Linearity is true for general random variable X1 , X2 , . . . , Xn .
5.6. EXPECTATION OF A FUNCTION OF A RANDOM VARIABLE 39
P (X = −1) = .2,
P (X = 0) = .5
P (X = 1) = .3
2
Let Y =X . Find EY . n o
2 2
Solution: Note that Y = 02 , (−1) , (1) = {0, 1}.
Note that pY (1) = .2 + .3 = .5 and pY (0) = .5.
Thus EY = 0 · .5 + 1 · .5 = .5.
• IMPORTANT:
Note that EX 2 = .5 .
2
While (EX) = .01. Not equal!
∗ Since EX = .3 − .2 = .1. Thus
2
EX 2 6= (EX) .
• In general, there is a formula for g(X) where g is function. That use the fact that g(X) will be
g(x) for some x such that X = x.
Theorem 6. If X is a discrete random varianle that takes values X ∈ {x1 , x2 , x3 , . . . } with respective
probability mass function p(xi ), then for any real valued function g : R → R we have that
∞
X
E [g (X)] = g (xi ) p(xi ).
i=1
Proof. The random variable Y = g(X) can take on values, say Y = y1 , y2 , . . . . But we know that
yj = g(xi )
and as we see there could be more than one value xi such that yj = g(xi ). Thus we will group this sum into
this fashion: Using the denition of expectation we have that
X
E [Y ] = yj P (Y = yj )
j
X
= yj P (g(X) = yj )
j
= (?).
Now
[
P (g(X) = yj ) = P (g(xi ) = yj )
i:g(xi )=yj
X
= p(xi ).
i:g(xi )=yj
5.6. EXPECTATION OF A FUNCTION OF A RANDOM VARIABLE 40
P (X = −1) = .2,
P (X = 0) = .5
P (X = 1) = .3
2
Let Y =X . Find EY .
2
EX 2 = x2i p(xi ) = (−1) (.2) + 02 (.5) + 12 (.3) = .5.
P
Sol: We have that
5.7. Variance
• The variance of a r.v. is a measure of how spread out the values of X are.
• The expectation of a r.v. is quantity that help us dierentiate dierent r.v.'s, but it doesn't tell us
how spread out values are.
For example, take
X = 0 with probability 1
(
−1 p = 12
Y =
1 p = 12
(
−100 p = 12
Z = .
100 p = 12
Definition. If X is a r.v with mean µ = EX , then the variance of X, denoted by Var(X), is dened
by
h i
2
Var (X) = E (X − µ) .
• Remark: Ec = c.
• We prove an alternate formula for the variance. (The technique of using linearity is important
here!!! Hint Hint)
h i
2
Var (X) = E (X − µ)
= E X 2 − 2µX + µ2
= E X 2 − 2µE [X] + E µ2
= E X 2 − 2µ2 + µ2
= E X 2 − µ2 .
2
= E X 2 − (E [X]) .
Var (X)
• Example1: Calculate Var(X) if X represents the outcome when a fair die is rolled.
Solution: Previously we calculated that EX = 72 .
Thus we only need to calculate the second moment:
1 1
EX 2 = 12 + · · · + 62
6 6
91
= .
6
5.7. VARIANCE 42
• Bernoulli Distribution
Suppose that a trial or experiment takes place, whose outcome is either success or failure.
Let X = 1 when the outcome is a success and X = 0 if it is a failure.
The pmf of X is given by
p(0) = P (X = 0) = 1 − p
p(1) = P (X = 1) = p
where 0 ≤ p ≤ 1.
For this X , X is said to be a Bernoulli random variable with parameter p,
∗ We wrtie this as X ∼ Bernoulli(p),
∗ Properties:
· EX = p · 1 + (1 − p) · 0 = p
· EX 2 = 12 · p + 02 (1 − p) = p.
· So VarX = p − p2 = p(1 − p).
• Binomial Distribution:
• We say X has a binomial distribution with parameters n and p if
n n−k
pX (k) = P (X = k) = pk (1 − p) .
k
Interpret: X =the number of successes in n indepedent trials.
∗ Let's take this as given.
We say X ∼ Binomial(n, p) or X ∼ bin(n, p).
• Properties of the Binomial
Check that probabilities sums to 1: Not really a property but more of a check that X is
indeed a random variable:
∗ We need to check two things:
(1) That pX (k) ≥ 0, and this is obvious from the fomula
Pn
(2) Need to check that k=0 pX (k) = 1.
Pn n n
∗ First recall the Binomial Theorem: k=0 xk y n−k = (x + y) .
k
Then
n n
X X n n−k n
pX (k) = pk (1 − p) = (p + (1 − p)) = 1n = 1.
k
k=0 k=0
Mean: Easiest way to compute EX is by recognizing that X = Y1 + · · · + Yn where Yi are
independent Bernoulli's.
43
6.1. BERNOULI AND BINOMIAL RANDOM VARIABLES 44
• Calculator(TI-84):
2ndDistri>binomialpdf(n, p, x)=P (X = x).
same with cdf.
• Example1: A company prices its hurricane insurance using the following assumptions:
(i) In any calendar year, there can be at most one hurricane.
(ii) In any calendar year, the probability of a hurricane is 0.05.
(iii) The numbers of hurricanes in dierent calendar years are mutually independent. Using
the company's assumptions, calculate the probability that there are fewer than 3 hurricanes
in a 20-year period
6.1. BERNOULI AND BINOMIAL RANDOM VARIABLES 45
P (X < 3) = P (X ≤ 2)
20 0 20 20 1 19 20 2 12
= (.05) (.95) + (.05) (.95) + (.05) (.95)
0 1 2
= .9245.
• Example2: Phan has a .6 probability of making a free throw. Suppose each free throw is inde-
pendent of the other. If he attempts 10 free throws, what is the probability that he makes at least
2 of them?
Solution: Let X ∼ bin(10, .6) then
P (X ≥ 2) = 1 − P (X = 0) − P (X = 1)
10 0 10 10 1 9
= 1− (.6) (.4) − (.6) (.4)
0 1
= .998.
6.2. THE POISSON DISTRIBUTION 46
λi
pX (i) = P (X = i) = e−λ for i = 0, 1, 2, 3, . . . .
i!
Or X ∼Poisson(λ).
• In general Poisson random variables are of the following form
Suppose success happens λ times on average in a given period (per year, per month etc). Then
X= number of times sucess happens in that given period.
Possion is like binomial, excpect, X is innitely countable!
• Examples that obey Poisson R.V
1. The number of misprints on a page ogf a book
2. # of people in community that survive to age 100
3. # of telephone numbers that are dialed in a day.
4. # of customers entering post oce on a day.
P∞ xn x
• Calc2: Recall that n=0 n! = e .
• Properties of Poisson: Let X ∼ P oisson(λ)
First we check that pX (i) is indeed a pmf: First it is obvious that pX (i) ≥ 0 since λ > 0. We
to need to check that all the probabilities add up to one:
∞ ∞ i ∞
X X
−λ λ −λ
X λi
pX (i) = e =e = e−λ eλ = 1.
i=0 i=0
i! i=0
i!
Mean: We have
∞ ∞
X λi X λi−1
EX = ie−λ = e−λ λ
i=0
i! i=1
(i − 1)!
= e−λ λeλ = λ.
Variance: We rst have
∞
X e−λ λi
EX 2 = i2
i=0
i!
∞
X e−λ λi−1
= λ i
i=0
(i − 1)!
∞
X e−λ λj
= λ (j + 1), let j = i − 1
j=0
j!
∞ −λ j ∞ −λ j
X e λ X e λ
= λ j +
j=0
j! j=0
j!
λ λ + e−λ eλ
=
= λ (λ + 1) .
Thus
VarX = λ (λ + 1) − λ2 = λ.
6.2. THE POISSON DISTRIBUTION 47
• Example1: Suppose on average there are 5 homicides per month in Hartford, CT. What is the
probability there will be at most 1 in a certain month?
Answer: If X is the number of homicides, we are given that EX = 5. Since the expectation
for a Poisson is λ = 5. Therefore P (X = 0) + P (X = 1) = e−5 + 5e−5 .
• Example2: Suppose on average there is one large earthquake per year in Mexico. What's the
probability that next year there will be exactly 2 large earthquakes?
−1
Answer: λ = EX = 1, so P (X = 2) = e 2 .
• Example3: Phan receives texts on the average of two every 3 minutes. Assume Poisson.
Question: What is the probability of ve or more texts arriving in a 9−minute period.
Answer: Let X number of calls in a 9−minute period. Let n = number of periods, λ1 =2
Thus λ = 3 · 2 = 6. Thus
P (X ≥ 5) = 1 − P (X ≤ 4)
4
X e−6 6n
= 1−
n=0
n!
= 1 − .285 = .715.
P (Xn = i) → P (Y = i)
where Y ∼ P oisson(λ).
• Summary of Theorem: This theorem says that suppose n is large and p is small, Thus
If X ∼ Bin(n, p) then we approximate X with a possion by letting let λ = np so that
i
(np)
P (X = i) ≈ e−np .
i!
• When can we assume X is Poisson: Another consequence of this theorem says that when
Y =the number of successes in a given period. And if the number possible of trials n is large, and
if the probability p of success is small, then Y can be treated as a Poisson random variable.
• NOTE:
(1)Why is number of misprints on a page will be approximately Poisson with λ = np
∗ Let X = number of misprints on a page of a book.
∗ Since prob of error, say p = .01 is usually small, and number of letters on a page is
usually large, say n = 1000. Then the average is λ = np.
∗ Then because p is small and n is large, then X can be approximated by a Poisson.
(2) Let X number of accidents in a year
∗ X is Poisson because the probability of an accident p in a given periord is usually small
and while the number n of times someone drives in a given period is high.
• Example: Here is an example showing this.
6.2. THE POISSON DISTRIBUTION 48
1
If X is number of times you get heads on a biased coin where P (H) = 100 . Suppose you you
toss 1000 times. Then np = 10
105
P (X = 5) ≈ e−10 = .0378
5!
while the actual value is
1000 5 995
P (X = 5) = (.01) (.99)
5
1000! 5 995
= (.01) (.99)
995!5!
= .0375.
6.3. OTHER DISCRETE DISTRIBUTIONS 49
• Uniform Distribution:
We say X is uniform, and write this as X ∼ unif orm(n), if X ∈ {1, 2, . . . , n} and
1
pX (i) = P (X = i) = for i = 1, 2, . . . , n.
n
Pn 1 1
Pn 1 n(n−1) n−1
Exercise: EX = i=1 i n = n i=1 i= n 2 = 2 and nd VarX .
• Geometric Distribution:
Experiment: Suppose that independent trials are held until success occurs. Trials are stopped
once success happens. Let p be the probabiliy of having a success in each trial.
Let X = number of trials required until rst success occurs. Thus X ∈ {1, 2, 3, 4, . . . }Here
we have
i−1
pX (i) = P (X = i) = (1 − p) p fori = 1, 2, 3, 4 . . . .
We say X ∼ geometric(p).
Properties:
∗ We rst double check is indeed a discrete random variable: This follows from what we
know about geometric series:
∞ ∞
X X i−1 p
P (X = i) = (1 − p) p= = 1.
i=1 i=1
1 − (1 − p)
∗ Mean: P Recall that by dierentiation of the geometric series, we came up with the
∞ n−1 1
formula n=0 nx = (1−x) 2 , so that
∞
X
EX = iP (X = i)
i=1
X∞
i−1
= i (1 − p) p
i=1
p 1
= 2
= .
(1 − (1 − p)) p
∞
X
2 i−1
EX = i2 (1 − p) p. (?)
i=1
P∞ 1
P∞
Thus we can dierentiate n=1 nxn−1 = (1−x)2 again to get n=2 n (n − 1) xn−2 =
2
.
(1−x)3
6.3. OTHER DISCRETE DISTRIBUTIONS 50
∗ From this we will attempt to get EX 2 in (?) by splitting the sum up:
∞
X n−2 2 2
n (n − 1) (1 − p) = 3 = ,
n=2 (1 − (1 − p)) p3
∞
X n−2 2
n (n − 1) (1 − p) p = , now split,
n=2
p2
∞ ∞
X n−2 2 X n−2
n2 (1 − p) p = + n (1 − p) p
n=2
p2 n=2
∞ ∞
−1
X n−1 2 X n−2 −1
(1 − p) n2 (1 − p) p = 2
+ n (1 − p) p + (1 − p) p
n=1
p n=2
∞
−1 2 −1
X n−1
(1 − p) EX 2 = + (1 − p) n (1 − p) p
p2 n=1
2 −1 1
= 2
+ (1 − p)
p p
Thus
2 (1 − p) 1
EX 2 = +
p2 p
2 − 2p + p 2−p
= =
p2 p2
∗ So Thus
2 2−p 1
VarX = EX 2 − (EX) = − 2
p2 p
(1 − p)
=
p2
• Example1: An urn contains 10 white balls and 15 black balls. Balls are randomly selected, one
at a time, until a black one is obtained. If we assume that each ball selected is replaced before the
next one is drawn, what is the probability.
Part (a): Exactly 6 draws are needed?
∗ X =number of draws needed to select a black ball, the probability of sucess is
15 15
p= = = .6.
10 + 15 25
∗ Thus
6−1
P (X = 6) = (.4) (.6) = .006144
Part (a): What is the expected number of draws in this game?
∗ Since X ∼ geometric(.6) then
1 10
EX = = = 1.6̄
p 6
Part (c)(Extra Problem to be done at home) Find exactly that probability at least k
draws are needed?
6.3. OTHER DISCRETE DISTRIBUTIONS 51
∗ We have that
∞
X
P (X ≥ k) = P (X = k)
n=k
X∞
n−1
= (.4) (.6)
n=k
∞
−1
X n
= (.6) (.4) (.4)
n=k
∞
−1 k
X n
= (.6) (.4) (.4) (.4)
n=0
k−1 1
= (.6) (.4)
1 − .4
k−1
= (.4) .
• Note: This could have been done for a general p. Thus
k−1
P (X ≥ k) = (1 − p) .
• Negative Binomial(Need to know for Actuarial Exam):
Experiment: Suppose that independent trials are held with probability p of having a success.
The trials are perfomed until a total of r sucesses are accumulated.
∗ Let X equal the number of trials required to obtain r succeses. Here we have
n−1 n−r
P (X = n) = pr (1 − p) forn = r, r + 1, . . . .
r−1
We say X ∼ N egativeBinomial(r, p).
Properties:
P∞
∗ This is a probability mass function. Can check that n=r P (X = n) = 1.
∗ Mean:
r
EX = .
p
∗ Variance:
r(1 − p)
Var(X) = .
p2
Note that Geometric(p) = N egativeBinomial (1, p).
• Example: Find the expected value of the number of times one must throw a die until the outcome
1 has occured 4 times.
Solution: X ∼ N egativeBinomial 4, 16 . So
4
EX = 1 = 24.
6
Definition. A random variable X is said to have a continuous distribution if there exists a non-
negative function f such that
Z b
P (a ≤ X ≤ b) = f (x)dx
a
R
for every a and b. B ⊂ R we have P (X ∈ B) = B f (x)dx.]
[Sometimes we write that for nice sets
We call f the pdf (probability density function) for X . Sometime we we the notation fX to signify
fX correponds to the pdf of X . We sometimes call fX the density of X .
• In fact, any function f satisfying the following two properties is called a density, and could be
considered a pdf of some random variable X:
(1)
Rf (x) ≥ 0 for all x
∞
(2)
−∞
f (x)dx = 1.
• Important Note!
(1) In this case X :S →R and the could attain uncountably many values (doesn't have to
discrete)
R∞
(2) −∞ f (x)dx = P (−∞ < X < ∞) = 1.
Ra
(3) P (X = a) = a f (x)dx = 0.
Ra
(4) P (X < a) = P (X ≤ a) = F (a) = −∞
f (x)dx.
∗ Recall that F is the cdf of X .
(5) Draw a pdf of X
∗ Note that P (a < X < b) is just the area under the curve.
• Remark: What are some random variables that are considered continuous?
Let X be the time it takes it take for a student to nish a probability exam. X ∈ (0, ∞).
Let X be the value of a Apple's stock price at the end of the day. Again X ∈ [0, ∞).
Let X be the height of a college student.
Any sort of continuous measurement can be considered a continuous random variable.
• Example1: Suppose we are given
(
c
x3 x≥1
f (x) =
0 x<1
is the pdf of X. What must the value of c be?
Solution: We would need
Z ∞ Z ∞
1 c
1= f (x)dx = c dx = ,
−∞ 1 x3 2
53
7.1. INTRO TO CONTINUOUS R.V 54
thus c = 2.
• Example2: Suppose we are given
(
2
x3 x≥1
fX (x) =
0 x<1
is the pdf of X from Example1.
Part (a): Find the c.d.f, FX (x).
∗ Solution: First we check thast if x<1 then
Z x Z x
Fx (x) = P (X ≤ x) = fX (y)dy = 0dy = 0.
−∞ −∞
Now when x≥1 we hav e
Z x
FX (x) = P (X ≤ x) = fX (y)dy
−∞
Z 1 Z x
2
= 0dy + dy
−∞ 1 y3
Z x
2
= 3
dy
1 y
1
= 1 − 2.
x
thus (
1 − x12 x≥1
FX (x) =
x x<1
Part (b): Use the cdf in Part (a) to help you nd P (3 ≤ X ≤ 4).
∗ Solution: We have
P (3 ≤ X ≤ 4) = P (X ≤ 4) − P (X < 3)
= FX (4) − FX (3)
1 1 7
= 1− 2 − 1− 2 = .
4 3 144
• Fact: For continuous R.V we have the following useful relationship
Rx
Since F (x) = −∞
f (y)dy then by the fundamentat theorem of calculus(Do you remenber this
form Calculus 1 or 2?)
F 0 (x) = f (x).
This means that for continuous random variables, the derivative of the CDF is
the PDF!
• Example3: Let
(
ce−2x x≥0
f (x) =
0 x<0
Find c.
Solution: c = 2.
7.2. EXPECTATION AND VARIANCE 55
• Recall that if p(x) is the pmf (density) of a discrete random variable, we had
∞
X
EX = xi p(xi ).
i=1
Find EX .
Solution: We have that
Z ∞
E [X] = xf (x)dx
−∞
Z 1
= x · 2xdx
0
2
= .
3
Theorem 9. If X and Y are continuous random variable then
(a) E [X + Y ] = EX + EY .
(b) E [aX] = aEX where a ∈ R.
Proof. See textbook. It will be shown later.
Proposition. If X is a continuous R.V. with pdf f (x), then for any real valued function g,
Z ∞
E [g(X)] = g(x)f (x)dx.
−∞
Proof. Recall that dxdy means Right-Left and dydx means Top-Bottom.
Z ∞ Z ∞ Z ∞
P (Y > y) dy = fY (x)dxdy
0 0 y
Z Z
= fY (x)dydx, interchange order in Calc III
D
Z ∞Z x
= fY (x)dydx draw the region to do this
0 0
Z ∞
= xfY (x)dx
0
= EX.
• Variance:
Will be dene in the same way as we did with discrete random variable:
h i
2
Var(X) = E (X − µ)
2
Var(X) = EX 2 − (EX) .
As before
Var (aX + b) = a2 Var(X).
• Example3: (Example 1 continued) Suppose X has density
(
2x 0 ≤ x ≤ 1
f (x) = .
0 otherwise
Find Var(X).
2
Solution: From Example 1 we found E [X] = 3 . Now
Z 1 Z 1
E X2 = x2 · 2xdx = 2 x3 dx
0 0
1
= .
2
Thus
2
1 2 1
Var(X) = − = .
2 3 18
• Example4: Suppose X has density
(
ax + b 0 ≤ x ≤ 1
f (x) = .
0 otherwise
1
and that E X2 =6 . Find the values of a and b.
R∞ 1
Solution: We need to use the fact that −∞ f (x)dx = 1 and E X2 = 6 . The rst one gives
us,
Z 1
a
1= (ax + b) dx = +b
0 2
7.2. EXPECTATION AND VARIANCE 57
a = −2, and b = 2.
7.3. THE UNIFORM RANDOM VARIABLE 58
• Example1: Suppose X ∼ U nif orm(a, b) Part (a) Find the mean of X . Part (b) Find the variance
of X .
Part (a): We compute
Z ∞ Z b
1
EX = xfX (x)dx = x dx
−∞ a b − a
2
a2
1 b a+b
= − = .
b−a 2 2 2
∗ Which makes sense right? It should be the midpoint of the interval [a, b].
Part(b): We compute rst the second moment
b
b3 a3
Z
1 1
EX 2 = x2 dx = −
a b−a b−a 3 3
1 1
(b − a) a2 + ab + b2
=
3b−a
a2 + ab + b2
= .
3
Thus after some algebra
2 2
a2 + ab + b2
a+b (b − a)
VarX = − = .
3 2 12
7.4. MORE PRACTICE 59
Normal Distributions
8.1. The normal distribution
• We say that X is a normal (Gaussian) random variable, or X is normally distributed with param-
eters µ and σ 2 if the density of X is given by
1 2 2
f (x) = √ e−(x−µ) /(2σ ) .
2πσ
• We'll usually write X ∼ N µ, σ 2 .
Turns out that in practice, many random variable overy the normal distribution
∗ Grades
∗ Height of a man or a women
• Note the following:
If X ∼ N (0, 1) then
Z ∞
1 2
√ e−x /2 dx = 1.
−∞ 2π
R ∞ −x2 /2 R∞ 2
To show this we use polar coordinates. Let I =
−∞
e dx = 2 0 e−x /2 dx The trick is
to write
Z ∞ Z ∞
2
/2 −y 2 /2
I 2
= 4 e−x e dxy
0 0
Z π/2 Z ∞
2 π
= 4 re−r /2
dr = 4 · = 2π,
0 0 2
√
Thus I= 2π as needed.
60
8.1. THE NORMAL DISTRIBUTION 61
Theorem 11. To help us compute the mean and variance of X its not too hard to show X ∼ N µ, σ 2
if and only if
X −µ
=Z where Z ∼ N (0, 1).
σ
Proof. We only show the (⇐=) direction. Note that
FX (x) = P (X ≤ x) = P (σZ + µ ≤ x)
x−µ
= P Z≤
σ
x−µ
= FY
σ
0
fX (x) = FX (x)
x−µ 1
= FY0
σ σ
x−µ
fZ σ
=
σ
1 1 −(x−µ)2 /(2σ2 )
= √ e .
σ 2π
EX = µ,
Var(X) = σ2 .
Z x
1 2
Φ(x) = P(Z ≤ x) = √ e−y /2
dy.
2π −∞
NOTE: A table of Φ(x) will be given but only for values of x > 0
Note this is symmetric[DRAW this ] thus here is an important fact: Φ(−x) = 1 − Φ(x)
8.1. THE NORMAL DISTRIBUTION 62
10
2=
σ
and hence σ = 5.
• Example (Extra): Suppose X ∼ N (3, 9) nd P (|X − 3| > 6).
8.1. THE NORMAL DISTRIBUTION 63
Answer: Get
P (|X − 3| > 6) = P (X − 3 > 6) + P (− (X − 3) > 6)
= P (X > 9) + P (X < −3)
= P (Z > 2) + P (Z < −2)
= 1 − Φ(2) + Φ(−2)
= 2 (1 − Φ(2))
≈ .0456.
• FACT: The 68 − 95 − 99.7 Rule
About 68% of all area is contained within 1 standard deviation of the mean
About 95% of all area is contained within 2 standard deviation of the mean
About 99.7% of all area is contained within 3 standard deviation of the mean
This can be explained by the following graph:
CHAPTER 9
64
9.1. THE NORMAL APPROXIMATES BINOMIAL 65
∗ Thus
P (X > a) = 1 − P (X ≤ a) = e−λa .
1
Mean: EX = λ Thus λ = µ1 .
1
Variance: We have Var(X) =
λ2 .
• How to interpret X
X = The amount of time until some specic event occurs.
Example:
∗ Time until earthquake occurs
∗ Length of a phone call
∗ Time until an accident happens
• Example1: Suppose that the length of a phone call in minutes is an exponential r.v with average
length 10 minutes.
Part (a) What's probability of your phone call being more than 10 minutes?
1
∗ Answer: Here λ= 10 thus
66
10.1. EXPONENTIAL RANDOM VARIABLES 67
Answer: Let X denote the life of an iphone (or time until it dies). Note that X ∼
exponential( 14 ) since λ= 1
µ = 1
4 . Then
1
P (X > 5) = e− 4 ·5 .
Part(b): Given that the iphone has already lasted 3 years, what is the probability that it will
last another 5 more years?
Answer: We compute
1 − e( 50 ) ln .7 = .435.
80
=
10.2. OTHER CONTINUOUS DISTRIBUTIONS 69
• Gamma Distribution:
We say X ∼ Gamma (α, λ) has density
λe−λx (λx)α−1
(
Γ(α) x≥0
f (x) =
0 x<0
where Γ(α) is the Gamma function
Z ∞
Γ(α) = e−y y α−1 dy.
0
If Y ∼ Gamma n 1
2
2 , 2 = χn , this is called the Chi-Squared distribution.
The chi-sqaure distribution is used a lot in statistics.
α α
∗ Its mean is EX = λ and VarX = λ2 .
• Weibull Distribution:
Usefull in engineering: Look in the book for its pdf.
X =. If there is an object consisting many parts, and suppose that the object experiences
death once any of tis parts fails. X= lifetime of this object.
• Cauchy Distribution:
We say X is cauchy with parameter −∞ < θ < ∞ if
1 1
f (x) = .
π 1 + (x − θ)2
Importance: It does not have nite mean: That is EX = ∞.
To see this, We compute for θ = 0
1 ∞
Z
x
EX = dx
π −∞ 1 + x2
1 ∞ 1
Z
∼ dx
π −∞ x
∼ lim ln |x| − ln lim |x|
x→∞ x→−∞
which is not dened.
10.3. THE DISTRIBUTION FUNCTION OF A RANDOM VARIABLE 70
F 0 (x) = f (x).
FY (x) = P (Y ≤ x)
= P (2X ≤ x)
x
= P X≤
x 2
= FX .
2
Step2: Then use the relation fY (y) = FY0 (y) and take a derivative of both sides to get
d h x i
FY0 (x) = FX ,
dx 2
x x 0
FY0 (x) = FX0
· , by chain rule on RHS
2 2
x 1
fY (x) = fX .
2 2
• Goal: To be able to compute the cdf and pdf of Y = g(X) where g:R→R is a function given
that we know the cdf and pdf of X.
Why is this useful?
∗ For example suppose X represent the income for a random US worker. And let Y = g (X)
be the amount of taxes a US worker pays per year. Note that taxes Y is dependent on
the random variable X. So if we only care about the random varibale Y then nding
its PDF and CDF can help us nd out everything we need to know about Y given we
can nd the PDF. Recall that any probability and expected value can be found using
the pdf.
• Example2: X ∼ U nif orm ((0, 10)) and Y = e3X . Find the
Let pdf fY of Y.
Solution: Recall that since X ∼ U nif orm ((0, 1)) then
(
1
10 0 < x < 10
fX (x) = .
0 otherwise
10.3. THE DISTRIBUTION FUNCTION OF A RANDOM VARIABLE 71
but since
1
0< ln y < 10 ⇐⇒ 0 < ln y < 30
3
⇐⇒ e0 < y < e30
⇐⇒ 1 < y < e30 .
then (
1
30y 1 < y < e30
fY (y) = .
0 otherwise
• Example3: Let X ∼ U nif orm ((0, 1]) and Y = −lnX . Find the pdf of Y? What distribution is
it?
Solution: Recall that
(
1 0<x<1
fX (x) = .
0 otherwise
Step1: First start with the cdf and write it terms of FX
FY (x) = P (Y ≤ x)
= P (−lnX ≤ x)
= P (lnX > −x)
= P X > e−x
1 − P X ≤ e−x
=
1 − FX e−x .
=
10.3. THE DISTRIBUTION FUNCTION OF A RANDOM VARIABLE 72
= fX (e−x ) · e−x
(
1 · e−x 0 < e−x < 1
=
0 otherwise
(
e−x −∞ < −x < 0
=
0 otherwise
(
e−x 0 < x < ∞
=
0 otherwise
= FX (tan−1 x)
1 1
Step2: Take a derivative and recall that since π π = π then
2+2
(
1
π − π2 < x < π
2
fX (x) =
0 otherwise.
Thus
Thus Y is Cauchy(0).
10.3. THE DISTRIBUTION FUNCTION OF A RANDOM VARIABLE 73
The resulting cost to the company is Y = T 2. Let fY be the density function for Y. Determine
fY (y), for y > 4.
Answer:
Step1: Find the cdf of Y is and
P T2 ≤ y
FY (y) =
√
= P (T ≤ y)
√
= F ( y)
4
= 1−
y
for y > 4.
Step2: Take a derivative
fY (y) = FY0 (y)
4
= .
y2
• One thing to note, is that we've been using the following useful property:
Proposition 15. Suppose g:R→R is a strictly increasing function, then the inverse g −1 exists and
−1
g(x) ≤ y implies x≤g (y) .
CHAPTER 11
Multivariate distributions
11.1. Joint distribution functions
p(x, y) = P (X = x, Y = y) .
FX,Y (x, y) = P (X ≤ x, Y ≤ y) .
1 P (X = 1, Y = 2) = 19 0 0 0 0
2 1
2 0 9 9 0 0
3 0 0 29 29 19
Question: Find P (X = 2 | Y = 4)?
∗ Answer: P (X = 2 | Y = 4) = 1/9 1
3/9 = 3 .
• Continuous
For random variables X, Y we let f (x, y) be the joint probability density function, if
Z b Z d
P (a ≤ X ≤ b, c ≤ Y ≤ d) = f (x, y)dydx.
a c
74
11.1. JOINT DISTRIBUTION FUNCTIONS 75
∗ Properties:
· 1)
Rf (x, y) ≥ 0
∞ R∞
· 2)
−∞ −∞
f (x, y)dxdy = 1.
We also have the multivariate cdf:(??) dened by
FX,Y (x, y) = P (X ≤ x, Y ≤ y) .
Ra Rb
∗ Note that FX,Y (a, b) = −∞ −∞ f (x, y)dydx.
Thus note that
∂ 2 F (x, y)
f (x, y) = .
∂x∂y
Marginal Density: If fX,Y is the joint density of X, Y . We recover the marginal densities
of X, Y respectively by the following
Z ∞
fX (x) = fX,Y (x, y)dy,
−∞
Z ∞
fY (y) = fX,Y (x, y)dx.
−∞
·
∗ Thus
Z ∞ Z ∞ ∞ Z
x=∞
−x −2y
e−2y −e−x x=0 dy
1 = ce e
dxdy = c
0 0
Z ∞ ∞0
1 1
= c e−2y dy = c − e−2y =c .
0 2 0 2
Then c = 2.
Part(b): Find P (X < Y ).
∗ Sol: Need to draw the region (Recall Calc III!!) Let D = {(x, y) | 0 < x < y, 0 < y < ∞}
11.1. JOINT DISTRIBUTION FUNCTIONS 76
·
· There are two ways to set up this integral:
· Method1: To set up dA = dydx. We use the Top-Bottom Method:
· Where the region is bounded by
Top Function:y =∞
Bottom Functiony =x
Range of Values0 ≤x≤∞
· Hence we use this information to set up
Z Z
P (X < Y ) = f (x, y)dA
D
Z ∞Z ∞
= 2e−x e−2y dydx
0 x
Z ∞
1 −2y y=∞
= 2e−x
−e y=x
dx
2
Z0 ∞ Z ∞
= e−x e−2x dx = e−3x x
0 0
1
= .
3
· Method2: To set up dA = dxdy . We use the Right-Left Method:
· Where the region is bounded by
Right Function:x =y
Left Functionx =0
Range of Values0 ≤y≤∞
· Hence we use this information to set up
Z Z
P (X < Y ) = f (x, y)dA
D
Z ∞Z y
= 2e−x e−2y dxdy
0 0
= do some work
1
= ,
3
which matches the answer from before.
Part(c): Set up P (X > 1, Y < 1)
11.1. JOINT DISTRIBUTION FUNCTIONS 77
P (X = x, Y = y) = P (X = x) P (Y = y) ,
for every x, y in the range of Y. X and
This is the same as saying that X, Y ar independent if the joint pmf splits into the marginal
pmfs: pX,Y (x, y) = pX (x) · pY (y)
• Continuous: We say continuous r.v. X, Y are independent if
P (X ∈ A, Y ∈ B) = P (X ∈ A) P (Y ∈ B)
for any set A, B
This equivalent: P (X ≤ a, Y ≤ b) = P (X ≤ a) P (Y ≤ b).
Equivalent to FX,Y (x, y) = FX (x)FY (y).
• Random variables that are not independent, are said to be dependent.
• How can we check independence?
Theorem 16. Continuous (discrete) r.v. X, Y are independent if and only if their joint pdf (pmf ) can
be expressed as
∗ Important! But whenever the domain of f is not a rectangle, you MUST draw
the region of domain for fX,Y . And here the region is D = {(x, y) | 0 < x < y < 1}.
(Please try drawing this region on your own. If you struggle with this region, go to
https://www.wolframalpha.com/ and type in0 < x < y < 1)
R1
∗ fX (x) = x 2dy = 2 (1 − x) for 0 < x < 1
Note that
Ry
∗
Then fY (y) =
0
2dx = 2y for 0 < y < 1 .
∗
But fX,Y (x, y) = 2 6= fX (x)fY (y) = 2(1 − x)2y !! Therefore X, Y are NOT independent.
• Example4: Suppose X, Y are independent uniformly distributed over (0, 1). Find P (Y < X).
Solution: Since X, Y are independent then using the Theorem form this section we have
fX,Y (x, y) = fX (x)fY (y) = 1 · 1,
for 0 ≤ x ≤ 1 and 0 ≤ y ≤ 1. Draw region ('What do you think probability will be by
looking at the region?)
∗
∗ and get
Z 1 Z x
P (Y < X) = f (x, y)dydx
0 0
Z 1 Z x Z 1
= 1dydx = xdx
0 0 0
1
= .
2
11.3. SUMS OF INDEPENDENT RANDOM VARIABLES(?) 80
• Fact: If X, Y are independent, its not too hard to show that the cdf of Z =X +Y is
FX+Y (a) = P (X + Y ≤ a)
Z Z
= fX (x)fY (y)dxdy
{x+y≤a}
Z ∞ Z a−y
= fX (x)fY (y)dxdy
−∞ −∞
Z ∞ Z a−y
= fX (x)dxfY (y)dy
−∞ −∞
Z ∞
= FX (a − y) fY (y)dy.
−∞
P (H > T ) = P (H − T > 0)
= 1 − P (H − T < 0)
0 − (−30)
= 1−P Z ≤ √
61
= 1 − Φ(3.84) = 1 − 1 = 0.
• Other facts.
• Fact 2: Let Z ∼ N (0, 1) then Z 2 ∼ χ21 .
If Z1 , . . . , Zn are indepedent N (0, 1) then Y = Z12 + · · · Zn2 ∼ χ2n .
11.3. SUMS OF INDEPENDENT RANDOM VARIABLES(?) 81
• Fact 3: If X ∼ P oisson(λ) and Y ∼ P oisson(µ) , and they are independent, then X+Y ∼
P oisson(λ + µ).
• List out stu and then stop.
11.4. CONDITIONAL DISTRIBUTIONS- DISCRETE(?) 82
pX|Y (x | y) = pX (x)
• Example1: Suppose the joint pmf of (X, Y ) is
x\y 0 1
0 .4 .2
1 .1 .3
Compute some conditional pmf: Then the second column is
.2 2 .3 3
pX|Y (0 | 1) = = and pX|Y (1 | 1) = = .
.5 5 .5 5
Are they independent? Note that pX (0) = .4 + .2 = .6 6= pX|Y (0 | 1), so no!
11.5. CONDITIONAL DISTRIBUTIONS- CONTINUOUS(?) 83
• Def: If X, Y are continuous with joint pdf f (x, y) then the conditional pdf of X given Y = y
is dened as
f (x, y)
fX|Y (x | y) = .
fY (y)
dened only when fY (y) > 0.
• Def: The conditional cdf of X given Y =y is
FX|Y (a | y) = P (X ≤ a | Y = y)
Z a
= fX|Y (x | y) dx.
−∞
• Fact: If X, Y are indepedent then
fX|Y (x | y) = fX (x).
• Example1: The joint pdf ofX, Y is given by
(
12
x (2 − x − y) 0 < x < 1, 0 < y < 1
f x(x, y) = 5 .
0 otherwise
• Goal:
Recall that from section 5.7 we can nd the pdf of a new random variableY = g (X).
Suppose we know the distributions of X1 , X2 then what is the distribution of g1 (X1 , X2 ) and
g2 (X1 , Y1 )
∗ For example if we know X1 , X2 what is the distribution of Y1 = X1 + X2 and Y2 =
X12 − eX1 X2 .
• Steps to nding the joint cdf of new R.V. made from old ones.:
Suppose X1 , X2 are jointly distributed with pdf fX1 ,X2 . Let g2 (x1 , x2 ) , g2 (x2 , x2 ) be multi-
variable functions.
Goal: Find the joint pdf of Y1 = g1 (X1 , X2 ) and Y2 = g1 (X2 , X2 )
Step1: Find the Jacobian:
∂g1 ∂g1
∇g1
∂x1 ∂x2
∂g1 ∂g2 ∂g1 ∂g2
J (x1 , x2 ) = = = − 6= 0.
∂g2 ∂g2
∇g2 ∂x1 ∂x2
∂x1 ∂x2 ∂x2 ∂x1
x1 = h1 (y1 , y2 ) ,
x2 = h2 (y1 , y2 ) .
Step3: The joint pdf of Y1 , Y2 is
−1
fY1 ,Y2 (y1 , y2 ) = fX1 ,X2 (x1 , x2 ) |J (x1 , x2 )|
−1
= fX1 ,X2 (h1 (y1 , y2 ) , h2 (y1 , y2 )) |J (x1 , x2 )| .
• Example1: Suppose X1 , X2 have joint distribution
(
2x1 x2 0 ≤ x1 , x2 ≤ 1
fX1 ,X2 (x1 , x2 ) = .
0 otherwise
y1 = g1 (x1 , x2 ) = x1 + x2 ,
y2 = g2 (x1 , x2 ) = x1 − x2 .
So
1 1
J (x1 , x2 ) = = −2.
1 −1
Step2: Solve for x1 , x2 and get
1
x1 = (y1 + y2 ) ,
2
1
x2 = (y1 − y2 ) .
2
11.6. JOINT PDF OF FUNCTIONS 85
y1 = g1 (x1 , x2 ) = 2x1 + x2 ,
y2 = g2 (x1 , x2 ) = x1 − 3x2 .
So
2 1
J (x1 , x2 ) = = −7.
1 −3
Step2: Solve for x1 , x2 and get
3 1
x1 = y1 + y2
7 y
1 2
x2 = y1 − y2
7 7
Step3: The joint pdf of Y1 , Y2 is given by the formula:
−1
fY1 ,Y2 (y1 , y2 ) = fX1 ,X2 (x1 , x2 ) |J (x1 , x2 )|
3 1 1 2 1
= fX1 ,X2 y1 + y2 , y1 − y2 .
7 y 7 7 7
So we need to nd the joint pdf of X1 and X2 .
∗ But since X1 ∼ N (0, 1) and X2 ∼ N (0, 4) and indepedent Then
1 2 1 2
fX1 (x1 ) = √ e−x /2 and fX2 (x2 ) = √ e−x /(2·4) .
2π 2 · 4π
Thus by inpedence
2π 8π 7
• Example3(if time): Suppose X1 , X2 have joint distribution
(
2
x1 + 32 (x2 ) 0 ≤ x1 ≤ 1, 0 ≤ x2 ≤ 1
fX1 ,X2 (x1 , x2 ) = .
0 otherwise
11.6. JOINT PDF OF FUNCTIONS 86
Expectations
12.1. Expectation of Sums of R.V.
XY 0 2
• Example1: Suppose the joint p.m.f of X and Y is given by 0 .2 .7 . Find E [XY ].
1 0 .1
Solution: Using the formula we have with the function g(x, y) = xy :
X
E [XY ] = xi yj p(xi , yj )
i,j
= 0 · 0p(0, 0) + 1 · 0p(1, 0) + 0 · 2p(0, 2) + 1 · 2p(1, 2)
= 0 · 0 · .2 + 1 · 0 · 0 + 0 · 2 · .7 + 1 · 2 · .1
= .2
• Example2: Suppose X, Y are independent exponential r.v. with parameter λ = 1. Set up a double
integral that represents
E X 2Y .
87
12.1. EXPECTATION OF SUMS OF R.V. 88
Proof. Part (a) was proved for the discrete case. So we only need to show the continuous case:
Z Z
E [X + Y ] = (x + y) f (x, y)dydx
Z Z Z Z
= xf (x, y)dydx + yf (x, y)dydx
Z Z
= xfX (x)dx + yfY (y)dy
= EX + EY.
= (EX) (EY ) .
The discrete case is the same, except replace integrals with summations.
• In general, the following is true:
E [g (X) h (Y )] = E [g (X)] E [h (Y )] .
12.2. COVARIANCE AND CORRELATIONS. 90
• Note that EX and VarX give information about a single random variable.
• What statistic can give us information about how X eects Y, or vice versa?
• Finally we have the following: Its standardized way to know how correlated two random variables
are:
Definition. The correlation coecient of two random variables X and Y, denoted by ρ(X, Y ) is
dened by
Cov (X, Y )
ρ (X, Y ) = p .
Var (X) Var (Y )
• Fact:
(1) −1 ≤ ρ(X, Y ) ≤ 1
σ
(2) If ρ(X, Y ) = 1 then Y = a + bX where b = σxy > 0 (Straight positive sloped line)
σy
(3) If ρ(X, Y ) = −1 then Y = a + bX where b = −
σx < 0 (Straight negatively sloped line)
(4) This ρ is a measure of linearity between Y and X .
∗ ρ > 0 positive linearity: Meaning that if you were to draw a line of best t, then it
must be a positive sloped line
· The closer ρ gets to 1, the more (X, Y ) seems to be in a positive sloped straight
line
∗ ρ < 0 negative linearity: Meaning that if you were to draw a line of best t, then it must
be a negative sloped line
· The closer ρ gets to −1, the more (X, Y ) seems to be in a negative sloped straight
line
(5) If ρ (X, Y ) = 0, then X and Y are uncorrelated.
• Warning:
ρ (X, Y ) does not pick up any other relationship, such as quadratic, or cubic
ρ(X, Y ) is not the slope of the line of best t. It is simply tell us if it's positive, or negative
relationship, and the strength of relationship.
• Example1:Suppose X, Y are random variables whose joint pdf is given by
(
1
y 0 < y < 1, 0 < x < y
f (x, y) = .
0 otherwise
• For each random variable X, we can dene its moment generating function mX (t) by
E etX
mX (t) =
(P
etxi p(xi ) , if X is discrete
= R ∞xi tx .
−∞
e f (x)ds ,if X is continuous
• mX (t) is called the moment generating function (m.g.f.) because we can nd all the moments of
X by dierentiating m(t) and then evaluating at t = 0.
• Note that
d tX
m0 (t) = E e
dt
d tX
= E e
dt
tX
= E Xe .
Now evaluate at t=0 and get
• Similarly,
d tX
m00 (t) = E Xe
dt
E X 2 etX
=
so that
m00 (0) = E X 2 e0 = E X 2 .
E [X n ] = m(n) (0) .
Pn
• Binomial: Recall that X ∼ Bin(n, p) if X= i=0 Yi where Yi ∼ Bern(p) thus
Pn
tX tX i=0 Yi
mX (t) = Ee = Ee
= E etY1 · · · etYn
= E etY1 · · · E etYn ,
by independence
n
= pet + (1 − p)
• Poisson: If X ∼ P oisson(λ) then
∞
X λn
mX (t) = EetX = etn e−λ
n=0
n!
∞
X λn
= e−λ etn
n=0
n!
∞ n
X (et λ)
= e−λ
n=0
n!
xn
P∞
now recall from Calculus 2 that ex = n=0 n! so that
∞
X xn
mX (t) = e−λ , with x = et λ
n=0
n!
−λ et λ
=e e
et λ−λ
=e
= exp λ et − 1
mX (t) = EetX
Z ∞
= etx λe−λx dx
0
λ
= ,
λ−t
which is valied whenever t > λ.
• Standard Normal: If X ∼ N (0, 1) then
Z ∞
1 2
mX (t) = Ee tX
=√ etx e−x /2
2π −∞
2
= et /2
.
2
• Normal: If X ∼ N (µ, σ ) then X = µ + σZ so that
tX
mX (t) = Ee
= Eetµ etσZ = etµ Ee(tσ)Z
2
= etµ mX (tσ) = etµ e(tσ) /2
t2 σ 2
= exp tµ + .
2
13.1. MOMENT GENERATING FUNCTIONS 95
e−sx fX (x)dx].
R
This is similar to the laplace transform of fX (x). [L [f ] (s) =
Recall that there is one to one correspondence of laplace transforms. That completely deter-
mines a function.
Theorem 23. If mX (t) = mY (t) < ∞ for all t in an interval, then X and Y have the same distribution.
That is, m.g.f 's completely determines the distribution.
t
−1)
• Example1: Suppose that m.g.f of X is given by m(t) = e3(e . Find P (X = 0).
Solution: (We want to work backwords). Match this m.g.f to a known m.g.f in our table.
Looks like
t t
−1) −1)
m(t) = e3(e = eλ(e where λ = 3.
Thus X ∼ P oisson(3). Thus
λ0
P (X = 0) = e−λ = e−3 .
0!
• Summary:
tX
(1) m(t) = Ee . We have a table of mgf of distributions:
(2) The m.g.f helps us nd moments: E [X n ] = m(n) (0)
(3) If X, Y are independent then mX+Y (t) = mX (t)mY (t).
(4) The m.g.f. helps us determine the distribution of random variables. If mX (t) = mY (t) then
X and Y have the same distribution.
• Recall we had a section on sums ofindependent random variables.
• Example2: Recall X ∼ N µx , σx2 and Y ∼ N (µy , σy2 ), indepedent. Then what is
X + Y ∼ N (?, ?)
Sol: Note that
• Example3: Suppose X ∼ bin(n, p) and Y ∼ bin(m, p) , independent, then what is the distribution
of X + Y ?
Solution: We use
mX+Y (t) = mX (t)mY (t)
n m
= pet + (1 − p) pet + (1 − p)
n+m
= pet + (1 − p) .
Look at the table and see what distribution has this m.g.f. Thus
X + Y ∼ bin(n + m, p).
• Example4: Suppose X is a discrete random variable and has the m.g.f.
1 2t 3 3t 2 5t 1 8t
mX (t) = e + e + e + e .
7 7 7 7
Question: What is the p.m.f of X ? Find EX .
Solution(a): This doesn't match any of the known mg.f.s. Thus we can read o from the
mgf that since
4
1 2t 3 3t 2 5t 1 8t X
e + e + e + e = etxi p(xi )
7 7 7 7 i=1
1 3 2 1
then p(2) = , p(3) = , p(5) = p(8) =
7 7 7 and 7.
Solution(b): First
2 2t 9 3t 10 5t 8 8t
m0 (t) = e + e + e + e ,
7 7 7 7
so that
2 9 10 8 29
E [X] = m0 (0) = + + + = .
7 7 7 7 7
• Example5: Suppose X has m.g.f
− 21 1
mX (t) = (1 − 2t) for t< .
2
Find the rst and second moments of X.
Solution: We have
1 −3 −3
m0X (t) = − (1 − 2t) 2 (−2) = (1 − 2t) 2 ,
2
3 −5 −5
m00X (t) = − (1 − 2t) 2 (−2) = 3 (1 − 2t) 2 .
2
So that
− 32
EX = m0X (0) = (1 − 2 · 0) = 1,
− 25
EX 2
= m00X (0) = 3 (1 − 2 · 0) = 3.
CHAPTER 14
Limit Laws
14.1. The Central Limit Theorem
Theorem 24. (CLT) Let X1 , X2 , X3 . . . be i.i.d. each with mean µ and variance σ2 . Then the distri-
bution of
X1 + · · · + Xn − nµ
√
σ n
tends to the standard normal Z as n → ∞. That is,
X1 + · · · + Xn − nµ
P √ ≤ b ≈ P (Z ≤ b) = Φ(b).
σ n
when n is large.
• The CLT helps us approximate the probability of anything involving X1 + · · · + Xn where Xi are
independent and identically distributed.
• When approximating discrete distributions: USE the ±.5 continuity correction:
• Example1: If 10 fair dice are rolled, nd the approximate probability that the sum obtained is
between 30 and 40, inclusive.
Solution: Let Xi denote the value of the ith die. Recall that
7 35
E (Xi ) = Var(Xi ) = .
2 12
Take
X = X1 + · · · + Xn
to be their sum.
Using the CLT we need
7
nµ = 10 · = 35
r 2
√ 350
σ n =
12
97
14.1. THE CENTRAL LIMIT THEOREM 98
≈ P (−1.0184 ≤ Z ≤ 1.0184)
= Φ (1.0184) − Φ (−1.0184)
= 2Φ (1.0184) − 1 = .692.
• Example2: An instructor has 1000 exams that will be graded in sequence.
The times required to grade exam exam are i.i.d. with µ = 20 minutes and SDσ = 4 minutes.
Approximate prob that the intructor will grade at least 25 exams in the rst 450 minutes of
work.
Solution:
Let Xi be the time it takes to grade exam i. Then
X = X1 + · · · + X25
is the time it takes to grade the rst 25 exams. We want P (X ≤ 450).
Use CLT,
nµ = 25 · 20 = 500
√ √
σ n = 4 25 = 20.
Thus
X − 500 450 − 500
P (X ≤ 450) = P ≤
20 20
≈ P (Z ≤ −2.5)
= 1 − Φ(2.5)
= .006.