0% found this document useful (0 votes)
412 views52 pages

Introduction To Probability: Mid-Term I

This document provides an introduction and syllabus for a course on probability. It covers topics such as combinatorial analysis, probability theory, random variables, expectation, and limit theorems. The course will include two midterms and a final exam. Discrete and continuous random variables will be examined, as well as jointly distributed random variables and expectation. Combinations, permutations, and the binomial theorem are introduced through examples.

Uploaded by

skipobrien
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
412 views52 pages

Introduction To Probability: Mid-Term I

This document provides an introduction and syllabus for a course on probability. It covers topics such as combinatorial analysis, probability theory, random variables, expectation, and limit theorems. The course will include two midterms and a final exam. Discrete and continuous random variables will be examined, as well as jointly distributed random variables and expectation. Combinations, permutations, and the binomial theorem are introduced through examples.

Uploaded by

skipobrien
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

Introduction to Probability

Text book: Ross S (2009) A First Course in Probability 8th ed.


Instructor: Pi-Wen Tsai Office: M205, E-mail:[email protected]
Course web-site: http://math.ntnu.edu.tw/˜pwtsai/prob9802/index.html

Syllabus

1. Combinatorial Analysis: tool for computing probability

2. Probability theory: set theory and axioms

3. Conditional probability and Independence

4. Random variables Mid-term I


5. Continuous RVs

6. Jointly Distributed RVs Mid-term II


7. Expectation

8. Limit theorems Final exam

Discrete RVs

1. Probability mass functions & Cumulative Distribution function

2. Expected value, variance

3. Expectation of a function of a RV

4. Bernoulli and Binomial RVs

5. Poisson, Geometric, Negative Binomial, Hypergeometric RVs

6. Moment generating functions (Sec 7.7)

7. Expected Value of sums of RVs

1
Continuous RVs

1. Probability density functions & Cumulative Distribution function

2. Expected value and variance

3. Uniform RV

4. Normal RVs

5. Exponential RVs, Gamma, Chi-square distributed, Cauchy, Beta RVs.

6. Moment generating functions (7.7)

7. The Distribution of a function of a RV.

8. The inverse transformation method (10.2.1)

Jointly distributed RVs

1. Joint probability density functions

2. Independent RVs

3. Sums of independent RVs:

4. Conditional distribution

5. Joint probability distribution of functions of RVs,

Expectation

1. Expectation of sums of RVs, (the use of indicator RVs)

2. Covariance, Variance of sums of RVs, Covariance and Correlation

3. Conditional Expectation

4. More on Normal RVs

5. Order statistics (6.6)

Limit Theorem

2
1. Markov and Chebyshev’s inequality

2. Weak Law of Large numbers

3. The Central Limit theorem

Thanks
These notes are based on the lecture notes Undergraduate Probability by
Professor Richard F. Bass at Department of Mathematics, University of Connecticut,
USA. Some additional changes are made. I thank Professor Bass for his original Tex
file.

Introduction
Why do we need to talk about probability?
”Is it so?” to ”What is the probability that it is so?”
A mathematical tool for understanding uncertainty and randomness in our life.
We We often want to assess how likely are the outcomes of some events.
Probability is defined for that measurement.
Many problems in probability theory can be solved by counting the number of
different ways that a certain event can occur. The method of counting is formally
known as combinatorial analysis.

Example 1. Suppose there are 20 people taking Prob 1. There are 7 women and
13 men. What is the chance that a person selected at random from the class is a
woman?
What is the chance that two persons selected at random from the class are both
women?

1 Combinatorical Analysis
1.1 Multiplication principle
Multiplication principle
The first basic principle is to multiply.

3
Example 2. Suppose we have 4 shirts of 4 different colors and 3 pants of different
colors. How many possibilities are there?
For each shirt there are 3 possibilities, so altogether there are 4 × 3 = 12 possi-
bilities.

Example 3 (license plates). How many different license plates of 3 letters followed
by 3 numbers are possible?
(26)3 (10)3 , because there are 26 possibilities for the first place, 26 for the second,
26 for the third, 10 for the fourth, 10 for the fifth, and 10 for the sixth. We multiply.
Example 4 (license plates). How many license plates of 3 letters followed by 3 num-
bers are possible when it is required no repeated letter and number?

1.2 Permutations
Permutations
Example 5. How many ways can one arrange a, b, c?
One can have
abc, acb, bac, bca, cab, cba.
There are 3 possibilities for the first position. Once we have chosen the first position,
there are 2 possibilities for the second position, and once we have chosen the first
two possibilities, there is only 1 choice left for the third. So there are 3 × 2 × 1 = 3!
arrangements.

Factorials:
In general, if there are n letters, there are n! possibilities.
Example 6. What is the number of possible batting orders with 9 players?
9!

Example 7. How many ways can one arrange 4 math books, 3 chemistry books,
2 physics books, and 1 biology book on a bookshelf so that all the math books
are together, all the chemistry books are together, and all the physics books are
together.

4
We can arrange the math books in 4! ways, the chemistry books in 3! ways, the
physics books in 2! ways, and the biology book in 1! = 1 way.
But we also have to decide which set of books go on the left, which next, and
so on. That is the same as the number of ways of arranging the letters M, C, P, B,
and there are 4! ways of doing that. Hence the answer is 4!(4!3!2!1!).

Example 8. How many ways can one arrange the letters a, a, b, c?


Let us label them A, a, b, c. There are 4!, or 24, ways to arrange these letters. But
we have repeats: we could have Aa or aA. So we have a repeat for each possibility,
and so the answer should be 4!/2! = 12.
If there were 3 a’s, 4 b’s, and 2 c’s, we would have
9!
3!4!2!
different permutations.

Example 9. Suppose that there are 4 Czech tennis players, 4 U.S. players, and 3
Russian players. If the tournament results lists just the nationalities of the players
in the order in which they placed, how many outcomes are possible?
11!/(4!4!3!).
What we just did was called the number of permutations.

In general, if there are n objects, of which n1 are alike, n2 are alike, . . ., nr are alike
and n1 + · · · + nr = n, there are
n!
n1 !n2 ! · · · nr !
different permutations.

1.3 Combinations
Combinations
Example 10. How many ways can we choose 3 letters out of 5?

5
If the letters are a, b, c, d, e and order matters, then there would be 5 for the first
position, 4 for the second, and 3 for the third, for a total of 5 × 4 × 3. But suppose the
letters selected were a, b, c. If order doesn’t matter, we will have the letters a, b, c 6 times,
because there are 3! ways of arranging 3 letters. The same is true for any choice of three
letters. So we should have 5 × 4 × 3/3!. We can rewrite this as
5·4·3 5!
=
3! 3!2!
5

This is often written 3 , read “5 choose 3.”

Combinations
More generally,  
n n!
= .
k k!(n − k)!

Example 11. Suppose there are 8 men and 8 women. How many ways can we choose
a committee that has 2 men and 2 women?
We can choose 2 men in 82 ways and 2 women in 82 ways. The number of
 

committees is then the product: 82 82 .


 

Example 12 (4c). Consider a set of 8 antennas of which 3 are defective and assume
that all of the defectives and all of the functionals are indistinguishable. How many
linear orderings are there in which no two defectives are consecutive?

o 1 o 1 o 1 o 1 o 1 o
If no two defectives are to be consecutive, then the space between the functional
antennas must each contain at most one defective antenna. Hence there are 63


possible orderings in which no two defectives are consecutive.

A useful combinatorial identity


The Pascal’s Triangle
     
n n−1 n−1
= + , 1 ≤ r ≤ n, n = 1, 2, · · · .
r r−1 r

Give a combinatorial explanation of the identity. For example, let us argue that
10
= 93 + 94 without doing any algebra.
  
4

6
Suppose we have 10 people, one of whom we decide is special, denoted A. 10

4
represents the number of committees having 4 people out of the 10. Any such
committee will either contain A or will not. 93 is the number of committees that


contain A and 3 out of the remaining 9 people, while 94 is the number of committee


that do not contain A and contain 4 out of the remaining 4 people.

The Binomial Theorem


n  
n
X n k n−k
(x + y) = x y .
k=0
k

Proof by mathematical induction.


Combinatorial argument
The left hand side is (x + y)(x + y) · · · (x + y) This will be the sum of 2n terms, and
each term will have n factors. How many terms have k x’s and n − k y’s? This is
the same as asking in a sequence of n positions, how many ways can one choose k
of them in which to put x’s? The answer is nk , so the coefficient of xk y n−k should


be nk .


Example 13. Expand (x + y)3 .


How many subsets are there of a set consisting of n elements?

n  
X n
= (1 + 1)n = 2n .
k=0
k

1.4 Multinomial Coefficients


Multinomial Coefficients
Example 14. Suppose one has 9 people and one wants to divide them into one
committee of 3, one of 4, and a last of 2. How many possible ways to set of these
committees?

7
There are 93 ways of choosing the first committee. Once that is done, there are


6 people left and there are 64 ways of choosing the second committee. Once that


is done, the remainder must go in the third committee. So the answer is


9! 6! 9!
= .
3!6! 4!2! 3!4!2!

Multinomial Coefficients
In general, to divide n objects into one group of n1 , one group of n2 , . . ., and a rth
group of nr , where n = n1 + · · · + nr , the answer is
n!
.
n1 !n2 ! · · · nr !
n

These are known as multinomial coefficients and can be written as n1 ,n2 ,...,nr
.

Example 15. 10 kids are to be divided into an A team and a B team of 5 each. The
A team will play in one league and the B team in another. How many different
divisions are possible? (distinct balls and distinct boxes)

10!
= 252
5!5!
10 kids at a playground divide themselves into two teams of 5 each. How many
different divisions are possible? (distinct balls and indistinguishable boxes)

10!/5!5!
= 126
2!

The Multinomial Theorem


X n

n
(x1 + x2 + · · · + xr ) = xn1 xn2 · · · xnr r ,
n1 , n2 , . . . , nr 1 2

where the sum is over all nonnegative integer-valued vectors (n1 , · · · , nr ) such that
n1 + · · · + nr = n.

8
Example 16 (5d). In the first round of a knockout tournament with 8 player, the
players are divided into 4 pairs with each of these pairs then playing a game. How
many possible outcome are there for the first round? How many outcomes of the
tournament are possible?
Example 17. Expand (x1 + x2 + x3 )2 .

1.5 The number of integer solutions of equations


Example
Suppose one has 8 distinguishable balls. How many ways can one put them in
3 distinguishable urns? 38 . Suppose balls are indistinguishable, how many different
outcomes are possible? ways can one put them in 3 distinguishable urns? Let us
make sequences of o’s and |’s; any such sequence that has | at each side, 2 other |’s,
and 8 o’s represents a way of arranging balls into boxes. For example, if one has

| o o | o o o | o o o |,

this would represent 2 balls in the first box, 3 in the second, and 3 in the third.
Altogether there are 8 + 4 symbols, the first is a | as is the last. so there are 10
symbols that can be either | or o. Also, 8 of them must be o. There are 10

8
ways
can one pick 8 spaces out of the 10 to put a o.

Example 18 (nonnegative integers). How many nonnegative integer solutions are


there to the equation x1 + x2 + x3 = 8.
View this as putting 8 balls in 3 boxes, with x1 denotes the balls in the first box,
x2 in the second, and so on. So the answer is 10 = 10
 
8 2
.
How many nonnegative integer solutions are there to the equation x1 + x2 + x3 +
x4 + x5 = 20. View this as putting 20 balls in 5 boxes, with x1 in the first box,
x2 in the second, and so on. So there are 20 o’s, a | for the first and last spot in a
sequence, and 4 other |’s. We can choose 20 spaces for the o’s out of the 24 total in
24

20
ways.

9
Example 19 (positive integers). Consider the same question as the example above,
but where each xi is at least 1.
First put one ball in each box. This leaves 15 balls to put in 5 boxes, and as
above this can be done 19

15
ways.

There are n+r−1



r−1
nonnegative integer-valued solutions satisfying the equation
x1 + x2 + · · · + xr = n.

There are n−1



r−1
positive integer-valued solutions satisfying the equation x1 + x2 +
· · · + xr = n, xi > 0, i = 1, · · · , r.

10
2 Axioms of Probability
2.1 Sample space and Events
Probability Theory: set theory

• Random Experiment: the outcome of an experiment is not predictable with


certain in advance, however the set of all possible outcomes is known.

• The set of all possible outcomes of an experiment is known as the sample


space of the experiment and is denoted by S (or sometimes Ω).

• outcome: an outcome is a particular element of S.

• event: any subset E of the sample space S is known as an event.

Examples on sample space

1. Sex of a newborn baby: S = {g, b} for girl and boy.

2. flip two coins:

S = {(H, H), (H, T ), (T, H), (T, T )}.

3. Let S be the possible orders in which 5 horses finish in a horse race.

S={ all 5! permutations of (1,2,3,4,5)}.

4. Toss two dice, then the sample space consists of

the 36 points: S = {(i, j) : i, j = 1, 2, 3, 4, 5, 6} where the outcome (i, j) is


said to occur if i appears on the first die and j on the second die.

More examples on sample space

1. the possible prices of some stock at closing time today; S = [0, ∞);

2. the age at which someone dies; or

3. S the points in a circle,

4. the possible places a dart can hit.

11
Examples on event

1. flip two coins: if E = {(H, H)},

then E is the event that a head appears twice.

2. if E = {(H, H), (H, T )},

then E is the event that a head appears on the first toss.

3. Toss two dice: if E = {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)},

then E is the event that the sum of the dice equals to seven.

4. horses racing: if E ={ all outcomes in S starting with a 5 },

then E is the event that horse 5 wins the race.

Set operations of events


For any two events E and F of a sample space S, we define the new event

1. E ∪ F is the union of E and F and denotes the points of S that are in E or


F or both.

2. E ∩ F is the intersection of E and F and is the set of points that are in both
E and F . Sometimes written as EF .

3. ∅, the empty set, denotes the null event.

4. If E ∩ F = ∅, then E and F are said to be mutually exclusive.

Toss two dice: if E = {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)} is the event the
sum of the dice is 7 and F = {(1, 5), (2, 4), (3, 3), (4, 2), (5, 1)} is the event the
sum of the dice is 6, then the event EF does not contain any outcome and
hence could not occur.

Set operations of events


Sn
1. We extend the definition E ∪ F to i=1 Ei which is the union of E1 , · · · , En ,
and similarly ni=1 Ei .
T

2. E c (or E 0 ) is the complement of E, that is the points in S that are not in E.

12
3. E ⊂ F means that E is contained in F or E is a subset of F . The occurence
of E implies the occurrence of F . If E ⊂ F and F ⊂ E, we say E = F .

Venn diagrams

Some laws of set theory

1. Commutative laws
E ∪ F = F ∪ E, EF = F E

2. Associative laws
(E ∪ F ) ∪ G = E ∪ (F ∪ G), (EF )G = E(F G)

3. Distribution laws
(E ∪ F )G = EG ∪ F G, EF ∪ G = (E ∪ G)(F ∪ G)

Venn diagrams

DeMorgan’s laws
DeMorgan’s laws
n
[ c n
\
Ai = Aci
i=1 i=1
n
\ c n
[
Ai = Aci .
i=1 i=1

proof (exercise)

2.2 What is probability?


Probabilities Defined on Events
How likely are the occurrences of certain events?
Consider an experiment whose sample space is S. For each event E of the sample
space S, we define a measure P (E) as the probability of the event E.
Probabilities as being functions defined on the events of a sample space. They
are probabilities of subsets of S, not of points of S. However it is common to write
P (x) for P ({x}). (Property of events)
toss a fair coin. P ({H}) = P ({T }) = P (H) = P (T ) = 1/2

13
Relative frequency approach
Suppose an experiment, whose sample space is S, is repeatedly performed under
exactly the same conditions.
Intuitively, the probability of E should be the number of times that the event
E occurs, denoted as fE , in n identical repeated experiments, taking a limit as n
tends to infinity.
fE
P (E) = lim
n→∞ n

P (E) is the (limiting) proportion of time that E occurs.


This is hard to use. How can we know it will converge to some constant limiting
value? It is better to start with a certain set of axioms.

Model axiomatic approach


Suppose an experiment, whose sample space is S. For each event E of the sample
space S, we assume that a number P (E) is defined and satisfies the following three
axioms.
Three axioms of probability:

(A1) Non-negative. 0 ≤ P(E) ≤ 1 for any event E.

(A2) Total one. P (S) = 1.

(A3) Additivity. For any sequence of mutually exclusive event E1 , E2 , . . .,



[  X∞
P Ei = P (Ei ).
i=1 i=1

Recall: mutually exclusive means that Ei ∩ Ej = ∅ when i 6= j.

Some consequences of the three axioms

14
Some propositions

1. P (∅) = 0.

2. If A1 , . . . , An are mutually exclusive, then P ( ni=1 Ai ) = ni=1 P (Ai ).


S P

3. P (E c ) = 1 − P (E).

4. If E ⊂ F , then P (E) ≤ P (F ).

5. P(E ∪ F ) = P(E) + P(F ) − P(E ∩ F ).

Proof
For (1), let A1 = S, Ai = ∅ for each i = 2, 3, . . .. These are clearly pairwise
disjoint, so P (S) = P ( ∞
S P∞ P∞
i=1 Ai ) = P (S) + i=2 P (Ai ) = i=1 P (∅). Since P (S) =
1(A2), then the last term would be 0.
The second (finite version of (A3)) follows if we let An+1 = An+2 = · · · = ∅. We
P∞ Pn
still have pairwise disjointness and ∪∞ n
i=1 Ai = ∪i=1 Ai , and i=1 P (Ai ) = i=1 P (Ai ),
using (1).

To prove (3), use S = E ∪ E c . By (2), P (S) = P (E) + P (E c ). By axiom (2),


P (S) = 1, so (1) follows.
To prove (4), write F = E ∪ (F ∩ E c ), so P (F ) = P (E) + P (F ∩ E c ) ≥ P (E) by
(2) and A(1).
Similarly, to prove (5), we have P (E ∪ F ) = P (E) + P (E c ∩ F ) and P (F ) =
P (E ∩ F ) + P (E c ∩ F ). Solving the second equation for P (E c ∩ F ) and substituting
in the first gives the desired result.

inclusion-exclusion identity

P (E ∪ F ∪ G) = P (E) + P (F ) + P (G)
− P (EF ) − P (EG) − P (F G) + P (EF G)

15
n
X X
P (E1 ∪ E2 ∪ · · · ∪ En ) = P (Ei ) − P (Ei1 Ei2 ) + · · ·
i=1 i1 <i2
X
r+1
+(−1) P(Ei1 Ei2 · · · Eir )
i1 <i2 <···<ir
n+1
+ · · · + (−1) P (E1 E2 · · · En )

either from a picture or an induction proof

2.3 Classical Approach


sample spaces having equallly likely outcomes
It is very common for a probability space to consist of finitely many points, all
with equally likely probabilities.

• Sample space is a finite set.

• Each outcome in the sample space is equally likely to occur.

Then we interpret the probability of an event as the proportion of the event in the
S. The probability of an event E is

#(E)
P (E) =
#(S)

This explains why combinatorial analysis plays an important role in probability.

Example 20. In tossing a fair coin, we have S = {H, T }, with P (H) = P (T ) = 21 .


If S = {1, 2, . . . , N }, with P ({1}) = · · · = P ({N }) which implies that P ({i}) =
1/N, i = 1, . . . , N.
[Roll a fair die] the probability space consists of {1, 2, 3, 4, 5, 6}, each point having
probability 61 .
What is the probability of rolling an even number? P ({2, 4, 6}) = P ({2}) +
P ({4}) + P ({6}) = 1/2

Example
Roll two dice What is the probability that the sum is (a) equal to 7? (b) equal
to 2? (c) even?

16
Answer. First we need a Sample Space. Is S = {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12} ? Are
event in S equally likely?
The correct sample space: There are 36 possibilities, (a) 6 of the S have a
sum of 7: (1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1). Since they are all equally likely, the
6
probability is 36 = 61 . (b) 1/36. (c)18/26=1/2.

Example 21 (cards). What is the probability that in a poker hand we get exactly 3
of a kind and he other two cards are of different ranks?
   . 52
4 4 4
Answer. The probability of 3 aces, 1 king and 1 queen is 3 1 1 5
. We have
12

13 choices for the rank we have 3 of and 2 choices for the other two ranks, so the
answer is   4 4 4
12 3 1 1
13 52
 .
2 5

Example 22 (urn with two types of balls). An urn contains n balls, one of which
is special. Draw k of these balls, with each selection being eaully likely to be any
of the balls that remain at the time, what is the probability that the special ball is
chosen?

1 n−1
 
1 k−1 k
P ( special ball is selected ) = n
 =
k
n
let Ai denote the event that the special ball is the ith ball to be chosen, i =
1, . . . , k.
k k
[  X k
P ( special ball is selected ) = P Ai = P (Ai ) =
i=1 i=1
n

Example 23 (card). A deck of 52 cards is dealt out to 4 players. What is the


probability that (a) one of the player receives all 13 spades. (b) each player receives
1 ace?

17
(a) Let Ei be the event that hand i has all 13 spades, then P (Ei ) = 1/ 52

13
,i =
1, 2, 3, 5.
Ei are mutually exclusive, the probability that one player has all 13 spades is
4  
X 52
P (∪4i=1 Ei ) = P (Ei ) = 4/
i=1
13

Example 24 (card). A deck of 52 cards is dealt out to 4 players. What is the


probability that (a) one of the player receives all 13 spades. (b) each player receives
1 ace?
48

(b) Put aside the aces, there are 12,12,12,12 possible divisions of the other 48
cards. Because there are 4! ways of dividing the 4 aces to the 4 players, the number
48

of possible outcomes in which each player receives exactly 1 aces is 4! 12,12,12,12

Example 25 (5i, Birthday problem). In a class of n people, what is the probability


that no two of them celebrate their birthday on the same day of the year? (We
assume each day of the 365 days (no leap) is equally likely.) How large need n be
so that this probability is less than 1/2?
Answer. Let the first person have a birthday on some day. The probability that the
364
second person has a different birthday will be 365 . The probability that the third
363
person has a different birthday from the first two people is 365 . So the answer is

364 363 365 − n + 1


··· .
365 365 365
When n ≥ 23, this probability is less than 1/2.

Example 26 (Matching problem, 5m). Suppose 10 people put a key into a hat and
then withdraw one randomly. What is the probability that at least one person gets
his/her own key?
Answer. Let Ei be the event that the ith person gets his/her own key, we want to
compute P ( 10
S
i=1 Ei ).

18
One can show that
10
[  X X X
P Ei = P (Ei1 ) − P (Ei1 Ei2 ) + P (Ei1 Ei2 Ei3 ) − · · ·
i=1 i1 i1 <i2 i1 <i2 <i3
X
+(−1)n+1 P (Ei1 Ei2 · · · Ein ) + · · · − P (E1 · · · E10 )
i1 <i2 <···<in

Now the probability that at least the 1st, 3rd, 5th, and 7th person gets his or
her own key is the number of ways the 2nd, 4th, 6th, 8th, 9th, and 10th person can
choose a key out of 6, namely 6!, divided by the number of ways 10 people can each
choose a key, namely 10!. So

P (E1 E3 E5 E7 ) = 6!/10!.

10

There are 4
ways of selecting 4 people to have their own key out of 10, so
 
X 10 6! 1
P (Ei1 ∩ Ei2 ∩ Ei3 ∩ Ei4 ) = =
i ,i ,i ,i
4 10! 4!
1 2 3 4

The other terms are similar, and the answer is


10
[  1 1 1 1
P Ei = − + − · · · − ≈ 1 − e−1 .
i=1
1! 2! 3! 10!

xj x2 x3
Pn
Note that ex = j=0 j! =1+x+ 2!
+ 3!
+ ···.

2.4 subjective probability


subjective probability
Probability statement expresses the opinion of some individual regarding how
certain an event is to occur.

19
3 Conditional probability and independence
3.1 Conditional probability
Conditional probability
Example 27. Suppose there are 200 men, of which 100 are smokers, and 100 women,
of which 20 are smokers. What is the probability that a person chosen at random
will be a smoker?
The answer is 120/300.
Example 28. What is the probability that a person chosen at random is a smoker
given that the person is a women? One would expect the answer to be 20/100 and
it is.

What we have computed is


number of women smokers number of women smokers/300
= ,
number of women number of women/300
which is the same as the probability that a person chosen at random is a woman and
a smoker divided by the probability that a person chosen at random is a woman.
With this in mind, we make the following definition.

Definition 29 (Conditional probability). If P (F ) > 0, we define

P (E ∩ F )
P (E | F ) = .
P(F )

P (E | F ) is read the conditional probability of E given F .

For an event, “new information” (some other event has occured) could change its
probability.

Example 30 (Dice). Suppose you roll two dice. What is the probability the sum is
8?
What is the probability that the sum is 8 given that the first die shows a 3?

20
There are five ways this can happen: (2, 6), (3, 5), (4, 4), (5, 3), (6, 2), so the prob-
ability is 5/36. Let us call this event A.
Let B be the event that the first die shows a 3. Then P (A ∩ B) is the probability
that the first die shows a 3 and the sum is 8, or 1/36. P (B) = 1/6, so P (A | B) =
1/36
1/6
= 1/6.

Example 31 (Rich and Famous). In a town 10% of the inhabitants are rich, 5% are
famous and 3% are rich and famous. If a town’s person is chosen at random and
she is rich what is the probability she is also famous?
Example 32. Suppose a box has 3 red marbles and 2 black ones. We select 2 marbles.
What is the probability that second marble is red given that the first one is red?

Answer. Let A be the event the second marble is red, and B the event that the
first one is red. P (B) = 3/5, while P (A ∩ B) is the probability both are red, or is
the probability that we chose 2 red out of 3 and 0 black out of 2. The P (A ∩ B) =
  . 5
3 2
2 0 2
. Then P (A | B) = 3/10
3/5
= 1/2.

Example 33. A family has 2 children. what is the probability that both children are
boys, given that at least one child is a boy?
Answer. The Sample Space S = {bb, bg, gb, gg}, each with probability 1/4.
Let B be the event that at least one child is a boy, and A the event that both
children are boys. P (A) = P (bb) = 1/4 and P (B) = P (bb, bg, gb) = 1/3.
So the answer is 1/4
3/4
= 1/3.

21
Definition 34 (Multiplication rule).

The probability that two events, E and F , both occur is given by

P (EF ) = P (E | F )P
P(F )

or by
P (EF ) = P (F | E)P
P(E).

Example 35 (2d). P (A | F ) = 1/2, P (A | C) = 2/3 and P (F ) = P (C) = 1/2. What


is the probability that she gets an A in chemistry?
P (AC) = P (A | C)P
P(C) = (1/2)(2/3) = 1/3
A bowl contains 8 red balls and 4 white balls, two balls are to be drawn succes-
sively at random without replacement. (a) What is the probability that both balls
are red? (b) Suppose red ball weights r and white ball weights w and the probability
of a ball to be picked is its weight divided by the sum of the weights of all balls
currently in the bowl. What is the probability that both balls are red?

The multiplication rules for sequence of events


The multiplication rules can be extended to three or more events.

P (E1 E2 · · · En ) =
P(E2 | E1 )P
P (E1 )P P(E3 | E1 E2 ) · · · P (En | E1 · · · En−1 )

proof for three events and graph.


Example 36. Four cards are to be drawn from an ordinary deck of playing cards
at random and w/o replacement. What is the probability of receiving in order an
spade, a heart, a diamond, and a club?

Example 37 (Birthday problem). In a class of 30 people, what is the probability


everyone has a different birthday? (We assume each day of the 365 days (no leap)
is equally likely.)

22
Answer. Let the first person have a birthday on some day. The probability that the
364
second person has a different birthday will be 365 . The probability that the third
person has a different birthday from the first two people is 363
365
. So the answer is
364 363 336
365 365
· · · 365 .

Example 38 (Birthday problem). There are k person in a room. What is the chance
that some 2 people have the same birthday? (assuming each person’s birthday is
equally likely to be any of the 365 days, independently.
Answer.

P (some 2 people have the same birthday)


=1 − P (all k people have the different birthdays)
1 2 k−1
=1 − (1 − )(1 − ) · (1 − )
365 365 365
365!
=1 −
(365 − k)!365k

For this chance to be ≈ 50%, we need only k = 23.

3.2 Bayes’ rule


Bayes’ rule
Example 39 (false positive & false negative). Suppose the test for HIV is 98% accu-
rate in both directions and 0.5% of the population is HIV positive. If someone tests
positive, what is the probability they actually are HIV positive?
Let D mean HIV positive, and T mean tests positive.

P (D ∩ T ) (.98)(.005)
P (D | T ) = = = 19.8%.
P (T ) (.98)(.005) + (.02)(.995)

Suppose you know P(E | F ) and you want P(F | E).


Example 40. Suppose 36% of families own a dog, 30% of families own a cat, and
22% of the families that have a dog also have a cat. A family is chosen at random.
Given that they have a cat, what is the probability they also own a dog?

23
Answer. Let D be the families that own a dog, and C the families that own a cat.
To find the numerator, we use P (D ∩ C) = P (C | D)PP(D)=(.22)(.36)=.0792. So
P (D | C)=.0792/.3=.264=26.4%.

Example 41. Suppose 30% of the women in a class received an A on the test and
25% of the men received an A. The class is 60% women. Given that a person chosen
at random received an A, what is the probability this person is a women?

Answer. Let A be the event of receiving an A, W be the event of being a woman, and M the event of being a man.
We are given P (A | W ) = .30, P (A | M ) = .25, P (W ) = .60 and we want P (W | A). From the definition

P (W ∩ A)
P (W | A) = .
P (A)

As in the previous example,


P (W ∩ A) = P (A | W )P
P(W ) = (.30)(.60) = .18.

To find P (A), we write


P (A) = P (W ∩ A) + P (M ∩ A).

Since the class is 40% men,


P (M ∩ A) = P (A | M )P
P(M ) = (.25)(.40) = .10.

So
P (A) = P (W ∩ A) + P (M ∩ A) = .18 + .10 = .28.

Finally,
P (W ∩ A) .18
P (W | A) = = .
P (A) .28

Bayes’ rule
To get a general formula, we can write

P (E ∩ F ) P (E | F )PP(F )
P (F | E) = =
P (E) P (E ∩ F ) + P (E ∩ F c )
P (E | F )P
P(F )
= .
P(F ) + P (E | F c )P
P (E | F )P P(F c )

Partition

24
A set of events F1 , · · · , Fn define on S is a partition if they are

• non-zero: P (Fi ) > 0 for all i

• Exhaustive:
Sn
i=1 Fi = S.

• mutually exclusive: Fi ∩ Fj = ∅, for i 6= j.

In order words, exactly one of the events F1 , · · · , Fn must occur.

Law of Total Probability


Suppose that F1 , · · · , Fn are mutually exclusive events such that ni=1 Fi = S
S

with P(Fi ) > 0 for all i. Then for any event E in S, we have
Law of total Probability
n
[ n
X
P (E) = P (ES) = P ( EFi ) = P (EFi )
i=1 i=1
n
X
= P (E | Fi )P
P(Fi )
i=1

Bayes’ rule
Proposition 3.1
Let F1 , · · · , Fn be a set of mutually exclusive and exhaustive events. Sup-
pose now that E has occurred and we are interested in determining which
one of the Fj also occurred. Then we have

P (EFj )
P(Fj | E) =
P(E)
P(E | Fj )PP(Fj )
= Pn
i=1 P (E | Fi )P
P(Fi )

This formula is known as Bayes’s formula.

Example 42 (3l, two sided card). 3 cards, both sides of the first card are colored
red, both sides of the second card are colored black and one side of the third card
is colored red and the other side black. Choose one card and the upper side of the
card is red, what is the probability that the other side is colored black?

25
3r P(r|3)
P (3)P
P (3 | r) = r
= P(r|1)+P
P (1)P P(2)PP(r|2)+P
P(3)P
P(r|3) = 1/3

The Monty Hall problem


Example 43. Suppose you’re on a game show, and you’re given the choice of three
doors: Behind one door is a car; behind the others, goats. You pick a door, say No.
1, and the host, who knows what’s behind the doors, opens another door, say No. 3,
which has a goat. He then says to you, ”Do you want to pick door No. 2? ”Should
you accept?
Answer. If refuse P (prize) = 1/3. If switch P (prize) = 2/3 when Monty always
makes the offer.

3.3 Independent Events


Independent Events
Suppose P (E | F ) = P (E), i.e., knowing F doesn’t help in predicting E. Then
E and F are independent. What we have said is that in this case

P (E ∩ F )
P (E | F ) = = P (E),
P (F )

or P (E ∩ F ) = P (E)P
P(F ). We use the latter equation as a definition:
We say E and F are independent if

P (E ∩ F ) = P (E)P
P(F ).

Example 44. Suppose you flip two coins. The outcome of heads on the second is
independent of the outcome of tails on the first. To be more precise, if A is tails
for the first coin and B is heads for the second, and we assume we have fair coins
(although this is not necessary), we have P (A ∩ B) = 41 = 12 · 12 = P (A)P
P(B).
Example 45. Suppose you draw a card from an ordinary deck. Let E be you drew
1 1
an ace, F be that you drew a spade. Here 52 = P (E ∩ F ) = 13 · 14 = P (E) ∩ P (F ).

If E and F are independent, then E and F c are independent.

26
Proof. P (E ∩ F c ) = P (E) − P (E ∩ F ) = P (E) − P (E)P
P(F ) = P (E)[1 − P (F )] =
P(F c ).
P (E)P

What do they mean? Which is independent?


mutually exclusive and dependent
If E and F are mutually exclusive and P (E) > 0, P (F ) > 0, then they are depen-
dent.

Definition 46 (independent for three events). We say the three events E, F , and
G are independent iff

(i) E and F are independent,

(ii) E and G are independent,

(iii) F and G are independent, and

(iv) P(E ∩ F ∩ G) = P(E)P


P(F )P
P(G).

If only (i), (ii) and (iii) are satified, we say the three events are pairwise indepen-
dent.

Example 47. Suppose you roll two dice, E is that the sum is 7, F that the first is a
4, and G that the second is a 3. E and F are independent, as are E and G and F
and G, but E, F and G are not.
Example 48 (Independent trials). What is the probability that exactly 3 threes will
show if you roll 10 dice?
Answer. The probability that the 1st, 2nd, and 4th dice will show a three and the other 7
13 57 11515
will not is 6 6 . Independence is used here: the probability is 66666 · · · 56 . The probability
that the 4th, 5th, and 6th dice will show a three and the other 7 will not is the same thing.
13 57
So to answer our original question, we take and multiply it by the number of ways
6 6
of choosing 3 dice out of 10 to be the ones showing threes. There are 10

3 ways of doing
that.
This is a particular example of what are known as Bernoulli trials or the binomial
distribution.

27
n independent idential Bernoulli trials
Suppose you have n independent trials, where the probability of a success is p.
The the probability there are k successes is the number of ways of putting k objects
in n slots (which is nk ) times the probability that there will be k successes and


n − k failures in exactly a given order. So the probability is nk pk (1 − p)n−k .




Example 49 (Sampling with replacement). An urn contains R red and N − R


white balls, a sample of n balls is drawn with replacement from it. let Ak =
{red on the kth draw}, P (Ak ) = They are mutually independent.
Example 50 (4g parallel system). system working if at least one is working.

Example 51 (4h E before F ). Independent trials, consisting of rolling a pair of fair


dice, are performed. What is the probability that an outcome of 5 appears before
an outcome of 7 when the outcome of a roll is the sum of the dice?
Two methods.
We show that if E and F are mutually exclusive events of an experiment, then,
when independent trials of this experiment are performed, the event E will occur
before the event F with probability
P (E)
P (E) + P (F )

Example 52 (The problem of the points). The division problem (Fermat and Pascal).
Example 53 (gambler’s ruin, random walk in a line ). Suppose you toss a fair coin
repeatedly and independently. If it comes up heads, you win a dollar, and if it comes
up tails, you lose a dollar. Suppose you start with $50. What’s the probability you
will get to $200 before you go broke?

3.4 P (· | F ) is a probability
P (· | F ) is a probability
It satisfies the axims for a probability function, namely with P (F ) > 0,

1. P (E | F ) ≥ 0

2. P (F | F ) = 1

28
3. If E1 , E2 , E3 , . . . are mutually exclusive events, then P ( ∞
S P∞
i=1 Ei | F ) = i=1 P (Ei |F ).

proof

Example 54. Suppose there are three cards, the first of which has red on both sides,
the second has black on both sides, and the third has red on one side and black on
the other.
A card is picked at random and a side chosen at random. If it is red, what is the
probability the other side will be red also?
Answer. Let A denote the card with two red sides, B the one with two black sides,
and C the third card. Let R denote the upturned face is red. we want
P (A ∩ R) P (A) 1/3
P (A | R) = = = = 2/3.
P (R) P (R) 1/2
example 5a, 5c

P (E1 ∪ E2 |F ) = P (E1 |F ) + P (E2 |F ) − P (E1 E2 |F )


P(E2 |F ) + P (E1 |E2c F )P
P (E1 |F ) = P (E1 |E2 F )P P(E2c |F )+
P (A) = 0.3, P (Ac ) = 0.7, P (E|A) = 0.4, P (E|Ac ) = 0.2. Find (a) P (A|E) (b)
P (E2 |E1 )?
(a) 6/13. (b) 3.8/13
in tossing a fair coin, what is the probability that a run of 2 heads will procede
a run of 3 tails ( HH occurs before TTT)? 7/10

Conditionally independent give F


We say that events E1 and E2 are conditionally independent give F if, given that
F occurs, the conditional probability that E1 occurs is unchanged by information
as to whether or not E2 occurs.

P (E1 | E2 F ) = P (E1 | F )
or equivalently
P (E1 E2 | F ) = P (E1 | F )P
P(E2 | F )

29
Example 55 (Updating information sequentially).

30
4 Random variables
4.1 Random variables
Random variables
Example 56. One rolls a die, and the outcome is observed.

Ω = {1, 2, 3, 4, 5, 6}

an outcome ω ∈ Ω is either 1,2,3,4,5,6.


For an outcome , we might be more interested in some quantitative attribute of
ω.
For example: if the value is even or odd.

Random variables
Example 57 (Coin tossing). Toss a 3 fair coins and the sequence of heads and tails
is observed.
Ω = {hhh, hht, htt, hth, ttt, tth, thh, tht}
For an outcome ω ∈ Ω, we might be interested in (1) the total number of heads,
(2) the total number of tails, (3) the number of heads minus the number of tails.

random variable
A random variable (RV) is a function X that maps the sample space Ω to the
real numbers R.
Random variables are usually denoted by X, Y, Z, . . .. (Capital letter)
The random variable X assigns to each element ω ∈ Ω a real numberer, i.e.

X : Ω → R.

or X(ω) = x, x ∈ R.
The space (range) of X is the set of real numbers.

1. Toss a 3 fair coins. Let X be the total number of heads, Y be the total number
of tails, and Z the number of heads minus the number of tails.

31
2. If one rolls a die, let Y be 1 if an odd number is showing and 0 if an even
number is showing.

3. If one tosses 10 coins, let X be the number of heads showing.

4. In n trials, let X be the number of successes.

Each is a real-valued function defined on Ω, i.e., each is a rule that assigns a


real number to every point ω ∈ Ω.

Since the outcome in Ω is random, the corresponding number is random as


well.

4.2 Discrete random variables


Discrete random variables
Let X = {x : X(ω) = x; ω ∈ Ω} be the space (range) of X and is the set of real
numbers.
Then X is called discrete if X is a finite or countably infinite set., i.e.,

X = {x1 , x2 , . . . , xn } or X = {x1 , x2 , . . .}.

The most important to characterize a RV is through the probabilities of the values


that it can take.

Probability mass function for discrete RV


For a discrete random variable, these are captured by the probability mass
function (p.m.f for short) of X, denoted by pX . In particular, if x is any possible
value of X, the probability mass of x, denote by pX (x), is the probability of the
event {X = x}:

fX (x) = pX (x) = P ({X = x}) = P ({ω ∈ Ω : X(ω) = x}),

for x ∈ R. Here P ({X = x}) or P (X = x) is an abbreviation for P ({ω ∈ Ω : X(ω) =


x}). This type of abbreviation is standard.

Example 58 (Coin tossing). Toss a 3 fair coins. Let X be the number of heads show-
ing. Graph for the relations between sample space, random variable and probability.

32
• X = {0, 1, 2, 3}

• pX (0) = 1/8, pX (1) = 3/8, pX (2) = 3/8, pX (3) = 1/8 and pX (x) = 0 for x ∈
/ X.

• graphical display

Example 59. Draw n balls from an urns contain N1 red and N2 white balls at random
and without replacement. Let X be the number of red balls in the balls we drawn.
What should a pmf look like?
Example 60 (fair dice: uniform).

Definition 61 (the pmf). If fX (x) is the pmf of RV X with range X , then

• fX (x) ≥ 0, for all x ∈ R

• fX (x) = 0, for x ∈
/X


P
x∈X fX (x) = 1.
P
Moreover, P X (X ∈ A) = x∈A fX (x), where A ⊂ Ω.
P
Note x∈Ω fX (x) = 1 since X must equal something.

Example 62. The p.m.f of a rv X is given by p(i) = cλi /i!, i = 0, 1, 2, . . ., where λ


is some positive value. Find (a) P(X = 0) (b) P(X > 2).

Cumulative distribution function, F (x)


Given a random variable X, one defines the distribution function (the cumulative
distribution function, c.d.f.) by

FX (b) = P (X ≤ b).

Example 63 (coin tossing). FX (x) = graphics


If F is the cdf of a r.v. X then it must satisfy

33
1. F is a nondecreasing function; i.e., if a < b, then F (a) ≤ F (b).

2. limb→∞ F (b) = 1,

3. limb→−∞ F (b) = 0.

4. F is right continuous. That is, for any b and any decreasing sequence bn ,
n ≥ 1, that converges to b, limn→∞ F (bn ) = F (b)

Given a distribution function, one can answer questions about probabilities. for
example,

(i) P (a < X ≤ b) = F (b) − F (a) and

(ii) P (X < b) = limn→∞ P (X ≤ F (b − n1 )) = limn→∞ F (b − n1 ) = F (b−).

The distribution function of the random variable X is given by




 0 x<0

x
 2 0≤x<1



2
F (x) = 3
1≤x<2
 11
 12 2 ≤ x < 3




 1 3≤x

P(X > 21 ) and (d)


A graph of F (x). Compute (a) P (X < 3) (b) P (X = 1), (c)P
P (2 < X ≤ 4)

4.3 Expected value


Expected value
Example 64. Let X be the number showing if we roll a die. The expected number to
show up on a roll of a die should be 1·P P(X = 2)+· · ·+6·P
P(X = 1)+2·P P(X = 6) = 3.5.

34
Definition 65 (Expected value of RV X (I)). More generally, we define
X
EX = xp(x)
x:p(x)>0

to be the expected value or expectation or mean of X.

Example 66 (coin toss). If we toss a coin and X is 1 if we have heads and 0 if we


have tails, what is the expectation of X?
Answer. 
1

 x=12
1
pX (x) = x=02

0 all other values of x

Hence E X = (1)( 21 ) + (0)( 12 ) = 21 .

Example 67. Suppose we roll a fair die. If 1 or 2 is showing, let X = 3; if a 3 or 4


is showing, let X = 4, and if a 5 or 6 is showing, let X = 10. What is E X?
Answer. We have P (X = 3) = P (X = 4) = P (X = 10) = 13 , so
X
EX = P(X = x) = (3)( 13 ) + (4)( 31 ) + (10)( 13 ) =
xP 17
3
.

Example 3b, 3d
Expected value as the center of gravity of a distribution of mass.

Expected value

Definition 68 (Expected value of RV X (II)).


X
EX = P(ω).
X(ω)P
ω∈Ω

Remember we are only working with discrete random variables here.

35
In the example we just gave, we have S = {1, 2, 3, 4, 5, 6} and X(1) = 3, X(2) =
3, X(3) = 4, X(4) = 4, X(5) = 10, X(6) = 10, and each ω has probability 16 . So
using the second definition,

E X = 3( 61 ) + 3( 16 ) + 4( 61 ) + 4( 16 ) + 10( 16 ) + 10( 16 ) = 34
6
= 17
3
.

We see that the difference between the two definitions is that we write, for
P(X = 3) as one of the summands in the first definition, while in the
example, 3P
P(X = 1) + 3P
second we write this as 3P P(X = 2).

Expectation
P
E (X + Y ) = (X(ω) + Y (ω))PP(ω)
Pω∈S
= [X(ω)PP(ω) + Y (ω)P
P(ω)]
Pω P
= P(ω) + ω Y (ω)P
ω X(ω)P P(ω)
= E X + E Y.

Similarly we have

• if c is a constant, E(c) = c

• E (cX) = cE
EX if c is a constant.

• if c1 and c2 are constants, E (c1 X + c2 Y ) = c1E X + c2E Y

Example 69. Suppose X = 0 with probability 21 , 1 with probability 14 , 2 with prob-


ability 18 , and more generally n with probability 1/2n . This is an example where X
can take infinitely many values (although still countably many values). What is the
expectation of X?
Answer. Here pX (n) = 1/2n if n is a nonnegative integer and 0 otherwise. So

E X = (0) 21 + (1) 14 + (2) 18 + (3) 16


1
+ ··· .

This turns out to sum to 1.

36
To see this, recall the formula for a geometric series:
1
1 + x + x2 + x 3 + · · · = .
1−x
If we differentiate this, we get
1
1 + 2x + 3x2 + · · · = .
(1 − x)2
We have

E X = 1( 14 ) + 2( 18 ) + 3( 16
1
) + ···
h i
= 14 1 + 2( 12 ) + 3( 41 ) + · · ·
1 1
= = 1.
4
(1 − 21 )2

Example 70. Roll a die 10 times and let X be the minimum of the number rolled
within these 10 times. Repeat this trial several times. What is the expected value
of X.
P (X ≥ 1) = 1, P (X ≥ 2) = (5/6)10 , P (X ≥ 3) = (4/6)10 , P (X ≥ 4) =
(3/6)10 , P (X ≥ 5) = (2/6)10 , P (X ≥ 6) = (1/6)10
P
E (X) = i P (X ≥ i) = 1.17984

4.4 Expectation of a function of a r.v.


Expectation of transformation
Example 71. Suppose we roll a die and let X be the value that is showing. We want
the expectation E X 2 .
Let Y = X 2 , so that P (Y = 1) = 61 , P (Y = 4) = 16 , etc. and

E X 2 = E Y = (1) 16 + (4) 61 + · · · + (36) 61 .

We can also write this as

E X 2 = (12 ) 16 + (22 ) 61 + · · · + (62 ) 61 ,

which suggests that a formula for E X 2 is x2P (X = x).


P
x

37
if more than one value of X leads to the same value of X 2 .
Suppose that P (X = −2) = 18 , P (X = −1) = 14 , P (X = 1) = 38 , P (X = 2) = 41 .
Then if Y = X 2 , P (Y = 1) = 58 and P (Y = 4) = 83 .
Then
E X 2 = (1) 58 + (4) 38 = (−1)2 14 + (1)2 83 + (−2)2 18 + (2)2 14 .
So E X 2 = x x2 P (X = x).
P

Definition 72 (Proposition 4.1). If X is a discrete RV that takes on one of the values


xi , i ≥ 1, with respectively probability p(xi ), then for any real-valued function g
X
E [g(X)] = g(xi )p(xi ).
i

Let Y = g(X). Then


X X X
EY = P(Y = y) =
yP y P (X = x)
y y {x:g(x)=y}
X
= P(X = x).
g(x)P
x

EX 2 = x2 p(x). E X n = xn p(x).
P P
x:p(x)>0 x:p(x)>0

4.5 Variance and standard deviation


Mean and Variance

Definition 73 (Variance and Standard deviation). If µ = E [X], then

Var(X) = E [(X − µ)2 ]

is called the variance of X. The square root of Var(X) is the standard deviation of
X.
p
SD(X) = Var(X)

The variance measures how much spread there is about the expected value.

38
We toss a fair coin and let X = 1 if we get heads, X = −1 if we get tails. Then
E X = 0, so X − E X = X, and then VarX = E X 2 = (1)2 21 + (−1)2 21 = 1.
Example 74. We roll a die and let X be the value that shows.
We have previously calculated E X = 72 . So X − E X equals − 52 , − 32 , − 12 , 12 , 32 , 25 ,
each with probability 16 . So

VarX = (− 52 )2 16 + (− 32 )2 61 + (− 12 )2 16 + ( 21 )2 16 + ( 23 )2 16 + ( 25 )2 16 = 35
12
.

An alternate expression for the variance is

VarX = E X 2 − 2E
E(Xµ) + E (µ2 ) = E X 2 − 2µ2 + µ2

= E X 2 − (E
EX)2 .

linear transformation of a RV
EX + b and Var(aX + b) = a2 Var(X)
For a, b ∈ R, E (aX + b) = aE
Note E X n is called the nth moment of X. E (X − b)n is called the nth moment
of X about b. E (X(X − 1))
population mean µ, population variance σ 2 sample variance X and sample stan-
dard deviation s

Some discrete RVS

4.6 Some discrete distributions


Some discrete RVS
Bernoulli
A r.v. X such that
P (X = 1) = p and
P (X = 0) = 1 − p,
where 0 ≤ p ≤ 1, is said to be a Bernoulli r.v. with parameter p.
The pmf for X is p(x) = px (1 − p)1−x , x = 0, 1, 0 < p < 1 Note E X = p and
E X 2 = p, so VarX = p − p2 = p(1 − p).

39
Examples for Bernoulli experiments
Nature of trial success failure probability of p and q
Tossing a fair coin head tail 1/2and 1/2
Rolling a die six not six 1/6 and 5/6
Rolling a pair of dice double six not double six 1/36 and 35/36
Birth of a chile girl boy 0.487 and 0.513

Binomial
A r.v. X has a binomial distribution with parameters n and p if
 
n k
P (X = k) = p (1 − p)n−k .
k

The probability of getting k successes in n trials is a binomial, called a binomial(n,


p) distribution.
tree diagram for derivation of the binomial distribution
An identical Bernoulli trial repeated n times, independently.
This is the distribution of the number of successes in n independent trials, with
probability p of success in each trial. The binomial (n, p) probabilities are the terms
in the binomial expansion: (p + q)n = nk=0 nk pk q n−k
P 

mean of a binomial distribution


expected number of successes
Let IA is an indicator variable of an event A with probability p. Suppose there
are n events, A1 , · · · , An , let X = IA1 + · · · + IAn where IAi is the indicator of Ai .
Then X counts the number of events that occur. E X = P (A1 ) + · · · + P (An )
Note we haven’t defined what it means for r.v.’s to be independent, but here we
mean that the events (Yk = 1) are independent.
An easier way is to realize that if X is binomial, then X = Y1 + · · · + Yn , where
the Yi are independent Bernoulli’s, so E X = E Y1 + · · · + E Yn = np.

40
The cumbersome way is as follows.
n   n  
X n k n−k
X n k
EX = k p (1 − p) = k p (1 − p)n−k
k=0
k k=1
k
n
X n!
= k pk (1 − p)n−k
k=1
k!(n − k)!
n
X (n − 1)!
= np pk−1 (1 − p)(n−1)−(k−1)
k=1
(k − 1)!((n − 1) − (k − 1))!
n−1
X (n − 1)!
= np pk (1 − p)(n−1)−k
k=0
k!((n − 1) − k)!
n−1  
X n−1 k
= np p (1 − p)(n−1)−k = np.
k=0
k

Variance for a binomial RV


Later we will see that the variance of the sum of independent r.v.’s is the sum
of the variances, so we could quickly get VarX = np(1 − p).
Alternatively, one can compute E (X 2 ) − E X = E (X(X − 1)) using binomial
coefficients and derive the variance of X from that.

Or to get the variance of X, we have


Xn X
2
EX = E Yk2 + E Yi Yj .
k=1 i6=j

P(Yi Yj = 1)+0·P
Now E Yi Yj = 1·P P(Yi Yj = 0) = P (Yi = 1, Yj = 1) = P (Yi = 1)P
P(Yj =
2 2
1) = p using independence. The square of Y1 + · · · + Yn yields n terms, of which
n are of the form Yk2 . So we have n2 − n terms of the form Yi Yj with i 6= j. Hence
VarX = E X 2 − (E
EX)2 = np + (n2 − n)p2 − (np)2 = np(1 − p).

Behavior of the binomial distribution for large n.


If n is large, the proportion of successes in n independent trials will be very close
to p, the probability of success on each trial. More formally for independent trails,
with probability p of success on each trial, for every  > 0, as n → ∞,
P ( proportion of successes in n trials differs fromp by less than ) → 1.

41
Some R for calculating the pmf and cdf
dbinom(i, n, p): calculate the probability P (X = i) when X follows a binomial
distribution with parameters n and p.
pbinom(i, n, p): calculate the cumulative probability P (x ≤ i) when X follows
a binomial distribution with parameters n and p.

Poisson X is Poisson with parameter λ if

λi
P (X = i) = e−λ .
i!
Note ∞ i λ
P
i=0 λ /i! = e , so the probabilities add up to one.
To compute expectations,
∞ ∞
X λi X λi−1
EX = ie−λ = e−λ λ = λ.
i=0
i! i=1
(i − 1)!

Similarly one can show that


∞ i ∞
2
X
−λ λ 2 −λ
X λi−2
E (X ) − E X = E X(X − 1) = i(i − 1)e =λ e = λ2 ,
i=0
i! i=2
(i − 2)!

so E X 2 = E (X 2 − X) + E X = λ2 + λ, and hence VarX = λ.

Poisson approximation to the binomial distribution


Proposition.
If Xn is binomial with parameters n and p and np → λ, then
P (Xn = i) → P (Y = i), where Y is Poisson with parameter λ.

The above proposition shows that the Poisson distribution models binomials
when the probability of a success is small. The number of misprints on a page, the
number of automobile accidents, the number of people entering a store, etc. can all
be modeled by Poisson.

For simplicity, let us suppose λ = np. In the general case we use λ = np. We

42
write
n!
P (Xn = i) = pi (1 − p)n−i
i!(n − i)!
n(n − 1) · · · (n − i + 1)  λ i  λ n−i
= 1−
i! n n
n(n − 1) · · · (n − i + 1) λi (1 − λ/n)n
= .
ni i! (1 − λ/n)i

The first factor tends to 1 as n → ∞. (1−λ/n)i → 1 as n → ∞ and (1−λ/n)n → e−λ


as n → ∞.

Bernoulli process
A Bernoulli process is a sequence of Bernoulli trials, where each trial produces a 1 (a
success) with probability p and a 0 (a failure) with probability 1 − p, independently
of what happens in other trials.
example: a sequence of independent coin tosses, where the probability of head
in each toss is a fixed number p, where 0 < p < 1.

Some Random Variables associated with the Bernoulli Process

• The Binomial with parameters n and p.

• The Geometric with parameter p.

• The Negative Binomial with parameters r and p.

Binomial
A r.v. X has a binomial distribution with parameters n and p if
 
n k
P(X = k) = p (1 − p)n−k .
k

let X be the number of successes in the n independent Bernoulli trials, then X is a


binomial distribution with parameters n and p, called a binomial(n, p) distribution.

43
Geometric distribution
Let p be a parameter, X is called to follow a geometric distribution if its pmf is
given by
P (X = k) = (1 − p)k−1 p for k = 1, 2, . . . .

In Bernoulli trials with parameter p, let X be the number of trials required to


have a first success, then X will be geometric distribution with parameter p.

Example 75. If we toss a coin over and over and X is the first time we get a head,
then X will have a geometric distribution.
To see this, to have the first success occur on the k th trial, we have to have k − 1
failures in the first k − 1 trials and then a success. The probability of that is

(1 − p)k−1 p.
P∞
nrn = 1/(1 − r)2 (differentiate the formula rn = 1/(1 − r)), we
P
Since n=0
see that
E X = 1/p and VarX = (1 − p)/p2 .

Coupon-collecting problems
There are a set of N different coupons. Suppose each coupon is equally likely to
be collected. What is the expected coupons one must have in order to obtain the
complete set of coupons. Let X be the number a coupons collected before a complete
set is attained. Define Xi , i = 1, 1, 2, · · · , N , to be the number of additional coupons
that need be obtained after i−1 distinct types have been collected in order to obtain
i distinct coupons. The expected number of coupons required to get the complete
set of N coupons is
N N N 1 1
E [X] = 1 + + + ··· + = N (1 + + · · · + )
N −1 N −2 1 2 N

Negative Binomial

44
Let r and p be parameters and set
 
n−1 r
P (X = n) = p (1 − p)n−r , n = r, r + 1, . . . .
r−1

A negative binomial represents the number of trials until r successes.


To get the above formula, to have the rth success in the nth trial, we must exactly
have r − 1 successes in the first n − 1 trials and then a success in the nth trial.
r
EX =
p

r(1 − p)
VarX =
p2

Negative Binomial vs Geometric


An easier way is to realize that if X is negative binomial, then X = Y1 + · · · + Yr ,
where the Yi are additional trials needed to obtain the ith success after the i − 1
successes and are independent Geometric’s, so

E X = E Y1 + · · · + E Yr = r/p.

VarX = VarY1 + · · · + VarYr = r(1 − p)/p2 .

If we toss a coin over and over and K is the number of tails before we get the
rth head, then
 
k+r−1 r
P (K = k) = p (1 − p)k , k = 0, 1, 2, ...
k

Uniform
Let
1
P(X = k) = for k = 1, 2, . . . , n.
n
This is the distribution of the number showing on a die (with n = 6), for example.
2
Note E X = n+1 2
VarX = n 12−1 .

45
Hypergeometric
Set
m N −m
 
i n−i
P (X = i) = N

n

where n − (N − m) ≤ i ≤ min(n, m).


This comes up in sampling without replacement: if there are N balls, of which
m are one color and the other N − m are another, and we choose n balls at random
without replacement, then X represents the probability of having i balls of the first
color.
If n balls are randomly chosen without replacement for a set of N balls, of
which the fraction p = m/N is white, then the number of white balls selected is
hypergeometric.

When m and N are large in relation to n, it shouldn’t make much difference


whether the selection is being done with or without replacement.

m
E X = np, p =
N
n−1
VarX = np(1 − p)(1 − )
N −1

example
[6a] Flip five fair coins when the outcomes are independent. The probability
mass function of the number of heads obtained.
Let X be the number of heads (successes) that appear.
[6b defective screws] P {defective} = 0.01, independently of each other. The
company sell the screws in packages of 10 and offers a money-back guarantee that
at most 1 of 10 screws in defective. What proportion of packages sold must the
company replace?
Let X be the number of defective screws in a package.

example
[6c wheel of fortune: three dice] A player bit a number. If the number bit by
the player appears i times, i = 1, 2, 3, then the player wins i units; if the number
bit by the player does not appear, then the player loses 1 units. Is this a fair game?

46
[6d dominant and recessive genes]. A trait (eye color) of a person is classified on
the basis of one pair of genes. d a dominant gene and r a recessive gene. dd, dr,
rd are alike in appearance, and rr is purely recessive. Two hybrid parents have a
total of 4 kids, what is the probability that 3 of the 4 kids have the appearance of
the dominant gene?

Example 76 (8a urn). An urn contains N white and M black balls. Balls are ran-
domly selected, one at a time, until a black ball is obtained. If we assume that each
selected ball is replaced before the next one is drawn, what is the probability that
(a) exactly n draws are needed? (b)at least k draws are needed?
M N n−1
P∞ n−1 k−1
Answer. (a)PP(X = n) = (M +N )n
P(X ≤ k) = NM
(b)P +M
N
n=k N +M
N
= N +M

Example 77 (8g dice). Find the expected value and the variance of the number of
times one must throw a die until the outcome 1 has occurred 4 times.
r(1−p)
Answer. E (X) = 24 Var(X) = p2
= 120

Example 8h
N animals in a certain region. How to estimate N 1. catch a number, say m,
of animals, mark them and release them. 2. make a new catch, say n after some
period. Let X denote the number of marked animals in this second capture.
The probability of the observed event when there are actually N animals in the
region.

Example 78 (8i). components packed in lot of size 10. Inspect policy: inspect 3
components randomly and to accept only if all 3 are good. It is known that 30
percent of the lots have 4 defective components and 70 percent have only 1. What
proportion of lots does the purchaser reject?

Example 79. Suppose on average there are 5 homicides per month in a given city.
What is the probability there will be at most 1 homicide in a certain month?

47
Answer. If X is the number of homicides, we are given that E X = 5. Since the
expectation for a Poisson is λ, then λ = 5. Therefore P (X = 0) + P (X = 1) =
e−5 + 5e−5 .
dpois(0, 5)+dpois(1, 5) or ppois(1, 5)

Example 80. Suppose on average there is one large earthquake per year in California.
(a) What’s the probability that next year there will be exactly 2 large earthquakes?
Answer. λ = E X = 1, so P (X = 2) = e−1 ( 21 ).

Poisson process
Let X be the number of events occur in a given time interval.
Then the process is called an approximate Poisson Process with parameter λ > 0
if it follows the following assumptions.

1. The events that occur in one time-interval are independent of those occurring
in any other non-overlapping time-interval.

2. For a small time-interval, the probability of exactly one event occurs in it is


proportional to the length of the interval.
The probability of exactly one event occurs in a small time-interval h is λh +
o(h), where o(h) is any function of h such that limh→0 o(h)
h
= 0.

3. The probability of two or more events occur in a very small time-interval is


essentially zero.
The probability of two or more events occur in a very small time-interval is
o(h).

Poisson process
Let λ be the mean per unit interval under assumption (1), (2) and (3), the
number of events occurred in an interval of length t is a Poisson random variable
with mean equal to λt.
Counting
A continuous-time analog of the Bernoulli process where there is no natural way
of dividing time into discrete periods.

48
Poisson process
If events in a Poisson Process occur at the mean λ per unit.
Let N (t) denote the number of events occur in an interval of length t, then
N (t) ∼ Poisson(λt).

(λt)k
P (N (t) = k) = e−λt , k = 0, 1, 2 · · · .
k!

Poisson process: examples


Example 81 (earthquakes). Suppose on average there is two earthquakes per week
in someplace.
(a) What’s the probability that there will be at least three earthquakes during
the next 2 weeks.?
Answer. (a) 1 − ppois(3, 4)

(b) Let X be the amount of time (in weeks), starting from now, until the next
earthquake. What is the probability distribution of X?
We note that X will be greater than t if and only on events occur within the
next t weeks.
P (X > t) = P (N (t) = 0) = e−λt .
So the (cumulative) probability function of X is

F (t) = P (X ≤ t) = 1 − P (X > t) = 1 − e−λt .

49
4.7 Moment generating functions
Moment generating functions, mgf
We define the moment generating function mX by

mX (t) = E etX ,

provided this is finite. In the discrete case this is equal to x etx p(x).
P

We call mx (t) the moment generating function because all of the moment of X
can be obtained by successively differentiating mX (t) and the evaluating the result
at t = 0. For example.

0 d tX d
mX (t) = E e == E (etX )
dt dt
= E XetX

Let us compute the moment generating function for some of the distributions we
have been working with.
Bernoulli: pet + (1 − p).
Binomial: using independence,
P Y Y
E et Xi
=E etXi = E etXi = (pet + (1 − p))n ,

where the Xi are independent Bernoulli’s.

Poisson:
X etk e−λ λk X (λet )k t t
E etX = = e−λ = e−λ eλe = eλ(e −1) .
k! k!
mean and variance of X by m.g.f.

Proposition :
If mX (t) = mY (t) < ∞ for all t in an interval, then X and Y have the same
distribution.

50
Moment generating functions
We define the moment generating function MX by

MX (t) = E etX ,

provided this is finite. In the discrete case this is equal to x etx p(x), in the con-
P
R
tinuous case etx f (x)dx.
It two random variables have the same m.g.f. then they must have the same
distribution of probability. The moment generating function uniquely determines
the distribution of a random variable.
We say these two random variables X and Y have identical distributions.
Note that it would be wrong to say they were equal.

HT
If Mx (t) = et ( 36 ) + e2t ( 26 ) + e3t ( 16 ). Find E (X).
et /2
If Mx (t) = 1−e t /2 , t < ln 2. Find the pmf of X.

We call Mx (t) the moment generating function because all of the moment of X
can be obtained by successively differentiating MX (t) and the evaluating the result
at t = 0. For example.

0 d tX
MX (t) = Ee
dt
d
= E (etX )
dt
= E XetX

mean and variance of X by m.g.f.


Setting t = 0, we have
0
MX (0) = E X
00
MX (0) = E X 2
0
In general MXr (0) = E X r . In particular, u = MX (0) and σ 2 = E X 2 − [E EX]2 =
00 0
MX (0) − [MX (0)]2
ex: The m.g.f. of X is Mx (t) = et ( 36 ) + e2t ( 26 ) + e3t ( 16 ). Find the p.m.f. of X.

51
The moment generating function for some distributions
Bernoulli:
pet + (1 − p).
Binomial: using independence,
P Y Y
Eet Xi
=E etXi = E etXi = (pet + (1 − p))n ,

where the Xi are independent Bernoulli’s.


Find mean and variance by differentiating the m.g.f. at t = 0

Poisson:
X etk e−λ λk X (λet )k t t
EetX = = e−λ = e−λ eλe = eλ(e −1) .
k! k!
e /2 t
ex : The m.g.f. of X is Mx (t) = 1−e t /2 , for t < ln 2. Find the p.m.f. of X. (Note

the the m.g.f. exists only when the sum of the series is finite, so sometime the range
of t need to be specified.)

Geometric(p)

pet
E etX = if t < − ln q
1 − qet
Negative binomial (r, p)
h pet ir
E etX = if t < − ln q
1 − qet

52

You might also like