Introduction To Probability: Mid-Term I
Introduction To Probability: Mid-Term I
Syllabus
Discrete RVs
3. Expectation of a function of a RV
1
Continuous RVs
3. Uniform RV
4. Normal RVs
2. Independent RVs
4. Conditional distribution
Expectation
3. Conditional Expectation
Limit Theorem
2
1. Markov and Chebyshev’s inequality
Thanks
These notes are based on the lecture notes Undergraduate Probability by
Professor Richard F. Bass at Department of Mathematics, University of Connecticut,
USA. Some additional changes are made. I thank Professor Bass for his original Tex
file.
Introduction
Why do we need to talk about probability?
”Is it so?” to ”What is the probability that it is so?”
A mathematical tool for understanding uncertainty and randomness in our life.
We We often want to assess how likely are the outcomes of some events.
Probability is defined for that measurement.
Many problems in probability theory can be solved by counting the number of
different ways that a certain event can occur. The method of counting is formally
known as combinatorial analysis.
Example 1. Suppose there are 20 people taking Prob 1. There are 7 women and
13 men. What is the chance that a person selected at random from the class is a
woman?
What is the chance that two persons selected at random from the class are both
women?
1 Combinatorical Analysis
1.1 Multiplication principle
Multiplication principle
The first basic principle is to multiply.
3
Example 2. Suppose we have 4 shirts of 4 different colors and 3 pants of different
colors. How many possibilities are there?
For each shirt there are 3 possibilities, so altogether there are 4 × 3 = 12 possi-
bilities.
Example 3 (license plates). How many different license plates of 3 letters followed
by 3 numbers are possible?
(26)3 (10)3 , because there are 26 possibilities for the first place, 26 for the second,
26 for the third, 10 for the fourth, 10 for the fifth, and 10 for the sixth. We multiply.
Example 4 (license plates). How many license plates of 3 letters followed by 3 num-
bers are possible when it is required no repeated letter and number?
1.2 Permutations
Permutations
Example 5. How many ways can one arrange a, b, c?
One can have
abc, acb, bac, bca, cab, cba.
There are 3 possibilities for the first position. Once we have chosen the first position,
there are 2 possibilities for the second position, and once we have chosen the first
two possibilities, there is only 1 choice left for the third. So there are 3 × 2 × 1 = 3!
arrangements.
Factorials:
In general, if there are n letters, there are n! possibilities.
Example 6. What is the number of possible batting orders with 9 players?
9!
Example 7. How many ways can one arrange 4 math books, 3 chemistry books,
2 physics books, and 1 biology book on a bookshelf so that all the math books
are together, all the chemistry books are together, and all the physics books are
together.
4
We can arrange the math books in 4! ways, the chemistry books in 3! ways, the
physics books in 2! ways, and the biology book in 1! = 1 way.
But we also have to decide which set of books go on the left, which next, and
so on. That is the same as the number of ways of arranging the letters M, C, P, B,
and there are 4! ways of doing that. Hence the answer is 4!(4!3!2!1!).
Example 9. Suppose that there are 4 Czech tennis players, 4 U.S. players, and 3
Russian players. If the tournament results lists just the nationalities of the players
in the order in which they placed, how many outcomes are possible?
11!/(4!4!3!).
What we just did was called the number of permutations.
In general, if there are n objects, of which n1 are alike, n2 are alike, . . ., nr are alike
and n1 + · · · + nr = n, there are
n!
n1 !n2 ! · · · nr !
different permutations.
1.3 Combinations
Combinations
Example 10. How many ways can we choose 3 letters out of 5?
5
If the letters are a, b, c, d, e and order matters, then there would be 5 for the first
position, 4 for the second, and 3 for the third, for a total of 5 × 4 × 3. But suppose the
letters selected were a, b, c. If order doesn’t matter, we will have the letters a, b, c 6 times,
because there are 3! ways of arranging 3 letters. The same is true for any choice of three
letters. So we should have 5 × 4 × 3/3!. We can rewrite this as
5·4·3 5!
=
3! 3!2!
5
This is often written 3 , read “5 choose 3.”
Combinations
More generally,
n n!
= .
k k!(n − k)!
Example 11. Suppose there are 8 men and 8 women. How many ways can we choose
a committee that has 2 men and 2 women?
We can choose 2 men in 82 ways and 2 women in 82 ways. The number of
Example 12 (4c). Consider a set of 8 antennas of which 3 are defective and assume
that all of the defectives and all of the functionals are indistinguishable. How many
linear orderings are there in which no two defectives are consecutive?
o 1 o 1 o 1 o 1 o 1 o
If no two defectives are to be consecutive, then the space between the functional
antennas must each contain at most one defective antenna. Hence there are 63
Give a combinatorial explanation of the identity. For example, let us argue that
10
= 93 + 94 without doing any algebra.
4
6
Suppose we have 10 people, one of whom we decide is special, denoted A. 10
4
represents the number of committees having 4 people out of the 10. Any such
committee will either contain A or will not. 93 is the number of committees that
contain A and 3 out of the remaining 9 people, while 94 is the number of committee
be nk .
n
X n
= (1 + 1)n = 2n .
k=0
k
7
There are 93 ways of choosing the first committee. Once that is done, there are
6 people left and there are 64 ways of choosing the second committee. Once that
Multinomial Coefficients
In general, to divide n objects into one group of n1 , one group of n2 , . . ., and a rth
group of nr , where n = n1 + · · · + nr , the answer is
n!
.
n1 !n2 ! · · · nr !
n
These are known as multinomial coefficients and can be written as n1 ,n2 ,...,nr
.
Example 15. 10 kids are to be divided into an A team and a B team of 5 each. The
A team will play in one league and the B team in another. How many different
divisions are possible? (distinct balls and distinct boxes)
10!
= 252
5!5!
10 kids at a playground divide themselves into two teams of 5 each. How many
different divisions are possible? (distinct balls and indistinguishable boxes)
10!/5!5!
= 126
2!
where the sum is over all nonnegative integer-valued vectors (n1 , · · · , nr ) such that
n1 + · · · + nr = n.
8
Example 16 (5d). In the first round of a knockout tournament with 8 player, the
players are divided into 4 pairs with each of these pairs then playing a game. How
many possible outcome are there for the first round? How many outcomes of the
tournament are possible?
Example 17. Expand (x1 + x2 + x3 )2 .
| o o | o o o | o o o |,
this would represent 2 balls in the first box, 3 in the second, and 3 in the third.
Altogether there are 8 + 4 symbols, the first is a | as is the last. so there are 10
symbols that can be either | or o. Also, 8 of them must be o. There are 10
8
ways
can one pick 8 spaces out of the 10 to put a o.
9
Example 19 (positive integers). Consider the same question as the example above,
but where each xi is at least 1.
First put one ball in each box. This leaves 15 balls to put in 5 boxes, and as
above this can be done 19
15
ways.
10
2 Axioms of Probability
2.1 Sample space and Events
Probability Theory: set theory
1. the possible prices of some stock at closing time today; S = [0, ∞);
11
Examples on event
3. Toss two dice: if E = {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)},
then E is the event that the sum of the dice equals to seven.
2. E ∩ F is the intersection of E and F and is the set of points that are in both
E and F . Sometimes written as EF .
Toss two dice: if E = {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)} is the event the
sum of the dice is 7 and F = {(1, 5), (2, 4), (3, 3), (4, 2), (5, 1)} is the event the
sum of the dice is 6, then the event EF does not contain any outcome and
hence could not occur.
12
3. E ⊂ F means that E is contained in F or E is a subset of F . The occurence
of E implies the occurrence of F . If E ⊂ F and F ⊂ E, we say E = F .
Venn diagrams
1. Commutative laws
E ∪ F = F ∪ E, EF = F E
2. Associative laws
(E ∪ F ) ∪ G = E ∪ (F ∪ G), (EF )G = E(F G)
3. Distribution laws
(E ∪ F )G = EG ∪ F G, EF ∪ G = (E ∪ G)(F ∪ G)
Venn diagrams
DeMorgan’s laws
DeMorgan’s laws
n
[ c n
\
Ai = Aci
i=1 i=1
n
\ c n
[
Ai = Aci .
i=1 i=1
proof (exercise)
13
Relative frequency approach
Suppose an experiment, whose sample space is S, is repeatedly performed under
exactly the same conditions.
Intuitively, the probability of E should be the number of times that the event
E occurs, denoted as fE , in n identical repeated experiments, taking a limit as n
tends to infinity.
fE
P (E) = lim
n→∞ n
14
Some propositions
1. P (∅) = 0.
3. P (E c ) = 1 − P (E).
4. If E ⊂ F , then P (E) ≤ P (F ).
Proof
For (1), let A1 = S, Ai = ∅ for each i = 2, 3, . . .. These are clearly pairwise
disjoint, so P (S) = P ( ∞
S P∞ P∞
i=1 Ai ) = P (S) + i=2 P (Ai ) = i=1 P (∅). Since P (S) =
1(A2), then the last term would be 0.
The second (finite version of (A3)) follows if we let An+1 = An+2 = · · · = ∅. We
P∞ Pn
still have pairwise disjointness and ∪∞ n
i=1 Ai = ∪i=1 Ai , and i=1 P (Ai ) = i=1 P (Ai ),
using (1).
inclusion-exclusion identity
P (E ∪ F ∪ G) = P (E) + P (F ) + P (G)
− P (EF ) − P (EG) − P (F G) + P (EF G)
15
n
X X
P (E1 ∪ E2 ∪ · · · ∪ En ) = P (Ei ) − P (Ei1 Ei2 ) + · · ·
i=1 i1 <i2
X
r+1
+(−1) P(Ei1 Ei2 · · · Eir )
i1 <i2 <···<ir
n+1
+ · · · + (−1) P (E1 E2 · · · En )
Then we interpret the probability of an event as the proportion of the event in the
S. The probability of an event E is
#(E)
P (E) =
#(S)
Example
Roll two dice What is the probability that the sum is (a) equal to 7? (b) equal
to 2? (c) even?
16
Answer. First we need a Sample Space. Is S = {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12} ? Are
event in S equally likely?
The correct sample space: There are 36 possibilities, (a) 6 of the S have a
sum of 7: (1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1). Since they are all equally likely, the
6
probability is 36 = 61 . (b) 1/36. (c)18/26=1/2.
Example 21 (cards). What is the probability that in a poker hand we get exactly 3
of a kind and he other two cards are of different ranks?
. 52
4 4 4
Answer. The probability of 3 aces, 1 king and 1 queen is 3 1 1 5
. We have
12
13 choices for the rank we have 3 of and 2 choices for the other two ranks, so the
answer is 4 4 4
12 3 1 1
13 52
.
2 5
Example 22 (urn with two types of balls). An urn contains n balls, one of which
is special. Draw k of these balls, with each selection being eaully likely to be any
of the balls that remain at the time, what is the probability that the special ball is
chosen?
1 n−1
1 k−1 k
P ( special ball is selected ) = n
=
k
n
let Ai denote the event that the special ball is the ith ball to be chosen, i =
1, . . . , k.
k k
[ X k
P ( special ball is selected ) = P Ai = P (Ai ) =
i=1 i=1
n
17
(a) Let Ei be the event that hand i has all 13 spades, then P (Ei ) = 1/ 52
13
,i =
1, 2, 3, 5.
Ei are mutually exclusive, the probability that one player has all 13 spades is
4
X 52
P (∪4i=1 Ei ) = P (Ei ) = 4/
i=1
13
Example 26 (Matching problem, 5m). Suppose 10 people put a key into a hat and
then withdraw one randomly. What is the probability that at least one person gets
his/her own key?
Answer. Let Ei be the event that the ith person gets his/her own key, we want to
compute P ( 10
S
i=1 Ei ).
18
One can show that
10
[ X X X
P Ei = P (Ei1 ) − P (Ei1 Ei2 ) + P (Ei1 Ei2 Ei3 ) − · · ·
i=1 i1 i1 <i2 i1 <i2 <i3
X
+(−1)n+1 P (Ei1 Ei2 · · · Ein ) + · · · − P (E1 · · · E10 )
i1 <i2 <···<in
Now the probability that at least the 1st, 3rd, 5th, and 7th person gets his or
her own key is the number of ways the 2nd, 4th, 6th, 8th, 9th, and 10th person can
choose a key out of 6, namely 6!, divided by the number of ways 10 people can each
choose a key, namely 10!. So
P (E1 E3 E5 E7 ) = 6!/10!.
10
There are 4
ways of selecting 4 people to have their own key out of 10, so
X 10 6! 1
P (Ei1 ∩ Ei2 ∩ Ei3 ∩ Ei4 ) = =
i ,i ,i ,i
4 10! 4!
1 2 3 4
xj x2 x3
Pn
Note that ex = j=0 j! =1+x+ 2!
+ 3!
+ ···.
19
3 Conditional probability and independence
3.1 Conditional probability
Conditional probability
Example 27. Suppose there are 200 men, of which 100 are smokers, and 100 women,
of which 20 are smokers. What is the probability that a person chosen at random
will be a smoker?
The answer is 120/300.
Example 28. What is the probability that a person chosen at random is a smoker
given that the person is a women? One would expect the answer to be 20/100 and
it is.
P (E ∩ F )
P (E | F ) = .
P(F )
For an event, “new information” (some other event has occured) could change its
probability.
Example 30 (Dice). Suppose you roll two dice. What is the probability the sum is
8?
What is the probability that the sum is 8 given that the first die shows a 3?
20
There are five ways this can happen: (2, 6), (3, 5), (4, 4), (5, 3), (6, 2), so the prob-
ability is 5/36. Let us call this event A.
Let B be the event that the first die shows a 3. Then P (A ∩ B) is the probability
that the first die shows a 3 and the sum is 8, or 1/36. P (B) = 1/6, so P (A | B) =
1/36
1/6
= 1/6.
Example 31 (Rich and Famous). In a town 10% of the inhabitants are rich, 5% are
famous and 3% are rich and famous. If a town’s person is chosen at random and
she is rich what is the probability she is also famous?
Example 32. Suppose a box has 3 red marbles and 2 black ones. We select 2 marbles.
What is the probability that second marble is red given that the first one is red?
Answer. Let A be the event the second marble is red, and B the event that the
first one is red. P (B) = 3/5, while P (A ∩ B) is the probability both are red, or is
the probability that we chose 2 red out of 3 and 0 black out of 2. The P (A ∩ B) =
. 5
3 2
2 0 2
. Then P (A | B) = 3/10
3/5
= 1/2.
Example 33. A family has 2 children. what is the probability that both children are
boys, given that at least one child is a boy?
Answer. The Sample Space S = {bb, bg, gb, gg}, each with probability 1/4.
Let B be the event that at least one child is a boy, and A the event that both
children are boys. P (A) = P (bb) = 1/4 and P (B) = P (bb, bg, gb) = 1/3.
So the answer is 1/4
3/4
= 1/3.
21
Definition 34 (Multiplication rule).
P (EF ) = P (E | F )P
P(F )
or by
P (EF ) = P (F | E)P
P(E).
P (E1 E2 · · · En ) =
P(E2 | E1 )P
P (E1 )P P(E3 | E1 E2 ) · · · P (En | E1 · · · En−1 )
22
Answer. Let the first person have a birthday on some day. The probability that the
364
second person has a different birthday will be 365 . The probability that the third
person has a different birthday from the first two people is 363
365
. So the answer is
364 363 336
365 365
· · · 365 .
Example 38 (Birthday problem). There are k person in a room. What is the chance
that some 2 people have the same birthday? (assuming each person’s birthday is
equally likely to be any of the 365 days, independently.
Answer.
P (D ∩ T ) (.98)(.005)
P (D | T ) = = = 19.8%.
P (T ) (.98)(.005) + (.02)(.995)
23
Answer. Let D be the families that own a dog, and C the families that own a cat.
To find the numerator, we use P (D ∩ C) = P (C | D)PP(D)=(.22)(.36)=.0792. So
P (D | C)=.0792/.3=.264=26.4%.
Example 41. Suppose 30% of the women in a class received an A on the test and
25% of the men received an A. The class is 60% women. Given that a person chosen
at random received an A, what is the probability this person is a women?
Answer. Let A be the event of receiving an A, W be the event of being a woman, and M the event of being a man.
We are given P (A | W ) = .30, P (A | M ) = .25, P (W ) = .60 and we want P (W | A). From the definition
P (W ∩ A)
P (W | A) = .
P (A)
So
P (A) = P (W ∩ A) + P (M ∩ A) = .18 + .10 = .28.
Finally,
P (W ∩ A) .18
P (W | A) = = .
P (A) .28
Bayes’ rule
To get a general formula, we can write
P (E ∩ F ) P (E | F )PP(F )
P (F | E) = =
P (E) P (E ∩ F ) + P (E ∩ F c )
P (E | F )P
P(F )
= .
P(F ) + P (E | F c )P
P (E | F )P P(F c )
Partition
24
A set of events F1 , · · · , Fn define on S is a partition if they are
• Exhaustive:
Sn
i=1 Fi = S.
with P(Fi ) > 0 for all i. Then for any event E in S, we have
Law of total Probability
n
[ n
X
P (E) = P (ES) = P ( EFi ) = P (EFi )
i=1 i=1
n
X
= P (E | Fi )P
P(Fi )
i=1
Bayes’ rule
Proposition 3.1
Let F1 , · · · , Fn be a set of mutually exclusive and exhaustive events. Sup-
pose now that E has occurred and we are interested in determining which
one of the Fj also occurred. Then we have
P (EFj )
P(Fj | E) =
P(E)
P(E | Fj )PP(Fj )
= Pn
i=1 P (E | Fi )P
P(Fi )
Example 42 (3l, two sided card). 3 cards, both sides of the first card are colored
red, both sides of the second card are colored black and one side of the third card
is colored red and the other side black. Choose one card and the upper side of the
card is red, what is the probability that the other side is colored black?
25
3r P(r|3)
P (3)P
P (3 | r) = r
= P(r|1)+P
P (1)P P(2)PP(r|2)+P
P(3)P
P(r|3) = 1/3
P (E ∩ F )
P (E | F ) = = P (E),
P (F )
or P (E ∩ F ) = P (E)P
P(F ). We use the latter equation as a definition:
We say E and F are independent if
P (E ∩ F ) = P (E)P
P(F ).
Example 44. Suppose you flip two coins. The outcome of heads on the second is
independent of the outcome of tails on the first. To be more precise, if A is tails
for the first coin and B is heads for the second, and we assume we have fair coins
(although this is not necessary), we have P (A ∩ B) = 41 = 12 · 12 = P (A)P
P(B).
Example 45. Suppose you draw a card from an ordinary deck. Let E be you drew
1 1
an ace, F be that you drew a spade. Here 52 = P (E ∩ F ) = 13 · 14 = P (E) ∩ P (F ).
26
Proof. P (E ∩ F c ) = P (E) − P (E ∩ F ) = P (E) − P (E)P
P(F ) = P (E)[1 − P (F )] =
P(F c ).
P (E)P
Definition 46 (independent for three events). We say the three events E, F , and
G are independent iff
If only (i), (ii) and (iii) are satified, we say the three events are pairwise indepen-
dent.
Example 47. Suppose you roll two dice, E is that the sum is 7, F that the first is a
4, and G that the second is a 3. E and F are independent, as are E and G and F
and G, but E, F and G are not.
Example 48 (Independent trials). What is the probability that exactly 3 threes will
show if you roll 10 dice?
Answer. The probability that the 1st, 2nd, and 4th dice will show a three and the other 7
13 57 11515
will not is 6 6 . Independence is used here: the probability is 66666 · · · 56 . The probability
that the 4th, 5th, and 6th dice will show a three and the other 7 will not is the same thing.
13 57
So to answer our original question, we take and multiply it by the number of ways
6 6
of choosing 3 dice out of 10 to be the ones showing threes. There are 10
3 ways of doing
that.
This is a particular example of what are known as Bernoulli trials or the binomial
distribution.
27
n independent idential Bernoulli trials
Suppose you have n independent trials, where the probability of a success is p.
The the probability there are k successes is the number of ways of putting k objects
in n slots (which is nk ) times the probability that there will be k successes and
Example 52 (The problem of the points). The division problem (Fermat and Pascal).
Example 53 (gambler’s ruin, random walk in a line ). Suppose you toss a fair coin
repeatedly and independently. If it comes up heads, you win a dollar, and if it comes
up tails, you lose a dollar. Suppose you start with $50. What’s the probability you
will get to $200 before you go broke?
3.4 P (· | F ) is a probability
P (· | F ) is a probability
It satisfies the axims for a probability function, namely with P (F ) > 0,
1. P (E | F ) ≥ 0
2. P (F | F ) = 1
28
3. If E1 , E2 , E3 , . . . are mutually exclusive events, then P ( ∞
S P∞
i=1 Ei | F ) = i=1 P (Ei |F ).
proof
Example 54. Suppose there are three cards, the first of which has red on both sides,
the second has black on both sides, and the third has red on one side and black on
the other.
A card is picked at random and a side chosen at random. If it is red, what is the
probability the other side will be red also?
Answer. Let A denote the card with two red sides, B the one with two black sides,
and C the third card. Let R denote the upturned face is red. we want
P (A ∩ R) P (A) 1/3
P (A | R) = = = = 2/3.
P (R) P (R) 1/2
example 5a, 5c
P (E1 | E2 F ) = P (E1 | F )
or equivalently
P (E1 E2 | F ) = P (E1 | F )P
P(E2 | F )
29
Example 55 (Updating information sequentially).
30
4 Random variables
4.1 Random variables
Random variables
Example 56. One rolls a die, and the outcome is observed.
Ω = {1, 2, 3, 4, 5, 6}
Random variables
Example 57 (Coin tossing). Toss a 3 fair coins and the sequence of heads and tails
is observed.
Ω = {hhh, hht, htt, hth, ttt, tth, thh, tht}
For an outcome ω ∈ Ω, we might be interested in (1) the total number of heads,
(2) the total number of tails, (3) the number of heads minus the number of tails.
random variable
A random variable (RV) is a function X that maps the sample space Ω to the
real numbers R.
Random variables are usually denoted by X, Y, Z, . . .. (Capital letter)
The random variable X assigns to each element ω ∈ Ω a real numberer, i.e.
X : Ω → R.
or X(ω) = x, x ∈ R.
The space (range) of X is the set of real numbers.
1. Toss a 3 fair coins. Let X be the total number of heads, Y be the total number
of tails, and Z the number of heads minus the number of tails.
31
2. If one rolls a die, let Y be 1 if an odd number is showing and 0 if an even
number is showing.
Example 58 (Coin tossing). Toss a 3 fair coins. Let X be the number of heads show-
ing. Graph for the relations between sample space, random variable and probability.
32
• X = {0, 1, 2, 3}
• pX (0) = 1/8, pX (1) = 3/8, pX (2) = 3/8, pX (3) = 1/8 and pX (x) = 0 for x ∈
/ X.
• graphical display
Example 59. Draw n balls from an urns contain N1 red and N2 white balls at random
and without replacement. Let X be the number of red balls in the balls we drawn.
What should a pmf look like?
Example 60 (fair dice: uniform).
• fX (x) = 0, for x ∈
/X
•
P
x∈X fX (x) = 1.
P
Moreover, P X (X ∈ A) = x∈A fX (x), where A ⊂ Ω.
P
Note x∈Ω fX (x) = 1 since X must equal something.
FX (b) = P (X ≤ b).
33
1. F is a nondecreasing function; i.e., if a < b, then F (a) ≤ F (b).
2. limb→∞ F (b) = 1,
3. limb→−∞ F (b) = 0.
4. F is right continuous. That is, for any b and any decreasing sequence bn ,
n ≥ 1, that converges to b, limn→∞ F (bn ) = F (b)
Given a distribution function, one can answer questions about probabilities. for
example,
34
Definition 65 (Expected value of RV X (I)). More generally, we define
X
EX = xp(x)
x:p(x)>0
Example 3b, 3d
Expected value as the center of gravity of a distribution of mass.
Expected value
35
In the example we just gave, we have S = {1, 2, 3, 4, 5, 6} and X(1) = 3, X(2) =
3, X(3) = 4, X(4) = 4, X(5) = 10, X(6) = 10, and each ω has probability 16 . So
using the second definition,
E X = 3( 61 ) + 3( 16 ) + 4( 61 ) + 4( 16 ) + 10( 16 ) + 10( 16 ) = 34
6
= 17
3
.
We see that the difference between the two definitions is that we write, for
P(X = 3) as one of the summands in the first definition, while in the
example, 3P
P(X = 1) + 3P
second we write this as 3P P(X = 2).
Expectation
P
E (X + Y ) = (X(ω) + Y (ω))PP(ω)
Pω∈S
= [X(ω)PP(ω) + Y (ω)P
P(ω)]
Pω P
= P(ω) + ω Y (ω)P
ω X(ω)P P(ω)
= E X + E Y.
Similarly we have
• if c is a constant, E(c) = c
• E (cX) = cE
EX if c is a constant.
36
To see this, recall the formula for a geometric series:
1
1 + x + x2 + x 3 + · · · = .
1−x
If we differentiate this, we get
1
1 + 2x + 3x2 + · · · = .
(1 − x)2
We have
E X = 1( 14 ) + 2( 18 ) + 3( 16
1
) + ···
h i
= 14 1 + 2( 12 ) + 3( 41 ) + · · ·
1 1
= = 1.
4
(1 − 21 )2
Example 70. Roll a die 10 times and let X be the minimum of the number rolled
within these 10 times. Repeat this trial several times. What is the expected value
of X.
P (X ≥ 1) = 1, P (X ≥ 2) = (5/6)10 , P (X ≥ 3) = (4/6)10 , P (X ≥ 4) =
(3/6)10 , P (X ≥ 5) = (2/6)10 , P (X ≥ 6) = (1/6)10
P
E (X) = i P (X ≥ i) = 1.17984
37
if more than one value of X leads to the same value of X 2 .
Suppose that P (X = −2) = 18 , P (X = −1) = 14 , P (X = 1) = 38 , P (X = 2) = 41 .
Then if Y = X 2 , P (Y = 1) = 58 and P (Y = 4) = 83 .
Then
E X 2 = (1) 58 + (4) 38 = (−1)2 14 + (1)2 83 + (−2)2 18 + (2)2 14 .
So E X 2 = x x2 P (X = x).
P
EX 2 = x2 p(x). E X n = xn p(x).
P P
x:p(x)>0 x:p(x)>0
is called the variance of X. The square root of Var(X) is the standard deviation of
X.
p
SD(X) = Var(X)
The variance measures how much spread there is about the expected value.
38
We toss a fair coin and let X = 1 if we get heads, X = −1 if we get tails. Then
E X = 0, so X − E X = X, and then VarX = E X 2 = (1)2 21 + (−1)2 21 = 1.
Example 74. We roll a die and let X be the value that shows.
We have previously calculated E X = 72 . So X − E X equals − 52 , − 32 , − 12 , 12 , 32 , 25 ,
each with probability 16 . So
VarX = (− 52 )2 16 + (− 32 )2 61 + (− 12 )2 16 + ( 21 )2 16 + ( 23 )2 16 + ( 25 )2 16 = 35
12
.
VarX = E X 2 − 2E
E(Xµ) + E (µ2 ) = E X 2 − 2µ2 + µ2
= E X 2 − (E
EX)2 .
linear transformation of a RV
EX + b and Var(aX + b) = a2 Var(X)
For a, b ∈ R, E (aX + b) = aE
Note E X n is called the nth moment of X. E (X − b)n is called the nth moment
of X about b. E (X(X − 1))
population mean µ, population variance σ 2 sample variance X and sample stan-
dard deviation s
39
Examples for Bernoulli experiments
Nature of trial success failure probability of p and q
Tossing a fair coin head tail 1/2and 1/2
Rolling a die six not six 1/6 and 5/6
Rolling a pair of dice double six not double six 1/36 and 35/36
Birth of a chile girl boy 0.487 and 0.513
Binomial
A r.v. X has a binomial distribution with parameters n and p if
n k
P (X = k) = p (1 − p)n−k .
k
40
The cumbersome way is as follows.
n n
X n k n−k
X n k
EX = k p (1 − p) = k p (1 − p)n−k
k=0
k k=1
k
n
X n!
= k pk (1 − p)n−k
k=1
k!(n − k)!
n
X (n − 1)!
= np pk−1 (1 − p)(n−1)−(k−1)
k=1
(k − 1)!((n − 1) − (k − 1))!
n−1
X (n − 1)!
= np pk (1 − p)(n−1)−k
k=0
k!((n − 1) − k)!
n−1
X n−1 k
= np p (1 − p)(n−1)−k = np.
k=0
k
P(Yi Yj = 1)+0·P
Now E Yi Yj = 1·P P(Yi Yj = 0) = P (Yi = 1, Yj = 1) = P (Yi = 1)P
P(Yj =
2 2
1) = p using independence. The square of Y1 + · · · + Yn yields n terms, of which
n are of the form Yk2 . So we have n2 − n terms of the form Yi Yj with i 6= j. Hence
VarX = E X 2 − (E
EX)2 = np + (n2 − n)p2 − (np)2 = np(1 − p).
41
Some R for calculating the pmf and cdf
dbinom(i, n, p): calculate the probability P (X = i) when X follows a binomial
distribution with parameters n and p.
pbinom(i, n, p): calculate the cumulative probability P (x ≤ i) when X follows
a binomial distribution with parameters n and p.
λi
P (X = i) = e−λ .
i!
Note ∞ i λ
P
i=0 λ /i! = e , so the probabilities add up to one.
To compute expectations,
∞ ∞
X λi X λi−1
EX = ie−λ = e−λ λ = λ.
i=0
i! i=1
(i − 1)!
The above proposition shows that the Poisson distribution models binomials
when the probability of a success is small. The number of misprints on a page, the
number of automobile accidents, the number of people entering a store, etc. can all
be modeled by Poisson.
For simplicity, let us suppose λ = np. In the general case we use λ = np. We
42
write
n!
P (Xn = i) = pi (1 − p)n−i
i!(n − i)!
n(n − 1) · · · (n − i + 1) λ i λ n−i
= 1−
i! n n
n(n − 1) · · · (n − i + 1) λi (1 − λ/n)n
= .
ni i! (1 − λ/n)i
Bernoulli process
A Bernoulli process is a sequence of Bernoulli trials, where each trial produces a 1 (a
success) with probability p and a 0 (a failure) with probability 1 − p, independently
of what happens in other trials.
example: a sequence of independent coin tosses, where the probability of head
in each toss is a fixed number p, where 0 < p < 1.
Binomial
A r.v. X has a binomial distribution with parameters n and p if
n k
P(X = k) = p (1 − p)n−k .
k
43
Geometric distribution
Let p be a parameter, X is called to follow a geometric distribution if its pmf is
given by
P (X = k) = (1 − p)k−1 p for k = 1, 2, . . . .
Example 75. If we toss a coin over and over and X is the first time we get a head,
then X will have a geometric distribution.
To see this, to have the first success occur on the k th trial, we have to have k − 1
failures in the first k − 1 trials and then a success. The probability of that is
(1 − p)k−1 p.
P∞
nrn = 1/(1 − r)2 (differentiate the formula rn = 1/(1 − r)), we
P
Since n=0
see that
E X = 1/p and VarX = (1 − p)/p2 .
Coupon-collecting problems
There are a set of N different coupons. Suppose each coupon is equally likely to
be collected. What is the expected coupons one must have in order to obtain the
complete set of coupons. Let X be the number a coupons collected before a complete
set is attained. Define Xi , i = 1, 1, 2, · · · , N , to be the number of additional coupons
that need be obtained after i−1 distinct types have been collected in order to obtain
i distinct coupons. The expected number of coupons required to get the complete
set of N coupons is
N N N 1 1
E [X] = 1 + + + ··· + = N (1 + + · · · + )
N −1 N −2 1 2 N
Negative Binomial
44
Let r and p be parameters and set
n−1 r
P (X = n) = p (1 − p)n−r , n = r, r + 1, . . . .
r−1
r(1 − p)
VarX =
p2
E X = E Y1 + · · · + E Yr = r/p.
If we toss a coin over and over and K is the number of tails before we get the
rth head, then
k+r−1 r
P (K = k) = p (1 − p)k , k = 0, 1, 2, ...
k
Uniform
Let
1
P(X = k) = for k = 1, 2, . . . , n.
n
This is the distribution of the number showing on a die (with n = 6), for example.
2
Note E X = n+1 2
VarX = n 12−1 .
45
Hypergeometric
Set
m N −m
i n−i
P (X = i) = N
n
m
E X = np, p =
N
n−1
VarX = np(1 − p)(1 − )
N −1
example
[6a] Flip five fair coins when the outcomes are independent. The probability
mass function of the number of heads obtained.
Let X be the number of heads (successes) that appear.
[6b defective screws] P {defective} = 0.01, independently of each other. The
company sell the screws in packages of 10 and offers a money-back guarantee that
at most 1 of 10 screws in defective. What proportion of packages sold must the
company replace?
Let X be the number of defective screws in a package.
example
[6c wheel of fortune: three dice] A player bit a number. If the number bit by
the player appears i times, i = 1, 2, 3, then the player wins i units; if the number
bit by the player does not appear, then the player loses 1 units. Is this a fair game?
46
[6d dominant and recessive genes]. A trait (eye color) of a person is classified on
the basis of one pair of genes. d a dominant gene and r a recessive gene. dd, dr,
rd are alike in appearance, and rr is purely recessive. Two hybrid parents have a
total of 4 kids, what is the probability that 3 of the 4 kids have the appearance of
the dominant gene?
Example 76 (8a urn). An urn contains N white and M black balls. Balls are ran-
domly selected, one at a time, until a black ball is obtained. If we assume that each
selected ball is replaced before the next one is drawn, what is the probability that
(a) exactly n draws are needed? (b)at least k draws are needed?
M N n−1
P∞ n−1 k−1
Answer. (a)PP(X = n) = (M +N )n
P(X ≤ k) = NM
(b)P +M
N
n=k N +M
N
= N +M
Example 77 (8g dice). Find the expected value and the variance of the number of
times one must throw a die until the outcome 1 has occurred 4 times.
r(1−p)
Answer. E (X) = 24 Var(X) = p2
= 120
Example 8h
N animals in a certain region. How to estimate N 1. catch a number, say m,
of animals, mark them and release them. 2. make a new catch, say n after some
period. Let X denote the number of marked animals in this second capture.
The probability of the observed event when there are actually N animals in the
region.
Example 78 (8i). components packed in lot of size 10. Inspect policy: inspect 3
components randomly and to accept only if all 3 are good. It is known that 30
percent of the lots have 4 defective components and 70 percent have only 1. What
proportion of lots does the purchaser reject?
Example 79. Suppose on average there are 5 homicides per month in a given city.
What is the probability there will be at most 1 homicide in a certain month?
47
Answer. If X is the number of homicides, we are given that E X = 5. Since the
expectation for a Poisson is λ, then λ = 5. Therefore P (X = 0) + P (X = 1) =
e−5 + 5e−5 .
dpois(0, 5)+dpois(1, 5) or ppois(1, 5)
Example 80. Suppose on average there is one large earthquake per year in California.
(a) What’s the probability that next year there will be exactly 2 large earthquakes?
Answer. λ = E X = 1, so P (X = 2) = e−1 ( 21 ).
Poisson process
Let X be the number of events occur in a given time interval.
Then the process is called an approximate Poisson Process with parameter λ > 0
if it follows the following assumptions.
1. The events that occur in one time-interval are independent of those occurring
in any other non-overlapping time-interval.
Poisson process
Let λ be the mean per unit interval under assumption (1), (2) and (3), the
number of events occurred in an interval of length t is a Poisson random variable
with mean equal to λt.
Counting
A continuous-time analog of the Bernoulli process where there is no natural way
of dividing time into discrete periods.
48
Poisson process
If events in a Poisson Process occur at the mean λ per unit.
Let N (t) denote the number of events occur in an interval of length t, then
N (t) ∼ Poisson(λt).
(λt)k
P (N (t) = k) = e−λt , k = 0, 1, 2 · · · .
k!
(b) Let X be the amount of time (in weeks), starting from now, until the next
earthquake. What is the probability distribution of X?
We note that X will be greater than t if and only on events occur within the
next t weeks.
P (X > t) = P (N (t) = 0) = e−λt .
So the (cumulative) probability function of X is
49
4.7 Moment generating functions
Moment generating functions, mgf
We define the moment generating function mX by
mX (t) = E etX ,
provided this is finite. In the discrete case this is equal to x etx p(x).
P
We call mx (t) the moment generating function because all of the moment of X
can be obtained by successively differentiating mX (t) and the evaluating the result
at t = 0. For example.
0 d tX d
mX (t) = E e == E (etX )
dt dt
= E XetX
Let us compute the moment generating function for some of the distributions we
have been working with.
Bernoulli: pet + (1 − p).
Binomial: using independence,
P Y Y
E et Xi
=E etXi = E etXi = (pet + (1 − p))n ,
Poisson:
X etk e−λ λk X (λet )k t t
E etX = = e−λ = e−λ eλe = eλ(e −1) .
k! k!
mean and variance of X by m.g.f.
Proposition :
If mX (t) = mY (t) < ∞ for all t in an interval, then X and Y have the same
distribution.
50
Moment generating functions
We define the moment generating function MX by
MX (t) = E etX ,
provided this is finite. In the discrete case this is equal to x etx p(x), in the con-
P
R
tinuous case etx f (x)dx.
It two random variables have the same m.g.f. then they must have the same
distribution of probability. The moment generating function uniquely determines
the distribution of a random variable.
We say these two random variables X and Y have identical distributions.
Note that it would be wrong to say they were equal.
HT
If Mx (t) = et ( 36 ) + e2t ( 26 ) + e3t ( 16 ). Find E (X).
et /2
If Mx (t) = 1−e t /2 , t < ln 2. Find the pmf of X.
We call Mx (t) the moment generating function because all of the moment of X
can be obtained by successively differentiating MX (t) and the evaluating the result
at t = 0. For example.
0 d tX
MX (t) = Ee
dt
d
= E (etX )
dt
= E XetX
51
The moment generating function for some distributions
Bernoulli:
pet + (1 − p).
Binomial: using independence,
P Y Y
Eet Xi
=E etXi = E etXi = (pet + (1 − p))n ,
Poisson:
X etk e−λ λk X (λet )k t t
EetX = = e−λ = e−λ eλe = eλ(e −1) .
k! k!
e /2 t
ex : The m.g.f. of X is Mx (t) = 1−e t /2 , for t < ln 2. Find the p.m.f. of X. (Note
the the m.g.f. exists only when the sum of the series is finite, so sometime the range
of t need to be specified.)
Geometric(p)
pet
E etX = if t < − ln q
1 − qet
Negative binomial (r, p)
h pet ir
E etX = if t < − ln q
1 − qet
52