0% found this document useful (0 votes)
6 views7 pages

St2131-Cheatsheet Otherstudent

ST2131 cheatsheet
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views7 pages

St2131-Cheatsheet Otherstudent

ST2131 cheatsheet
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

ST2131 02.

AXIOMS OF PROBABILITY N7 - P (Ec ) = 1 − P (E)


AY21/22 SEM 2 N8 - if E ⊂ F , then P (E) ≤ P (F )
Sample Space and Events N9 - P (E ∪ F ) = P (E) + P (F ) − P (E ∩ F )
github/jovyntls
• sample space → The set of all outcomes of an experiment (where outcomes are N10 - Inclusion-Exclusion identity where n = 3
not predictable with certainty) P (E ∪ F ∪ G) = P (E) + P (F ) + P (G)
01. COMBINATORIAL ANALYSIS • event → Any subset of the sample space
tricky - E18, E20-22, E23, E26 • union of events E and F → E ∪ F is the event that contains all outcomes that − P (EF ) − P (EG) − P (F G)
are either in E or F (or both). + P (EF G)
The Basic Principle of Counting • intersection of events E and F → E ∩ F or EF is the event that contains all N11 - Inclusion-Exclusion identity -
• combinatorial analysis → the mathematical theory of counting outcomes that are both in E and in F . n
X X
• basic principle of counting → Suppose that two experiments are performed. If • complement of E → E c is the event that contains all outcomes that are not in E . P (E1 ∪ E2 ∪ · · · ∪ En ) = P (Ei ) − P (Ei1 Ei2 ) + . . .
exp1 can result in any one of m possible outcomes and if, for each outcome of • subset → E ⊂ F is all of the outcomes in E that are also in F . i=1 i1 <i2
exp1, there are n possible outcomes of exp2, then together there are mn possible • E ⊂F ∧F ⊂E ⇒E =F
X
outcomes of the two experiments.
+ (−1)r+1 P (Ei1 Ei2 . . . Eir ) + . . .
i1 <i2 <···<ir
• generalized basic principle of counting → If r experiments are performed such DeMorgan’s Laws
that the first one may result in any of n1 possible outcomes and if for each of these n n + (−1)n+1 P (E1 E2 . . . En )
Ei )c = Eci
S T
n1 possible outcomes, there are n2 possible outcomes of the 2nd exp, and if ..., (
Proof. Suppose an outcome with probability ω is in exactly m of the events Ei ,
i=1 i=1
then there is a total of n1 · n2 · · · · · nr possible outcomes of r experiments. Sn where m > 0. Then
Proof. to show LHS
S ⊂ RHS: let x ∈ ( i=1 Ei )c LHS: the outcome is in E1 ∪ E2 ∪ · · · ∪ En and ω will be counted once in
Permutations ⇒x∈ / n / E1 and x ∈ / E2 . . . and x ∈
i=1 Ei ⇒ x ∈ / En P (E1 ∪ E2 ∪ · · · ∪ En )
factorials - 1! = 0! = 1 c c c
⇒x∈E T1 and x ∈ E2 . . . and x ∈ En RHS:
N1 - if we know how to count the number of different ways that an event can occur, ⇒x∈ n
i=1 Eic
we will know the probability of the event.
Tn c • the outcome is in exactly m of the events Ei and ω will be counted exactly
to show RHS ⊂ LHS: let x ∈ i=1 Ei n
N2 - there are n! different arrangements for n objects. m
times in
P
n n 1
P (Ei )
N3 - there are n ! n n! different arrangements of n objects, of which n1 are )c Eci
T S
( Ei = i=1
1 2 ! ...nr ! m
alike, n2 are alike, ..., nr are alike. i=1 i=1 • the outcome is contained in 2
subsets of the type Ei1 Ei2 and ω will be
m
counted 2 times in
P
Proof. using the first law of DeMorgan, negate LHS to get RHS P (Ei1 Ei2 )
Combinations i1 <i2
n
N4 - r = (n−r)! n!
represents the number of different groups of size r that could • . . . and so on
r! Axioms of Probability m m m m
be selected from a set of n objects when the order of selection is not considered hence RHS = 1
ω− 2
ω + 3
ω − ··· ± m
ω
relevant. 
definition 1: relative frequency m
m
(−1)i = binomial theorem where x = −1, y = 1
P
n n−1 n−1 =ω i
N4b - r = r−1 + r
, 1≤r≤n n(E) i=0
P (E) = lim = 0 = LHS
n−1
Proof. If object 1 is chosen ⇒ r−1 ways of choosing the remaining objects.
n→∞ n
n−1 problems with this definition: e.g. For an outcome with probability ω and n = 3
If object 1 is not chosen ⇒ r
ways of choosing the remaining objects. n(E)
1. n
may not converge when n → ∞ • Case 1. w = P (E1 E2 )
n  
X n 2.
n(E)
may not converge to the same value if the experiment is repeated LHS = ω
N5 - The Binomial Theorem - (x + y)n = xk y n−k n
k RHS = (ω + ω + 0) - (ω + 0 + 0) + 0 = ω
k=0
definition 2: Axioms • Case 2. ω = P (E1 ∩ E2 ∩ E3 )
Proof. by mathematical induction: n = 1 is true; expand; sub dummy variable;
LHS = ω
combine using N4b; combine back to final term Consider an experiment with sample space S . For each event E of the sample space
RHS = (ω + ω + ω ) - (ω + ω + ω ) + ω = ω
S , we assume that a number P (E) is defined and satisfies the following 3 axioms:
Multinomial Coefficients 1. 0 ≤ P (E) ≤ 1 N12 -
n n
N6 - n ,n ,...,n = n ! n n! represents the number of possible divisions of

2. P (S) = 1 (i) P ( n
S P
1 2 r 1 2 ! ... nr ! i=1 Ei ) ≤ P (Ei )
n distinct objects into r distinct groups of respective sizes n1 , n2 , . . . , n3 , where 3. For any sequence of mutually exclusive events E1 , E2 , . . . i=1
n
n1 + n2 + · · · + nr = n (i.e., events for which Ei Ej = ∅ when i 6= j ),
(ii) P ( n
S P P
i=1 Ei ) ≥ P (Ei ) − P (Ei Ej )
Proof. using basic counting principle, ∞
S ∞
P i=1 j<i
= n  n−n1  n−n1 −n2 
. . . n−n1 −n 2 −nr−1
 P( Ei ) = P (Ei ) n
(iii) P ( n
S P P P
i=1 Ei ) ≤ P (Ei ) − P (Ei Ej ) + k<j<i P (Ei Ej Ek )
n1 n2 n3 nr i=1 i=1
n! (n−n1 )! (n−n1 −n2 −···−nr−1 ) i=1 j<i
= × (n−n1 −n2 )! n2 !
× ··· × P (E) is the probability of event E .
(n−n1 )! n1 ! 0! nr ! (iv) and so on.
n!
= n1 ! n2 ! ... nr ! Simple Propositions Proof.
n
S c
Ei =E1 ∪E1 c c
E2 ∪E1 c c
E2 E3 ∪···∪E1 c
E2 ...En−1 En
N7 - The Multinomial Theorem: (x1 + x2 + · · · + xr )n N1 - P (∅) = 0 i=1
xn1 xn . . . xn
P n! 2 r n
= n1 ! n2 ! ...nr ! 1 2 r n n P(
S c
Ei )=P (E1 )+P (E1 c c
E2 )+P (E1 c c
E2 E3 )+···+P (E1 c
E2 ...En−1 En )
N2 - P ( (aka axiom 3 for a finite n)
S P
(n1 ,...,nr ):n1 +n2 +···+nr =n Ei ) = P (Ei ) i=1
i=1 i=1
Number of Integer Solutions of Equations N3 - strong law of large numbers - if an experiment is repeated over and over
Sample Space having Equally Likely Outcomes
n−1 tricky - 14, 15, 16, 18, 19, 20
N8 - there are r−1 distinct positive integer-valued vectors (x1 , x2 , . . . , xr ) again, then with probability 1, the proportion of time during which any specific event
satisfying x1 + x2 + · · · + xr = n, xi > 0, i = 1, 2, . . . , r E occurs will be equal to P (E). Consider an experiment with sample space S = {e1 , e2 , . . . , en }. Then
1 1
! cannot be directly applied to N8 as 0 value is not included N6 - the definitions of probability are mathematical definitions. They tell us which set P ({e1 }) = P ({e2 }) = · · · = P ({en }) = n or P ({ei }) = n .
n+r−1 # of outcomes in E # of outcomes in E
N9 - there are r−1
distinct non-negative integer-valued vectors functions can be called probability functions. They do not tell us what value a N1 - for any event E , P (E) = # of outcomes in S = n
(x1 , x2 , . . . , xr ) satisfying x1 + x2 + · · · + xr = n probability function P (·) assigns to a given event E . increasing sequence of events {En , n ≥ 1} →
Proof. let yk = xk + 1 ⇒ y1 + y2 + · · · + yr = n + r probability function ⇐⇒ it satisfies the 3 axioms. E1 ⊂ E2 ⊂ · · · ⊂ En ⊂ En+1 ⊂ . . .

−k n
N6 - if E and F are independent and E and G are independent, P (Aj1 Aj2 · · · Ajn ) = ( NN )
S
lim En = Ei
n→∞ i=1 6⇒ E and F G are independent
−1
NP
decreasing sequence of events {En , n ≥ 1} → N7 - For independent trials with probability p of success, probability of m successes N  N −1n
Hence P (T > n) = i N
(−1)i+1
E1 ⊃ E2 ⊃ · · · ⊃ En ⊃ En+1 ⊃ . . . before n failures, for m, n ≥ 1, i=1

T method 1 method 2
lim En = Ei
n→∞ Pn−1,m A win m+n−1
i=1
S
X  m + n − 1 k Probability Mass Function
p
B win Pn,m = p (1 − p)m+n−1−k
03. CONDITIONAL PROBABILITY AND 1− Pn,m−1 A win k=n
k • for a discrete r.v., we define the probability mass function (pmf) of X by
p F p(a) = P (X = P a)
INDEPENDENCE B win = P (exactly k successes in m + n − 1 trials)
• cdf, F (a) = p(x) for all x ≤ a
tricky - E6, urns (p.37) recursive approach to solving probabilities: see page 85 alternative approach ∞
• if X assumes one of the values x1 , x2 , . . . , then
P
p(xi ) = 1
Conditional Probability 04. RANDOM VARIABLES i=1
P (E∩F ) • the pmf p(a) is positive for at most a countable number of values of a
N1 - if P (F ) > 0. then P (E|F ) = • random variable → a real-valued function defined on the sample space
P (F ) a 1 2 4
N2 - multiplication rule - P (E1 E2 . . . En ) = • e.g. 1 1 1
Types of Random Variables p(a) 2 4 4
P (E1 )P (E2 |E1 )P (E3 |E1 E2 ) . . . P (En |E1 E2 . . . En−1 ) • discrete variable → a random variable that can take on at most a countable
N3 - axioms of probability apply to conditional probability • X is a Bernoulli r.v. with parameter p if →
( number of possible values
1. 0 ≤ P (E|F ) ≤ 1 p, x = 1, (’success’)
p(x) =
2. P (S|F ) = 1 where S is the sample space 1 − p, x = 0 (’failure’)
3. If Ei (i ∈ Z≥1 ) are mutually exclusive events, then Cumulative Distribution Function
• Y is a Binomial r.v. with parameters n and p → Y = X1 + X2 + · · · + Xn
∞ ∞
S P where X1 , X2 , . . . , Xn are independent Bernoulli r.v.’s with parameter p. • for a r.v. X , the function F defined by F (x) = P (X ≤ x), −∞ < x < ∞, is
P( Ei |F ) = P (Ei |F )
1 1
n
• P (X = k) = k pk (1 − p)n−k called the cumulative distribution function (cdf) of X .
N4 - If we define Q(E) = P (E|F ), then Q(E) can be regarded as a probability • P (k successes from n independent trials each with probability p of success) • aka distribution function
function on the events of S , hence all results previously proved for probabilities apply. • e.g. number of red balls out of n balls drawn with replacement • F (x) is defined on the entire real line
• Q(E1 ∪ E2 ) = Q(E1 ) + Q(E2 ) − Q(E1 E2 )

• E(Y ) = np, V ar(Y ) = np(1 − p) 0, a<1
• P (E1 ∪ E2 |F ) = P (E1 |F ) + P (E2 |F ) − P (E1 E2 |F )

• Negative Binomial → X = number of trials until k successes are obtained 1,

1≤a<2
• e.g. number of balls drawn (with replacement) until k red balls are obtained • e.g. F (a) = 2
Total Probability & Bayes’ Theorem  34 , 2≤a<4
• Geometric → X = number of trials until a success is obtained


1, a≥4

conditioning formula - P (E) = P (E|F )P (F ) + P (E|F c )P (F c ) • P (X = k) = (1 − p)k−1 · p where k is the number of trials needed
tree diagram - • e.g. number of balls drawn (with replacement) until 1 red ball is obtained
P (E |F ) E • Hypergeometric → X = number of trials until success, without replacement Expected Value
P (EF ) P (F )·P (E|F )
P (F ) F P (F |E) = = • e.g. number of red balls out of n balls drawn without replacement • aka population mean/sample mean, µ
Ec P (E) P (E)
P (F c c
P (F c )·P (E|F c )
) c
P (E |F ) E P (F c |E)
P (EF )
= P (E) = Summary p(x), the expectation or the
• if X is a discrete random variable having pmf P
P (E)
Fc expected value of X is defined as E(X) = x · p(x)
Ec binomial X = # of successes in n trials w/ replacement np x
Total Probabililty negative binomial X = # of trials until k successes k/p N1 - if a and b are constants, then E(aX + b) = aE(X) + b
geometric X = # of trials until a success 1/p N2 - the nth moment of of X is given as E(X n ) = xn · p(x)
P
theorem of total probability - Suppose F1 , F2 , . . . , Fn are mutually exclusive x
n n n hypergeometric X = # of successes in n trials, no replacement rn/N (
events such that Fi = S , then P (E) = 1, if A occurs
S P P
P (EFi ) = P (Fi )P (E|Fi ) • I is an indicator variable for event A if I = . then E(I) = P (A).
i=1 i=1 i=1 Properties
0, if Ac occurs
Bayes Theorem N1 - if X ∼ Binomial(n, p), and Y ∼ Binomial(n − 1, p),
P (EFj ) P (Fj )P (E|Fj ) then E(X k ) = np · E[(Y + 1)k−1 ] P
P (Fj |E) = = Proof of N1. E(aX +
Pb) = x (aX +Pb)p(x)
P (E) n N2 - if X ∼ Binomial(n, p), then for k ∈ Z+ ,
=a· xp(x) + b · x p(x) = a · E(X) + b
P
P (Fi )P (E|Fi )
(n−k+1)p x
i=1 P (X = k) = k(1−p)
· P (X = k − 1)
application of bayes’ theorem
P (A|B1 )·P (B1 ) Coupon Collector Problem finding expectation of f(x)
P (B1 | A) = P (A|B1 )·P (B1 )+P (A|B2 )·P (B2 )
Q. Suppose there are N distinct types of coupons. If T denotes the number of
Let A be the event that the person test positive for a disease. • method 1, using pmf of Y : let Y = f (X) P. Find corresponding X for each Y .
coupons needed to be collected for a complete set, what is P (T = n)?
B1 : the person has the disease. B2 : the person does not have the disease. • method 2, using pmf of X : E[g(x)] = i g(xi )p(xi )
A. P (T > n − 1) = P (T ≥ n) = P (T = n) + P (T > n) • where X is a discrete r.v. that takes on one of the values of xi with the
true positives: P (B1 | A) false negatives: P (Ā | B1 ) ⇒ P (T = n) = P (T > n − 1) − P (T > n) Let respective probabilities of p(xi ), and g is any real-valued function g
false positives: P (A | B2 ) true negatives: P (Ā | B2 ) Aj = {no type j coupon is contained among the first n}
SN
P (T > n) = P ( j=1 Aj ) Variance
Independent Events Using the inclusion-exclusion identity,
P (T > n) =
P
P (Aj ) - coupon j is not among the first n collected If X is a r.v. with mean µ = E[X], then the variance of X is defined by
N1 - E and F are independent ⇐⇒ P (EF ) = P (E) · P (F )
Pj P V ar(X) = E[(X − µ)2 ]
N2 - E and F are independent ⇐⇒ P (E|F ) = P (E) − P (Aj1 Aj2 ) - coupon j1 and j2 are not the first n X
N3 - if E and F are independent, then E and F c are independent. j1 j2 = xi (xi − µ)2 · p(xi ) (deviation · weight)
N4 - if E, F, G are independent, then E will be independent of any event formed + · · · + (−1)k+1
PP P
· · · P (Aj1 Aj2 · · · Ajn ) + . . . 2 2
from F and G. (e.g. F ∪ G) j1 j2 jk
= E(x ) − [E(x)]
N5 - if E, F, G are independent, then P (EF G) = P (E)P (F )P (G) +(−1)N +1 P (A1 A2 · · · AN ) • V ar(aX + b) = a2 V ar(x)
Poisson Random Variable • i.e. Eij and Emn are independent variance
a r.v. X is said to be a Poisson r.v. with parameter λ if for some λ > 0, • but E12 and (E13 ∩ E ) are not independent
 23
⇒ P (E12 |E13 ∩ E23 ) = 1
n
N1 - variance of X , V ar(X) = E[(X − µ)2 ] = E(X 2 ) − [E(X)]2
λi n(n−1)
P (X = i) = e−λ · i! • X∼
2
˙ Poisson(λ), λ = 365 =
n(n−1)
⇒ P (X = 0) = e − 730
example
730
• notation: X ∼ Poisson(λ) • for P (X = 0) ≤ 2 1
, n ≥ 23
Q - Find the pdf of((b − a)X + a where a, b are constants, b > a. The pdf of X is
P∞
• i=0 P (X = i) = 1 distribution of time to next event
• Poisson Approximation of Binomial - if X ∼ Binomial(n, p), n is large and p is Q. suppose an accident happens at a rate of 5 per day. Find the distribution of time, 1, 0 ≤ X ≤ 1
given by f (x) = .
small, then X ∼˙ Poisson(λ) where λ = np. starting from now, until the next accident. 0, otherwise
• For n independent trials with probability p of success, the number of successes
is approximately a Poisson r.v. with parameter λ = np if n is large & p is small. A. Let X = time (in days) until the next accident. A. Let Y = (b − a)X + a.
y−a
• Poisson approximation remains even when the trials are not independent, Let V = be the number of accidents during time period [0, t]. cdf, FY (y) = P (Y ≤ y) = P ((b − a)X + a ≤ y) = P (X ≤ b−a )
provided that their dependence is weak. V ∼ Poisson(5t) ⇒ P (V = k) =
e−5t ·(5t)k y−a
1 dx = y−a
b−a
R
• 2 ways to look at the Poisson distribution k! FY (y) = , a<y<b
0 b−a
1. an approximation to the binomial distribution with large n and small p P (X > t) = P (no accidents happen during [0, t]) = P (V = 0) = e−5t (
1
, a<y<b
2. counting the number of events that occur at random at certain points in time P (X ≤ t) − 1 − e−5t d
fY (y) = dy FY (y) = b−a
0, otherwise
Mean and Variance
05. CONTINUOUS RANDOM VARIABLES Uniform Random Variable
if X ∼ Poisson(λ), then E(X) = λ, V ar(X) = λ X is a continuous r.v. → if there exists aRnonnegative function f defined for all real X is a uniform r.v. on the interval (α, β), X ∼ U nif orm(α, β)
Poisson distribution as random events x ∈ (−∞, ∞), such that P (X ∈ B) = B f (x) dx if its pdf is given by
R∞
N1 - P (X ∈ (−∞, ∞)) = −∞ f (x) dx = 1 (
Let N (t) be the number of events that occur in time interval [0, t]. 1
, α<x<β
Rb
f (x) = β−α
N1 - If the 3 assumptions are true, then N (t) ∼ Poisson(λt). N2 - P (a ≤ X ≤ b) = a f (x) dx
Ra 0, otherwise
N2 - If λ is the rate of occurrences of events per unit time, then the number of N3 - P (X = a) = a f (x) dx = 0
Ra α+β (β−α)2
occurrences in an interval of length t has a Poisson distribution with mean λt. N4 - P (X < a) = P (X ≤ a) = −∞ f (x) dx E(X) = 2
, V ar(X) = 12
e−λt (λt)k N5 - interpretation of probability density function
P (N (t) = k) = k!
, for k ∈ Z≥0
x−α
o(h) notation Z x+dx if X ∼ U nif orm(α, β), then β−α ∼ U nif orm(0, 1)
P (x < X < x + dx) = f (y) dy
f (h) x
o(h) stands for any function f (h) such that lim =0
h→0 h ≈ f (x) · dx
• a function of h that is small compared to h when h is small
Normal Random Variable
P (x<X<x+dx)
pdf at x, f (x) ≈ X is a normal r.v. with parameters µ and σ 2 , X ∼ N (µ, σ 2 )
• o(h) + o(h) = o(h) dx
• λt
+ o( nt )=
˙ λt for large n if the pdf of X is given by
n n d
N6 - if X is a continuous r.v. with pdf f (x) and cdf F (x), then f (x) = dx F (x). 1 ( x σ)2
−2
f (x) = √1 e µ , −∞ < x < ∞
Expected Value of sum of r.v. (Fundamental Theorem of Calculus) 2πσ
For a r.v. X , let X(s) denote the value of X when s ∈ S N7 - median of X , x occurs where F (x) = 1 E(x) = µ, V ar(X) = σ 2
2
N1 - E(x) = X(s)p(s) where Si = {s : X(s) = xi }
P P
xi P (X = xi ) =
i s∈S Generating a Uniform r.v. if X ∼ N (µ, σ 2 ), then
X−µ
∼ N (0, 1)
n n σ
if X is a continuous r.v. with cdf F (x), then
N2 - E( for r.v. X1 , X2 , . . . , Xn = Φ( a−µ
P P
)= E(Xi ) if Y ∼ 2
N (µ, σ ) and a is a constant, Fy (a) )
• N8 - F (X) = U ∼ unif orm(0, 1). σ
i−1 i=1

examples Proof. let Y = F (X). then cdf of Y , FY (y) = standard normal distribution → X ∼ N (0, 1)
P (Y ≤ y) = P (F (X) ≤ y) = P (X ≤ F −1 (y)) = F (F −1 (y)) = y . x 1 2
Selecting hats problem • F (x) = P (X ≤ x) = √1rπ −∞ e− 2 y dy = Φ(x)
R
Let n be the number of men who select their own hats. Let IE be an indicator r.v. for hence Y is a uniform r.v.
E . Ei is the event that the i-th man selects his own hat. Let X be the number of Normal Approximation to the Binomial Distribution
• N9 - X = F −1 (U ) ∼ cdf F (x).
men that select their own hats. S −np
if Sn ∼ Binomial(n, p), then √ n ∼ N (0, 1) for large n.
• generating a r.v. from a uniform(0, 1) r.v. and a r.v. with cdf F (x).
• X = IE1 + IE2 + · · · + IEn np(1−p)

• P (Ei ) = n 1
Expectation & Variance µ = np, σ2 = np(1 − p)
1
• P (Ei |Ej ) = n−1 6= P (Ej ) for j < i (hence Ei and Ej are not independent) expectation Exponential Random Variable
• but dependence is weak for large n R∞ a continuous r.v. X is a exponential r.v., X ∼ Exponential(λ) or Exp(λ)
• X satisfies the other conditions for binomial r.v., besides independence (n trials N1 - expectation of X , E(X) = −∞ x · f (x) dx
if for some λ > 0, its pdf is given by
with equal probability of success) N2 - if X is a continuous r.v. with pdf fR(x), then for any real-valued function g , (
• Poisson approximation of X : X ∼ Poisson(λ) E[g(x)] = −∞∞
g(x)f (x) dx λe−λx , x ≥ 0
1 f (x) =
• λ = n · P (Ei ) = n · n otherwise
R∞
=1 N2a E[aX + b] = −∞ (aX + b) · f (x) dx = a · E(X) + b 0,
−1 i −1 R∞ 1 1
• P (X = i) = e i! 1 = e i! N3 - for a non-negative r.v. Y , E(Y ) = 0 P (Y > y) dy E(X) = λ
, V ar(X) = λ2
• P (X = 0) = e −1 ≈ 0.37 R∞ R∞R∞ d
No 2 people have the same birthday Proof. 0 P (Y > y) dy = 0
R∞Rx y fY (x) dx dy (because f (x) = dx
F (x))
n = 0 0 fY (x) dy dx (draw diagram to convert integration)
For 2 pairs of individuals i and j , i 6= j , let Eij be the event where they have the

Ra
same birthday. Let X be the number of pairs with the same birthday. = 0∞ fY (x) 0x dy dx
R R P (X < a) = 0 λe−λx dx
R∞ Rx
• X = IE1 + IE2 + · · · + IEn = 0 xfY (x) dx (because 0 dy = x)
1
• Each Eij is only pairwise independent. P (Eij ) = 365 = E(Y )
• an exponential r.v. is memoryless. 06. JOINTLY DISTRIBUTED RANDOM VARIABLES how to do a double integral
• a non-negative r.v. is memoryless → if
P (X > s + t | X > t) = P (X > s) for all s, t > 0.
Joint Distribution Function e.g. find P (X
( < Y ) where the joint pdf of X and Y are given by
the joint cumulative distribution function of the pair of r.v. X and Y is → 2e−x e−y , 0 < x < ∞, 0 < y < ∞
F (x, y) = P (X ≤ x, Y ≤ y), −∞ < x < ∞, −∞ < y < ∞ f (x, y) =
Gamma Distribution 0, otherwise
a r.v. X has a gamma distribution, X ∼ Gamma(α, λ) N1 - marginal cdf of X , FX (x) = lim F (x, y).
y→∞
with parameters (α, λ), λ > 0 and α > 0 if its pdf is given by N2 - marginal cdf of Y , FY (y) = lim F (x, y).
x→∞
λe−λx (λx)α−1
(
Γ(α)
, x≥0
f (x)
0, x<0 N3 - P (X > a, Y > b) = 1 − FX (a) − FY (b) + F (a, b)
E(X) = α
V ar(X) = λα2 N4 - P (a1 < X ≤ a2 , b1 < Y ≤ b2 )
λ
R∞ = F (a2 , b2 ) + F (a1 , b1 ) − F (a1 , b2 ) − F (a2 , b1 )
where the gamma function Γ(α) is defined as Γ(α) = 0 e−y y α−1 dy .

N1 - Γ(α) = (α − 1)Γ(α − 1)
Joint Probability Mass Function
Proof. using integration by parts of LHS to RHS
if X and Y are both discrete r.v., then their joint pmf is defined by 1. to get the bounds for dx and dy , plot X < Y
N2 - if α is an integer n, then Γ(n) = (n − 1)! p(i, j) = P (X = i, Y = j) 1.1. draw horizontal lines to determine the bounds for x, from x = a to x = b
N3 - if X ∼ Gamma(α, λ) and α = 1, then 1.2. draw vertical lines to determine the bounds for y , from y = c to y = d
N1 - marginal pmf of X , P (X = i) =
P
P (X = i, Y = j)
X ∼ Exp(λ). Pj RdRb
2. integrate c a f (x) dx dy
N2 - marginal pmf of Y , P (Y = i) = i P (X = i, Y = j)
N4 - for events occurring randomly in time following the 3 assumptions of poisson
example - given the joint pdf of X and Y , find
distribution, the amount of time elapsed until a total of n events has occurred is a Joint Probability Density Function the pdf of r.v. X/Y .
gamma r.v. with parameters (n, λ). the r.v. X and Y are said to be jointly continuous if there is a function f (x, y) called
• time at which event n occurs, Tn ∼ Gamma(n, λ) the joint pdf, such that for any two-dimensional set C , ans. set dummy variable W = X/Y , then
• number of events in time period [0, t], N (t) ∼ P oisson(λt) RR FW (w) = P (W ≤ w) = P ( X
Y
≤ w)
P [(X, Y ) ∈ C] = f (x, y) dx dy
N5 - Gamma(α = n , λ = 12 ) = χ2n (chi-square distribution to n degrees of R ∞ R wy −x−y
2 C P(X
Y
≤ w) = 0 0 e dx dy
freedom) = volume under the surface over the region C .

Beta Distribution Independent Random Variables


a r.v. X is said to have a beta distribution, X ∼ Beta(a, b) N1 - X and Y are independent →
N1 - if C = {(x, y) : xR ∈ P (X ∈ A, Y ∈ B) = P (X ∈ A) · P (Y ∈ B)
if its density is given by R A, y ∈ B}, then
(
1 P (X ∈ A, Y ∈ B) = f (x, y) dx dy N2 - X and Y are independent → ∀a, b,
β(a,b)
xa−1 (1 − x)b−1 , 0<x<1 BA P (X ≤ a, Y ≤ b) = P (X ≤ a) · P (Y ≤ b)
f (x) =
0, otherwise or F (a, b) = FX (a) · FY (b) ⇒ joint cdf is the product of the marginal cdfs
a
E(X) = a+b V ar(X) = ab N3 - discrete case: discrete r.v. X and Y are independent ⇐⇒
(a+b)2 (a+b+1)
P (X = x, Y = y) = P (X = x) · P (Y = y) for all x, y .
  Rb Ra N4 - continuous case: jointly continuous r.v. X and Y are independent ⇐⇒
N2 - F (a, b) = P X ∈ (−∞, a], Y ∈ (−∞, b] = f (x, y) dx dy
−∞ −∞ f (x, y) = fX (x) · fY (y) for all x, y .
for double integral: when integrating dx, take y as a constant N5 - independence is a symmetric relation → X is independent of Y ⇐⇒ Y is
δ2 independent of X
N3 - f (a, b) = δaδb F (a, b)
interpretation of pdf Sum of Independent Random Variables
Z x+dx N1 - for independent, continuous r.v.R X and Y having pdf fX and fY ,

P (x < X < x + dx) = f (y) dy FX+Y (a) = −∞ F (a − y)fY (y) dy
x
R∞ X
fX+Y (a) = −∞ fX (a − y)fY (y) dy
≈ f (x) dx impt example - E52 (pdf of X + Y )
P (x<X<x+dx)
pdf at x, f (x) ≈ dx
R∞ Distribution of Sums of Independent r.v.
R1
N1 - β(a, b) = 0 xa−1 (1 − x)b−1 dx N4 - pdf of X , fX (x) = 0 f (x, y) dy
for i = 1, 2, . . . , n,
R∞
N2 - β(a = 1, b = 1) = U nif orm(0, 1) N5 - pdf of Y , fY (y) = 0 f (x, y) dx
n n
1. Xi ∼ Gamma(ti , λ) ⇒
P P
N3 - β(a, b) =
Γ(a)Γ(b)
interpretation of joint pdf Xi ∼ Gamma( ti , λ)
Γ(a+b) i=1 i=1
n
2. Xi ∼ Exp(λ) ⇒
P
Cauchy Distribution Xi ∼ Gamma(n, λ)
i=1
a r.v. X has a cauchy distribution, X ∼ Cauchy(θ) P (a < X < a + da, b < Y < b + db) n
3. Zi ∼ N (0, 1) ⇒ zi2 ∼ χ2n = Gamma( n , 1)
P
= bb+db aa+da f (x, y) dx dy
R R
with parameter θ , ∞ < θ < ∞ if its density is given by 2 2
i=1
f (x) = 1
· 1
, −∞ < x < ∞ ≈ f (a, b) da db (densityRof probability) n n n
1+(x−θ)2 ∞ 4. Xi ∼ N (µi , σi2 ) ⇒ σi2 )
π
P P P
marginal pdf of X , fX (x) = −∞ f (x, y) dy Xi ∼ N ( µi ,
R∞ i=1 i=1 i=1
Proof. E(X n ) does not exist for n ∈ Z+ marginal pdf of Y , fY (x) = −∞ f (x, y) dx 5. X ∼ P oisson(λ1 ), Y ∼ P oisson(λ2 ) ⇒ X + Y ∼ P oisson(λ1 + λ2 )
6. X ∼ Binom(n, p), Y ∼ Binom(m, p) ⇒ X + Y ∼ Binom(n + m, p)
R∞
E(X) = −∞ x · f (x) dx = ∞ − ∞ (undefined)
n n
Conditional Distribution (discrete)
P Xi 1 P 1 Correlation
Proof. E(X̄) = E( n
) = n
( E(Xi )) = n
· nµ = µ
i=1 i=1 Cov(X,Y )
for discrete r.v. X and Y , the conditional pmf of X given that Y = y is correlation of two r.v. X and Y , ρ(X, Y ) = √
P (X=x,Y =y) p(x,y) ⇒ sample mean = population mean V ar(X)·V ar(Y )
PX|Y (x|y) = P (X = x|Y = y) = P (Y =y)
= pY (y) N1 - −1 ≤ ρ(X, Y ) ≤ 1 where −1 and 1 denote a perfect negative and positive
for discrete r.v. X and Y , the conditional pdf of X given that Y = y is N6 - X̄ is the sample mean. linear relationship respectively.
P P (X=a,Y =y) P N7 - if X ∼ Binom(n, p), then E(X) = np. N2 - ρ(X, Y ) = 0 ⇒ no linear relationship - uncorrelated
FX|Y (x|y) = P (X ≤ x|Y = y) = P (Y =y)
= PX|Y (a|y) δy
a≤x a≤x N3 - ρ(X, Y ) = 1 ⇒ Y = aX + b, a = δx > 0
N0 - equivalent notation: Proof. express X as a sum of Bernoulli r.v. ⇒ sum of indicator r.v. = np. N4 for events A and B with indicator r.v. IA and IB , then Cov(IA , IB ) = 0 when
• PX|Y (x|y) = P (X = x|Y = y) they are independent events.
• PX (x) = P (X = x) examples N5 - deviation is not correlated with the sample mean. For independent & identically
N1 - if X is independent of Y , then PX|Y (x|y) = PX (x) distributed r.v. X1 , X2 , . . . , Xn with variance σ 2 , then Cov(Xi − X̄, X̄) = 0.
! trick: express a r.v. as a sum of r.v. with easier to find expectation
Conditional Distribution (continuous) Proof. Cov(Xi − X̄, X̄)
P= Cov(Xi , X̄) − Cov(X̄, X̄)
• negative binomial = sum of geometric = k/p 1 n
• hypergeometric with r red balls out of N balls with n trials
= Cov(Xi , n j=1 Xj ) − V ar(X̄)
for X and Y with joint pdf f (x, y), the conditional pdf of X given that Y = y is 1 Pn
f (x,y) • indicator r.v. = 1 if the ith ball selected is red = n j=1 Cov(Xi , Xj ) − V ar(X̄)
fX|Y (x|y) = for all y s.t. fY (y) > 0
⇒ E(X) = n
r r r 2
• P (Yi = 1) = N
P
fY (y) ⇒ E(Yi ) = N i=1 Yi = n N = 1
Cov(Xi , Xi ) − σn since ∀i 6= j, Cov(xi , xj ) = 0
Ra n
fX|Y (a|y) = P (X ≤ a|Y = y) = fX|Y (x|y) dx • hat throwing problem: expected number of people that select their own hat 1 σ2
1 1 = V ar(xi ) − n = 0
−∞ • P(select your own hat back) = N ⇒ E(X) = N · N =1 n

N1 - for any set A, P (X ∈ A|Y = y) = fX|Y (x|y) dy • coupon collector problem:


R
Conditional Expectation
A • let X = number of coupons collected for a complete set the conditional expectation of X ,
N2 - if X is independent of Y , then fX|Y (x|y) = fX (x). • let Xi = number of additional coupons that need to be collected to obtain given that Y = y , for all values of y such that PY (y) > 0 is defined by
! "find the marginal/conditional pdf of Y " ⇒ must include the range too!! another distinct type after i distinct types have been collected P P
(see Ex. 69(b, c))
E[X|Y = y] = x · P (X = x|Y = y) = x · pX|Y (x|y)
• Xi ∼ Geometric(p = NN−i ) x x
PN −1 1 1 1 ∞ ∞
Joint Probability Distribution of Functions of r.v. • E(X) = E(Xi ) = 1 + + + ··· + R R f (x,y)
i=1 N −1
N
N −2
N
1
N
E(X|Y = y) = fX|Y (x|y) dx = fY (y)
dx
1 1 −∞ −∞
Let X1 and X2 be jointly continuous r.v. with joint pdf fx1 ,x2 (x1 , x2 ). Suppose = N( N + + · · · + 1)
N −1 ! note the range for fX|Y (x|y)
Y1 = g1 (X1 , X2 ) and Y2 = g2 (X1 , X2 ) satisfy
N1 - If X, Y ∼ Geometric(p),
1. the equations y1 = g1 (X1 , X2 ) and y2 = g2 (X1 , X2 ) can be uniquely Covariance, Variance of Sums and Correlations 1
then P (X = i|X + Y = n) = n−1 , a uniform distribution.
solved for x1 , x2 in terms of y1 and y2 if X and Y are independent, then for any functions h and g , Pn−1 n
2. g1 (x1 , x2 ) and g2 (x1 , x2 ) have continuous partial derivatives at all points N2 - E(X|X + Y = n) = i · P (X = i|X + Y = n) =
δg1 δg1
E[g(X)h(Y )] = E[g(X)] · E[h(Y )] i=1 2
δx1 δx2 δg1 δg2 δg2 δg1
(x1 , x2 ) such that J(x1 , x2 ) = δg2 δg2 = δx1
· δx2
− δx1
· δx2
6= 0 covariance → measure of linear relationship Conditional expectations also satisfy properties of ordinary expectations.
δx1 δx2
Cov(X, Y ) = E[(X − E[X])(Y − E[Y ])] ⇒ an ordinary expectation on a reduced sample space consisting only of outcomes
then for which Y = y
fY1 ,Y2 (y1 , y2 ) = fX1 ,X2 (x1 , x2 ) |J(x 1,x )| Cov(X, Y ) = E(XY ) − E(X)E(Y )
discrete case: E[g(x)|Y = y] =
P
1 2 g(x)PX|Y (x|y)
where x1 = h1 (y1 , y2 ), x2 = h2 (y1 , y2 ) N1 - X and Y are independent ⇒ Cov(X, Y ) = 0 Rx ∞
N2 - Cov(X, Y ) = 0 6⇒ X and Y are independent continuous case: E[g(x)|Y = y] = −∞ g(x)fX|Y (x|y)
07. PROPERTIES OF EXPECTATION then E(X) = Ew.r.t. y (Ew.r.t. X|Y =y (X|Y ))
recap: Proof. let E(X) = 0, E(XY ) = 0 ⇒ Cov(X, Y ) = 0, but not independent
Deriving Expectation
• for a discrete r.v. X , E(X) =
P P
xRx · p(x) = x ·P (X = x) e.g. non-linear relationship

• for a continuous r.v. X , E(X) = −∞ x · f (x) dx E(X) = EY (EX (X|Y ))
discrete case: E(X) =
P
• for a non-negative integer-valued r.v. Y , E(Y ) = ∞
P E(X|Y = y)P (Y = y)
R∞ i=1 P (Y ≥ i) Covariance properties y
• for a non-negative r.v. Y , E(Y ) = −∞ P (Y > y) dy R∞
1. Cov(X, Y ) = Cov(Y, X) continuous case: E(X) = −∞ E(X|Y = y)fY (y) dy
Expectations of Sums of Random Variables 2. Cov(X, X) = V ar(X) N3 - 3 methods for findingRE(X) given f (x, y)
∞ R∞
for X and Y with joint pmf p(x, y) and joint pdf f (x, y), 3. Cov(aX, Y ) = aCov(X, Y ) 1. using E(g(x, y)) = −∞ −∞ g(x, y)f (x, y) dx dy ⇒ let g(x, y) = x
n m n P
m R∞
2. using E(X) = −∞ xfX (x) dx
PP
E[g(x, y)] = g(x, y)p(x, y) 4. Cov(
P P P
Xi , Yj ) = Cov(Xi , Yj ) R∞
R ∞ Ry∞x i=1 j=1 i=1 j−1 3. using E(X) = −∞ E(X|Y = y)fY (y) dy
E[g(x, y)] = −∞ −∞ g(x, y)f (x, y) dx dy for variance: N N ∞ N
n n N4 - E(
P P P P
Xi ) = EN (E( Xi |N )) = E( Xi |N = n) · P (N = n)
N1 - V ar(
P P PP
Xi ) − V ar(Xi ) + 2 Cov(Xi , Xj ) i=1 i=1 n=0 i=1
N2 - if P (a ≤ X ≤ b) = 1, then a ≤ E(X) ≤ b i=1 i=1 i<j
N3 - if E(X) and E(Y ) are finite, E(X + Y ) = E(X) + E(Y ) N2 - if X1 , . . . , Xn are pairwise independent (Xi , Xj are independent ∀i 6= j ),
Computing Probabilities by Conditioning
n n P (E|Y = y)P (Y = y) if Y is discrete
P
R∞ R∞ P (E) =
Proof. using N1, integrate −∞ −∞ (x + y)f (x, y) dx dy then V ar(
P P
Xi ) = V ar(Xi ) y
R ∞ R ∞ i=1 i=1 ∞
= x · fX (x) dx + yfY (y) dy = E(X) + E(Y )
−∞ −∞ N3 - for n independent and identically distributed r.v. with expected value µ and P (E|Y = y)fY (y) dy if Y is continuous
R
P (E) =
N4 - if, for r.v.s X and Y , if X ≥ Y , then E(X) ≥ E(Y ) variance σ 2 , −∞
n n
N5 - let X1 , . . . , Xn be independent and identically distributed r.v.s having X̄ = 1 P
xi 1
S 2 n−1
P
(xi − x̄)2 Proof. let X be an indicator r.v. for E .
⇒ E(X) = P (E)
n
distribution P (Xi ≤ x) = F (x) and expected value E(Xi ) = µ. i=1 i=1 E(X|Y = y) = P (X = 1|Y = y) = P (E|Y = y)
n 2
if X̄ =
P Xi
, then E(X̄) = µ V ar(X̄) = σn E(S 2 ) = σ 2 N5 - find P ((X, Y ) ∈R C) given f (x, y): see p.57
n
i=1 ⇒ S is an unbiased estimator for σ 2 .
2 also: P (X < Y ) = P (X < Y |Y = y) · fY (y)
Conditional Variance weak law of large numbers → let X1 , X2 , . . . be a sequence of independent and
identically distributed r.v.s, each with finite mean E[Xi ] = µ. Then, for any  > 0,
V ar(X|Y ) = E[(X − E(X|Y |Y] ))2
V ar(X|Y ) = E(X 2 |Y ) − [E(X|Y )]2 P {| X1 +···+X
n
n
− µ| ≥ } → 0 as n → ∞
central limit theorem → let X1 , X2 , . . . be a sequence of independent and
N6 - V ar(X) = E[V ar(X|Y )] + V ar[E(X|Y )]
identically distributed r.v.s each having mean µ and variance σ 2 . Then the
N7 - E(f (Y )) = E(f (Y )|Y = t) = E(f (y)|Y = t) X +···+X −nµ
distribution of 1 σ√nn tends to the standard normal as n → ∞.
= E(f (t)) if N (t) and Y are independent
x̄−µ
• aka: σ/√n → z ∼ N (0, 1)
Moment Generating Functions • for −∞ < a < ∞,
2
√ n −nµ ≤ a) → √1
Ra
moment generating function M (t) of the r.v. X → P ( X1 +···+X e−x /2 dx = F (a) (cdf of standard
σ n 2π −∞
M (t) = E(etX ) for all real values of t normal) as n → ∞
• if X is discrete with pmf p(x), M (t) = etx · p(x) N2 - Let Z1 , Z2 , . . . be a sequence of r.v.s with distribution functions FZn and
P
Rx ∞
• if X is continuous with pdf f (x), M (t) = −∞ etx f (x) dx moment generating functions MZn , n ≥ 1. Let Z be a r.v. with distribution function
FZ and mgf MZ .
M (t) is called the mgf because all moments of X can be obtained by successively
If MZn (t) → MZ (t) for all t, then FZn (t) → FZ (t) for all t at which FZ (t) is
differentiating M (t) and then evaluating the result at t = 0.
continuous.
(M 0 (0) = E(X), M 00 (0) = E(X 2 ), etc)
strong law of large numbers → let X1 , X2 , . . . be a sequence of independent
in general,
and identically distribution r.v.s, each having finite mean µ = E[Xi ].
• M 0 (t) = E(X n etX ), n ≥ 1 X +···+Xn
Then, with probability 1, 1 n → µ as n → ∞
• M n (0) = E(X n ), n ≥ 1
n
n i n−i
N8 - binomial expansion: (a + b)n =
P
i
ab
i=0
(see other series for useful expansions on other distributions)
N9 - integrating over a pdf from ∞ to −∞ always gives 1

if X and Y are independent and have mgf’s MX (t) and MY (t) respectively,
N10 - the mgf of X + Y is MX+Y (t) = MX (t) · MY (t)

Proof. MX+Y (t) = E[et(X+Y ) ] = E[etX · etY ] = E(etX )E(etY )


= MX (t) · MY (t)

N11 - if MX (t) exists and is finite in some region about t = 0, then the distribution
of X is uniquely determined. MX (t) = MY (t) ⇐⇒ X = Y

Common mgf’s
2
• X ∼ N ormal(0, 1), M (t) = ee /2
• X ∼ Binomial(n, p), M (t) = (pet + (1 − p))n
• X ∼ P oisson(λ), M (t) exp[λ(et − 1)]
λ
• X ∼ Exp(λ), M (t) = λ−t

08. LIMIT THEOREMS


Markov’s Inequality → if X is a non-negative r.v., for any a > 0,
E(x)
P (X ≥ a) ≤ a
.

Proof. let I be an indicator r.v. = 1 when X ≥ a.


E(X) E(X)
Then I ≤ X
a
, and E(I) ≤ a , and P (X ≥ a) ≤ a .

Chebyshev’s inequality → if X is an r.v. with finite mean µ and variance σ 2 , then


2
for any value of k > 0, P (|X − µ| ≥ k) ≤ σ
k2
.

E[(X−µ)2 ]
Proof. P [(X − µ)2 ≥ k2 ] ≤ k2
by Markov’s inequality
2
Since (X − µ)2 ≥ k2 ⇐⇒ |X − µ| ≥ k, then P (|X − µ| ≥ k) ≤ σ
k2

N1 - if V ar(X) = 0, then P (X = E[X]) = 1

Proof. let µ = E[X]. by Chebyshev’s inequality, for any n ≥ 1,


1 V ar(X)
P (|X − µ| > n
) ≤ 1 )2 =0
(n

then P (X 6= µ) = 0 ⇒ P (X = µ) = 1
commutative E∪F =F ∪E E∩F =F ∩E
associative (E ∪ F ) ∪ G = E ∪ (F ∪ G) (E ∩ F ) ∩ G = E ∩ (F ∩ G)
distributive (E ∪ F ) ∩ G = (E ∩ F ) ∪ (F ∩ G) (E ∩ F ) ∪ G = (E ∪ F ) ∩ (F ∪ G)
n n n n
Ei )c = Eic Ei )c = Eic
S T T S
DeMorgan’s ( (
i=1 i=1 i=1 i=1

You might also like