Stat 350 Study Guide
Stat 350 Study Guide
Part I
Probability
1 Probability Review
1. Probability: The numerical descriptions of how likely an event is to
occur.
1
9. Event: A subset of the sample space S. If A is an event, then A has
occurred if it contains the outcome that has occurred.
10. Elementary Event: An event that contains exactly one outcome of
the experiment.
11. Set Theory Terminology:
(a) A but not B: A ∩ B 0 → A − B
(b) Exactly one of A or B: (A ∩ B 0 ) ∪ (A0 ∩ B)
(c) Neither A nor B: A0 ∩ B 0 → (A ∪ B)0
12. Mutually Exclusive: Two events A and B where A ∩ B = ∅
13. Relative Frequency: The number of times an event occurs divided
by the number of trials in the experiment.
m(A)
fA =
M
14. Statistical Regularity: If, for an event A, the limit of fA as M
approaches infinity exists, and one could assign the probability to A
by:
P (A) = lim fA
M →∞
2
17. Classical Probability: Experiments with n trials with equal proba-
bility. Things such as flipping coins, rolling dice etc.
n(A)
P (A) =
N
P (A ∩ B)
P (A|B) =
P (B)
if P (B) 6= 0
If two events A1 and A2 are mutually exclusive, then:
P (A1 ∩ B)P (A2 ∩ B)
P (A1 |B) + P (A2 |B) =
P (B)
3
21. Multiplication Theorem: For any events A and B,
This theorem is quite useful for dealing with problems involving sam-
pling without replacement.
Two events A and B are independent if and only if the following pairs
of events are also independent:
(a) A and B 0
(b) A0 and B
(c) A0 and B 0
24. The notion of independence can be extended to more than two events
as follows:
The k events A1 , A2 , . . . , Ak are said to be independent or mutually
exclusive if for every j = 2, 3, . . . , k and every subset of distinct indices
i1 , i2 , . . . , ij :
4
2 Counting Techniques
1. Multiplication Principle: If one operation can be performed n1 ways
in a second operation can be performed n2 ways, then there are:
n1 · n2
5
5. The number of permutations of n objects of which r1 our of one kind,
r2 of the second kind, rk of a k th kind is:
n!
r1 !r2 !. . . rk !
P (B|A)P (A)
P (A|B) =
P (B)
If P (B) 6= 0
6
Part II
Random Variables
1. Random Variable: Usually denoted X, Y, Z, is a function defined
over a sample space S, that associates a real number x ∈ R as:
X(e) = x
X:S→R
(∀e ∈ S, X(e) ∈ R)
f (x) = P [X = x]
x = x1 , x2 , . . .
(a) f (xi ) ≥ 0
(Probability must be non-negative)
P P
(b) f (xi ) = P [X = xi ] = 1
all xi all xi
(An exhaustive partition of the sample space must sum to 1 )
7
4. Cumulative Distribution Function: (CDF): The CDF of the ran-
dom variable X is defined for any real x by:
F (x) = P [X ≤ x]
In practice, the lower bound for this integral is the lower bound given
by the pdf.
(a) f (x) ≥ 0
R∞
(b) f (x)dx = 1
−∞
8
3. Continuous Expected Value: For a continuous random variable
with pdf f (x), then the expected value of X is defined by:
Z∞
E(X) = xf (x)dx
−∞
Remember: E(X) = µ
4. Percentile: For 0 < p < 1, the 100 × pth percentile of the distribution
of a continuous random variable is a solution xp to the equation:
F (xp ) = p
f (c − x) = f (c + x)
For all x
9. Finding a constant: For pdf cf (x) we can find the value for c by
solving the equation:
Zb
cf (x)dx = 1
a
9
Variance for Random Variables:
1. Discrete Variance:
1+2+3+4+5+6 21
E(X) = = = 3.5
6 6
7 49
E(X)2 = ( )2 = = 12.25
2 4
(12 + 22 + 32 + 42 + 52 + 62 ) 91
E(X 2 ) = = = 15.166
6 6
91 49 105
E(X 2 ) − E(X)2 = − = = 2.9166 = V ar(X)
6 4 36
Alternatively:
n
X
(xi − µ)2 f (xi )
i=1
6
X 1 35
(i − 3.5)2 ( ) = = 2.9166 = V ar(X)
i=1
6 12
2. Continuous Variance:
Z∞
2
σ = V ar(X) = (x − E(X))2 f (x)dx
−∞
10
Mixed Distributions
1. A random variable whose distribution is neither purely discrete nor
continuous. The probability distribution for a random variable X is of
mixed type if the CDF has the form:
Where:
Moments
1. Moments: Special expected values. The moments of a function are
quantitative measures related to the shape of the function’s graph. The
moments for a probability distribution are:
MX (t) = E(etX )
∗The subscript X is not always used
−h < t < h
11
2. Series Review:
x2 x3
(a) ex = 1 + x + 2!
+ 3!
+ ...
t2 x2 t3 x3
(b) etx = 1 + tx + 2!
+ 3!
+ ...
t2 t3
(c) E(etx ) = 1 + tE(X) + 2!
E(X 2 ) + 3!
E(X 3 )
dj
j
E(etk )
dt
t=0
Inequalities
1. Markov Inequality: For any c > 0 and non-negative function h(x):
E(h(x))
P (h(x) ≥ c) ≤
c
The probability that h(x) is greater than a constant c > 0 is upper
bounded by the expectation of h(x) divided by that constant c.
This implies that as we choose larger values of c, the upper bound
approaches 0 and therefore so does the probability that P (h(x) ≥ c).
12
2. Indicator Random Variable: A random variable with the following
properties: (
1 if h(x) ≥ c
Ic (x)
0 if h(x) < c
h(x)
∴ ≥ Ic
c
h(x)
If h(x) < c, then Ic = 0 and c
≥ 0 (since h(x) is non-negative)
h(x)
∴ ≥ Ic
c
∴ h(x) ≥ Ic in all cases Since
h(x)
0 ≤ Ic ≤ c
for any X
It follows that:
0 ≤ E(Ic ) ≤ E( h(x)
c
)= E(h(x))
c
Therefore:
E(h(x))
P (h(x) ≥ c) ≤ c
13
4. Chebyshev Inequality: If X is a random variable with a finite mean
µ and finite variance σ 2 , then for any k > 0:
1
P (|X − µ| ≥ kσ) ≤
k2
The probability that the random variable X lies within ±K standard
deviations of its mean is at least:
1
1−
k2
This implies that for a wide class of probability distributions, no more
than a certain fraction of values can be more than a certain distance
from the mean. Specifically, no more than k12 of the distribution’s
values can be more than k standard deviations away from the mean.
Equivalently, that at least 1 − k12 of the distribution’s values are within
k standard deviations from the mean.
5. Chebyshev Inequality Proof:
Start with Chebyshev Inequality:
1
P (|X − µ| ≥ kσ) ≤
k2
Remember Markov Inequality:
E(h(x))
P (h(x) ≥ c) ≤
c
Let:
h(x) = (x − µ)2
c2 = k 2 σ 2
Then:
1
P (|X − µ| ≥ kσ) ≤
k2
2 2 2 E(x − µ)2
=P (|X − µ| ≥ k σ ) ≤
k2σ2
σ2
=P ((X − µ)2 ≥ k 2 σ 2 ) ≤ 2 2
k σ
1
=P (|X − µ| ≥ kσ) ≤ 2
k
14
Part III
Special Probability
Distributions
Discrete Probability Distributions
1. Discrete Uniform Distribution X ∼ DU (N )
(a) pdf:
1
f (x) = , x = 1, 2, . . . , N
N
(b) Description: Many problems, especially those involving classical
assignment of probability, can be modelled by a discrete random
variable that assumes all of its values with the same probability.
Useful for modeling: games of chance such as lotteries, tossing an
unbiased die, flipping a fair coin.
(c) Expectation:
N +1
E(X) =
2
(d) Variance:
(N 2 − 1)
V ar(X) =
12
2. Bernoulli Distribution
(a) pdf:
f (x) = px (1 − p)1−x x = 0, 1
(b) Description: A single trial of an experiment where there are only
two possible outcomes: Success (1) or failure (0)
(c) Expectation:
p
(d) Variance:
p(1 − p)
15
3. Binomial Distribution X ∼ BIN (n, p)
(a) pdf:
n x
b(x; n, p) = p (1 − p)n−x
x
(b) CDF:
x
X
B(x; n, p) = b(k; n, p) x = 0, 1, . . . , n
k=0
16
4. Hypergeometric Distribution X ∼ HY P (n, M, N )
(a) pdf:
M N −M
x n−x
h(x; n, M, N ) = N
n
where:
i. N : A finite number of items
ii. M : The subset of interest of the finite N items
iii. n: The number of items drawn
iv. x: The number of items belonging to M
(b) CDF:
x
X
H(x; n, M, N ) = h(i; n, M, N )
i=0
17
5. Geometric Distribution X ∼ GEO(p)
(a) pdf:
g(x; p) = p(1 − p)x−1 x = 1, 2, 3
(b) CDF:
x
X
G(x; p) = p(1 − p)i−1 = 1 − (1 − p)x x = 1, 2, 3
i=1
18
6. Negative Binomial Distribution X ∼ N B(r, p)
(a) pdf:
x−1 r
f (x; r, p) = p (1 − p)x−r x = r, r + 1, . . .
r−1
P [X ≤ n] = P [W ≥ r]
F (x; r, p)
= P [X ≤ x]
= 1 − B(r − 1; x, p)
= B(x − r; x, q)
19
8. Poisson Distribution X ∼ P OI(λ)
(a) pdf:
e−λ λx
f (x; λ) =
x!
(b) CDF:
x
X
F (x; λ) = f (k; λ)
k=0
The Poisson process is often defined on the real line and is used
to model random events such as the arrival of customers at a
store, phone calls at an exchange, or the occurrence of earthquakes
distributed in time.
20
Continuous Probability Distributions
1. Continuous Uniform Distribution X ∼ U N IF (a, b)
(a) CDF:
0
if x ≤ a
x−a
F (x) b−a
if a < x < b
1 if x ≥ b
21
2. Gamma Distribution X ∼ GAM (θ, κ)
22
3. Exponential Distribution X ∼ EXP (θ)
(a) pdf:
1 x
f (x; θ) = e− θ
θ
(b) CDF:
x
F (x; θ) = 1 − e− θ
(c) Description: A special case of the gamma distribution, com-
monly used to model the time we need to wait for a given event
occurs. It is the continuous counterpart of the geometric distribu-
tion.
(d) Expectation
E(X) = θ
(e) Variance
V ar(X) = θ2
(f) MGF
1
MX (t) =
1 − θt
4. Beta Distribution X ∼ BET A(a, b)
(a) CDF:
Γ(a + b) a−1
x (1 − x)b−1
Γ(a)Γ(b)
(b) Description: A useful distribution for modelling proportion pa-
rameterized by two positive shape parameters. In Bayesian infer-
ence, the beta distribution is related to the Bernoulli, binomial,
negative binomial and geometric distributions.
(c) Expectation:
a
E(X) =
a+b
(d) Variance:
ab
V ar(X) =
(a + b)2 (a + b + 1)
23
5. Normal Distribution X ∼ N (µ, σ 2 )
(a) pdf: 2
(x−µ)
1 σ
f (x; µ, σ) = √ e− 2
2πσ
(b) CDF: The CDF of Z ∼ N (0, 1) is given by:
Zx
1 z2
Φ(z) = √ e− 2 , − ∞ < z < ∞
2π
−∞
(x−µ)
where z = σ
(c) Description: The single most important distribution in statis-
tics.
(d) Expectation:
E(X) = µ
(e) Variance:
V ar(X) = σ 2
(f) MGF:
σ 2 t2
MX (t) = eµt+ 2
X −µ
Z= ∼ N (0, 1)
σ
The standard score (Z-score) can be used to look up percentiles
(either with tables, or pnorm(Z) in R)
Alternatively, we can do a reverse lookup to find the number of
standard deviations associated with the percentile (either with ta-
bles, or with qnorm(percentage) in R). This is useful for calculating
100(1 − α) confidence intervals.
i. pnorm: Enter standard deviation value, returns percentage.
ii. qnorm: Enter percentage, return standard deviation value.
24
Location and Scale Parameters
Mean is an example of location.
Standard deviation is an example of scale.
Let f0 (z) represent a fully specified pdf
Let F0 (z) represent a fully specified CDF
1. Location Parameters: A quantity η as a location parameter for the
distribution of X if the CDF has the form:
F (x; η) = F0 (x − η)
or when the pdf has the form:
f (x; η) = f0 (x − η)
It is common for location parameter to be a measure of central tendency
of X , such as mean or median.
For example, If X has a pdf of the form (double exponential):
1
f (x; n) = e−|x−η|
2
then the location parameter η is the mean of the distribution because
f (x; η) is symmetric about η
2. Scale Parameters: A positive quantity θ is a scale parameter for the
distribution of X if the CDF has the form:
x
F (x; θ) = F0
θ
or when the pdf has the form:
1 x
f (x; θ) = f0
θ θ
For the expononential distriubtion X ∼ EXP (θ), θ is a scale parame-
ter.
The standard deviation σ, often turns out to be a scale parameter.
3. Location–Scale Parameters: Quantities η and θ > 0 are called
location–scale parameters for the distribution of X if the CDF has the
form: x − η
F (x; θ, η) = F0
θ
or when the pdf has the form:
1 x − η
f (x; θ, η) = f0
θ θ
25
Part IV
Joint Distributions
When there is more than one random variable of interest, it is convenient to
regard these variables as components of a k-dimensional vector:
X = (X1 , X2 , . . . , Xk )
x = (x1 , x2 , . . . , xk )
f (x1 , x2 , . . . , xk ) = P [X1 = x1 , X2 = x2 , . . . , Xk = xk ]
26
3. Marginal pmf ’s: If the pair (X1 , X2 ) of a discrete random variable
has the joint pdf f (x1 , x2 ), then the marginal pdf ’s of X1 and X2 are:
P
(a) f1 (x1 ) = f (x1 , x2 )
x2
P
(b) f2 (x2 ) = f (x1 , x2 )
x1
F (x1 , . . . , xk ) = P [X1 ≤ x1 , . . . , Xk ≤ xk ]
27
3. Marginal pdf ’s: If the pair (X1 , X2 ) of the continuous random vari-
able has the joint pdf f (x1 , x2 ), then the marginal pdf’s of X1 and X2
are:
R∞
(a) f1 (x1 ) = f (x1 , x2 )dx2
−∞
R∞
(b) f2 (x2 ) = f (x1 , x2 )dx1
−∞
Marginal Distributions
1. In the discrete case where a joint pmf is represented by a 2D table,
we can find the marginal pmf by simply adding up the rows (or
columns) while holding the other random variable constant. Notice
that the row/column tables sum to 1.
where x is fixed
(b) The marginal pmf of Y :
X
fY (y) = P (Y = y) = f (x, Y = y)
allx
where y is fixed
28
3. If the random variables (X, ) is continuous with joint pdf f (x, y) then:
We integrate out y
(b) The marginal pdf of Y :
Z∞
fY (y) = f (x, y)dx
−∞
We integrate out x
(a) pmf:
XX XX X
fj (j ) = ... ... f (x1 , x2 , . . . , xj , . . . , xk )
x1 x2 j−1 j+1 xk
where xj is fixed.
(b) pdf:
Z Z
fj (xj ) = ... f (x1 , x2 , . . . , xj , . . . , xk )dx1 . . . dxj−1 dxj+1 . . . dxk
29
Independent Random Variables
1. It is natural to extend the concept of independence of events to the
independence of random variables.
Remember that two events are independent iff :
P (A ∩ B) = P (A)P (B)
If the above formula does not hold for all ai ≤ bi , the random variables
are called dependent.
30
If we multiply together two marginal pdf’s they are independent if
they equal the joint pdf
Conditional Distributions
P [X = x, T = t] fX,T (x, t)
P [T = t|X = x] = =
P [X = x] fX (x)
Just like with independence, conditional probability can also be extended to
random variables.
f (x1 , x2 )
f (x1 |x2 ) =
f2 (x2 )
joint
conditional =
marginal
2. For discrete (X, Y ), we have:
f (x, y) f (x, y)
f (Y |X) = =P
fX (x) f (x, y)
Y
where X is fixed.
f (x, y) f (x, y)
f (X|Y ) = =P
fY (y) f (x, y)
X
where Y is fixed.
31
3. For continuous (X, Y ), with a joint pdf f (x, y):, the conditional distri-
butions are:
f (x, y) f (x, y)
f (Y |X) = = R∞
fX (x)
f (x, y)dy
−∞
32
Part V
Properties of Random
Variables
1. Theorem 1: Suppose x = (x1 , x2 , . . . , xk ) has joint pdf/pmf f (x1 , x2 , . . . , xk ),
and u(x) = u(x1 , x2 , . . . , xk ) is a real valued function (Rk → R) then:
Continuous:
Z∞ Z∞
E(u(x)) = ... u(x1 , . . . , xk )f (x1 . . . , xk )dx1 . . . dxk
−∞ −∞
Discrete:
X X
E(u(x)) = ... u(x1 , . . . , xk )f (x1 . . . , xk )
x1 xk
Corr:
k
X k
X
E( (ai Xi ) = ai E(Xi )
1 i
4. Theorem 4:
33
6. Correlation Coefficient: A measure that expresses the extent to
which two random variables are linearly related. The correlation coef-
ficient of X and Y is:
Cov(x, y)
ρxy =
σx σy
Important: If X and Y are independent, then ρxy = 0. But ρxy = 0
does not imply independence.
Discrete: X
E(x|y) = x(f x|y)
x
9. Theorem:
V (y) ≥ V (y|x)
34
If x1 , x2 , . . . , xk are independent, then:
35
Part VI
Functions of Random Variables
Let X be a random variable with pdf f (x) and Y = u(x).
What is the pdf of Y ?
1. CDF Method:
Z
Fy (y) = p(Y ≤ y) = p(u(x) ≤ y) = f (x)dx
u(x)≤y
We can then find the pdf by taking the derivative of the CDF:
d
fy (y) = Fy (y)
dy
3. Order Statistics: Suppose x1 , . . . , xn are iid with pdf f (x) and cdf
F (x). The joint pdf is f (x1 , x2 , . . . , xn ) = f (x1 )f (x2 ). . . f (xn ). If we
consider the order statistics of sample x1 , . . . , xn :
(a) Yn = the largest of the xi ’s = max(x1 , . . . , xn ) (xn:n )
(b) Yn−1 = the second largest of the xi ’s (xn−1:n )
(c) Y1 = the smallest of the xi ’s (x1:n )
The transformation (x1 , . . . , xn ) → (Y1 , . . . , Yn ) is not one-to-one. It
is n! to 1 (since there are n! permutations that all correspond to one
correct ordering).
Therefore, the joint pdf of Y1 , Y2 , . . . , Yn is:
f (Y1 , Y2 , . . . , Yn ) = n!fx (x1 , x2 , . . . , xn )
= n!fx (y1 , y2 , . . . , yn )
= n!f (y1 )f (y2 ). . . f (yn )
36
The marginal pdf of Yk is:
n!
fk (yk ) = F (yk )k−1 f (yk )[1 − F (yk )]n−k
(k − 1)!1!(n − k)!
Special Cases:
(a) pdf of Y1
1 − (1 − F (y1 ))n
(b) pdf of Yn
(F (yn ))n
37