0% found this document useful (0 votes)
18 views37 pages

Stat 350 Study Guide

Uploaded by

nilsdmikkelsen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views37 pages

Stat 350 Study Guide

Uploaded by

nilsdmikkelsen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

STAT 350 Study Guide

Nils Dosaj Mikkelsen


January 13, 2021

Part I
Probability
1 Probability Review
1. Probability: The numerical descriptions of how likely an event is to
occur.

2. Probability Distribution: A mathematical function that gives the


probabilities of occurrence of different possible outcomes for an exper-
iment.

3. Experiment: The process of obtaining an observed result of some


phenomenon.

4. Trial: The performance of an experiment.

5. Outcome: An observed result.

6. Sample Space: The set of all possible outcomes of an experiment.


Denoted S.

7. Random Variable: A variable whose values depend on outcomes of


a random phenomenon In the context of probability theory, a random
variable is understood as measurable function that maps from the sam-
ple space to the real numbers

8. Countably Infinite: A set that can be put into a one-to-one corre-


spondence with the positive integers.

1
9. Event: A subset of the sample space S. If A is an event, then A has
occurred if it contains the outcome that has occurred.
10. Elementary Event: An event that contains exactly one outcome of
the experiment.
11. Set Theory Terminology:
(a) A but not B: A ∩ B 0 → A − B
(b) Exactly one of A or B: (A ∩ B 0 ) ∪ (A0 ∩ B)
(c) Neither A nor B: A0 ∩ B 0 → (A ∪ B)0
12. Mutually Exclusive: Two events A and B where A ∩ B = ∅
13. Relative Frequency: The number of times an event occurs divided
by the number of trials in the experiment.
m(A)
fA =
M
14. Statistical Regularity: If, for an event A, the limit of fA as M
approaches infinity exists, and one could assign the probability to A
by:
P (A) = lim fA
M →∞

15. Set Function: A function whose domain is a collection of sets (events),


and the range of which is a subset of the real numbers.
16. Properties of Probability: For a given experiment, S denotes the
sample space and A, A1 , A2 ... represent possible events. A set function
that associates a real value P (A) with each event A is called a probabil-
ity set function, and P (A) is called the probability of A, if the following
properties are satisfied:
(a) P (A) ≥ 0 for every A
Probabilities can’t be negative
(b) P (S) = 1
An event must occur in order to have an experiment

S ∞
P
(c) P ( ) = P (Ai )
i=1 i=1
The probability of the union of mutually exclusive events is equal
to the sum of their individual probabilities
P (A ∪ B) = P (A) + P (B)

2
17. Classical Probability: Experiments with n trials with equal proba-
bility. Things such as flipping coins, rolling dice etc.
n(A)
P (A) =
N

18. Chosen at Random: An object chosen from a finite collection of dis-


tinct objects in such a manner that each object has the same probability
of being chosen.

19. Properties of Probability:

(a) P (A) = 1 − P (A0 )


(b) For any event A, P (A) ≤ 1
(c) For any two events A and B:
P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
(d) For any three events A, B and C:
P (A ∪ B ∪ C) = P (A) + P (B) + P (C) − P (A ∩ B) − P (A ∩ C) −
P (B ∩ C) + P (A ∩ B ∩ C)
(e) If A ⊂ B, then P (A) ≤ P (B)
(f) If A1 , A2 ... is a sequence of events, then:

[ ∞
X
P( Ai ) ≤ P (Ai )
i=1 i=1

(g) If A1 , A2 , . . . , Ak are events, then:


k
\ k
X
P( Ai ) ≥ 1 − P (A0i )
i=1 i=1

20. Conditional Probability:The conditional probability of an event A,


given the event B (Read as ”A given B”), is defined by:

P (A ∩ B)
P (A|B) =
P (B)

if P (B) 6= 0
If two events A1 and A2 are mutually exclusive, then:
P (A1 ∩ B)P (A2 ∩ B)
P (A1 |B) + P (A2 |B) =
P (B)

3
21. Multiplication Theorem: For any events A and B,

P (A ∩ B) = P (B)P (A|B) = P (A)P (B|A)

This theorem is quite useful for dealing with problems involving sam-
pling without replacement.

22. Total Probability If B1 , B2 , . . . , Bk is a collection of mutually exclu-


sive and exhaustive events, and for any event A:
k
X
P (A) = P (Bi )P (A|Bi )
i=1

23. Independent Events: Two events A and B are called independent


events if:
P (A ∩ B) = P (A)P (B)
Otherwise, A and B are dependent events
In terms of conditional probability, If A and B are events such that
P (A) > 0 and P (B) > 0, then A and B are independent if and only if
either of the following holds:

(a) P (A|B) = P (A)


(b) P (B|A) = P (B)

Two events A and B are independent if and only if the following pairs
of events are also independent:

(a) A and B 0
(b) A0 and B
(c) A0 and B 0

24. The notion of independence can be extended to more than two events
as follows:
The k events A1 , A2 , . . . , Ak are said to be independent or mutually
exclusive if for every j = 2, 3, . . . , k and every subset of distinct indices
i1 , i2 , . . . , ij :

P (Ai1 ∩ Ai2 ∩ . . . ∩ Aij ) = P (Ai1 )P (Ai2 ). . . P (Aij )

4
2 Counting Techniques
1. Multiplication Principle: If one operation can be performed n1 ways
in a second operation can be performed n2 ways, then there are:

n1 · n2

ways in which both operations can be carried out.


We can extend the multiplication rule to more than two operations as
follows: r
Y
ni = n1 n2 . . . nr
i=1

2. If there are N possible outcomes of each of t trials of an experiment,


then there are:
Nt
possible outcomes in the sample space.

3. Permutations: An ordered arrangement of a set of objects. The


number of permutations of n distinguishable objects is n!
The number of permutations of n distinct objects taken r at a time is:
n!
nP r =
(n − r)!
If all n objects are included in the permutation and the result is simply
n!.

4. Combinations: If the order of the objects is not important, the num-


ber of combinations that are possible when selecting r objects from n
distinct objects denoted:
 
n n!
nCr = =
r r!(n − r)!

This is the same as the binomial coefficient


This assumes without replacement.
In general, the number of ways to draw k objects from n objects with
replacement is:
   
k+n−1 k+n−1
=
k n−1

5
5. The number of permutations of n objects of which r1 our of one kind,
r2 of the second kind, rk of a k th kind is:
n!
r1 !r2 !. . . rk !

This is the same as the multinomial coefficient

6. Useful Combinatoric Identities:


   
n n
=
k n−k
     
n n−1 n−1
= +
r r r−1
k     
X m n m+n
=
r=0
r k − r k

3 Law of Total Probability/Bayes Theorem


1. The The Law of Total Probability states that if B1 , B2 , . . . , Bk is
a collection of mutually exclusive and exhaustive events, then for any
event A:
Xk
P (A) = P (Bi )P (A|Bi )
i=1

Total probability questions are usually solved by constructing tree di-


agrams which represent the exhaustive list of Bi ’s with (A, A0 ) as the
leaves.

2. Bayes Theorem describes the probability of an event, based on prior


knowledge of conditions that might be related to the event:

P (B|A)P (A)
P (A|B) =
P (B)

If P (B) 6= 0

6
Part II
Random Variables
1. Random Variable: Usually denoted X, Y, Z, is a function defined
over a sample space S, that associates a real number x ∈ R as:

X(e) = x

with each possible outcome e in S


In other words:
A real valued function defined on the sample space S

X:S→R
(∀e ∈ S, X(e) ∈ R)

Discrete Random Variables


1. Discrete Random Variable: If the set of all possible values of a
random variable X is a countable set x1 , x2 , . . . , xn or x1 , x2 , . . . then
X is called a discrete random variable

2. Discrete Probability Density Function (PMF): A function that


assigns the probability to each possible value x if x is finite or countably
infinite. Denoted:

f (x) = P [X = x]
x = x1 , x2 , . . .

3. PMF Properties: A function f (x) as the discrete pdf if and only if


it satisfies both of the following for at most a countably infinite set of
real values x1 , x2 , . . . :

(a) f (xi ) ≥ 0
(Probability must be non-negative)
P P
(b) f (xi ) = P [X = xi ] = 1
all xi all xi
(An exhaustive partition of the sample space must sum to 1 )

7
4. Cumulative Distribution Function: (CDF): The CDF of the ran-
dom variable X is defined for any real x by:

F (x) = P [X ≤ x]

It is the cumulative summed value of all probabilities where P [X ≤ x]

5. CDF Properties: A function F (x) is the CDF for some random


variable X if and only if it satisfies the following properties:

(a) lim F (x) = 0


x→−∞

(b) lim F (x) = 1


x→∞

6. Expected Value ( expectation, mean): If X is a discrete random


variable with pdf f (x), and the expected value of X is defined by:
X
E(X) = xf (x)
x

This results in a weighted sum

Continuous Random Variables


1. Continuous Random Variable: A random variable X is a contin-
uous random variable if there is a function f (x) called the probability
density function (pdf) of X, Such that the CDF can be represented as:
Zx
F (x) = f (t)dt
−∞

In practice, the lower bound for this integral is the lower bound given
by the pdf.

2. Properties of a Continuous PDF: A function f (x) is a pdf for some


continuous random variable X if and only if it satisfies the following
properties:

(a) f (x) ≥ 0
R∞
(b) f (x)dx = 1
−∞

8
3. Continuous Expected Value: For a continuous random variable
with pdf f (x), then the expected value of X is defined by:
Z∞
E(X) = xf (x)dx
−∞

Only if the integral is absolutely convergent. Otherwise E(X) DNE.

Remember: E(X) = µ

4. Percentile: For 0 < p < 1, the 100 × pth percentile of the distribution
of a continuous random variable is a solution xp to the equation:

F (xp ) = p

5. Quantile: Cut points dividing the range of a probability distribution


into continuous intervals with equal probabilities.

6. Median: The 50th percentile or middle value.

7. Mode: If the pdf has a unique maximum at x = m0 , where f (x) =


f (m0 ), then m0 is the mode of X. or the most popular value.

8. Symmetric Distribution: A distribution with pdf f (x) is said to be


symmetric about c if:

f (c − x) = f (c + x)

For all x

9. Finding a constant: For pdf cf (x) we can find the value for c by
solving the equation:
Zb
cf (x)dx = 1
a

9
Variance for Random Variables:
1. Discrete Variance:

V (X) = E(X 2 ) − E(X)2


n n
1
P P
(a) Calculate E(X) = µ = n
xi = xf (x)
i=1 i=1
n
(b) Calculate E(X 2 ) = x2i f (xi )
P
i=1
Example.
x 1 2 3 4 5 6
1 1 1 1 1 1
f (x) 6 6 6 6 6 6

1+2+3+4+5+6 21
E(X) = = = 3.5
6 6
7 49
E(X)2 = ( )2 = = 12.25
2 4
(12 + 22 + 32 + 42 + 52 + 62 ) 91
E(X 2 ) = = = 15.166
6 6
91 49 105
E(X 2 ) − E(X)2 = − = = 2.9166 = V ar(X)
6 4 36
Alternatively:
n
X
(xi − µ)2 f (xi )
i=1
6
X 1 35
(i − 3.5)2 ( ) = = 2.9166 = V ar(X)
i=1
6 12

2. Continuous Variance:
Z∞
2
σ = V ar(X) = (x − E(X))2 f (x)dx
−∞

10
Mixed Distributions
1. A random variable whose distribution is neither purely discrete nor
continuous. The probability distribution for a random variable X is of
mixed type if the CDF has the form:

F (x) = aFd (x) + (1 − a)Fc (x)

Where:

(a) Fd (x) is a discrete CDF


(b) Fc (x) is a continuous CDF
(c) 0 < a < 1

Moments
1. Moments: Special expected values. The moments of a function are
quantitative measures related to the shape of the function’s graph. The
moments for a probability distribution are:

(a) Moment 0: The total probability (1)


(b) Moment 1: The expected value E(X) = µ
(c) Moment 2: The variance V ar(X)
(d) Moment 3: The skewness (A measure of asymmetry)
(e) Moment 4: The kurtosis (A measure of heavy/light tailedness)

Moment Generating Function (MGF): If X is a random variable,


then the expected value:

MX (t) = E(etX )
∗The subscript X is not always used

is called the moment generating function of X if this expected value


exists for all values of t in some interval of the form:

−h < t < h

for some h > 0

11
2. Series Review:
x2 x3
(a) ex = 1 + x + 2!
+ 3!
+ ...
t2 x2 t3 x3
(b) etx = 1 + tx + 2!
+ 3!
+ ...
t2 t3
(c) E(etx ) = 1 + tE(X) + 2!
E(X 2 ) + 3!
E(X 3 )

3. Would we take the j th derivative of a moment generating function

dj
j
E(etk )
dt
t=0

all terms except the E(X j ) disappear.

4. To calculate the MGF for a random variable X:


P tk
(a) e fx (X) (Discrete)
all k
R∞
(b) etk fx (X) (Continuous)
−∞

Inequalities
1. Markov Inequality: For any c > 0 and non-negative function h(x):

E(h(x))
P (h(x) ≥ c) ≤
c
The probability that h(x) is greater than a constant c > 0 is upper
bounded by the expectation of h(x) divided by that constant c.
This implies that as we choose larger values of c, the upper bound
approaches 0 and therefore so does the probability that P (h(x) ≥ c).

12
2. Indicator Random Variable: A random variable with the following
properties: (
1 if h(x) ≥ c
Ic (x)
0 if h(x) < c

3. Markov Inequality Proof:


If we have an indicator variable of the type above, and we can calculate
its expectation as:

E(Ic ) = 0 · P (h(x) < c) + 1 · P (h(x) ≥ c)


= P (h(x) ≥ c)
h(x)
If h(x) ≥ c, then Ic = 1 and c
≥ 1 (divide both sides by 1 )

h(x)
∴ ≥ Ic
c

h(x)
If h(x) < c, then Ic = 0 and c
≥ 0 (since h(x) is non-negative)

h(x)
∴ ≥ Ic
c
∴ h(x) ≥ Ic in all cases Since

h(x)
0 ≤ Ic ≤ c
for any X

It follows that:

0 ≤ E(Ic ) ≤ E( h(x)
c
)= E(h(x))
c

Therefore:
E(h(x))
P (h(x) ≥ c) ≤ c


Markov’s inequality relates probabilities to expectation, and provide


(frequently loose still useful) bounds for the CDF of a random variable.

13
4. Chebyshev Inequality: If X is a random variable with a finite mean
µ and finite variance σ 2 , then for any k > 0:
1
P (|X − µ| ≥ kσ) ≤
k2
The probability that the random variable X lies within ±K standard
deviations of its mean is at least:
1
1−
k2
This implies that for a wide class of probability distributions, no more
than a certain fraction of values can be more than a certain distance
from the mean. Specifically, no more than k12 of the distribution’s
values can be more than k standard deviations away from the mean.
Equivalently, that at least 1 − k12 of the distribution’s values are within
k standard deviations from the mean.
5. Chebyshev Inequality Proof:
Start with Chebyshev Inequality:
1
P (|X − µ| ≥ kσ) ≤
k2
Remember Markov Inequality:
E(h(x))
P (h(x) ≥ c) ≤
c
Let:

h(x) = (x − µ)2
c2 = k 2 σ 2

Then:
1
P (|X − µ| ≥ kσ) ≤
k2
2 2 2 E(x − µ)2
=P (|X − µ| ≥ k σ ) ≤
k2σ2
σ2
=P ((X − µ)2 ≥ k 2 σ 2 ) ≤ 2 2
k σ
1
=P (|X − µ| ≥ kσ) ≤ 2 
k

14
Part III
Special Probability
Distributions
Discrete Probability Distributions
1. Discrete Uniform Distribution X ∼ DU (N )

(a) pdf:
1
f (x) = , x = 1, 2, . . . , N
N
(b) Description: Many problems, especially those involving classical
assignment of probability, can be modelled by a discrete random
variable that assumes all of its values with the same probability.
Useful for modeling: games of chance such as lotteries, tossing an
unbiased die, flipping a fair coin.
(c) Expectation:
N +1
E(X) =
2
(d) Variance:
(N 2 − 1)
V ar(X) =
12
2. Bernoulli Distribution

(a) pdf:
f (x) = px (1 − p)1−x x = 0, 1
(b) Description: A single trial of an experiment where there are only
two possible outcomes: Success (1) or failure (0)
(c) Expectation:
p
(d) Variance:
p(1 − p)

15
3. Binomial Distribution X ∼ BIN (n, p)

(a) pdf:  
n x
b(x; n, p) = p (1 − p)n−x
x
(b) CDF:
x
X
B(x; n, p) = b(k; n, p) x = 0, 1, . . . , n
k=0

(c) Description: Used to test the number of successes in a sequence


of n independent experiments with equal probability. Binomial
uses:
Sampling with replacement
The binomial distribution is similar to multiple iterations of the
Bernoulli distribution
(d) Expectation:
E(X) = np
(e) Variance:
V ar(X) = np(1 − p)
(f) MGF:
MX (t) = (pet + q)n
(g) The binomial distribution approaches a Poisson pdf as n →
∞, p → 0 with np = λ constant.
As a general rule, the approximation gives reasonable results pro-
vided:
i. n ≥ 100
ii. p ≤ 0.01
iii. x as close to np

16
4. Hypergeometric Distribution X ∼ HY P (n, M, N )

(a) pdf:
M N −M
 
x n−x
h(x; n, M, N ) = N

n
where:
i. N : A finite number of items
ii. M : The subset of interest of the finite N items
iii. n: The number of items drawn
iv. x: The number of items belonging to M
(b) CDF:
x
X
H(x; n, M, N ) = h(i; n, M, N )
i=0

(c) Description: Similar to the binomial distribution. In fact, as


N → ∞, The hypergeometric distribution approaches the bino-
mial distribution.
Samples without replacement
(d) Expectation:
nM
E(X) =
N
(e) Variance:
M  M
V ar(X) = n (1 − )(N − n)(N − 1)
N N

17
5. Geometric Distribution X ∼ GEO(p)

(a) pdf:
g(x; p) = p(1 − p)x−1 x = 1, 2, 3
(b) CDF:
x
X
G(x; p) = p(1 − p)i−1 = 1 − (1 − p)x x = 1, 2, 3
i=1

(c) Description: Similar to a Bernoulli distribution, we use the ge-


ometric distribution if we wish to denote the number of trials
required to obtain the first success.
(d) Expectation:
1
E(X) =
p
(e) Variance:
1−p
V ar(X) =
p2
(f) MGF:
pet
1 − (1 − p)et

18
6. Negative Binomial Distribution X ∼ N B(r, p)

(a) pdf:
 
x−1 r
f (x; r, p) = p (1 − p)x−r x = r, r + 1, . . .
r−1

(b) Description: Similar to a binomial distribution, except we let X


denote the number of trials required to obtain r successes.
(c) Expectation:
r
E(X) =
p
(d) Variance:
r(1 − p)
V ar(X) =
p2
(e) MGF:
h pet ir
MX (t) =
1 − (1 − p)et
7. Binomial vs Negative Binomial
Suppose X ∼ N B(r, p) and W ∼ BIN (n, p), then:

P [X ≤ n] = P [W ≥ r]

W ≥ r corresponds to the event of having r or more successes in n


trials, and that means that n or fewer trials will be needed to obtain
the first r successes.
The CDF’s are related as:

F (x; r, p)
= P [X ≤ x]
= 1 − B(r − 1; x, p)
= B(x − r; x, q)

19
8. Poisson Distribution X ∼ P OI(λ)

(a) pdf:
e−λ λx
f (x; λ) =
x!
(b) CDF:
x
X
F (x; λ) = f (k; λ)
k=0

(c) Description: Expresses the probability of a given number of


events occurring in a fixed interval of time or space if these events
occur with a known constant mean rate (λ) and independently of
the time since the last event.
(d) Expectation:
E(X) = λ
(e) Variance:
V ar(X) = λ
(f) MGF:
t
MX (t) = eλ(e −1)
(g) Poisson Processes: If a collection of random points in some
space forms a Poisson process, then the number of points in a re-
gion of finite size is a random variable with a Poisson distribution.

The Poisson process is often defined on the real line and is used
to model random events such as the arrival of customers at a
store, phone calls at an exchange, or the occurrence of earthquakes
distributed in time.

X(t) ∼ P OI(λt), where λ = E[X(λ)] = λt

20
Continuous Probability Distributions
1. Continuous Uniform Distribution X ∼ U N IF (a, b)

(a) CDF: 
0
 if x ≤ a
x−a
F (x) b−a
if a < x < b

1 if x ≥ b

(b) Description: This is a continuous counterpart of the discrete uni-


form distribution. It provides the probability model for selecting
a point ”at random” from an interval (a, b)
(c) Expectation:
a+b
E(X) =
2
(d) Variance:
(b − a)2
V ar(X) =
12

21
2. Gamma Distribution X ∼ GAM (θ, κ)

(a) Function: The gamma function for all κ > 0:


Z∞
Γ(κ) = tκ−1 e−t dt
0

When κ = 1 the gamma function is equal to the exponential


function.
(b) The gamma function satisfies the following properties:
i. Γ(κ) = (κ − 1)Γ(κ − 1) κ > 1
ii. Γ(n) = (n − 1)!, n = 1, 2, . . .

iii. Γ( 12 ) = π
(c) CDF:
Zx
1 t
F (x; θ, κ) = tκ−1 e− θ dt
θκ Γ(κ)
0

(d) Description: Widely used in engineering and science, the gamma


distribution is useful for modelling continuous random variables
that are always positive and have skewed distributions. The ex-
ponential, and chi-squared distributions are special cases of the
gamma distribution.
(e) Expectation:
E(X) = κθ
(f) Variance:
V ar(X) = κθ2
(g) MGF:
1 κ 
MX (t) =
1 − θt
(h) If X ∼ GAM (θ, n) where n is a positive integer, then the CDF
can be written as:
n−1 x i
X ( ) x
F (x; θ, n) = 1 − θ
e− θ
i=0
i!

22
3. Exponential Distribution X ∼ EXP (θ)

(a) pdf:
1 x
f (x; θ) = e− θ
θ
(b) CDF:
x
F (x; θ) = 1 − e− θ
(c) Description: A special case of the gamma distribution, com-
monly used to model the time we need to wait for a given event
occurs. It is the continuous counterpart of the geometric distribu-
tion.
(d) Expectation
E(X) = θ
(e) Variance
V ar(X) = θ2
(f) MGF
1
MX (t) =
1 − θt
4. Beta Distribution X ∼ BET A(a, b)

(a) CDF:
Γ(a + b) a−1
x (1 − x)b−1
Γ(a)Γ(b)
(b) Description: A useful distribution for modelling proportion pa-
rameterized by two positive shape parameters. In Bayesian infer-
ence, the beta distribution is related to the Bernoulli, binomial,
negative binomial and geometric distributions.
(c) Expectation:
a
E(X) =
a+b
(d) Variance:
ab
V ar(X) =
(a + b)2 (a + b + 1)

23
5. Normal Distribution X ∼ N (µ, σ 2 )

(a) pdf:  2
(x−µ)
1 σ
f (x; µ, σ) = √ e− 2
2πσ
(b) CDF: The CDF of Z ∼ N (0, 1) is given by:
Zx
1 z2
Φ(z) = √ e− 2 , − ∞ < z < ∞

−∞

(x−µ)
where z = σ
(c) Description: The single most important distribution in statis-
tics.
(d) Expectation:
E(X) = µ
(e) Variance:
V ar(X) = σ 2
(f) MGF:
σ 2 t2
MX (t) = eµt+ 2

(g) Standard Scores: The number of standard deviations the value


X is above or below the mean value.
If X ∼ N (µ, σ 2 ), then:

X −µ
Z= ∼ N (0, 1)
σ
The standard score (Z-score) can be used to look up percentiles
(either with tables, or pnorm(Z) in R)
Alternatively, we can do a reverse lookup to find the number of
standard deviations associated with the percentile (either with ta-
bles, or with qnorm(percentage) in R). This is useful for calculating
100(1 − α) confidence intervals.
i. pnorm: Enter standard deviation value, returns percentage.
ii. qnorm: Enter percentage, return standard deviation value.

24
Location and Scale Parameters
Mean is an example of location.
Standard deviation is an example of scale.
Let f0 (z) represent a fully specified pdf
Let F0 (z) represent a fully specified CDF
1. Location Parameters: A quantity η as a location parameter for the
distribution of X if the CDF has the form:
F (x; η) = F0 (x − η)
or when the pdf has the form:
f (x; η) = f0 (x − η)
It is common for location parameter to be a measure of central tendency
of X , such as mean or median.
For example, If X has a pdf of the form (double exponential):
1
f (x; n) = e−|x−η|
2
then the location parameter η is the mean of the distribution because
f (x; η) is symmetric about η
2. Scale Parameters: A positive quantity θ is a scale parameter for the
distribution of X if the CDF has the form:
x
F (x; θ) = F0
θ
or when the pdf has the form:
1 x
f (x; θ) = f0
θ θ
For the expononential distriubtion X ∼ EXP (θ), θ is a scale parame-
ter.
The standard deviation σ, often turns out to be a scale parameter.
3. Location–Scale Parameters: Quantities η and θ > 0 are called
location–scale parameters for the distribution of X if the CDF has the
form: x − η 
F (x; θ, η) = F0
θ
or when the pdf has the form:
1 x − η 
f (x; θ, η) = f0
θ θ

25
Part IV
Joint Distributions
When there is more than one random variable of interest, it is convenient to
regard these variables as components of a k-dimensional vector:

X = (X1 , X2 , . . . , Xk )

which is capable of assuming values:

x = (x1 , x2 , . . . , xk )

in a k-dimensional Euclidean space.

An observed value x may be the result of measuring k characteristics once


each, or the result of measuring one characteristic k times.
(i.e. k repeated trials of an experiment concerning a single variable)

Joint Discrete Distributions


1. Joint Probability Mass Function: the joint pdf of the k-dimensional
discrete random variable X = (X1 , X2 , . . . , Xk ) is defined to be:

f (x1 , x2 , . . . , xk ) = P [X1 = x1 , X2 = x2 , . . . , Xk = xk ]

for all possible values x = (x1 , x2 , . . . , xk ) of X


Where:

[X1 = x1 , X2 = x2 , . . . , Xk = xk ] = [X1 = x1 ]∩[X2 = x2 ]∩. . . ∩[Xk = xk ]

2. Joint pmf Requirements: A function f (x1 , x2 , . . . , xk ) is a joint pmf


for some vector valued random variable X = (X1 , X2 , . . . , Xk ) if and
only if the following properties are satisfied:

(a) f (x1 , x2 , . . . , xk ) ≥ 0 for all possible values (x1 , x2 , . . . , xk )


P P
(b) ... f (x1 , x2 , . . . , xk ) = 1
x1 xk

26
3. Marginal pmf ’s: If the pair (X1 , X2 ) of a discrete random variable
has the joint pdf f (x1 , x2 ), then the marginal pdf ’s of X1 and X2 are:
P
(a) f1 (x1 ) = f (x1 , x2 )
x2
P
(b) f2 (x2 ) = f (x1 , x2 )
x1

4. Joint CDF: The joint cumulative distribution function (CDF) of the


k random variables X1 , X2 , . . . , Xk as defined by:

F (x1 , . . . , xk ) = P [X1 ≤ x1 , . . . , Xk ≤ xk ]

5. Multinomial Distribution: Suppose there are k + 1 mutually ex-


clusive and exhaustive events, and the multinomial distribution has a
joint pdf of the form:
n! xk+1
f (x1 , x2 , . . . , xk ) = px1 1 px2 2 . . . pk+1
x1 !x2 !. . . xk+1 !

Denoted as: X ∼ M U LT (n, p1 , p2 , . . . , pk )

Joint Continuous Distributions


1. Joint Probability Density Function: A k-dimensional vector-valued
random variable X = (X1 , X2 , . . . , Xk ) is said to be continuous if
there is a function f (x1 , x2 , . . . , xk ) called the joint probability density
function of X, such that the joint CDF can be written as:
Zxk Zx1
F (x1 , . . . , xk ) = ... f (t1 , . . . , tk )dt1 . . . dtk
−∞ −∞

for all x = (x1 , . . . , xk )

2. Joint pdf Requirements: Any function f (x1 , x2 , . . . , xk ) is a joint


pdf of a k-dimensional random variable if and only if :

(a) f (x1 , . . . , xk ) ≥ 0 for all x1 , . . . , xk


R∞ R∞
(b) ... f (x1 , . . . , xk )dx1 . . . dxk = 1
−∞ −∞

27
3. Marginal pdf ’s: If the pair (X1 , X2 ) of the continuous random vari-
able has the joint pdf f (x1 , x2 ), then the marginal pdf’s of X1 and X2
are:
R∞
(a) f1 (x1 ) = f (x1 , x2 )dx2
−∞
R∞
(b) f2 (x2 ) = f (x1 , x2 )dx1
−∞

4. Marginal CDF: If X = (X1 , X2 , . . . , Xk ) is a k-dimensional random


variable with joint CDF F (x1 , x2 , . . . , xk ), and the marginal CDF of Xj
is:
Fj (xj ) = lim F (x1 , . . . , xj , . . . , xk )
xj →∞alli6=j

Marginal Distributions
1. In the discrete case where a joint pmf is represented by a 2D table,
we can find the marginal pmf by simply adding up the rows (or
columns) while holding the other random variable constant. Notice
that the row/column tables sum to 1.

2. for a joint pmf P (x, y) of a discrete random variable (X, y):

(a) The marginal pmf of X:


X
fX (x) = P (X = x) = f (X = x, y)
ally

where x is fixed
(b) The marginal pmf of Y :
X
fY (y) = P (Y = y) = f (x, Y = y)
allx

where y is fixed

28
3. If the random variables (X, ) is continuous with joint pdf f (x, y) then:

(a) The marginal pdf of X:


Z∞
fX (x) = f (x, y)dy
−∞

We integrate out y
(b) The marginal pdf of Y :
Z∞
fY (y) = f (x, y)dx
−∞

We integrate out x

4. In general, knowledge of marginal pdf/pmf’s are not sufficient to de-


termine the joint pdf/pmf. When the joint pdf/pmf is the product of
the marginal pdf/pmf’s, the two random variables X and Y are inde-
pendent.

5. In general for higher dimensions:

(a) pmf:
XX XX X
fj (j ) = ... ... f (x1 , x2 , . . . , xj , . . . , xk )
x1 x2 j−1 j+1 xk

where xj is fixed.
(b) pdf:
Z Z
fj (xj ) = ... f (x1 , x2 , . . . , xj , . . . , xk )dx1 . . . dxj−1 dxj+1 . . . dxk

where xj is fixed, and there are K − 1 integrals.

29
Independent Random Variables
1. It is natural to extend the concept of independence of events to the
independence of random variables.
Remember that two events are independent iff :

P (A ∩ B) = P (A)P (B)

For random variables, this is denoted as:

f (1, 2) = f1 (1)f2 (2)

If we represent the discrete joint pmf as a table of columns and rows,


we can check for independence by seeing if multiplying the sums of the
row and column of interest together is equal to their intersecting entry.
E.g. f (1, 2): Multiply the totals of the first row and the second column
to see if it equals the entry found at (1,2) of the table. If it does,
this event is independent. If not, we have found in general dependence
between the random variables.

2. The same rule of independence holds true for continuous random


variables:
Zd Zb Zb Zd
f1 (x1 )f2 (x2 )dx1 dx2 = f1 (x1 )dx1 f2 (x2 )dx2
c a a c

3. In general, Independent Random Variables X1 , . . . , Xk said to be


independent if for every ai < bi :
k
Y
P [a1 ≤ X1 ≤ b1 , . . . , ak ≤ Xk ≤ bk ] = P [ai ≤ Xi ≤ bi ]
i=1

If the above formula does not hold for all ai ≤ bi , the random variables
are called dependent.

4. Random variables X1 , . . . , Xk are independent if and only if the follow-


ing properties hold:

(a) F (x1 , . . . , xk ) = F1 (x1 ). . . Fk (xk )


(b) f (x1 , . . . , xk ) = f1 (x1 ). . . fk (xk )
Where Fi (xi ) and fi (xi ) are the marginal CDF and pdf of Xi ,
respectively.

30
If we multiply together two marginal pdf’s they are independent if
they equal the joint pdf

mariginal1 × marginal2 = joint ⇐⇒ independence

Conditional Distributions
P [X = x, T = t] fX,T (x, t)
P [T = t|X = x] = =
P [X = x] fX (x)
Just like with independence, conditional probability can also be extended to
random variables.

1. Conditional pmf/pdf: If X1 and X2 are discrete or continuous ran-


dom variables with joint pdf f (x1 , x2 ), then the conditional probability
density function (conditional pdf) of X2 given X1 = x1 is defined to
be:
f (x1 , x2 )
f (x2 |x1 ) =
f1 (x1 )
for all values x1 such that f1 (x1 ) > 0, and zero otherwise. Similarly,
the conditional pdf of X1 given X2 = x2 is:

f (x1 , x2 )
f (x1 |x2 ) =
f2 (x2 )

We can calculate the conditional pmf/pdf by dividing the joint pdf by


the marginal pdf

joint
conditional =
marginal
2. For discrete (X, Y ), we have:

f (x, y) f (x, y)
f (Y |X) = =P
fX (x) f (x, y)
Y

where X is fixed.
f (x, y) f (x, y)
f (X|Y ) = =P
fY (y) f (x, y)
X

where Y is fixed.

31
3. For continuous (X, Y ), with a joint pdf f (x, y):, the conditional distri-
butions are:
f (x, y) f (x, y)
f (Y |X) = = R∞
fX (x)
f (x, y)dy
−∞

Where we integrate out Y .


f (x, y) f (x, y)
f (X|Y ) = = R∞
fY (y)
f (x, y)dx
−∞

Where we integrate out X.


4. In general, for random variable (x1 , x2 , . . . , xk ) with joint pdf f (x1 , . . . , xk )
Let
Z∞
fM1 = f (x1 , x2 , . . . , xk )dx1
−∞

be the joint marginal pmf/pdf of (x2 , x3 , . . . , xk ), then the conditional


pdf of x1 given X2 = x2 , X3 = x3 , . . . , Xk = xk is:
f (x1 , x2 , . . . , xk ) f (x1 , x2 , . . . , xk )
f (x1 |x2 , x3 , . . . , xk ) = = R∞
fM1 (x2 , x3 , . . . , xk )
f (x1 , x2 , . . . , xk )dx1
−∞

5. The joint conditional distribution of, for example, x1 , x2 given X3 =


x3 , X4 = x4 , . . . , Xk = xk is:
f (x1 , x2 , . . . , xk ) f (x1 , x2 , . . . , xk )
f (x1 , x2 |x3 , . . . , x4 ) = = R∞ R∞
fM1,2 (x3 , . . . , xk )
f (x1 , x2 , . . . , xk )dx1 dx2
−∞ −∞

Independence with a Conditional Distributions


6. If X and Y are independent, and the conditional pdf/pmf of X given
Y = y is:
f (x, y) fX (x)fY (y)
f (x|y) = = = fX (x)
fY (y) fY (y)
And the probability of Y given X = x is:
f (x, y) fX (x)fY (y)
f (y|x) = = = fY (y)
fX (x) fX (x)

32
Part V
Properties of Random
Variables
1. Theorem 1: Suppose x = (x1 , x2 , . . . , xk ) has joint pdf/pmf f (x1 , x2 , . . . , xk ),
and u(x) = u(x1 , x2 , . . . , xk ) is a real valued function (Rk → R) then:
Continuous:
Z∞ Z∞
E(u(x)) = ... u(x1 , . . . , xk )f (x1 . . . , xk )dx1 . . . dxk
−∞ −∞

Discrete:
X X
E(u(x)) = ... u(x1 , . . . , xk )f (x1 . . . , xk )
x1 xk

2. Theorem 2: If (x1 , x2 ) has pdf/pmf f (x1 , x2 ), then:

E(aX1 + bX2 ) = aE(x1 ) + bE(x2 )

Corr:
k
X k
X
E( (ai Xi ) = ai E(Xi )
1 i

3. Theorem 3: If X and Y are independent and h(x), g(y) are functions


of X and Y , then:

E(h(x)g(y)) = E(h(x)) · E(g(y))

4. Theorem 4:

V (X + Y ) = V (X) + V (Y ) + 2COV (X, Y )

Corr: If X and Y are independent, then:

E(XY ) = E(X) · E(Y )

5. Covariance: A measure of the joint variability of two random vari-


ables. The covariance of X and Y is:

Cov(x, y) = E((X − µx )(Y − µy )) = E(XY ) − E(X)E(Y ) = σxy

33
6. Correlation Coefficient: A measure that expresses the extent to
which two random variables are linearly related. The correlation coef-
ficient of X and Y is:
Cov(x, y)
ρxy =
σx σy
Important: If X and Y are independent, then ρxy = 0. But ρxy = 0
does not imply independence.

7. Conditional Expectation: The mean of x given y is:


Continuous:
Z∞
E(x|y) = x(f |x)dx
−∞

Discrete: X
E(x|y) = x(f x|y)
x

We can also compute E(x) as:

E(x) = E(E(x|y)) = Ey [Ex (x|y)]

8. Conditional Variance: The variance of x given y is:

V (x|y) = E(x2 |y) − [E(x|y)]2

We can also compute V ar(x) as:

V (x) = Vy (Ex (x|y)) = Ey [Vx (x|y)]

9. Theorem:

(a) E[g(x)h(y)|x] = g(x)E[h(x)|y]


(b) E[h(x, y)] = Ex [E(h(x, y)|x)]

We can potentially reduce variance by conditioning since:

V (y) ≥ V (y|x)

10. Joint Moment Generating Functions: The joint MGF of x =


(x1 , . . . , xk ) is:
k
X
Mx (t) = E[exp{ ti xi }]
i=1

34
If x1 , x2 , . . . , xk are independent, then:

Mx (t) = Mx1 (t1 )Mx2 (t2 . . . Mxk (tk )

If Y = x1 + x2 + . . . + xk and xi , . . . , xk are iid copies of X, then:

My (t) = E(e( tY )) = E(et(x1 +x2 +. . . +xk ) )

35
Part VI
Functions of Random Variables
Let X be a random variable with pdf f (x) and Y = u(x).
What is the pdf of Y ?

1. CDF Method:
Z
Fy (y) = p(Y ≤ y) = p(u(x) ≤ y) = f (x)dx
u(x)≤y

We can then find the pdf by taking the derivative of the CDF:
d
fy (y) = Fy (y)
dy

2. Transformation Method: If Y = u(x) is a one-to-one transforma-


tion, then:
Continuous:
d −1 dx
fy (y) = fx (u−1 (y))| u (y)| = fx (x)| |
dy dy
Discrete:
fy (y) = fx (u−1 (y))

3. Order Statistics: Suppose x1 , . . . , xn are iid with pdf f (x) and cdf
F (x). The joint pdf is f (x1 , x2 , . . . , xn ) = f (x1 )f (x2 ). . . f (xn ). If we
consider the order statistics of sample x1 , . . . , xn :
(a) Yn = the largest of the xi ’s = max(x1 , . . . , xn ) (xn:n )
(b) Yn−1 = the second largest of the xi ’s (xn−1:n )
(c) Y1 = the smallest of the xi ’s (x1:n )
The transformation (x1 , . . . , xn ) → (Y1 , . . . , Yn ) is not one-to-one. It
is n! to 1 (since there are n! permutations that all correspond to one
correct ordering).
Therefore, the joint pdf of Y1 , Y2 , . . . , Yn is:
f (Y1 , Y2 , . . . , Yn ) = n!fx (x1 , x2 , . . . , xn )
= n!fx (y1 , y2 , . . . , yn )
= n!f (y1 )f (y2 ). . . f (yn )

36
The marginal pdf of Yk is:
n!
fk (yk ) = F (yk )k−1 f (yk )[1 − F (yk )]n−k
(k − 1)!1!(n − k)!

Special Cases:

(a) pdf of Y1
1 − (1 − F (y1 ))n
(b) pdf of Yn
(F (yn ))n

37

You might also like