0% found this document useful (0 votes)
279 views11 pages

1 Math Fundamentals: 1.1 Integrals, Factors and Techniques

This document summarizes key concepts in probability and statistics including: 1) Integrals, factors, and techniques for calculating integrals. 2) Probability relations such as the law of total probability, Bayes' theorem, and independence. 3) Counting principles like the binomial theorem and multinomial theorem. 4) Essential definitions for probability distributions including the cumulative distribution function, probability density function, survival function, and hazard rate. Expectation, variance, and other distribution parameters are also defined. 3) Moment generating functions, cumulant generating functions, and other characteristic functions are introduced. Percentiles, modes, skewness and kurtosis are also defined.

Uploaded by

roy_getty
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
279 views11 pages

1 Math Fundamentals: 1.1 Integrals, Factors and Techniques

This document summarizes key concepts in probability and statistics including: 1) Integrals, factors, and techniques for calculating integrals. 2) Probability relations such as the law of total probability, Bayes' theorem, and independence. 3) Counting principles like the binomial theorem and multinomial theorem. 4) Essential definitions for probability distributions including the cumulative distribution function, probability density function, survival function, and hazard rate. Expectation, variance, and other distribution parameters are also defined. 3) Moment generating functions, cumulant generating functions, and other characteristic functions are introduced. Percentiles, modes, skewness and kurtosis are also defined.

Uploaded by

roy_getty
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

1 Math Fundamentals

1.1 Integrals, factors and techniques


∞ ∞
X 1 X xn
rn = , (r2 < 1) = ex (a + b + c)2 = a2 + b2 + c2 + 2ab + 2ac + 2bc
n=0
1−r n=0
n!
Zb Zb Zb a a
−λx 1 −λx 1 −λx
u dv = uv|ba − v du a special case: xe dx = xe + λ2 e
λ b

b
a a a
Z∞ Z∞ Z∞ Z∞
n −ax n! 1 2
x e = −→ x e−ax = 2 and x2 e−ax = 3 Γ(n) = xn−1 e−x dx
an+1 a a
0 0   0√ 0
1 π
Γ(1) = 1 Γ(n) = (n − 1)! Γ n+ = n [1 · 3 · 5 · · · (2n − 1)]
  2 2   √
1 √ 3 π
Γ = π Γ =
2 2 2

1.2 Probability relations


If A ⊆ B, then Pr(A) ≤ Pr(B).
Pr(A) = Pr(A ∩ B) + Pr(A ∩ B 0 )
0 0
( Ai ) = A0i ( Ai ) = A0i
S T T S

Pr(A ∪ B) = Pr(A) + Pr(B) − Pr(A ∩ B)


Pr(A ∪ B ∪ C) = Pr(A) + Pr(B) + Pr(C) − Pr(A ∩ B) − Pr(A ∩ C) − Pr(B ∩ C) + Pr(A ∩ B ∩ C)
Pr(A ∩ B)
Pr(A ∩ B) = Pr(B|A) Pr(A) ⇔ Pr(B|A) = .
Pr(A)
Pr(B) = Pr(B|A) Pr(A) + Pr(B|A0 ) Pr(A0 ) (Law of Total Probability)
Pr(B|A) Pr(A)
Pr(A|B) = (Bayes’; note the “flip-flop” of Pr(A|B) & Pr(B|A))
Pr(B)
Pr(Aj ∩ B) Pr(B|Aj ) Pr(Aj )
Pr(Aj |B) = = Pn (Generalized Bayes’; the Ai ’s form a partition)
Pr(B) i=1 Pr(B|Ai ) Pr(Ai )

A, B are independent iff Pr(A ∩ B) = Pr(A) Pr(B) and Pr(A|B) = Pr(A).


Pr(A0 |B) = 1 − Pr(A|B)
Pr((A ∪ B)|C) = Pr(A|C) + Pr(B|C) − Pr((A ∩ B)|C).
If A, B are independent, then (A0 , B), (A, B 0 ), and (A0 , B 0 ) are also independent (each pair).

1.3 Counting
 
n n!
There are = ways to choose r objects from a collection of n items denoted n Cr . If order
r r!(n − r)!
is important, then there are n Pr = r! n Cr permutations of those objects.
 
n n!
There are = ways to choose n1 objects of Type 1, n2 objects of Type 2,
n1 n2 · · · nm n1 ! n2 ! · · · nm !
Xm
etc. Note that ni = n.
i=1

1
  ∞  
N X N k
Binomial Theorem: When expanding (1+t)N , the coefficient of tk is , so that (1 + t)N = t .
k k
k=0
N
Multinomial Theorem: When expanding (t1 + t2 + · ·  · + ts ) where N is a positive integer, the
k1 k2 ks
P N N!
coefficient of t1 t2 · · · ts (where ki = N ) is = .
k1 k2 · · · ks k1 !k2 ! · · · ks !

2 Probability Distributions
2.1 Essential definitions
I only list examples for continuous variables. Note that for a discrete distribution, integration is replaced
Rb P
by summation. a dx → a<x<b . If X is a random variable, then a function of it, e.g. g(X), is also a
random variable.
Rx
Cumulative Distribution Function FX (x) = Pr(X ≤ x) FX (x) = −∞ fX (x) dx
0
Probability Function fX (x) = Pr(X = x) f (x) = FX (x)
R bX
If X is continuous, then Pr(a < X < b) = a fX (x) dx = F (b) − F (a)

Survival Function sX (x) = Pr(X > x) = 1 − FX (x)


fX (x) d  
Hazard Rate (failure rate) h(x) = λ(x) = = − ln 1 − FX (x)
sX (x) dx

2.1.1 Expectation values and other parameters


R∞ R∞ R∞
In general, E(g(X)) = −∞ g(x)fX (x) dx. In particular, E(X) = −∞ xfX (x) dx and E(X 2 ) = −∞ x2 fX (x) dx.
There are a couple of special cases when X > 0:
R∞
• If X is continuous, E(X) = 0 sX (x) dx
P∞ P∞ P
• If X is discrete, E(X) = n=1 Pr(X ≥ n) = n=0 Pr(X > n) = n sX (n) = sX (0) + sX (1) + . . .
R∞
• If X ≥ a almost surely, then E(X) = a + a sX (x) dx
Rb
• If a ≤ X ≤ b, then E(X) = a + a sX (x) dx
If Pr(X ≥ 0) = 1, then we can write
Za Z∞
 
E min(X, a) = sX (x) dx and E max(X, a) = a + sX (x) dx.
0 a

2
 2
The variance of X, Var(X) = σX = E(X 2 ) − E(X) . Since the variance is not a linear operator, it
must often be calculated manually after obtaining the first two moments. This is particularly important
for mixed distributions! Note that

E(aX + bY + c) = aE(X) + bE(Y ) + c,

and

Var(aX + bY + c) = a2 Var(X) + b2 Var(Y ) (X, Y independent)


p σX
The standard deviation of X is σX = Var(X) and the coefficient of variation is .
µX

2
2.1.2 Moment generating function and friends
If it exists, the moment generating function (MGF) is defined as
n
dn X
MX (t) = E(eXt ) MX (0) = E(X n ) and MX (t) = pi exi t
dtn i=1

The latter is the sum of the probability of getting a particular value of x. The MGF of the sum of
independent random variables is the product of their MGFs.
Several properties of the MGF are worth remembering:

MX (0) = 1 MaX = MX (at)

MX+Y (t) = MX (t)MY (t) if X and Y are independent


The cumulant generating function is given by

dk h k i
ΨX (t) = ln MX (t) which leads to k
Ψ(0) = Ψ(k) (0) = E X − E(X)
dt
The probability generating function is given by PX (t) = E(tX ) iff E(tX ) exists.
If 0 < p < 1, then the 100p−th percentile of the distribution of X is the number xp which satisfies both
of the following inequalities:

Pr(X ≤ xp ) ≥ p and Pr(X ≥ xp ) ≥ 1 − p.

For a continuous distribution, Pr(X ≤ xp ) = p. The 50th percentile is p = 0.50, and is called the median.
For discrete distributions, the median and percentiles need not be unique!
The mode of a distribution is where fX (x) is maximized.
If a random variable X
 hasmean µ and standard deviation σ, we can create a new, “standardized”
random variable Z = X−µσ . Then the skewness of X is defined as γ = E(Z 3 ) and its kurtosis is
given by κ = E(Z 4 ). A positive skewness indicates a long tail to the right. A large kurtosis indicates
that the variance is influenced more by a few extreme outliers rather than several small deviations.

2.1.3 Important inequalities


Jensen’s: Pick h such that h00 (x) exists. Then

if h00 (x) ≥ 0
 
then E h(X) ≥ h E(X)
if h00 (x) ≤ 0
 
then E h(X) ≤ h E(X)

E(X)
Markov: Pr(X ≥ a) ≤ (a > 0 and X nonnegative real)
a
σ2 1
Chebyshev: Pr(|X − µ| ≥ κ) ≤ . Equivalently, Pr(|Z| ≥ r) ≤ 2 where Z = (X − µ)/σ.
κ2 r
You must remember the Chebyshev Inequality!

3
2.1.4 Transformation of a random variable
Given random variable X with known functions fX and FX , and random variable Y = Φ(X) is a function
of X. We want to find fY and FY . Note that

Y = Φ(X) = Y (X)
−1
X= Φ (Y ) = X(Y )
(
FX (x(y)) if y 0 > 0

dx
X continuous FY = fY (y) = fX (x(y))
The following is useful: sX (x(y)) if y 0 < 0 dy
X
X discrete fY (y) = fX (x)
x∈Φ−1 ({y})

2.2 Commonly Used Distributions


The most commonly used distributions for the purposes of this exam are summarized in tables 1, 2,
and 3. The Binomial, Negative Binomial, and Poisson distributions all obey the following recursion
relations:

 
b
fX (n) = a + fX (n − 1) a b
n
Binomial − pq (n + 1) pq
Negative Binomial q (r − 1)q
Poisson 0 λ

3 Multivariate Distributions
These are almost always best started by drawing a graph of the region where fX,Y > 0. This is very
useful for identifying the proper limits of integration or determining the ratio of areas.

3.1 Joint and marginal distributions


We now concern ourselves of the case when we have two random variables, call them X and Y , and wish
to know features of their joint probability. That is, we want to study the probability density function

fX,Y (x, y) = Pr(X = x ∩ Y = y) = Pr(X = x, Y = y)

with cumulative probability

FX,Y (x, y) = Pr(X ≤ x ∩ Y ≤ y) = Pr(X ≤ x, Y ≤ y)

and the two are related as before:


Zx Zy
FX,Y (x, y) = fX,Y (s, t) dt ds
−∞ −∞

R∞ R∞
Expectation values are as before: E[h(X, Y )] = −∞ −∞
h(x, y)fX,Y (x, y) dx dy

4
∂2
The pdf can be found from fX,Y (x, y) = FX,Y (x, y)
∂x ∂y R P RR PP
As in the single variable case, for discrete variables, replace → and → .
If one plots the probability distribution as a function of X and Y , then it may be interesting to note how
X behaves for a fixed value of Y , or vice-versa. Holding X fixed, we can sum FX,Y over all the allowed
y for that X, and record it next to the graph—in the margin. We define the marginal distribution of
X as
Z∞ Z∞
fX (x) = fX,Y (x, y) dy and the marginal dist. of Y fY (y) = fX,Y (x, y) dx
−∞ −∞

The marginal CDFs are given by

FX (x) = lim FX,Y (x, y) FY (y) = lim FX,Y (x, y)


y→+∞ x→+∞

If the random variables are independent, then the joint probability can be factored:

fX,Y (x, y) = fX (x)fY (y) FX,Y (x, y) = FX (x)FY (y)

The expectation values can be factored as well:

E(XY ) = E(X)E(Y ) E(X 2 Y 2 ) = E(X 2 )E(Y 2 )

A plot of fX,Y will be a rectangle with sides parallel to the axes if X and Y are independent.
The conditional distribution of Y given X has a direct analog to basic conditional probability. Recall
that
Pr(A ∩ B) fX,Y (x, y)
Pr(B|A) = so that fY |X=x (y|X = x) = fY (y|x) =
Pr(A) fX (x)
R∞
The expectation value is found in the usual way; E(Y |X = x) = −∞ yfY (y|x) dy. There are two
important results:

E(Y ) = E[E(Y |X)] Var(Y ) = E[Var(Y |X)] + Var[E(Y |X)]

3.2 Covariance
The covariance and correlation coefficient are given by
2
σXY = Cov(X, Y ) = E(XY ) − E(X)E(Y )

Cov(X, Y ) σ2
ρ(X, Y ) = √ √ = XY
Var X Var Y σX σY
Note that ρ = Cov = 0 if X and Y are independent.
Covariance is a linear operator:

Cov(aX1 + bX2 + c, Y ) = a Cov(X1 , Y ) + b Cov(X2 , Y )

and we can generalize the variance of a multivariate distribution to

Var(aX + bY ) = a2 Var(X) + b2 Var(Y ) + 2ab Cov(X, Y )

Let X1 , X2 ,. . . ,Xn make a random sample from a distribution with variance σ 2 , then the above rule
(with a = b = 1) can be extended as Var(X1 + X2 + · · · + Xn ) = nσ 2 .

5
3.3 Moment generating functions and transformations of a joint distribution
Similar to the single-variable case,
MX,Y (s, t) = E(esX+tY )
The relations are explicitly MX (s) = MX,Y (s, 0) and MY (t) = MX,Y (0, t). If X and Y are independent,
then MX,Y (s, t) = MX (s) · MY (t). Expectation values of the moments can be found from the relation
∂ m+n

= E(X m Y n )

MX,Y (s, t)
∂sm ∂tn s=t=0

3.4 Transformations and Convolution


Let (U, V ) = Φ(X, Y ) be a differentiable function of two random variables such that X and Y have a
known joint probability function. Then the joint pdf of U and V is given by
 
∂x ∂x
∂(x, y) ∂(x, y)
fU,V (u, v) = fX,Y (x(u, v), y(u, v)) where = det  ∂u ∂v 

∂(u, v) ∂(u, v) ∂y ∂y 
∂u ∂v
Convolution is especially pertinent for the sum of two random variables. It is the weighting of one
variable by the other when the two are constrained by a condition such as a sum. The following table
summarizes fX+Y (k) = Pr(X + Y = k) for discrete and continuous random variables for independent
and dependent X, Y pairs. Note that since X + Y = k, Y = X − k.

Discrete Continuous
k
X R∞
General case fX,Y (x, k − x) −∞
fX,Y (x, k − x) dx
x=0

k
X R∞
X, Y indpt fX (x)fY (k − x) −∞
fX (x)fY (k − x) dx
x=0
P
If we have a collection of several random variables, Xi , with different weights, αi such that αi = 1,
we can construct a new mixed distribution of fX (x) = α1 fX1 (x) + . . . + αn fXn (x). Then the various
moments are weighted averages of the individual moments. For example, to find the variance, you must
first find the 1st two moments for each Xi , then weight them to get the moments of mixed distribution,
then finally compute E(X 2 ) − [E(X)]2 .

3.5 Central Limit Theorem


For any sufficiently large sample size, e.g. ≥ 30, any distribution can be approximated by a normal
with the same mean and variance of the original. The practical implication that a sum of independent
identically distributed random variables can be approximated by a normal distribution with the same mean
and variance as the sum.This means that several of our earlier distributions become normal distributions:
b(n, p) N (np, npq)
NEGBIN(r) N ( rq rq
p , p2 )
Poisson N (λ, λ)
Γ N(α α
β , β2 )
2
σX
Sample Avg. (X) N (µX , n )

6
In principle, one must be careful about approximating a discrete distribution with a continuous one.
We therefore have the continuity correction: Let X be the original discrete distribution and Y the
continuous approximation. Then, Pr(X ≤ n) → Pr(X ≤ n + 21 ) ≈ Pr(Y ≤ n + 12 ). Also, Pr(X < n) ≈
Pr(Y < n − 21 ) and Pr(X ≥ n) ≈ Pr(Y > n − 21 ). In practice, though, the difference is small and not
likely to be a factor in choosing between multiple choice options on the exam.

3.6 Order Statistics


Let X1 through Xn be a collection of independent random variables. Let Y1 be the smallest of the Xi ,
and Yn be the largest. The collection of Yi has the same mean and variance as the collection of Xi ,
but the Y are now dependent. There are two interesting cases: when the smallest Y is larger than a
particular value, and when the largest Y is smaller than some limit.

Pr(Y1 > y) = sY1 (y) = [sX (y)]n


Pr(Yn < y) = FYn (y) = [FX (y)]n

If the Xi come from a continuous distribution, then the ordered pdf is

fY1 ,Y2 ,...,Yn (y1 , . . . , yn ) = n!fX (y1 )fX (y2 ) · · · fX (yn )

and the k th order is


n! h n−k i
fYk (y) = FX (y)k−1 1 − FX (y) fX (y)
(k − 1)!(n − k)!
where the bracketed terms represent the probability of k − 1 samples being less than y, the probability
of n − k samples being greater than y, and the probability of one sample being in the interval [y, y + dy].

4 Risk Management
Some general definitions that are common to describing losses:
X = loss, actual full loss, “ground up” loss Y = claim to be paid
E(Y ) = net premium, pure premium, expected claim
σY
= unitized risk, coefficient of variation
µY

4.1 Risk models


The individual risk model considers n policies where the claim for policy i has a random variable Xi .
All the Xi are independent and identically distributed, with finite mean and variance.
Xn
S= Xi is the aggregate claim random variable.
i=1
X X
E(S) = E(Xi ) = nµ Var(S) = Var(Xi ) = nσ 2
i i
p
Var(S) 1 σ −−−−→
The coefficient of variation is then =√ n → ∞ 0.
E(S) nµ
The collective risk model is an extension of the IRM by allowing n to also be a random variable, N .
PN
Note that often S = i=1 can be approximated as N (nµ, nσ 2 ). If S is the total loss paid to an individual
or group, then E(S) is the pure premium for the policy. The actual premium before expenses and
profits is given by Q = (1 + θ)E(S) where θ is the relative security loading.

7
4.2 Deductibles and policy limits
Let X represent the loss amount, d, the deductible on the policy, and Y the amount paid on the claim.
Ordinary deductible insurance is sometimes called excess loss insurance and has Y = max(X − d, 0) =
R∞ R∞ R∞
(X − d)+ . The pure premium is E(Y ) = d sX (x) dx = d (1 − FX (x)) dx = d (x − d)fX (x) dx.
There are two common variations on the deductible. The franchise deductible pays in full if loss ex-
ceeds the deductible. The disappearing deductible has both upper (dU ) and lower (dL ) deductibles,
and the payout increases linearly from zero to full loss between the limits.

The franchise deductible: The disappearing deductible :


( 
0 X≤d 0 
 X ≤ dL
Y =  
X X>d
 X − d L
Y = dU dL < X ≤ dU

 dU − dL

X X > dU

A policy may have a limit of u on the maximum payout on a policy. Then Y = max(X, u) and
R∞ Ru
E(Y ) = 0 sY (y) dy = 0 sX (x) dx. Note the similarity to the ordinary deductible.
An insurance cap specifies a maximum claim amount m on a policy. If there is no deductible, then
this is identical to a policy limit. If there is a deductible, then m = u − d is the maximum payout.
Proportional insurance pays a fraction, α, of the loss X. For example 80-20 medical plans. When
the amount paid is not the same as the loss, the following random variables are sometimes used:

• Y ∗ is the amount paid, conditional on the event that payment was made, sometimes termed payment
per payment.

• Y is the amount paid per loss.

For example, a policy with deductible d has

Y ∗ = (X − d)|(X > d) and Y = max(X − d, 0).

The mean excess loss in this case is E(Y ∗ )


expected loss eliminated from a claim
The loss elimination ratio is defined as . Note that the loss
expected total loss
eliminated from payment is made up by the insured (customer).
Reinsurance is an insurance policy purchased by an insurance company, primarily to limit catastrophic
claims or tax purposes. These policies can have caps, deductibles, and proportional payments.

8
Table 1: Common Discrete Distributions

Distribution fX (x) E(X) Var(X) M (t) Notes and Comments and application
2 t
 nt 
n+1 n −1 e e −1
Uniform
2 12 n et − 1

Bernoulli px q 1−x p pq q + pet Succeed OR Fail; p is chance of success


 
n x n−x
Binomial p q np npq (q + pet )n n Bernoulli trials with x successes.
x

9
pet
Geometric pq x−1 1/p q/p2 Perform Bernoulli trials until success
1 − qet
   n
x − 1 n x−n p
Negative Binomial p q n/p nq/p2 nth success on xth Bernoulli trial
n−1 1 − qet
  
K M −K
x n−x nK nK(M − K)(M − n)
Hypergeometric   Choose n objects from a group of M , partitioned
M M M 2 (M − 1)
into K and M − K
n

λx t
−1)
Poisson e−λ λ λ eλ(e Counting events or time between events
x!
Table 2: Common Continuous Distributions

Distribution fX (x) E(X) Var(X) M (t) Notes and Comments


2 bt at
1 b+a (b − a) e −e
Uniform
b−a 2 12 (b − a)t

1 2 2 σ 2 t2
Normal √ e−[(x−µ) /2σ ] µ σ2 eµt+ 2 95th %ile is 1.645; Pr(|Z| ≤ 1.96) = 0.95
σ 2π

λ
Exponential λe−λx 1/λ 1/λ2 Waiting time between failures. sX (t) = e−λt , me-
λ−t
dian = (ln 2)/λ

 n
λn e−λx n−1 λ
Gamma x n/λ n/λ2 Add n (independant) exponential distributions to-
Γ(n) λ−t
gether. General form has λ → β and n → α.

• If Z1 , . . . , Zn is a sample from the std. normal

10
 n/2
e−x/2 1 dist., then Z12 + . . . + Zn2 is chi-sq. with n deg. of
Chi-Squared xn/2−1 n 2n freedom.
Γ(n/2)2n/2 1 − 2t
• n = 2 is exponential with mean 2 (λ = 0.5)

αθα θ αθ2  α
θ
Pareto sX (x) = x+θ
(x + θ)α+1 α−1 (α − 1)2 (α − 2)

• α = β = 1 is U[0,1].
Γ(α + β) α−1 α αβ • If x out of n items are defective and the
Beta x (1 − x)β−1 2
Γ(α)Γ(β) α+β (α + β) (α + β + 1) prior dist. is Beta(α,β), then the posterior dist.
is Beta(α + x,β + n − x)
h    i x β
X β
β  x β−1 −( αx )β 
Weibull e αΓ(1 + β1 ) α2 Γ 1 + β2 − Γ2 1 + β1 sX (x) = e−( α ) Y = α is an exp. of λ = 1.
α α
1 2 2 2 2
/2σ 2 ] /2
Lognormal √ e−[(ln x−µ) eµ+σ (eσ − 1)e2µ+σ ln X ∼ N (µ, σ 2 )
xσ 2π
Table 3: Common Multivariate Distributions

Distribution fX1 ,...,Xk (x1 , . . . , xk ) E(Xi ) Var(Xi ) Cov(Xi .Xj ) Notes and Comments and application
 
n
Multinomial px1 · . . . · pxkk npi npi qi −npi pj An experiment with k possible outcomes per-
x1 · · · xk 1
formed n times.
1 1  2 

11
Bivariate Normal p eγ , γ=− 2
z1 + z22 − 2ρz1 z2 zi = (xi − µi )/σi
2πσ1 σ2 1 − ρ 2 2(1 − ρ )
Xi ∼ N (µi , σi2 )

(X1 |X2 = x2 ) ∼ N µ1 + ρσ1 z2 , (1 − ρ2 )σ12

Bivariate Uniform Joint density must be 1/Area of region where pos-


itive.

You might also like