1 Math Fundamentals: 1.1 Integrals, Factors and Techniques
1 Math Fundamentals: 1.1 Integrals, Factors and Techniques
1.3 Counting
n n!
There are = ways to choose r objects from a collection of n items denoted n Cr . If order
r r!(n − r)!
is important, then there are n Pr = r! n Cr permutations of those objects.
n n!
There are = ways to choose n1 objects of Type 1, n2 objects of Type 2,
n1 n2 · · · nm n1 ! n2 ! · · · nm !
Xm
etc. Note that ni = n.
i=1
1
∞
N X N k
Binomial Theorem: When expanding (1+t)N , the coefficient of tk is , so that (1 + t)N = t .
k k
k=0
N
Multinomial Theorem: When expanding (t1 + t2 + · · · + ts ) where N is a positive integer, the
k1 k2 ks
P N N!
coefficient of t1 t2 · · · ts (where ki = N ) is = .
k1 k2 · · · ks k1 !k2 ! · · · ks !
2 Probability Distributions
2.1 Essential definitions
I only list examples for continuous variables. Note that for a discrete distribution, integration is replaced
Rb P
by summation. a dx → a<x<b . If X is a random variable, then a function of it, e.g. g(X), is also a
random variable.
Rx
Cumulative Distribution Function FX (x) = Pr(X ≤ x) FX (x) = −∞ fX (x) dx
0
Probability Function fX (x) = Pr(X = x) f (x) = FX (x)
R bX
If X is continuous, then Pr(a < X < b) = a fX (x) dx = F (b) − F (a)
2
2
The variance of X, Var(X) = σX = E(X 2 ) − E(X) . Since the variance is not a linear operator, it
must often be calculated manually after obtaining the first two moments. This is particularly important
for mixed distributions! Note that
and
2
2.1.2 Moment generating function and friends
If it exists, the moment generating function (MGF) is defined as
n
dn X
MX (t) = E(eXt ) MX (0) = E(X n ) and MX (t) = pi exi t
dtn i=1
The latter is the sum of the probability of getting a particular value of x. The MGF of the sum of
independent random variables is the product of their MGFs.
Several properties of the MGF are worth remembering:
dk h k i
ΨX (t) = ln MX (t) which leads to k
Ψ(0) = Ψ(k) (0) = E X − E(X)
dt
The probability generating function is given by PX (t) = E(tX ) iff E(tX ) exists.
If 0 < p < 1, then the 100p−th percentile of the distribution of X is the number xp which satisfies both
of the following inequalities:
For a continuous distribution, Pr(X ≤ xp ) = p. The 50th percentile is p = 0.50, and is called the median.
For discrete distributions, the median and percentiles need not be unique!
The mode of a distribution is where fX (x) is maximized.
If a random variable X
hasmean µ and standard deviation σ, we can create a new, “standardized”
random variable Z = X−µσ . Then the skewness of X is defined as γ = E(Z 3 ) and its kurtosis is
given by κ = E(Z 4 ). A positive skewness indicates a long tail to the right. A large kurtosis indicates
that the variance is influenced more by a few extreme outliers rather than several small deviations.
if h00 (x) ≥ 0
then E h(X) ≥ h E(X)
if h00 (x) ≤ 0
then E h(X) ≤ h E(X)
E(X)
Markov: Pr(X ≥ a) ≤ (a > 0 and X nonnegative real)
a
σ2 1
Chebyshev: Pr(|X − µ| ≥ κ) ≤ . Equivalently, Pr(|Z| ≥ r) ≤ 2 where Z = (X − µ)/σ.
κ2 r
You must remember the Chebyshev Inequality!
3
2.1.4 Transformation of a random variable
Given random variable X with known functions fX and FX , and random variable Y = Φ(X) is a function
of X. We want to find fY and FY . Note that
Y = Φ(X) = Y (X)
−1
X= Φ (Y ) = X(Y )
(
FX (x(y)) if y 0 > 0
dx
X continuous FY = fY (y) = fX (x(y))
The following is useful: sX (x(y)) if y 0 < 0 dy
X
X discrete fY (y) = fX (x)
x∈Φ−1 ({y})
b
fX (n) = a + fX (n − 1) a b
n
Binomial − pq (n + 1) pq
Negative Binomial q (r − 1)q
Poisson 0 λ
3 Multivariate Distributions
These are almost always best started by drawing a graph of the region where fX,Y > 0. This is very
useful for identifying the proper limits of integration or determining the ratio of areas.
R∞ R∞
Expectation values are as before: E[h(X, Y )] = −∞ −∞
h(x, y)fX,Y (x, y) dx dy
4
∂2
The pdf can be found from fX,Y (x, y) = FX,Y (x, y)
∂x ∂y R P RR PP
As in the single variable case, for discrete variables, replace → and → .
If one plots the probability distribution as a function of X and Y , then it may be interesting to note how
X behaves for a fixed value of Y , or vice-versa. Holding X fixed, we can sum FX,Y over all the allowed
y for that X, and record it next to the graph—in the margin. We define the marginal distribution of
X as
Z∞ Z∞
fX (x) = fX,Y (x, y) dy and the marginal dist. of Y fY (y) = fX,Y (x, y) dx
−∞ −∞
If the random variables are independent, then the joint probability can be factored:
A plot of fX,Y will be a rectangle with sides parallel to the axes if X and Y are independent.
The conditional distribution of Y given X has a direct analog to basic conditional probability. Recall
that
Pr(A ∩ B) fX,Y (x, y)
Pr(B|A) = so that fY |X=x (y|X = x) = fY (y|x) =
Pr(A) fX (x)
R∞
The expectation value is found in the usual way; E(Y |X = x) = −∞ yfY (y|x) dy. There are two
important results:
3.2 Covariance
The covariance and correlation coefficient are given by
2
σXY = Cov(X, Y ) = E(XY ) − E(X)E(Y )
Cov(X, Y ) σ2
ρ(X, Y ) = √ √ = XY
Var X Var Y σX σY
Note that ρ = Cov = 0 if X and Y are independent.
Covariance is a linear operator:
Let X1 , X2 ,. . . ,Xn make a random sample from a distribution with variance σ 2 , then the above rule
(with a = b = 1) can be extended as Var(X1 + X2 + · · · + Xn ) = nσ 2 .
5
3.3 Moment generating functions and transformations of a joint distribution
Similar to the single-variable case,
MX,Y (s, t) = E(esX+tY )
The relations are explicitly MX (s) = MX,Y (s, 0) and MY (t) = MX,Y (0, t). If X and Y are independent,
then MX,Y (s, t) = MX (s) · MY (t). Expectation values of the moments can be found from the relation
∂ m+n
= E(X m Y n )
MX,Y (s, t)
∂sm ∂tn s=t=0
Discrete Continuous
k
X R∞
General case fX,Y (x, k − x) −∞
fX,Y (x, k − x) dx
x=0
k
X R∞
X, Y indpt fX (x)fY (k − x) −∞
fX (x)fY (k − x) dx
x=0
P
If we have a collection of several random variables, Xi , with different weights, αi such that αi = 1,
we can construct a new mixed distribution of fX (x) = α1 fX1 (x) + . . . + αn fXn (x). Then the various
moments are weighted averages of the individual moments. For example, to find the variance, you must
first find the 1st two moments for each Xi , then weight them to get the moments of mixed distribution,
then finally compute E(X 2 ) − [E(X)]2 .
6
In principle, one must be careful about approximating a discrete distribution with a continuous one.
We therefore have the continuity correction: Let X be the original discrete distribution and Y the
continuous approximation. Then, Pr(X ≤ n) → Pr(X ≤ n + 21 ) ≈ Pr(Y ≤ n + 12 ). Also, Pr(X < n) ≈
Pr(Y < n − 21 ) and Pr(X ≥ n) ≈ Pr(Y > n − 21 ). In practice, though, the difference is small and not
likely to be a factor in choosing between multiple choice options on the exam.
4 Risk Management
Some general definitions that are common to describing losses:
X = loss, actual full loss, “ground up” loss Y = claim to be paid
E(Y ) = net premium, pure premium, expected claim
σY
= unitized risk, coefficient of variation
µY
7
4.2 Deductibles and policy limits
Let X represent the loss amount, d, the deductible on the policy, and Y the amount paid on the claim.
Ordinary deductible insurance is sometimes called excess loss insurance and has Y = max(X − d, 0) =
R∞ R∞ R∞
(X − d)+ . The pure premium is E(Y ) = d sX (x) dx = d (1 − FX (x)) dx = d (x − d)fX (x) dx.
There are two common variations on the deductible. The franchise deductible pays in full if loss ex-
ceeds the deductible. The disappearing deductible has both upper (dU ) and lower (dL ) deductibles,
and the payout increases linearly from zero to full loss between the limits.
• Y ∗ is the amount paid, conditional on the event that payment was made, sometimes termed payment
per payment.
8
Table 1: Common Discrete Distributions
Distribution fX (x) E(X) Var(X) M (t) Notes and Comments and application
2 t
nt
n+1 n −1 e e −1
Uniform
2 12 n et − 1
9
pet
Geometric pq x−1 1/p q/p2 Perform Bernoulli trials until success
1 − qet
n
x − 1 n x−n p
Negative Binomial p q n/p nq/p2 nth success on xth Bernoulli trial
n−1 1 − qet
K M −K
x n−x nK nK(M − K)(M − n)
Hypergeometric Choose n objects from a group of M , partitioned
M M M 2 (M − 1)
into K and M − K
n
λx t
−1)
Poisson e−λ λ λ eλ(e Counting events or time between events
x!
Table 2: Common Continuous Distributions
1 2 2 σ 2 t2
Normal √ e−[(x−µ) /2σ ] µ σ2 eµt+ 2 95th %ile is 1.645; Pr(|Z| ≤ 1.96) = 0.95
σ 2π
λ
Exponential λe−λx 1/λ 1/λ2 Waiting time between failures. sX (t) = e−λt , me-
λ−t
dian = (ln 2)/λ
n
λn e−λx n−1 λ
Gamma x n/λ n/λ2 Add n (independant) exponential distributions to-
Γ(n) λ−t
gether. General form has λ → β and n → α.
10
n/2
e−x/2 1 dist., then Z12 + . . . + Zn2 is chi-sq. with n deg. of
Chi-Squared xn/2−1 n 2n freedom.
Γ(n/2)2n/2 1 − 2t
• n = 2 is exponential with mean 2 (λ = 0.5)
αθα θ αθ2 α
θ
Pareto sX (x) = x+θ
(x + θ)α+1 α−1 (α − 1)2 (α − 2)
• α = β = 1 is U[0,1].
Γ(α + β) α−1 α αβ • If x out of n items are defective and the
Beta x (1 − x)β−1 2
Γ(α)Γ(β) α+β (α + β) (α + β + 1) prior dist. is Beta(α,β), then the posterior dist.
is Beta(α + x,β + n − x)
h i x β
X β
β x β−1 −( αx )β
Weibull e αΓ(1 + β1 ) α2 Γ 1 + β2 − Γ2 1 + β1 sX (x) = e−( α ) Y = α is an exp. of λ = 1.
α α
1 2 2 2 2
/2σ 2 ] /2
Lognormal √ e−[(ln x−µ) eµ+σ (eσ − 1)e2µ+σ ln X ∼ N (µ, σ 2 )
xσ 2π
Table 3: Common Multivariate Distributions
Distribution fX1 ,...,Xk (x1 , . . . , xk ) E(Xi ) Var(Xi ) Cov(Xi .Xj ) Notes and Comments and application
n
Multinomial px1 · . . . · pxkk npi npi qi −npi pj An experiment with k possible outcomes per-
x1 · · · xk 1
formed n times.
1 1 2
11
Bivariate Normal p eγ , γ=− 2
z1 + z22 − 2ρz1 z2 zi = (xi − µi )/σi
2πσ1 σ2 1 − ρ 2 2(1 − ρ )
Xi ∼ N (µi , σi2 )
(X1 |X2 = x2 ) ∼ N µ1 + ρσ1 z2 , (1 − ρ2 )σ12