0% found this document useful (0 votes)
116 views15 pages

CS1 Formula Sheet

Cs1 formula sheet

Uploaded by

Vanshika Arora
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
116 views15 pages

CS1 Formula Sheet

Cs1 formula sheet

Uploaded by

Vanshika Arora
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

ACTEX Learning

Exam CS1 Formula & Review Sheet


(updated 07/29/2023)

DATA AND BASICS OF MODELLING

Descriptive Analysis Summarize and describe the key characteristics of a dataset

Measures of central tendency The mean, median and mode

Measures of dispersion The standard deviation, range and interquartile range

Inferential Analysis Using a smaller sample size to draw conclusions about a larger population

Predictive Analysis Making predictions or forecasts about future events based on past or historical data

The Data Analysis Process

1. Develop 6. Exploratory data analysis


2. Identify 7. Modelling
3. Collection 8. Communicating
4. Processing 9. Monitoring
5. Data cleaning

Primary Source The data collected either from the source or through the original data collection
process

Secondary Source Information that has already been collected, analyzed, and published by others

Cross-Sectional Data Recording values of the variable(s) of interest for each case in the sample at a single
moment in time. It can be thought of as a snapshot of the data at a single moment
in time

Longitudinal Data Recording values of the SAME subjects at intervals through time

Censored Data The value of a variable is only partially known

Truncated Data Measurements on some variables are not recorded so are completely unknown

Big Data Size, speed, variety, reliability of data

Reproducibility The ability to reproduce statistical analyses or models using the same data and
methodology as the original study

Pros and Cons of Reproducibility

Pros It is necessary for a complete technical work review

Required by external regulators and auditors

More easily extended to investigate the effect of changes to the analysis,


or to incorporate new data

Desirable to compare the results of an investigation with a similar one


carried out in the past

Lead to fewer errors that need correcting in the original work, greater
efficiency

Cons Reproducibility does not mean that the analysis is correct

If activities involved in reproducibility only occurred at the end of an


analysis, it may be too late to address any unforeseen problems

Copyright © ArchiMedia Advantage Inc.


www.ACTEXLearning.com Need Help? Email [email protected] All Rights Reserved
ACTEX Learning Exam CS1 Formula & Review Sheet 2

DISCRETE DISTRIBUTIONS

Discrete Uniform Distribution on sample space S, where S = {1, 2, . . . , k}


1 k+1 2 k2 − 1
P (X = x) = for x = 1, 2, 3, . . . , k µ = E[X] = σ = Var[X] =
k 2 12
Bernoulli Distribution on sample space S, where S = {s, f }

P ({s}) = p, P ({f }) = 1 − p µ=p σ 2 = p − p2 = p(1 − p)

Binomial Distribution for n trials and success probability p


 
n x x = 0, 1, 2, . . . , n;
P (X = x) = p (1 − p)n−x µ = np σ 2 = np(1 − p)
k 0 < p < 1.

Geometric Distribution on the integers 0, 1, 2, . . . and parameter p

x = 0, 1, 2, . . . ; 1 (1 − p)
P (X = x) = p(1 − p)x−1 µ= σ2 =
0 < p < 1. p p2

Negative Binomial Distribution with X as the number of trials on which the k-th success occurs, or Y
as the number of failures before the k-th success:
 
x−1 k x = k, k + 1, . . . ; k k(1 − p)
P (X = x) = p (1−p)x−k µ= σ2 =
k−1 0 < p < 1. p p2
x−1
P (X = x) = (1 − θ)P (X = x − 1)
x−k
 
k+y−1 k k(1 − p)
P (Y = y) = p (1 − p)y , y = 0, 1, 2, 3, . . . µ =
y p
Hypergeometric Distribution
K
 N −K

x n−x x = 1, 2, 3, . . . ; nk nk(N − k)(N − n)
P (X = x) =  µ= σ2 =
N
n 0 < p < 1. N N 2 (N − 1)

Poisson Distribution with parameter λ:


λx e−λ x = 0, 1, 2, . . . ;
P (X = x) = µ=λ σ2 = λ
x! λ>0
λ
P (X = x) = P (X = x − 1)
x

CONTINUOUS DISTRIBUTIONS

Continuous Uniform Distribution on the interval [α, β]


1 α+β (β − α)2
fX (x) = , α<x<β µ= σ2 =
β−α 2 12
Gamma Distribution with parameters α and λ
λα α−1 −λx α α
fX (x) = x e for x > 0 µ= σ2 =
Γ(α) λ λ2
Z ∞
Gamma function: Γ(α) = tα−1 e−t dt Γ(1) = 1 Γ(α) = (α − 1)Γ(α − 1)
0
for α > 1

Exponential Distribution (Gamma with α = 1)


1 1
fX (x) = λe−λx , x > 0 µ= σ2 =
Z x λ λ2
FX (x) = λe−λt dt = 1 − e−λx
0

2

Chi-square χ distribution with ‘degrees of freedom’ as its parameter

Gamma with α = v/2 where v is a positive integer, and λ = 1/2 µ=v σ 2 = 2v

Copyright © ArchiMedia Advantage Inc.


www.ACTEXLearning.com Need Help? Email [email protected] All Rights Reserved
ACTEX Learning Exam CS1 Formula & Review Sheet 3

Beta Distribution {x : 0 < x < 1} with parameters α > 0 and β > 0


1
fX (x) = xα−1 (1 − x)β−1 for 0 < x < 1
B(α, β)
α αβ
µ= σ2 =
α+β (α + β)2 (α + β + 1)
Z 1
Γ(α)Γ(β)
Beta function: B(α, β) = xα−1 (1 − x)β−1 dx B(α, β) =
0 Γ(α + β)
Normal Distribution with parameters N (µ, σ) where mean=µ, and variance=σ 2
 2  
1 x−µ
1 −2 x−µ X −µ
fX (x) = √ e σ
for −∞ < x < ∞ FX (x) = P (X ≤ x) = Φ where Z =
σ 2π σ σ
Standard Normal Distribution N (0, 1) where mean=0, and variance=1
1 −x2
fX (x) = √ e 2 for −∞ < x < ∞ FX (x) = Φ(x)

Lognormal Distribution with parameters µ and σ. If ln(X) ∼ N (µ, σ 2 ) then we have
1  log x−µ 2  2 
1 − σ2
fX (x) = √ e 2 σ
for 0 < x < ∞ E[X] = eµ+ 2 Var[X] = E[Y ]2 eσ − 1
xσ 2π
T -distribution with ‘degrees of freedom’ parameter, v

If X ∼ χ2v and Z ∼ N (0, 1), and X and Z are independent,


Z
then q ∼ t-distribution with degree of freedom v.
X
v

F distribution with ‘degrees of freedom’ parameters, n1 and n2

If X ∼ χ2 distribution with degree of freedom n1 and Y ∼ χ2 with degree of freedom n2 , and X and
X/n1
Y are independent, then ∼ F distribution with degrees of freedom n1 and n2 .
Y /n2

POISSON PROCESS

λx e−λ x = 0, 1, 2, . . . ;
Poisson Distribution X ∼ P oisson (λ) → P (X = x) = ,
x! λ > 0, µ = λ, σ2 = λ
ind
Sum Xi ∼ P oisson (λi ) → X1 + · · · + Xn ∼ P oisson (λ1 + · · · + λn )

Counting Process X (t) is the number of events that occur at or before time t

Poisson Process (PP) X (t) ∼ P oisson (λ (t))

X (t) − X (s) is independent of X (v) − X (u) For t > s > v > u > 0.

X (t + s) − X (t) is a poisson random variable For s > 0.

Homogeneous PP X (t) ∼ P oisson (λ (t) = λ) λ is a constant.

INVERSE TRANSFORMATION METHOD

Simulation Generator Xn+1 = aXn + c (mod m)

To Generate Uniform Numbers

1. Specify an initial integer x0 called the “seed”

2. Calculate X1 = ax0 + c

3. Divide X1 by m, obtain the first remainder x1


x1
4. The first uniform number is u1 = m

5. Repeat steps 2-4 using x1 to obtain the second remainder x2 and the second uniform
number u2 = xm2 . And so on ...

Copyright © ArchiMedia Advantage Inc.


www.ACTEXLearning.com Need Help? Email [email protected] All Rights Reserved
ACTEX Learning Exam CS1 Formula & Review Sheet 4

Inverse transformation method

1. Generate uniform numbers u1 , . . . , un .

2. Specify a distribution function FY (y) = Pr (Y ≤ y).

3. Calculate yi = FY−1 (ui )

GENERATING FUNCTIONS

Moment Generating Functions (MGF)


t 2   t3  
MGF Moments MX (t) = E[etX ] MX (t) = 1 + tE[X] + E X 2 + E X 3 + · · ·
2! 3!
′ ′′
 
MX (0) = E[X] MX (0) = E X 2
 t 
 e 1 − ekt
Uniform MX (t) = E etX =
k 1 − et
Z b
1 ebt − eat
Uniform (a, b) MX (t) = etx dx =
a b−a t(b − a)
n  
X
Binomial (n, p) n x n
MX (t) = pet q n−x = q + pet
(Including Bernoulli, for which n = 1) x=0
x

X∞    k
Negative Binomial (k, p) x − 1 tx k x−k pet
MX (t) = e p q =
(Including Geometric, for which k = 1) k−1 1 − qet
x=k

P∞ (λet )x
= eλ(e −1)
t
Poisson (λ) MX (t) = e−λ
x=0 x!
Z ∞ α  α
λα 1 λ
Gamma (α, λ) MX (t) = y α−1 e−y dy =
Γ(α) 0 λ−t λ−t
 
1
Normal (µ, σ 2 ) MX (t) = exp µt + σ 2 t2
2
Cumulant Generating Functions(CGF) CX (t) = ln MX (t)

′ MX (t)
CGF Moments CX (t) =
MX (t)

′′ ′ 2
′′ MX (t)MX (t) − (MX (t))
CX (t) = 2
(MX (t))

′′′ 3′ ′′ 2 ′ 3
′′′ MX (t) (MX (t)) − 3 (MX (t)) MX (t)MX (t) + 2MX (t) (MX (t))
CX (t) = 4
(MX (t))

MX (0) = 1

′ MX (0)
CX (0) = = E[X]
MX (0)
 
′′ ′ 2
E X 2 (1) − (E[X])2
′′ MX (0)MX (0) − (MX (0))
CX (0) = 2 (0) = = Var[X]
MX 12

′′′ ′ 3 ′′ 2 ′ 3
′′′ MX (0) (MX (0)) − 3 (MX (0)) MX (0)MX (0) + 2MX (0) (MX (0))
CX (0) = 4
(MX (0))
= skew(X)
tr
Cumulants The coefficient of in the Maclaurin’s series of CX (t) = ln MX (t) is called
r!
the rth cumulant and is denoted by κr

Copyright © ArchiMedia Advantage Inc.


www.ACTEXLearning.com Need Help? Email [email protected] All Rights Reserved
ACTEX Learning Exam CS1 Formula & Review Sheet 5

     
Linear function Y = a + bX MY (t) = E etY = E et(a+bX) = eat E ebtX = eat MX (bt)

JOINT DISTRIBUTIONS

Joint Probability (Density) Functions


PP
Discrete p (x, y) = P (X = x, Y = y) p (x, y) = 1 p (x, y) ≥ 0
x y
Z y2 Z x2
Continuous P (x1 < X < x2 , y1 < Y < y2 ) = f (x, y)dxdy F (x, y) = P (X ≤ x, Y ≤ y)
y1 x1
Z Z
∂2
f (x, y) = F (x, y) f (x, y) = 1 f (x, y) ≥ 0
∂x∂y x y

Marginal Distribution of X from joint distribution of X and Y :


Z
P
pX (x) = p(x, y) (discrete) fX (x) = f (x, y) dy (continuous)
y y

Conditional Probability (Density) Functions


pX,Y (x, y)
Discrete pX|Y =y (x|y) = P (X = x|Y = y) =
Z x2 pY (y)
Continuous fX|Y =y (x|y)dx = P (x1 < X < x2 |Y = y)
x=x1

Independence of Random Variables

fY (y) = f (y|x) = f (x, y)/fX (x) i.e. fX,Y (x, y) = fX (x)fY (y)

Discrete P (X = x, Y = y) = P (X = x)P (Y = y)

Continuous fX,Y (x, y) = fX (x)fY (y)

Expectations
PP PP
Discrete E[g(X, Y )] = g(x, y)pX,Y (x, y) = g(x, y)P (X = x, Y = y)
Zx Zy x y

Continuous E[g(X, Y )] = g(x, y)fX,Y (x, y)dxdy


x y

Expectations of Sums and Products

E[ag(X) + bh(Y )] = aE[g(X)] + bE[h(Y )] where a and b are constants

E[g(X)h(Y )] = E[g(X)]E[h(Y )] for independent random variables X and Y

Covariance and Correlation Coefficient

Cov[X, Y ] = E[(X − E[X])(Y − E[Y ])] = E[XY ] − E[X]E[Y ]

Cov[aX + b, cY + d] = ac Cov[X, Y ] Cov[X, Y + Z] = Cov[X, Y ] + Cov[X, Z]

If X and Y are independent, Cov[X, Y ] = 0

Variance of a Sum V [X + Y ] = V [X] + V [Y ] + 2 Cov[X, Y ]

V [X + Y ] = V [X] + V [Y ] for independent random variables


P
Convolutions pZ (z) = p(x, z − x) (discrete)
x
P
pZ (z) = pX (x)pY (z − x) for independent random variables X and Y
Zx
fZ (z) = fX (x)fY (z − x)dx (continuous)
x

Moments of Linear Combinations of Random Variables

Mean E (c1 X1 + c2 X2 + . . . + cn Xn ) = c1 E (X1 ) + c2 E (X2 ) + . . . + cn E (Xn )


 n 
P Pn
E c i Xi = ci E (Xi )
i=1 i=1

Copyright © ArchiMedia Advantage Inc.


www.ACTEXLearning.com Need Help? Email [email protected] All Rights Reserved
ACTEX Learning Exam CS1 Formula & Review Sheet 6

P PP
Variance V (Y ) = Cov(Y, Y ) = c2i Cov (Xi , Xi ) + 2 ci cj Cov (Xi , Xj )
i i<j j

Independent r.v. V (c1 X1 + c2 X2 + . . . + cn Xn ) = c21 V (X1 ) + c22 V (X2 ) + . . . + c2n V (Xn )


 n 
P Pn
V c i Xi = c2i V (Xi )
i=1 i=1

Moment generating functions (MGFs):

X1 and X2 are independent random variables with MGFs MX1 (t) and MX2 (t) and S = c1 X1 + c2 X2
     
MS (t) = E e(c1 X1 +c2 X2 )t = E ec1 X1 t E ec2 X2 t = MX1 (c1 t) MX2 (c2 t)

Y = X1 + X2 + . . . + Xn where the Xi are independent and Xi has MGF Mi (t)

MY (t) = M1 (t)M2 (t) . . . ..Mn (t)

Y = X1 + X2 + . . . + Xn where the Xi ’s are identically distributed, each with MGF M (t)

MY (t) = [M (t)]n
n
Bernoulli/Binomial [q + pet ] where Y = X1 + X2 + . . . + Xn

with Xi , i = 1, 2, . . . ., n, be independent Bernoulli (p) variables and each has MGF


M (t) = q + pet
 k
pet
Geometric/ where Y = X1 + X2 + . . . + Xk
1 − qet
Negative binomial pet
with Xi , i = 1, 2, . . . .., k, be independent geometric (p) variables with MGF M (t) =
1 − qet
Poisson exp {(λ + γ) (et − 1)} with X and Z be independent Poisson (λ) and Poisson (γ) variables

X has MGF MX (t) = exp {λ (et − 1)} , Z has MGF MZ (t) = exp {γ (et − 1)}
 k
Exponential/ λ(λ − t)−1 where Y = X1 + X2 + . . . . . . + Xk with Xi , i = 1, 2, . . . , k,
Gamma be independent exponential (λ) variables and each has MGF M (t) = λ(λ − t)−1
 
Normal exp (µX + µY ) t + 12 σx2 + σY2 t2 where Z = X + Y
 
with X has MGF MX (t) = exp µX t + 12 σX t , Y has MGF MY (t) = exp µY t + 12 σY2 t2
2 2

Chi-square The sum of a chi-square (n) and an independent chi-square (m) is a chi-square (n + m) variable
the sum of independent chi-square variables is a chi-square variable

CONDITIONAL EXPECTATION

Conditional expectation of Y given X = x (E[Y |X = x])



 P y · fY |X (y|X = x)

in the discrete case
E[Y |X = x] = R R

 y · fY |X (y|X = x)dy = y · f (x,y)
fY (y) dy in the continuous case

Conditional Variance of Y given X = x (Var[Y |X])


 
Var[Y |X] = E Y 2 |X − (E[Y |X])2

Double Expectation and Variance

E[E[Y |X]] = E[Y ] Var[Y ] = E[Var[Y |X]] + Var[E[Y |X]]

THE CENTRAL LIMIT THEOREM

Central Limit Suppose X1 , X2 , · · · , Xn are n independent random variables with with mean µ and variance
Theorem σ2
X̄ − µ
then the distribution of √ approaches the standard normal distribution, N (0, 1), as n → ∞
σ/ n
 P 
X̄ ∼ N µ, σ 2 /n Xi ∼ N nµ, nσ 2

Copyright © ArchiMedia Advantage Inc.


www.ACTEXLearning.com Need Help? Email [email protected] All Rights Reserved
ACTEX Learning Exam CS1 Formula & Review Sheet 7

Binomial Distribution Binomial (n, p) ∼ N (np, np(1 − p))for large n

Poisson Distribution Poisson (nλ) ∼ N (nλ, nλ)for large n Poisson (λ) ∼ N (λ, λ)

Gamma Distribution Gamma (n, λ) ∼ N (n/λ, n/λ2 ) χ2k ∼ N (k, 2k)

Continuity If n and m are integers, the probability P [n ≤ X ≤ m] is approximated by using a normal


Correction random variable Y with the same mean and variance as X, and then finding the probability
 
P n − 12 ≤ Y ≤ m + 12

RANDOM SAMPLING AND SAMPLING DISTRIBUTIONS

A random sample, X = (X1 , X2 , . . . , Xn ) is a collection of independent and identically distributed random variables,

with observed sample X = (X1 , X2 , . . . , Xn ) is x = (x1 , x2 , . . . , xn )

Probability (Density) Function, f (x; θ), where θ denotes the parameter(s) of the distribution
P
Xi
Sample Mean X̄ =
Pn 2
2
Xi − X̄
Sample Variance S =
n−1
Sampling Distributions for the Normal
X̄ − µ  S2
Sample Mean q ∼ N (0, 1) or X̄ ∼ µ, σ 2 /n Sample Variance (n − 1) ∼ χ2n−1
σ2 σ2
n
N (0, 1) X̄ − µ
Student’s tk = q 2 → q ∼ tn−1
χk S2
T-Distribution k n

U /v1
F-Distribution F = , where U and V are independent χ2 random variables with v1 and v2 degrees of freedom
V /v2
S22 /σ22
∼ Fn2 −1,n1 −1 F ∼ Fn1 −1,n2 −1 ⇔ 1
∼ Fn2 −1,n1 −1
S12 /σ12 F

ESTIMATION AND ESTIMATORS

The Method of Moments

To estimate one parameter, use the first moment.


To estimate two parameters, use the first and second moments.
P P 2
xi   xi
Complete Data E [X] = E X2 =
n n
Maximize the likelihood function L.

The value(s) of parameter(s) that maximizes L is called the maximum likelihood estimate(s)

For distributions that belong to the exponential family

1. Determine L (θ)

2. Apply natural logarithm, obtain l (θ) = log L (θ)

3. Take the first derivative with respect to the parameter, obtain l′ (θ)

4. Set l′ (θ) = 0, obtain θ̂, which is the MLE


Qn
Complete Samples L(θ) = f (xi ; θ) for x1 , x2 , . . . , xn from a population with density or probability function
i=1
f (x; θ)

Copyright © ArchiMedia Advantage Inc.


www.ACTEXLearning.com Need Help? Email [email protected] All Rights Reserved
ACTEX Learning Exam CS1 Formula & Review Sheet 8

Properties of Maximum Likelihood Estimation (MLE)

1. Invariance Property if θ̂ is the MLE of θ, then the MLE of a function g(θ) is g(θ̂)

2. The consistency nature: Estimators approach true value with increase in sample size.

3. Asymptotic normality: As sample size increases, it converges to the normal distribution.

4. Efficiency: MlE achieves Cramer Rao as the sample size tends to infinity.

Incomplete Samples
Qn
Truncated Samples L(θ) = ( i=1 f (xi , θ)) × (P (X > y))m with n observations (x1 , . . . .xn ) and m observations <
y
Qn
Censored Samples L(θ) = ( i=1 f (xi , θ)) /(P (X > z))m with n observations (x1 , . . . .xn ) and no information
about samples under z

Independent For independent samples from two populations which share a common parameter the overall
Samples likelihood is the product of the two separate likelihoods

Bias The bias of an estimator is Bias = E[g(X)] − θ

An estimator is unbiased if Bias = 0


 
The mean square error of an estimator is MSE = MSE(g(X)) = E (g(X) − θ)2 = V ar(θ̂) + Bias2 .
1
Cramer-Rao Lower Bound CRLB =  ∂2 
−E ∂θ2 log L(θ, X)
This is the lowest possible variance of an unbiased estimator θ̂
1 1
Alternative expressions CRLB = n  2
o= n 2 o
∂ ∂
E ∂θ log L(θ, X) nE ∂θ log f (X; θ)
1
Non-parametric (full) Bootstrap empirical distribution, F̂n (y) = { Number of yi ≤ y}
n
1. Draw a sample of size n from F̂n
This is the bootstrap sample (y1∗ , y2∗ , . . . , yn∗ ) with y ∗ selected with replacement from
(y1 , y2 , . . . , yn )

2. Obtain an estimate θ̂∗ from the bootstrap sample

3. Repeat steps 1 and 2 , say, B times

Empirical distribution of θ̂∗

An estimate of the sampling distribution of θ, and is referred to as the bootstrap empirical


distribution of θ̂

Sample 1: (y1∗ , y2∗ , . . . , yn∗ ) → θ̂1∗ 





∗ 
Sample 2: (y1 , y2 , . . . , yn ) → θ̂2 
∗ ∗ ∗
y1 , y 2 , . . . , y n → .. → Bootstrap empirical distribution of θ̂.


. 



∗ 

Sample B : y1∗ , y2∗ , . . . , yn∗ → θ̂B
 !2 
1 PB 1 P B  2 1 P
B 
Mean : Ê(θ̂) = θ̂j∗ Variance: Var( d θ̂) = θ̂j∗ − θ̂j∗
B j=1 B − 1 j=1 B j=1 

(1 − α)% confidence interval



kα/2 , k1−α/2 where kα denotes the αth empirical quantile of the bootstrap values θ̂∗

Parametric Bootstrap

first estimates parameters of the data-generating process and then simulates new values by
drawing from this estimated distribution

Copyright © ArchiMedia Advantage Inc.


www.ACTEXLearning.com Need Help? Email [email protected] All Rights Reserved
ACTEX Learning Exam CS1 Formula & Review Sheet 9

HYPOTHESIS TESTING

100(1 − α)% confidence interval for θ: θ̂1 (X), θ̂2 (X) depending on the sample X = (X1 , . . . , Xn ) such that
 
P θ̂1 (X) < θ < θ̂2 (X) = 1 − α

The Pivotal Method

Pivotal quantity of the form g(X, θ) 1. It is a function of the sample values and the unknown parameter θ

2. Its distribution is completely known

3. It is monotonic in θ
 x̄ − µ0 σ
N µ, σ 2 with known σ 2 √ Confidence interval: x̄ ± zα/2 √
σ/ n n
 X̄ − µ S
N µ, σ 2 with unknown σ 2 √ ∼ tn−1 Confidence interval: X̄ ± t1−α/2,n−1 √
S/ n n
!
(n − 1)S 2 (n − 1)S 2 (n − 1)S 2
Estimation of normal variance σ 2
∼ χ2n−1 Confidence interval: ,
σ2 χ2α/2,n−1 χ21−α/2,n−1

 X̄ − Xn+1 S
N Xn+1 , σ 2 with unknown σ 2 p ∼ tn−1 Prediction interval: X̄ ± tα/2,n−1 p
S 1 + 1/n 1 + 1/n
r
X X − np p̂(1 − p̂)
Binomial distribution with p̂ = p Confidence interval: p̂ ± zα/2
n np(1 − p) n
r
X̄ − λ X̄
Poisson distribution p Confidence interval: X̄ ± zα/2
λ/n n
s
 σ12 σ2
Two Normal means with known σ12 and σ22 Confidence interval: X̄1 − X̄2 ± zα/2 + 2
n1 n2

Two Normal means with known or unknown σ12 and σ22


r
 1 1
Confidence interval: X̄1 − X̄2 ± tα/2,n1 +n2 −2 · Sp +
n1 n2
(n1 − 1) S12 + (n2 − 1) S22
where Sp2 =
n1 + n2 − 2
S12 /S22
Two population variances ∼ Fn1 −1,n2 −1
σ12 /σ22

S12 1 σ12 S12


Confidence interval: · < < . Fn2 −1,n1 −1
S22 Fn1 −1,n2 −1 σ22 S22

(p̂ − p̂2 ) − (p1 − p2 )


Two population proportions q1
p̂1 (1−p̂1 )
n1 + p̂2 (1−
n2
p̂2 )

 
λ1 λ2
Two Poisson parameters X̄1 − X̄2 → N λ1 − λ2 , +
n1 n2
r 
Confidence interval: X̄1 − X̄2 ± zα/2 X̄1
n1 + X̄2
n2

D̄ − µD
Paired data √ ∼ tn−1
SD / n
SD
Confidence interval: D̄ ± tα/2,n−1 √
n

Null hypothesis H0

Alternative hypothesis H1

Copyright © ArchiMedia Advantage Inc.


www.ACTEXLearning.com Need Help? Email [email protected] All Rights Reserved
ACTEX Learning Exam CS1 Formula & Review Sheet 10

The following are the same • Probability of Type I error • Size of critical region • Significance level • α

Terminology
Accept H0 Reject H0

H0 true 1−α α Pr(Type I error)

H1 true β Pr(Type II error) 1−β Power of test

Type I error The error committed when a TRUE null hypothesis is rejected

Type II error The error committed when a FALSE null hypothesis is failed to be rejected.

Sensitivity The probability that an event that does occur is predicted

Specificity The probability that an event that does not occur is predicted to not occur
L (θ0 ) L (θ0 )
Neyman–Pearson Lemma ≤ kfor all values (x1 , . . . , xn ) ∈ C > kfor all values (x1 , . . . , xn ) ∈
/ C.
L (θ1 ) L (θ1 )
rejection region C constitute a uniformly most powerful test
X̄ − µ0
Likelihood Ratio Tests Mean µ → H0 : µ = µ0 Test statistic: √ ∼ tn−1
S/ n
(n−1)S 2
variance σ 2 → H0 : σ 2 = σ02 σ02
∼ χ2n−1

p-values The lowest level at which H0 can be rejected


X̄ − µ0
Population Mean H0 : µ = µ 0 Test statistic: X̄, and √ ∼ N (0, 1) under H0 with σ
σ/ n
known
X̄ − µ0
Test statistic: √ ∼ tn−1 under H0 with σ unknown
S/ n
(n − 1)S 2
Population Variance H0 : σ 2 = σ 0 2 Test statistic: ∼ χ2n−1 under H0
σ02

Population Proportion H 0 : p = p0 Test statistic: X ∼ binomial (n, p0 ) under H0


X̄ − λ0
Mean of a Poisson Distribution H0 : λ = λ 0 Test statistic: X̄, and p ∼ N (0, 1) under H0
λ0 /n
x̄1 − x̄2 − δ
Difference Between H0 : µ1 − µ2 = δ Test statistic: z = q 2 with σ12 , σ22 known
σ1 σ22
Two Population Means n 1 + n2
x̄1 − x̄2 − δ
Test statistic: t = q with σ12 , σ22 unknown
1 1
Sp n1 + n2
(n1 − 1) S12 + (n2 − 1) S22
and Sp2 =
n1 + n2 − 2
Ratio of Two Population H0 : σ12 = σ22 v H1 : σ12 ̸= σ22 Test statistic: S12 /S22 ∼ Fn1 −1,n2 −1 under H0
Variances:
(p̂1 − p̂2 )
Difference Between H 0 : p1 = p2 Test statistic: q ∼ N (0, 1) under H0
p̂(1−p̂) p̂(1−p̂)
Two Population Proportions n1 + n2
(λ̂1 −λ̂2 )
Difference Between H0 : λ 1 = λ 2 Test statistic: q
λ̂ λ̂
∼ N (0, 1) under H0
n +n
Two Poisson Means 1 2

Paired Data H0 : µD (= µ1 − µ2 ) = δ Test statistic: D̄−δ



SD / n
∼ tn−1 under H0

Permutation Approach All possible permutations of the data subject to some criterion
P (fi − ei )2
Chi-square Tests Test statistic:
ei
Contingency Table A two-way table of counts obtained when sample items are classified
according to two category variables
P P P
The proportion of data in row i is j fij / i j fij
P P P  P
The number expected in cell (i, j) is j fij / i j fij × ( i fij )

Copyright © ArchiMedia Advantage Inc.


www.ACTEXLearning.com Need Help? Email [email protected] All Rights Reserved
ACTEX Learning Exam CS1 Formula & Review Sheet 11

  
nX 1 nX 2
nX 1 Y 1 nY 1 − nX 1 Y 1
Fisher’s Exact Test P (nX1 Y1 ) =   for nX 1 Y 1 ≤ n X 1 , n Y 1
n
nY 1

EXPLORATORY DATA ANALYSIS

Exploratory Data Analysis The process of analysing data to gain further insight into the nature of the data
Sxy
Pearson Correlation Coefficient r= p
Sxx × Syy
 n  n 
Pn Pn 1 P P
Sum of Squares Sxy = (xi − x̄) (yi − ȳ) = x i yi − xi yi
i=1 i=1 n i=1 i=1

 2
P
n
2 P
n P
n
Sxx = (xi − x̄) = x2i − xi /n
i=1 i=1 i=1

 2
P
n
2 P
n P
n
Sxx = (yi − ȳ) = yi2 − yi /n
i=1 i=1 i=1

Sxx Syy Sxy


Sample variance of x and y: s2x = s2y = Sample Covariance cvxy =
n−1 n−1 n−1
P
6 d2i
i
Spearman’s rank rs = 1 −
n (n2 − 1)
Correlation coefficient
where di = r (Xi ) − r (Yi ) and Pearson correlation coefficient r (Xi ) and r (Yi )
nc − nd
Kendall rank τ = where nc is the number of concordant pairs, and nd is the
n(n − 1)/2
Correlation coefficient number of discordant pairs

Any pair of observations (Xi , Yi ) ; (Xj , Yj ) where i ̸= j, is concordant if the


ranks for both elements agree, i.e. Xi > Xj and Yi > Yj , or Xi < Xj and
Yi < Yj ; otherwise discordant

Scatter Plot Matrix Each entry of this matrix is a scatter plot for a pair of variables identified by
corresponding row and column labels

Principal Component Analysis (PCA)

Eigenvalues of matrix A The values λ such that det(A − λI) = 0 where I is the identity matrix
The corresponding eigenvector, v, of an eigenvalue λ satisfies the equation
(A − λI)v = 0

Covariance of the data XT X

The principal components decomposition P of X → P = XW.

The explanatory power of each component → S = PT P

LINEAR REGRESSION

Bivariate Model Yi = α + βxi + ei i = 1, 2, . . . , n


E[Y |x] = α + βx
α(intercept) and β(slope parameter) are regression coefficients, ei is the ran-
dom error term; ei are independent with E[ei ] = 0 and Var(ei ) = σ 2

Fitted Regression yb = α b
b + βx
P
n
(xi − x̄) (yi − ȳ)
b Sxy
β= i=1
Pn = b = ȳ − βbx̄
and α
2 Sxx
(xi − x̄)
i=1

Copyright © ArchiMedia Advantage Inc.


www.ACTEXLearning.com Need Help? Email [email protected] All Rights Reserved
ACTEX Learning Exam CS1 Formula & Review Sheet 12

σ2
Statistic B̂ E[B̂] = β Var[B̂] =
Sxx
P
n
e2i
1 P 2 i=1
Error variance σ 2 σ̂ 2 = (yi − ŷi ) =
n−2 n−2
Partition Sum of Squares
Xn
2
X
n
2
X
n
2
X
n
(yi − ȳ) = (yi − ybi ) + yi − ȳ)
(b + 2 (yi − ybi ) (b
yi − ȳ)
i=1 i=1 i=1 i=1
| {z } | {z } | {z } | {z }
Total sum of quares Residual sum of squares Regression sum of squares P
n
(yi −βb0 −βb1 xi )(βb0 +βb1 xi −ȳ)=0
i=1

Total Sum of Squares SST OT : Amount of variability inherent in the response prior to performing regression

Residual Sum of Squares SSRES : Variation unexplained by the linear regression model

Regression Sum of Squares SSREG : Variation explained by the linear regression model

Coefficient of Determination The proportion of the total variability of the responses ‘explained’ by a model
2
SSREG SSRES Sxy
R2 = =1− = ,
SST OT SST OT Sxx Syy
adding more variables to the model always increases R2 .

R2 = r2 (the square of the Pearson’s correlation coefficient)


 
M SSRES n−1 
Adjusted R2 Adjusted R = 1 −
2
=1− 1 − R2
M SST OT n−k−1

ANOVA Table
Source of variation degree of freedom Sums of squares Mean sums of squares

Regression k SSREG SSREG /k

Residual n−k−1 SSRES SSREG /(n − k − 1)

Total n−1 SST OT


!
2
1 (xo − x̄)
Mean Response µo = E [Y |xo ] = α + βxo Var (b
uo ) = σ 2
+
n Sxx
v !
u
u 1 (xo − x̄)
2
µo ) = tσ̂ 2
SE (b +
n Sxx
v !
u
u 1 (xo − x̄)
2
Individual Response SE (ŷ0 ) = tσ̂ 2 1+ +
n Sxx

Residual (Raw Residual) ei = yi − ybi = observed value − fitted value

Sum-to-zero Constraints on Residuals


P
n P
n
ei = 0, xi ei = 0
i=1 i=1

Multivariate Model E [Y |x1 , x2 , . . . , xk ] = α + β1 x1 + β2 x2 + . . . + βk xk

βj is the regression coefficient attached to the jth predictor, for j = 1, . . . , k

Yi = α + β1 xi1 + β2 xi2 . . . βk xik + ei i = 1...n

Mean Response µ0 = E [Y |x0 ] = α + β1 x01 + β2 x02 + · · · + βk x0k

Individual Response ŷ0 = α̂ + β̂1 x01 + β̂2 x02 + · · · + β̂k x0k

Copyright © ArchiMedia Advantage Inc.


www.ACTEXLearning.com Need Help? Email [email protected] All Rights Reserved
ACTEX Learning Exam CS1 Formula & Review Sheet 13

k (k + 1)
Forward Stepwise Selection There is a total of 1 + fitted models
2
1. Start with model having intercept only

2. Create p + 1 predictor models by fitting a model with the current p predictors plus one
of the k − p unused predictors.

3. Select the best p + 1 predictor model based on SSRES or R2

4. If p + 1 < k, repeat steps 2 − 3 with the p + 1 parameter model.

5. Select the best model from the various models based on adjusted R2 or AIC

Nested model the predictors in the p-predictor model are always a subset of the predictors in the (p+1)-
predictor model
k (k + 1)
Backward Stepwise Selection There is a total of 1 + fitted models
2
1. Start with full model

2. create p − 1 predictor models by fitting a model removing one of the parameters from
the current p predictors

3. Select the best p − 1 predictor model based on SSRES or R2

4. If p − 1 > 1, repeat steps 2 − 3 with the p − 1 parameter model

5. Select the best model from the various models based on adjusted R2 or AIC Backward
selection cannot be implemented in the high-dimensional setting with n ≤ k

Polynomial Regression: Y = α + β1 x + β2 x2 + · · · + βm xm + ε

m=2 → quadratic regression, m=3 → cubic polynomial

Regression with Interaction Term: Y = α + β 1 x 1 + β 2 x 2 + β 3 x1 x2 + ε

x1 x2 is called an interaction term

GENERALISED LINEAR MODELS


 
(yθ − b(θ))
Exponential Family of Distributions fY (y; θ, φ) = exp + c(y, φ)
a(φ)
• φ is the scale parameter. • θ the ‘natural’ parameter

Mean and Variance µ = E[y] = b′ (θ) and Var(y) = a(φ)b′′ (θ)

Distribution θ b(θ) φ
2

Normal, N µ, σ µ θ2 /2 σ2

Poisson (λ) log λ eθ 1


 
µ θ

Binomial, Bin(n, µ) log log 1 + e n
1−µ
1
Gamma − − log(−θ) α
µ
Members of Exponential Family in Canonical Form

Distribution Canonical Link Function Mathematical Form

Normal Identity g(µ) = µ

Poisson Log g(µ) = log µ


 
µ
Binomial Logit g(µ) = log
1−µ
1
Gamma Inverse g(µ) =
µ

Copyright © ArchiMedia Advantage Inc.


www.ACTEXLearning.com Need Help? Email [email protected] All Rights Reserved
ACTEX Learning Exam CS1 Formula & Review Sheet 14

Obtaining the estimates: Maximising l with respect to the parameters in the linear predictor

Significance of If |β̂| > 2 standard error (β̂), the parameter is significant and should be retained in the
the Parameters model

Deviance for DM
Current Model

Scaled Deviance A goodness-of-fit measure of how much the fitted GLM departs from the saturated model
DM
2 (lSAT − l) scaled deviance =
φ
Saturated Model bi = yi for all i = 1, . . . , n under the
The fitted values exactly equal the observed values, µ
saturated model
(S1 − S2 ) /q
Scaled Deviance If > the 5% value for the Fq,n−p−q distribution, model 2 is a significant
S2 /(n − (p + q))
Comparasion improvement over Model 1 where Model 1 which has p parameters and scaled deviance
S1 and Model 2 has p + q parameters and scaled deviance S2

Akaike Information AIC = −2 × log LM + 2× parameters, where logLM is the log likelihood of the model
Criterion under consideration the smaller the AIC, the better the model

Inverse of the µ = g −1 (η) = g −1 (β0 + β1 x1 + · · · + βp xp ) for η = x′ β.


Link Function
y − µ̂ yi − µ
b y −µ bi
Pearson residuals: p , p i (Poisson) p i (Bernoulli)
Var(µ̂) bi
µ bi (1 − µ
µ bi )
P
n
Deviance Residuals sign(y − µ̂)di′ where d2i = D∗
i=1

Residual Plots Create residual plot (e) against fitted values (b


y ) of the regression model

BAYESIAN STATISTICS
P (A|Br ) P (Br ) Pk
Bayes’ Theorem: P (Br |A) = where P (A) = P (A|Bi ) P (Bi ) for r = 1, 2, . . . , k
P (A) i=1

Prior Density fΘ (θ) (continuous) pΘ (θ) (discrete)


f (θ, X) f (X | θ)f (θ) R
Posterior Density f (θ | X) = = where f (X) = f (X | θ)f (θ)dθ
f (X) f (X)

Conjugate Prior The prior distribution leads to a posterior distribution belonging to the same family
as the prior distribution
2
Quadratic Loss L(g(x), θ) = [g(x) − θ]

Absolute Error Loss L(g(x), θ) = |g(x) − θ|




 0 if g(x) = θ
‘All-or-nothing’ Loss L(g(x), θ) =

 1 if g(x) ̸= θ
R
Bayesian Credible Interval P (θ ∈ A | x) = A f (θ | x)dθ = 1 − α

CREDIBILITY THEORY

For any random variables X and Y : E[X] = E[E(X | Y )]

Two random variables X1 and X2 are conditionally independent given a third random variable Y :

E [X1 X2 | Y ] = E [X1 | Y ] E [X2 | Y ]

Credibility Premium Z X̄ + (1 − Z) µ̂ where Z is credibility factor

Copyright © ArchiMedia Advantage Inc.


www.ACTEXLearning.com Need Help? Email [email protected] All Rights Reserved
ACTEX Learning Exam CS1 Formula & Review Sheet 15

Bayesian Credibility
Model Posterior Distribution Credibility Factor Posterior Mean
   
P
n n P
n
Poisson(λ) /Gamma(α, β) Gamma α + xi , β + n Z xi /n + (1 − Z)α/β
i=1 β+n i=1

nx̄ µ !
  σ12
+ σ22 1 n/σ12 n
Normal/ Normal N θ, σ12 /N θ, σ22 N n 1 , n 1 = Z x̄ + (1 − Z)µ
σ12
+ σ22 σ12
+ σ22
n/σ12 + 1/σ22 n + (σ12 /σ22 )

Empirical Bayesian Credibility Theory


P
n
Xij
j=1
Model 1 m(θ) = E [Xj | θ] s2 (θ) = Var [Xj | θ] X̄i =
n
  P
N P
n 2
E[m(θ)] = X̄ E s2 (θ) = N −1 (n − 1)−1 Xij − X̄i
i=1 j=1

P
N 2 P
N P
n 2
Var[m(θ)] = (N − 1)−1 X̄i − X̄ − (N n)−1 (n − 1)−1 Xij − X̄i
i=1 i=1 j=1
n
Credibility factor Z=
n + E [s2 (θ)] /Var[m(θ)]

Credibility premium Z X̄i + (1 − Z) E[m(θ)]


P
n
Pj X j
j=1
Model 2 m(θ) = E [Xj | θ] s (θ) = Pj Var [Xj | θ]
2
X̄i = P
n
Pj
j=1
  P
N P
n 2
E[m(θ)] = X̄ E s2 (θ) = N −1 (n − 1)−1 Pij Xij − X̄i
i=1 j=1
!
∗−1 −1
N P
P n 2 −1
P
N
−1
P
n 2
Var[m(θ)] = P (N n − 1) Pij Xij − X̄ −N (n − 1) Pij Xij − X̄i
i=1 j=1 i=1 j=1

P
n P
n
Pj Pij
j=1 j=1
Credibility Factor Z= P
n Zi = P
n
Pj + E [s2 (θ)] /Var[m(θ)] Pij + E [s2 (θ)] / var[m(θ)]
j=1 j=1

Credibility Premium Z X̄i + (1 − Z) E[m(θ)]

Copyright © ArchiMedia Advantage Inc.


www.ACTEXLearning.com Need Help? Email [email protected] All Rights Reserved

You might also like