CS1 Formula Sheet
CS1 Formula Sheet
Inferential Analysis Using a smaller sample size to draw conclusions about a larger population
Predictive Analysis Making predictions or forecasts about future events based on past or historical data
Primary Source The data collected either from the source or through the original data collection
process
Secondary Source Information that has already been collected, analyzed, and published by others
Cross-Sectional Data Recording values of the variable(s) of interest for each case in the sample at a single
moment in time. It can be thought of as a snapshot of the data at a single moment
in time
Longitudinal Data Recording values of the SAME subjects at intervals through time
Truncated Data Measurements on some variables are not recorded so are completely unknown
Reproducibility The ability to reproduce statistical analyses or models using the same data and
methodology as the original study
Lead to fewer errors that need correcting in the original work, greater
efficiency
DISCRETE DISTRIBUTIONS
x = 0, 1, 2, . . . ; 1 (1 − p)
P (X = x) = p(1 − p)x−1 µ= σ2 =
0 < p < 1. p p2
Negative Binomial Distribution with X as the number of trials on which the k-th success occurs, or Y
as the number of failures before the k-th success:
x−1 k x = k, k + 1, . . . ; k k(1 − p)
P (X = x) = p (1−p)x−k µ= σ2 =
k−1 0 < p < 1. p p2
x−1
P (X = x) = (1 − θ)P (X = x − 1)
x−k
k+y−1 k k(1 − p)
P (Y = y) = p (1 − p)y , y = 0, 1, 2, 3, . . . µ =
y p
Hypergeometric Distribution
K
N −K
x n−x x = 1, 2, 3, . . . ; nk nk(N − k)(N − n)
P (X = x) = µ= σ2 =
N
n 0 < p < 1. N N 2 (N − 1)
CONTINUOUS DISTRIBUTIONS
2
Chi-square χ distribution with ‘degrees of freedom’ as its parameter
If X ∼ χ2 distribution with degree of freedom n1 and Y ∼ χ2 with degree of freedom n2 , and X and
X/n1
Y are independent, then ∼ F distribution with degrees of freedom n1 and n2 .
Y /n2
POISSON PROCESS
λx e−λ x = 0, 1, 2, . . . ;
Poisson Distribution X ∼ P oisson (λ) → P (X = x) = ,
x! λ > 0, µ = λ, σ2 = λ
ind
Sum Xi ∼ P oisson (λi ) → X1 + · · · + Xn ∼ P oisson (λ1 + · · · + λn )
Counting Process X (t) is the number of events that occur at or before time t
X (t) − X (s) is independent of X (v) − X (u) For t > s > v > u > 0.
2. Calculate X1 = ax0 + c
5. Repeat steps 2-4 using x1 to obtain the second remainder x2 and the second uniform
number u2 = xm2 . And so on ...
GENERATING FUNCTIONS
X∞ k
Negative Binomial (k, p) x − 1 tx k x−k pet
MX (t) = e p q =
(Including Geometric, for which k = 1) k−1 1 − qet
x=k
P∞ (λet )x
= eλ(e −1)
t
Poisson (λ) MX (t) = e−λ
x=0 x!
Z ∞ α α
λα 1 λ
Gamma (α, λ) MX (t) = y α−1 e−y dy =
Γ(α) 0 λ−t λ−t
1
Normal (µ, σ 2 ) MX (t) = exp µt + σ 2 t2
2
Cumulant Generating Functions(CGF) CX (t) = ln MX (t)
′
′ MX (t)
CGF Moments CX (t) =
MX (t)
′′ ′ 2
′′ MX (t)MX (t) − (MX (t))
CX (t) = 2
(MX (t))
′′′ 3′ ′′ 2 ′ 3
′′′ MX (t) (MX (t)) − 3 (MX (t)) MX (t)MX (t) + 2MX (t) (MX (t))
CX (t) = 4
(MX (t))
MX (0) = 1
′
′ MX (0)
CX (0) = = E[X]
MX (0)
′′ ′ 2
E X 2 (1) − (E[X])2
′′ MX (0)MX (0) − (MX (0))
CX (0) = 2 (0) = = Var[X]
MX 12
′′′ ′ 3 ′′ 2 ′ 3
′′′ MX (0) (MX (0)) − 3 (MX (0)) MX (0)MX (0) + 2MX (0) (MX (0))
CX (0) = 4
(MX (0))
= skew(X)
tr
Cumulants The coefficient of in the Maclaurin’s series of CX (t) = ln MX (t) is called
r!
the rth cumulant and is denoted by κr
Linear function Y = a + bX MY (t) = E etY = E et(a+bX) = eat E ebtX = eat MX (bt)
JOINT DISTRIBUTIONS
fY (y) = f (y|x) = f (x, y)/fX (x) i.e. fX,Y (x, y) = fX (x)fY (y)
Discrete P (X = x, Y = y) = P (X = x)P (Y = y)
Expectations
PP PP
Discrete E[g(X, Y )] = g(x, y)pX,Y (x, y) = g(x, y)P (X = x, Y = y)
Zx Zy x y
P PP
Variance V (Y ) = Cov(Y, Y ) = c2i Cov (Xi , Xi ) + 2 ci cj Cov (Xi , Xj )
i i<j j
X1 and X2 are independent random variables with MGFs MX1 (t) and MX2 (t) and S = c1 X1 + c2 X2
MS (t) = E e(c1 X1 +c2 X2 )t = E ec1 X1 t E ec2 X2 t = MX1 (c1 t) MX2 (c2 t)
MY (t) = [M (t)]n
n
Bernoulli/Binomial [q + pet ] where Y = X1 + X2 + . . . + Xn
X has MGF MX (t) = exp {λ (et − 1)} , Z has MGF MZ (t) = exp {γ (et − 1)}
k
Exponential/ λ(λ − t)−1 where Y = X1 + X2 + . . . . . . + Xk with Xi , i = 1, 2, . . . , k,
Gamma be independent exponential (λ) variables and each has MGF M (t) = λ(λ − t)−1
Normal exp (µX + µY ) t + 12 σx2 + σY2 t2 where Z = X + Y
with X has MGF MX (t) = exp µX t + 12 σX t , Y has MGF MY (t) = exp µY t + 12 σY2 t2
2 2
Chi-square The sum of a chi-square (n) and an independent chi-square (m) is a chi-square (n + m) variable
the sum of independent chi-square variables is a chi-square variable
CONDITIONAL EXPECTATION
Central Limit Suppose X1 , X2 , · · · , Xn are n independent random variables with with mean µ and variance
Theorem σ2
X̄ − µ
then the distribution of √ approaches the standard normal distribution, N (0, 1), as n → ∞
σ/ n
P
X̄ ∼ N µ, σ 2 /n Xi ∼ N nµ, nσ 2
Poisson Distribution Poisson (nλ) ∼ N (nλ, nλ)for large n Poisson (λ) ∼ N (λ, λ)
A random sample, X = (X1 , X2 , . . . , Xn ) is a collection of independent and identically distributed random variables,
Probability (Density) Function, f (x; θ), where θ denotes the parameter(s) of the distribution
P
Xi
Sample Mean X̄ =
Pn 2
2
Xi − X̄
Sample Variance S =
n−1
Sampling Distributions for the Normal
X̄ − µ S2
Sample Mean q ∼ N (0, 1) or X̄ ∼ µ, σ 2 /n Sample Variance (n − 1) ∼ χ2n−1
σ2 σ2
n
N (0, 1) X̄ − µ
Student’s tk = q 2 → q ∼ tn−1
χk S2
T-Distribution k n
U /v1
F-Distribution F = , where U and V are independent χ2 random variables with v1 and v2 degrees of freedom
V /v2
S22 /σ22
∼ Fn2 −1,n1 −1 F ∼ Fn1 −1,n2 −1 ⇔ 1
∼ Fn2 −1,n1 −1
S12 /σ12 F
The value(s) of parameter(s) that maximizes L is called the maximum likelihood estimate(s)
1. Determine L (θ)
3. Take the first derivative with respect to the parameter, obtain l′ (θ)
1. Invariance Property if θ̂ is the MLE of θ, then the MLE of a function g(θ) is g(θ̂)
2. The consistency nature: Estimators approach true value with increase in sample size.
4. Efficiency: MlE achieves Cramer Rao as the sample size tends to infinity.
Incomplete Samples
Qn
Truncated Samples L(θ) = ( i=1 f (xi , θ)) × (P (X > y))m with n observations (x1 , . . . .xn ) and m observations <
y
Qn
Censored Samples L(θ) = ( i=1 f (xi , θ)) /(P (X > z))m with n observations (x1 , . . . .xn ) and no information
about samples under z
Independent For independent samples from two populations which share a common parameter the overall
Samples likelihood is the product of the two separate likelihoods
Parametric Bootstrap
first estimates parameters of the data-generating process and then simulates new values by
drawing from this estimated distribution
HYPOTHESIS TESTING
100(1 − α)% confidence interval for θ: θ̂1 (X), θ̂2 (X) depending on the sample X = (X1 , . . . , Xn ) such that
P θ̂1 (X) < θ < θ̂2 (X) = 1 − α
Pivotal quantity of the form g(X, θ) 1. It is a function of the sample values and the unknown parameter θ
3. It is monotonic in θ
x̄ − µ0 σ
N µ, σ 2 with known σ 2 √ Confidence interval: x̄ ± zα/2 √
σ/ n n
X̄ − µ S
N µ, σ 2 with unknown σ 2 √ ∼ tn−1 Confidence interval: X̄ ± t1−α/2,n−1 √
S/ n n
!
(n − 1)S 2 (n − 1)S 2 (n − 1)S 2
Estimation of normal variance σ 2
∼ χ2n−1 Confidence interval: ,
σ2 χ2α/2,n−1 χ21−α/2,n−1
X̄ − Xn+1 S
N Xn+1 , σ 2 with unknown σ 2 p ∼ tn−1 Prediction interval: X̄ ± tα/2,n−1 p
S 1 + 1/n 1 + 1/n
r
X X − np p̂(1 − p̂)
Binomial distribution with p̂ = p Confidence interval: p̂ ± zα/2
n np(1 − p) n
r
X̄ − λ X̄
Poisson distribution p Confidence interval: X̄ ± zα/2
λ/n n
s
σ12 σ2
Two Normal means with known σ12 and σ22 Confidence interval: X̄1 − X̄2 ± zα/2 + 2
n1 n2
λ1 λ2
Two Poisson parameters X̄1 − X̄2 → N λ1 − λ2 , +
n1 n2
r
Confidence interval: X̄1 − X̄2 ± zα/2 X̄1
n1 + X̄2
n2
D̄ − µD
Paired data √ ∼ tn−1
SD / n
SD
Confidence interval: D̄ ± tα/2,n−1 √
n
Null hypothesis H0
Alternative hypothesis H1
The following are the same • Probability of Type I error • Size of critical region • Significance level • α
Terminology
Accept H0 Reject H0
Type I error The error committed when a TRUE null hypothesis is rejected
Type II error The error committed when a FALSE null hypothesis is failed to be rejected.
Specificity The probability that an event that does not occur is predicted to not occur
L (θ0 ) L (θ0 )
Neyman–Pearson Lemma ≤ kfor all values (x1 , . . . , xn ) ∈ C > kfor all values (x1 , . . . , xn ) ∈
/ C.
L (θ1 ) L (θ1 )
rejection region C constitute a uniformly most powerful test
X̄ − µ0
Likelihood Ratio Tests Mean µ → H0 : µ = µ0 Test statistic: √ ∼ tn−1
S/ n
(n−1)S 2
variance σ 2 → H0 : σ 2 = σ02 σ02
∼ χ2n−1
Permutation Approach All possible permutations of the data subject to some criterion
P (fi − ei )2
Chi-square Tests Test statistic:
ei
Contingency Table A two-way table of counts obtained when sample items are classified
according to two category variables
P P P
The proportion of data in row i is j fij / i j fij
P P P P
The number expected in cell (i, j) is j fij / i j fij × ( i fij )
nX 1 nX 2
nX 1 Y 1 nY 1 − nX 1 Y 1
Fisher’s Exact Test P (nX1 Y1 ) = for nX 1 Y 1 ≤ n X 1 , n Y 1
n
nY 1
Exploratory Data Analysis The process of analysing data to gain further insight into the nature of the data
Sxy
Pearson Correlation Coefficient r= p
Sxx × Syy
n n
Pn Pn 1 P P
Sum of Squares Sxy = (xi − x̄) (yi − ȳ) = x i yi − xi yi
i=1 i=1 n i=1 i=1
2
P
n
2 P
n P
n
Sxx = (xi − x̄) = x2i − xi /n
i=1 i=1 i=1
2
P
n
2 P
n P
n
Sxx = (yi − ȳ) = yi2 − yi /n
i=1 i=1 i=1
Scatter Plot Matrix Each entry of this matrix is a scatter plot for a pair of variables identified by
corresponding row and column labels
Eigenvalues of matrix A The values λ such that det(A − λI) = 0 where I is the identity matrix
The corresponding eigenvector, v, of an eigenvalue λ satisfies the equation
(A − λI)v = 0
LINEAR REGRESSION
Fitted Regression yb = α b
b + βx
P
n
(xi − x̄) (yi − ȳ)
b Sxy
β= i=1
Pn = b = ȳ − βbx̄
and α
2 Sxx
(xi − x̄)
i=1
σ2
Statistic B̂ E[B̂] = β Var[B̂] =
Sxx
P
n
e2i
1 P 2 i=1
Error variance σ 2 σ̂ 2 = (yi − ŷi ) =
n−2 n−2
Partition Sum of Squares
Xn
2
X
n
2
X
n
2
X
n
(yi − ȳ) = (yi − ybi ) + yi − ȳ)
(b + 2 (yi − ybi ) (b
yi − ȳ)
i=1 i=1 i=1 i=1
| {z } | {z } | {z } | {z }
Total sum of quares Residual sum of squares Regression sum of squares P
n
(yi −βb0 −βb1 xi )(βb0 +βb1 xi −ȳ)=0
i=1
Total Sum of Squares SST OT : Amount of variability inherent in the response prior to performing regression
Residual Sum of Squares SSRES : Variation unexplained by the linear regression model
Regression Sum of Squares SSREG : Variation explained by the linear regression model
Coefficient of Determination The proportion of the total variability of the responses ‘explained’ by a model
2
SSREG SSRES Sxy
R2 = =1− = ,
SST OT SST OT Sxx Syy
adding more variables to the model always increases R2 .
ANOVA Table
Source of variation degree of freedom Sums of squares Mean sums of squares
k (k + 1)
Forward Stepwise Selection There is a total of 1 + fitted models
2
1. Start with model having intercept only
2. Create p + 1 predictor models by fitting a model with the current p predictors plus one
of the k − p unused predictors.
5. Select the best model from the various models based on adjusted R2 or AIC
Nested model the predictors in the p-predictor model are always a subset of the predictors in the (p+1)-
predictor model
k (k + 1)
Backward Stepwise Selection There is a total of 1 + fitted models
2
1. Start with full model
2. create p − 1 predictor models by fitting a model removing one of the parameters from
the current p predictors
5. Select the best model from the various models based on adjusted R2 or AIC Backward
selection cannot be implemented in the high-dimensional setting with n ≤ k
Polynomial Regression: Y = α + β1 x + β2 x2 + · · · + βm xm + ε
Distribution θ b(θ) φ
2
Normal, N µ, σ µ θ2 /2 σ2
Obtaining the estimates: Maximising l with respect to the parameters in the linear predictor
Significance of If |β̂| > 2 standard error (β̂), the parameter is significant and should be retained in the
the Parameters model
Deviance for DM
Current Model
Scaled Deviance A goodness-of-fit measure of how much the fitted GLM departs from the saturated model
DM
2 (lSAT − l) scaled deviance =
φ
Saturated Model bi = yi for all i = 1, . . . , n under the
The fitted values exactly equal the observed values, µ
saturated model
(S1 − S2 ) /q
Scaled Deviance If > the 5% value for the Fq,n−p−q distribution, model 2 is a significant
S2 /(n − (p + q))
Comparasion improvement over Model 1 where Model 1 which has p parameters and scaled deviance
S1 and Model 2 has p + q parameters and scaled deviance S2
Akaike Information AIC = −2 × log LM + 2× parameters, where logLM is the log likelihood of the model
Criterion under consideration the smaller the AIC, the better the model
BAYESIAN STATISTICS
P (A|Br ) P (Br ) Pk
Bayes’ Theorem: P (Br |A) = where P (A) = P (A|Bi ) P (Bi ) for r = 1, 2, . . . , k
P (A) i=1
Conjugate Prior The prior distribution leads to a posterior distribution belonging to the same family
as the prior distribution
2
Quadratic Loss L(g(x), θ) = [g(x) − θ]
CREDIBILITY THEORY
Two random variables X1 and X2 are conditionally independent given a third random variable Y :
Bayesian Credibility
Model Posterior Distribution Credibility Factor Posterior Mean
P
n n P
n
Poisson(λ) /Gamma(α, β) Gamma α + xi , β + n Z xi /n + (1 − Z)α/β
i=1 β+n i=1
nx̄ µ !
σ12
+ σ22 1 n/σ12 n
Normal/ Normal N θ, σ12 /N θ, σ22 N n 1 , n 1 = Z x̄ + (1 − Z)µ
σ12
+ σ22 σ12
+ σ22
n/σ12 + 1/σ22 n + (σ12 /σ22 )
P
N 2 P
N P
n 2
Var[m(θ)] = (N − 1)−1 X̄i − X̄ − (N n)−1 (n − 1)−1 Xij − X̄i
i=1 i=1 j=1
n
Credibility factor Z=
n + E [s2 (θ)] /Var[m(θ)]
P
n P
n
Pj Pij
j=1 j=1
Credibility Factor Z= P
n Zi = P
n
Pj + E [s2 (θ)] /Var[m(θ)] Pij + E [s2 (θ)] / var[m(θ)]
j=1 j=1