cheatsheet
cheatsheet
Counfounding Variables ∑n
Regression sum of squares(SSR) = i=1 (Yi − Ȳ )2
ˆ
A confounding variable is related both to group ∑
Residual sum of squares(SSE) = n i=1 (Yi − Yi )
ˆ 2
membership and to the outcome. Its presence makes it
hard to establish the outcome as being a direct consequence of √ SST = SSR + SSE
group membership 1
SD(b1 ) = σ̂ SST − SSE SSR
(n − 1)σx2 R2 = ( )% = ( )%
Inference to populations SST SST
Inferences to populations can be drawn from random sampling √
1 X̄ 2 SSE
studies, but not otherwise. SD(b0 ) = σ̂ + M SE =
n (n − 1)σx2 n−2
Random sampling ensures that all subpopulations are
represented in the sample in roughly the same mix as in the b1 − β1 b0 − β0 Extra-Sums-of-Squares F-test
overall population. Again, ∼ t(n − 2) ∼ t(n − 2)
SE(B1 ) SE(B0 ) H0 : β1 = 0
Simple Random Sample [ ]
Matrix Form extra sums of squares SSRf ull −SSRnull
A simple random sample of size n from a population is a #β being tested
Y = Xβ + ϵ F − stat = = 1
subset of the population consisting of n members selected in σ̂ 2 from full model M SE
such a way that every subset of size n is afforded the same
Y1 1 X1 [ ] ϵ1
chance of being selected. Y2 1 X2 Multiple Regression
Y = β0 + ϵ2
. = . β1 .
Simple Linear Regression . ∑
Yn 1 Xn ϵn (Yi − Yˆi )2 SSE
µ{Y |X} = β0 + β1 X σ̂ 2 = =
n−p n−p
Model Assumption √
Ψ = (Y − Xβ)T (Y − Xβ) SD(bj ) = σ cij
1. Linearity
standardized bj ∼ t(n − p)
2. Normality: Y |X ∼ N ormal β̂ = (X T X)−1 X T Y
cij is j th diagonal element of (X T X)−1
3. Constant Variance: σ(Y |X) = σ
Confidence Intervals Linear Combination Of Coefficients
4. Independence √
(X0−X̄)2
Least Square Method SD(µ(Y |X0 )) = σ̂ 1
n
+ 2
(n−1)σx
∑ ∑ H0 : c0 β0 + c1 β1 + ... + cp βp = 0
Minimize Q = (Yi − b0 − b1 Xi )2 = (Yi − Yˆi )2
∑n standardized µ(Y |X0 ) ∼ t(n − 2)
HA : c0 β0 + c1 β1 + ... + cp βp ̸= 0
1 (Xi − X̄)(Yi − Ȳ ))
b1 = ∑n est = c0 b0 + c1 b1 + ... + cp bp
i=1 (Xi − X̄) Prediction Interval
2
√ V ar(est) = c20 V ar(b0 )2 + .. + c2p V ar(b0 )
b0 = Ȳ − b1X̄
√∑ SD(Y |X0 ) = σ̂ 1+ 1
+
(X0 −X̄)2
n n 2 + 2c0 c1 Cov(b0 , b1 ) + ... + cp−1 cp Cov(bp−1 , bp )
j=1 (Yi − Yi )
ˆ 2 (n−1)σx
σ̂ =
n−2 standardized Y |X0 ∼ t(n − 2) = σ̂ 2 C(X T X)−1
Extra-Sums-of-Squares F-test Weighted Regression Strategies
H0 : β1 = β2 = ... = βp = 0
σ2 Forward Selection
[ ] var(Yi |X) = Start with the null model.
extra sums of squares SSRf ull −SSRreduce wi
# of β ’s being tested dfreduce −dff ull ∑ Backward Selection
F − stat = = Q= wi (Yi − Yˆi )2
σ̂ 2 from full model M SE Start with the full model.
β̂ = (X T W X)−1 X T W Y Stepwise Selection
SSE
M SE = w1 0 . 0 1. Start with null model.
n−p−1 0 w2 0 0
W = 2. Do on step of forward selection.
. . . .
Adjusted R2 3. Do one step of backward elimination.
0 0 0 wn
Only for model comparison, not for model assessment. 4. Repeat 2 and 3 until no explanatory variables can be added
Ridge and Lasso Regression or removed.
SST
− SSE
Adjusted R2 =
n−1 n−p |βj |: L1-norm Exhaustive Search Through All Subsets
SST
n−1 ∑
n ∑
p
Use the Cp statistics, R2 , Adjusted R2 , AIC and BIC.
Lasso: (yi − yˆi )2 + λ |βj |
Leverage i=1 j=1 Cp Statistic
(βj )2 : L2-norm The lower, the better.
Measure the distance between explanatory values and the
∑
n ∑
p σ̂ 2 − σ̂f2 ull
mean of explanatory values. Cp = p + (n − p)
Ridge: (yi − yˆi )2 + λ (βj )2 −σ̂f2 ull
H = X(X T X)−1 X i=1 j=1
Akaike’s Information Criteria(AIC)
∂ Yˆi The lower, the better.
For ith observation: hi = Hii =
∂Yi
√ AIC = 2p + nlog(σ̂ 2 ) = 2p − 2log(L)
SD(residuali ) = σ 1 − hi , h¯i = p/n
Cutoff: larger than 2p/n (p : the number of parameters)
Bayesian Information Criteria(BIC)
The lower, the better.
Studentized Residual
BIC = p · log(n) + n · log(σ̂ 2 ) = p · log(n) − 2log(L)
Model Validation
residuali
studresi = √ For a new data set, define mean square prediction error as:
σ̂ 1 − hi ∑k=1
i=1 (Yi − Ŷi )
2