Formulae
Formulae
(n − 1)s2X
2
σX and σX Normal population 2 ∼ χ2n−1
σX
D̄ − µD
µX − µY Normal difference Di = Xi − Yi , matched pairs √ ∼ tn−1
sD / n
X̄ − Ȳ − (µX − µY )
µX − µY Normal populations, common variance " ∼ tnX +nY −2 , where
sp n1X + n1Y
(nX − 1)s2X + (nY − 1)s2Y
s2p =
nX + nY − 2
X̄ − Ȳ − (µX − µY )
µX − µY Normal populations, known variances " 2 2
∼ N (0, 1)
σX σY
nX + nY
X̄ − Ȳ − (µX − µY )
µX − µY Nonnormal populations, large sample sizes " 2 ∼approx. N (0, 1)
sX s2Y
nX + nY
p̂ − p̂Y − (pX − pY )
pX − pY Bernoulli populations, large sample sizes #X $ % ∼approx. N (0, 1), where
p̂0 (1 − p̂0 ) n1X + n1Y
nX p̂X + nY p̂Y
p̂0 =
nX + nY
2 s2X /σX
2
σX /σY2 and σX /σY Normal populations ∼ FnX −1,nY −1
sY /σY2
2
2 2
Example: To construct an (1 − α) confidence interval for µX if X ∼ N (µX , σX ) with σX unknown we have:
& '
sx sx
CI1−α (µX ) = x̄ − tn−1;α/2 √ ; x̄ + tn−1;α/2 √
n n
To perform a lower-tail test H0 : µX ≥ µ0 versus H1 : µX < µ0 , the rejection region at significance level α, RRα , is:
( t 0
) )
* ,x̄ −
) -. /
µ0
)
1
RRα = t : √ < tn−1;1−α
)
) sx / n )
)
+ 2
1
Sample covariance and correlation based on bivariate observations (x1 , y1 ), . . . , (xn , yn ):
n
3 n
3
sxy (xi − x̄) (yi − ȳ) xi yi − nx̄ȳ r(x,y) 4
n
, -. / , -. / cov (x, y) xi yi − nx̄ȳ
i=1 i=1 i=1
cov (x, y) = = , cor (x, y) = =5 5
n−1 n−1 sx sy 4
n 4
n
2
xi − nx̄ 2 yi2 − nȳ 2
i=1 i=1
Slope and intercept estimates in the simple linear regression model yi = β0 + β1 xi + ui , where
ui ∼ iid N (0, σ 2 ) to obtain the fitted line ŷi = β̂0 + β̂1 xi :
n n
1 3 3
(xi − x̄) (yi − ȳ) xi yi − nx̄ȳ
cov(x, y) n − 1 i=1 i=1
β̂1 = = n = n , β̂0 = ȳ − β̂1 x̄
s2x 1 3 2
3
(xi − x̄) x2i − nx̄2 n
n − 1 i=1 i=1
3
e2i
i=1
Pivotal quantities for β1 , β0 , σ 2 , with residuals ei = yi − ŷi and residual variance s2R = :
n−2
β̂1 − β1 β̂0 − β0 (n − 2) s2R
5 ∼ tn−2 , 5 & ' ∼ tn−2 , ∼ χ2n−2
s2R 1 x̄2 σ2
s2R +
(n − 1)s2X n (n − 1)s2X
Confidence intervals for the mean and individual response for y0 given X = x0 :
6 9 : 6 9 :
7 7
7 1 (x − x̄)
2 7 1 (x − x̄)
2
ŷ0 ± tn−2,α/2 8s2R , ŷ0 ± tn−2,α/2 8s2R 1 + +
0 0
+
n (n − 1) s2X n (n − 1) s2X
ANOVA table for the simple linear regression model (R-squared R2 = SSM/SST ):
Source of variability SS 4n DF Mean F ratio
Model SSM =4 i=1 (ŷi − ȳ)2 4 1 SSM/1 SSM/s2R
n n
Residuals/errors SSR = i=1 (yi − ŷi )2 = i=1 e2i n−2 SSR/(n − 2) = s2R
Total SST = SSM + SSR n−1
To test H0 : β1 = 0 vs. H1 : β1 ∕= 0, test stat is F = SSM/s2R ∼ F1,n−2 and RRα = {F > F1,n−2;α }.
Model formulation, estimates, fitted model and residuals in multiple linear regression model
yi = β0 + β1 xi1 + β2 xi2 + · · · + βk xik + ui , where ui ∼ iid N (0, σ 2 ) in matrix notation:
y = Xβ + u, β̂ = (X T X)−1 X T y, ŷ = X β̂, e = y − ŷ, where
; >
; > ; > β0 ; >
y1 1 x11 x12 ··· x1k < β1 ? u1
< y2 ? < 1 x21 x22 ··· x2k ? < ? < u2 ?
< ? < ? < ? < ?
y=< .. ?, X=< .. .. .. .. .. ?, β = < β2 ? , u = < . ?
= . @ = . . . . . @ < .. ? = .. @
= . @
yn 1 xn1 xn2 ··· xnk un
βk
4n
Pivotal quantities for σ 2 and βj , j = 0, 1, . . . , k, with residual variance s2R = 2
i=1 ei /(n − k − 1):
(n − k − 1) s2R β̂j − βj
∼ χ2n−k−1 , ∼ tn−k−1 ,
σ2 s(β̂j )
"
where s(β̂j ) = s2 (β̂j ) and s2 (β̂j ) is the j-th diagonal element of the (estimated) variance-covariance matrix of β̂,
with the matrix defined as Sβ̂ = s2R (X T X)−1 .