0% found this document useful (0 votes)
20 views2 pages

Formulae

This fact sheet summarizes key concepts for statistics exams including: - Notation for population and sample parameters like means, variances, and proportions - Assumptions and pivotal quantities for confidence intervals and hypothesis tests for one and two population means, variances, proportions, and differences between means - Examples of a confidence interval for a normal population mean with unknown variance and a lower-tail test for the same - Formulas for sample covariance, correlation, and simple linear regression slope and intercept estimates The fact sheet is organized by statistical test or method and lists the assumptions, pivotal quantity and its distribution, and examples for each. It covers one and two sample tests and intervals for both normal and non-normal populations.

Uploaded by

silvia.jmez.glez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views2 pages

Formulae

This fact sheet summarizes key concepts for statistics exams including: - Notation for population and sample parameters like means, variances, and proportions - Assumptions and pivotal quantities for confidence intervals and hypothesis tests for one and two population means, variances, proportions, and differences between means - Examples of a confidence interval for a normal population mean with unknown variance and a lower-tail test for the same - Formulas for sample covariance, correlation, and simple linear regression slope and intercept estimates The fact sheet is organized by statistical test or method and lists the assumptions, pivotal quantity and its distribution, and examples for each. It covers one and two sample tests and intervals for both normal and non-normal populations.

Uploaded by

silvia.jmez.glez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Statistics II

Fact sheet for exams

Confidence intervals and hypothesis testing in one and two populations.


Notation:
2
• µX and σX : population mean and variance of a random variable/population X, X̄ and s2X : sample mean and quasi-
variance
• pX population proportion if X ∼ Bernoulli(pX ), p̂X sample proportion
• X n : simple random sample (SRS) of size n from X
• (1 − α) confidence level, α significance level
• zα an upper α quantile of N(0,1) distribution, tn−1;α an upper α quantile of a tn−1 distribution

Parameter Assumptions: SRS(s) and Pivotal quantity and distribution


X̄ − µX
µX Normal population, known variance √ ∼ N (0, 1)
σX / n
X̄ − µX
µX Normal population, unknown variance √ ∼ tn−1
sX / n
X̄ − µX
µX Nonnormal population, large sample size √ ∼approx. N (0, 1)
sX / n
p̂ − pX
pX Bernoulli population, large sample size ! X ∼approx. N (0, 1)
p̂X (1 − p̂X )/n
(For hypothesis testing replace p̂X with pX in the standard error)

(n − 1)s2X
2
σX and σX Normal population 2 ∼ χ2n−1
σX
D̄ − µD
µX − µY Normal difference Di = Xi − Yi , matched pairs √ ∼ tn−1
sD / n
X̄ − Ȳ − (µX − µY )
µX − µY Normal populations, common variance " ∼ tnX +nY −2 , where
sp n1X + n1Y
(nX − 1)s2X + (nY − 1)s2Y
s2p =
nX + nY − 2
X̄ − Ȳ − (µX − µY )
µX − µY Normal populations, known variances " 2 2
∼ N (0, 1)
σX σY
nX + nY

X̄ − Ȳ − (µX − µY )
µX − µY Nonnormal populations, large sample sizes " 2 ∼approx. N (0, 1)
sX s2Y
nX + nY

p̂ − p̂Y − (pX − pY )
pX − pY Bernoulli populations, large sample sizes #X $ % ∼approx. N (0, 1), where
p̂0 (1 − p̂0 ) n1X + n1Y
nX p̂X + nY p̂Y
p̂0 =
nX + nY
2 s2X /σX
2
σX /σY2 and σX /σY Normal populations ∼ FnX −1,nY −1
sY /σY2
2

2 2
Example: To construct an (1 − α) confidence interval for µX if X ∼ N (µX , σX ) with σX unknown we have:
& '
sx sx
CI1−α (µX ) = x̄ − tn−1;α/2 √ ; x̄ + tn−1;α/2 √
n n
To perform a lower-tail test H0 : µX ≥ µ0 versus H1 : µX < µ0 , the rejection region at significance level α, RRα , is:
( t 0
) )
* ,x̄ −
) -. /
µ0
)
1
RRα = t : √ < tn−1;1−α
)
) sx / n )
)
+ 2

1
Sample covariance and correlation based on bivariate observations (x1 , y1 ), . . . , (xn , yn ):
n
3 n
3
sxy (xi − x̄) (yi − ȳ) xi yi − nx̄ȳ r(x,y) 4
n
, -. / , -. / cov (x, y) xi yi − nx̄ȳ
i=1 i=1 i=1
cov (x, y) = = , cor (x, y) = =5 5
n−1 n−1 sx sy 4
n 4
n
2
xi − nx̄ 2 yi2 − nȳ 2
i=1 i=1

Slope and intercept estimates in the simple linear regression model yi = β0 + β1 xi + ui , where
ui ∼ iid N (0, σ 2 ) to obtain the fitted line ŷi = β̂0 + β̂1 xi :
n n
1 3 3
(xi − x̄) (yi − ȳ) xi yi − nx̄ȳ
cov(x, y) n − 1 i=1 i=1
β̂1 = = n = n , β̂0 = ȳ − β̂1 x̄
s2x 1 3 2
3
(xi − x̄) x2i − nx̄2 n
n − 1 i=1 i=1
3
e2i
i=1
Pivotal quantities for β1 , β0 , σ 2 , with residuals ei = yi − ŷi and residual variance s2R = :
n−2
β̂1 − β1 β̂0 − β0 (n − 2) s2R
5 ∼ tn−2 , 5 & ' ∼ tn−2 , ∼ χ2n−2
s2R 1 x̄2 σ2
s2R +
(n − 1)s2X n (n − 1)s2X

Confidence intervals for the mean and individual response for y0 given X = x0 :
6 9 : 6 9 :
7 7
7 1 (x − x̄)
2 7 1 (x − x̄)
2
ŷ0 ± tn−2,α/2 8s2R , ŷ0 ± tn−2,α/2 8s2R 1 + +
0 0
+
n (n − 1) s2X n (n − 1) s2X

ANOVA table for the simple linear regression model (R-squared R2 = SSM/SST ):
Source of variability SS 4n DF Mean F ratio
Model SSM =4 i=1 (ŷi − ȳ)2 4 1 SSM/1 SSM/s2R
n n
Residuals/errors SSR = i=1 (yi − ŷi )2 = i=1 e2i n−2 SSR/(n − 2) = s2R
Total SST = SSM + SSR n−1
To test H0 : β1 = 0 vs. H1 : β1 ∕= 0, test stat is F = SSM/s2R ∼ F1,n−2 and RRα = {F > F1,n−2;α }.

Model formulation, estimates, fitted model and residuals in multiple linear regression model
yi = β0 + β1 xi1 + β2 xi2 + · · · + βk xik + ui , where ui ∼ iid N (0, σ 2 ) in matrix notation:
y = Xβ + u, β̂ = (X T X)−1 X T y, ŷ = X β̂, e = y − ŷ, where
; >
; > ; > β0 ; >
y1 1 x11 x12 ··· x1k < β1 ? u1
< y2 ? < 1 x21 x22 ··· x2k ? < ? < u2 ?
< ? < ? < ? < ?
y=< .. ?, X=< .. .. .. .. .. ?, β = < β2 ? , u = < . ?
= . @ = . . . . . @ < .. ? = .. @
= . @
yn 1 xn1 xn2 ··· xnk un
βk
4n
Pivotal quantities for σ 2 and βj , j = 0, 1, . . . , k, with residual variance s2R = 2
i=1 ei /(n − k − 1):

(n − k − 1) s2R β̂j − βj
∼ χ2n−k−1 , ∼ tn−k−1 ,
σ2 s(β̂j )
"
where s(β̂j ) = s2 (β̂j ) and s2 (β̂j ) is the j-th diagonal element of the (estimated) variance-covariance matrix of β̂,
with the matrix defined as Sβ̂ = s2R (X T X)−1 .

ANOVA table for the multiple linear regression model:


Source of variability SS DF Mean F ratio
Model SSM k SSM/k (SSM/k)/s2R
Residuals/errors SSR n−k−1 SSR/(n − k − 1) = s2R
Total SST n−1

You might also like