0% found this document useful (0 votes)
8 views9 pages

FormaShee FF t2024

This document is a comprehensive formula sheet for statistics, covering basic statistical formulas, special distributions, estimators, confidence intervals, hypothesis testing, and regression analysis. It includes key equations for calculating means, variances, probabilities, and various statistical tests. The content is structured to serve as a quick reference for students and professionals in the field of statistics.

Uploaded by

watermark0203
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views9 pages

FormaShee FF t2024

This document is a comprehensive formula sheet for statistics, covering basic statistical formulas, special distributions, estimators, confidence intervals, hypothesis testing, and regression analysis. It includes key equations for calculating means, variances, probabilities, and various statistical tests. The content is structured to serve as a quick reference for students and professionals in the field of statistics.

Uploaded by

watermark0203
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Version: January 9, 2024

Statistical Formulas

Vrije Universiteit
School of Business and Economics

Formula Sheet

1
1 Basic formulas for Statistics
P
position P th-percentile : (n + 1) (1.1)
100

P P Pc
xi xi j=1 f j mj
mean : µ= x= or x= (1.2)
N n n

1
Geometric mean : xG = (x1 × x2 × · · · × xn ) n (1.3)
  
1
= exp n
ln(x1 ) + . . . + ln(xn ) = exp ln(x)

Variance
P
2 (xi − µ)2
σ = (1.4)
P N P 2 P 2 P P
2 (xi − x)2 xi − nx2 xi − ( xi )2 /n fj (mj − x)2
s = = = or (1.5)
n−1 n−1 n−1 n−1

 
1
Chebyshev: 1− 2 × 100% (1.6)
k

   s 
σ
CV = × 100% CV = × 100% (1.7)
µ x

P
σX,Y (xi − x) (yi − y)
ρ= r = qP qP (1.8)
σX σY
(xi − x) ·
2
(yi − y)2

P (A) 1 − P (A)
Odds : (1.9)
1 − P (A) P (A)

P (B1 ∩ A) P (A|B1 ) P (B1 ) P (A|B1 ) P (B1 )


Bayes : P (B1 |A) = = P =P (1.10)
P (A) P (A ∩ Bi ) P (A|Bi ) P (Bi )

X
µ = µX = E (X) = xi P (X = xi ) (1.11)

X
E [h(X)] = h (xi ) P (X = xi ) , e.g. (1.12)
  X
E (X − 3)2 = (xi − 3)2 P (X = xi ) for h(X) = (X − 3)2

2
E (a · X) = a · E (X) , (1.13)
E (a · X + b · Y + c · Z) = a · E (X) + b · E (Y ) + c · E (Z) ,

X  
σ 2 = σX
2
= var(X) = (x − µ)2 P (X = x) = E (X − µ)2 = (1.14)
 
= E (X − E X)2 = E [X 2 ] − (E[X])2

var (a X ) = a2 var(X) σaX = |a| σX (1.15)

X
σX,Y = cov (X, Y ) = (xi − µX ) (yj − µY ) P (X = xi , Y = yj ) (1.16)
i,j

1 X
N
= (xi − µX ) (yi − µY ) (1.17)
N i=1

1 X
n
sX,Y = (xi − x) (yj − y) (1.18)
n − 1 i=1
sX,Y
rX,Y = (1.19)
sX sY

cov (aX, bY ) = a · b · cov (X, Y ) (1.20)

µX+Y = µX + µY (1.21)
2
σa·X+b·Y = var (a · X + b · Y ) = a2 σX
2
+ b2 σY2 + 2a b σX,Y

Variables X and Y are independent if and only if for all x and for all y:

P (X = x, Y = y) = P (X = x) × P (Y = y) (1.22)

If X and Y independent:
2 2
σX+Y = var (X + Y ) = σX + σY2 cov (X, Y ) = 0 (1.23)

Skewness and (excess) kurtosis:


Xn  3
n x − x̄
Skewness =
(n − 1)(n − 2) i=1 s
Xn  4
n(n + 1) x − x̄ 3(n − 1)2
Kurtosis = −
(n − 1)(n − 2)(n − 2) i=1 s (n − 2)(n − 3)

3
2 Special Distributions
 
n n! n!
n Cr = = n Pr = (2.1)
r r! (n − r)! (n − r)!
Binomial distribution:
 
n x
P ( X = x) = π (1 − π)n−x ; E X = nπ; var X = nπ (1 − π) (2.2)
x
Hypergeometric distribution (π = S/N ):

S N −S

N −n
P ( X = x) = x

n−x
; E X = nπ; var X = nπ (1 − π) (2.4)
N
n
N −1

In this class we allow the application of the z-test or the χ2 -test if the expected frequencies
in all cells are at least 5.
Hypergeometric can be approximated by a binomial(n, π = S/N ) if n/N < 0.05

Poisson distribution:
λx e−λ
P ( X = x) = ; E X = λ; var X = λ (2.5)
x!
Geometric distribution:
1
P ( X = x) = π (1 − π)x−1 ; for x = 1, 2, . . . EX = ; var X = (1 − π) /π 2
π
(2.6)

Uniform discrete distribution:


1
P (X = x) = for x = a, a + 1, . . . , b (2.7)
b−a+1
1 1 
E X = (a + b) ; var X = (b − a + 1)2 − 1 (2.8)
2 12
Uniform continuous distribution:
1 1 1
f (x) = for a ≤ x ≤ b EX= (a + b) ; var X = (b − a)2 (2.9)
b−a 2 12
Exponential distribution:

f (x) = λe−λx , P (X ≤ x) = 1 − e−λx , EX = σX = 1/λ, for x ≥ 0, λ > 0 (2.10)

Normal distribution:
 
1 1
f (x) = √ exp − 2 (x − µ) ;
2
E X = µ; var X = σ 2 (2.11)
σ 2π 2σ

4
3 Estimators

Estimator for µ : X

σ2
E X = µ, 2
var X = σX = (3.1)
n
X −µ
√ ∼ N (0, 1) (assume normal popul. for n < 15; symmetric popul. for n < 30)
σ/ n
(3.2)
X −µ
√ ∼ tn−1 (assume normal popul. for n < 15; symmetric popul. for n < 30) (3.3)
S/ n

Estimator for σ 2 : S 2
(n − 1) S 2
∼ χ2n−1 (assume normal population) (3.4)
σ2
Estimator for π: p

π (1 − π)
E p = π, var p = σp2 = (3.5)
n

5
4 Confidence intervals and error margins
σ
x ± zα/2 √ (4.1)
n

s
x ± tn−1;α/2 √ (4.2)
n

r
p (1 − p)
p ± zα/2 (4.3)
n

(n − 1) s2 (n − 1) s2
≤ σ 2
≤ (4.4)
χ2n−1;α/2 χ2n−1;1−α/2
Notation: tn−1;α/2 denotes the critical value of a t distribution with n − 1 degrees of
freedom and α/2 probability to the RIGHT of this critical value. Similarly for zα/2 and
χ2n−1;α/2 .
Note that we can rewrite each of the test-statistics in the next section into a confidence
interval as well. For instance, for a confidence interval for a difference of means with
samples that have unequal variances, we can take the test statistic from equation (5.7):
 2 2
s1 s22
(x1 − x2 ) − (µ1 − µ2 ) n1
+ n2
t= p 2 , df W elch = ( 2 )2 ( 2 )2
s1 /n1 + s22 /n2 s1
n1
s2
n2

n1 −1
+ n2 −1
and rewrite it into the confidence interval form for (µ1 − µ2 ):
q
(x1 − x2 ) ± tcrit
dfW elch ; α/2 · s21 /n1 + s22 /n2 .

Similar results hold for all other t and z type test-statistics.


So for instance:
r
1 − r2
r ± tn−2;α/2 (4.5)
n−2

s
σ12 σ22
(x̄1 − x̄2 ) ± zα/2 + (4.6)
n1 n2

s
s2p s2p (n1 − 1)s21 + (n2 − 1)s22
(x̄1 − x̄2 ) ± tn−2;α/2 + , s2p = (4.7)
n1 n2 n1 + n2 − 2

s
s21 s2
(x̄1 − x̄2 ) ± tdfW elch ;α/2 + 2, (4.8)
n1 n2

6
5 Testing hypotheses (assumptions, see section estimators)

x − µ0
z= (5.1)
√σ
n
x − µ0
t= (5.2)
√s
n
r
n−2
t=r , df = n − 2 (5.3)
1 − r2
p − π0
z=p or PBin(n,π0 ) (X ≤ p n) or PBin(n,π0 ) (X ≥ p n) (5.4)
π0 (1 − π0 ) /n
(n − 1) s2
χ2 = , df = n − 1 (normal population) (5.5)
σ02
x1 − x2 − (µ1 − µ2 )
z= p 2 (5.6)
σ1 /n1 + σ22 /n2
 2 2
s1 s22
(x1 − x2 ) − (µ1 − µ2 ) n1
+ n2
t= p 2 , df W elch = ( 2 )2 ( 2 )2 (5.7)
s1 /n1 + s22 /n2 s1
n1
s2
n2

n1 −1
+ n2 −1
(5.8)
x1 − x2 − (µ1 − µ2 ) (n1 − 1) s21 + (n2 − 1) s22
t= q ; df = n1 + n2 − 2; s2p =
s2p /n1 + s2p /n2 (n1 − 1) + (n2 − 1)
(5.9)
p1 − p2 − (π1 − π2 )
z=p (5.10)
p1 (1 − p1 ) /n1 + p2 (1 − p2 ) /n2
p1 − p2 − 0 x1 + x2
z=p ; pc = (5.11)
pc (1 − pc ) /n1 + pc (1 − pc ) /n2 n1 + n2
s2
F = 12 , (df1 , df2 ) = (n1 − 1; n2 − 1) (normal populations) (5.12)
s2
Xn′
n′ (n′ + 1) n′ (n′ + 1) (2n′ + 1) W − µW
W = Ri+ ; µW = ; 2
σW = , zW = p
i=1
4 24 2
σW
(5.13)

X
n
S= Si+ ; S ∼ Bin(n′ , 0.5) (5.14)
i=1
X (fjk − ejk )2 Rj C k
χ2 = ; df = (r − 1) (c − 1) , ejk = ≥5 (5.15)
ejk n
X (fj − ej )2
2
χ = ; df = r − 1 − m, e j = n · πj ≥ 5 (5.16)
ej

7
6 Regression
Simple Regression:
P P
SSxy (xi − x) (yi − y) xi yi − nx y
b1 = = P = P 2 (6.1)
SSxx (xi − x) 2
xi − nx2
b0 = y − b1 x (6.2)

Multiple Regression
X X
e2i = SSE = (yi − yb)2 (6.3)
P 2
ei
be = se = M SE =
σ 2 2
(6.4)
n−k−1
bi − βi,0
t= , i = 0, 1, . . . , k, df = n − k − 1 (6.5)
sb i
se
ybi ± tn−k−1 · √ resp. ybi ± tn−k−1 · se (6.6)
n
bi − tn−k−1 sbi ≤ βi ≤ bi + tn−k−1 sb1 , i = 0, 1, . . . , k (6.7)

Sum of Mean
Squares df Squares
P
Regression SSR= y − y)2
(b k (6.8)
P i
Residual SSE= (y − ybi )2 n−k−1 be2
σ
P i
Total SST= (yi − y)2 n−1

P P 2
SSR yi − y)2
(b ei
=P 2 = 1− P
2
R = (6.9)
SST (yi − y) (yi − y)2
 
 n−1
2
Radj = 1 − 1 − R2 (6.10)
n−k−1

(SSERestricted − SSEF ull ) /m


partial F : F = ∼ Fm,n−k−1 (6.11)
M SEF ull

s
1 M SE
V IFj = sb j =  (6.12)
1 − Rj2 SSxj 1 − Rj2

2 (k + 1)
Leverage : hi ≥ (6.13)
n

8
7 ANOVA
ANOVA (c columns, n1 , n2 , . . . , nc observations per column)

Source of variation Sum of Squares df Mean Square F Statistic


P 2
Between SSB = nj y j − y c−1 M SB = SSB F =M SB
PP 2 c−1 M SE
Error (‘Within’) SSE = yij − y j n−c M SE = SSE
PP 2 n−c
Total SST = yij − y n−1
(7.1)

Tukey:
s  
yj − yk 1 1
Tcalc =r   ≥ Tc,n−c ; Crit.Range = Tc,n−c M SE +
nj nk
M SE n1j + 1
nk

(7.2)

You might also like