0% found this document useful (0 votes)
29 views

FormulaSheet Test 1

The document provides formulas and procedures for various statistical tests that will be covered on Test 1, including: 1) Confidence intervals and hypothesis tests for estimating one population mean when the standard deviation is unknown. 2) Tests for estimating the difference between two means based on independent samples, including formulas for confidence intervals and t-tests. 3) Procedures for estimating the difference between two means based on paired data, including confidence intervals and t-tests. 4) Details on one-factor and two-factor analysis of variance, including calculations of sums of squares, mean squares, and F-ratios.

Uploaded by

bestreview7
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views

FormulaSheet Test 1

The document provides formulas and procedures for various statistical tests that will be covered on Test 1, including: 1) Confidence intervals and hypothesis tests for estimating one population mean when the standard deviation is unknown. 2) Tests for estimating the difference between two means based on independent samples, including formulas for confidence intervals and t-tests. 3) Procedures for estimating the difference between two means based on paired data, including confidence intervals and t-tests. 4) Details on one-factor and two-factor analysis of variance, including calculations of sums of squares, mean squares, and F-ratios.

Uploaded by

bestreview7
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

STAT4003 Formula Sheet Test 1

9:00 am – 5:00 pm on Monday, October 16, 2023 at SJA-204B

Estimating One Population Mean When the Population Standard Deviation is Unknown

Under the confidence level 1 − 𝛼, the confidence interval for the population mean is


𝑠
𝑥̅ ± 𝑡𝑛−1 𝑆𝐸(𝑥̅ ), , 𝑆𝐸(𝑥̅ ) =
√𝑛

𝑡𝑛−1 𝑖𝑠 𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙 𝑣𝑎𝑙𝑢𝑒 𝑤𝑖𝑡ℎ 𝑑𝑒𝑔𝑟𝑒𝑒𝑠 𝑜𝑓 𝑓𝑟𝑒𝑒𝑑𝑜𝑚 𝑛 − 1 𝑢𝑛𝑑𝑒𝑟 𝑡ℎ𝑒 𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 𝑙𝑒𝑣𝑒𝑙 1 − 𝛼.

Sample size determined by the requirement of margin of error.


∗ ∗
𝑠 ∗
𝑠 2 𝑠 2
𝑀𝐸 = 𝑡𝑛−1 𝑆𝐸(𝑥̅ ) = 𝑡𝑛−1 , 𝑛 = (𝑡𝑛−1 ) , 𝑢𝑠𝑢𝑎𝑙𝑙𝑦 𝑢𝑠𝑒 𝑎𝑠 𝑛 = (𝑧 ∗ )
√𝑛 𝑀𝐸 𝑀𝐸

Testing One Population Mean when the Population Standard Deviation is Unknown
𝑥̅ − 𝜇0
𝑇𝑒𝑠𝑡 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐 𝑡= , 𝑛 − 1 𝑖𝑠 𝑑𝑒𝑔𝑟𝑒𝑒 𝑜𝑓 𝑓𝑟𝑒𝑒𝑑𝑜𝑚
𝑠/√𝑛

Estimating the Difference between Two Means Based on Independent Samples

Under the confidence level 1 − 𝛼, 𝑤ℎ𝑒𝑛 𝜎12 ≠ 𝜎22 the confidence interval of 𝜇1 − 𝜇2 is

𝑠12 𝑠22 (𝑠12 /𝑛1 + 𝑠22 /𝑛2 )2 𝑠12 𝑠22


(𝑥̅1 − 𝑥̅2 ) ± 𝑡𝛼/2 √( + ), 𝑑𝑓 = , 𝑆𝐸(𝑥̅ 1 − 𝑥̅2 ) = √( + )
𝑛1 𝑛2 (𝑠12 /𝑛1 )2 (𝑠22 /𝑛2 )2 𝑛1 𝑛2
𝑛1 − 1 + 𝑛2 − 1

Under the confidence level 1 − 𝛼, 𝑤ℎ𝑒𝑛 𝜎12 = 𝜎22 the confidence interval of 𝜇1 − 𝜇2 is

1 1 (𝑛1 − 1)𝑠12 + (𝑛2 − 1)𝑠22


(𝑥̅1 − 𝑥̅2 ) ± 𝑡𝛼/2 √𝑠𝑝2 ( + ), 𝑑𝑓 = 𝑛1 + 𝑛2 − 2, 𝑠𝑝2 =
𝑛1 𝑛2 𝑛1 + 𝑛2 − 2

1 1
𝑆𝐸𝑝𝑜𝑜𝑙𝑒𝑑 (𝑥̅ 1 − 𝑥̅2 ) = 𝑠𝑝 √ +
𝑛1 𝑛2

Testing the Difference between Two Means Based on Independent Samples

Test statistic for 𝜇1 − 𝜇2 𝑤ℎ𝑒𝑛 𝜎12 ≠ 𝜎22

(𝑥̅1 − 𝑥̅2 ) − (𝜇1 − 𝜇2 ) (𝑠12 /𝑛1 + 𝑠22 /𝑛2 )2


𝑡= , 𝑑𝑓 = 2
(𝑠1 /𝑛1 )2 (𝑠22 /𝑛2 )2
𝑠2 𝑠2 𝑛1 − 1 + 𝑛2 − 1
√( 1 + 2 )
𝑛1 𝑛2
(Pooled t-test) Test statistic for 𝜇1 − 𝜇2 𝑤ℎ𝑒𝑛 𝜎12 = 𝜎22

(𝑥̅1 − 𝑥̅2 ) − (𝜇1 − 𝜇2 ) (𝑛1 − 1)𝑠12 + (𝑛2 − 1)𝑠22


𝑡= , 𝑑𝑓 = 𝑛1 + 𝑛2 − 2, 𝑠𝑝2 =
1 1 𝑛1 + 𝑛2 − 2
√𝑠𝑝2 ( + )
𝑛1 𝑛2

Estimating the Difference between Two Means Based on Paired Data


𝑠𝐷 ̅) = 𝑠𝑑
𝜇 = 𝑥̅𝐷 ± 𝑡𝛼/2 , 𝑑𝑓 = 𝑛 − 1 𝑖𝑠 𝑑𝑒𝑔𝑟𝑒𝑒𝑠 𝑜𝑓 𝑓𝑟𝑒𝑒𝑑𝑜𝑚, 𝑆𝐸(𝑑
√𝑛𝐷 √𝑛
Testing the Difference between Two Means based on Paired Data
𝐻0 : 𝜇𝑑 = 0

𝐻1 : 𝜇𝑑 ≠ 0, 𝑜𝑟 𝐻1 : 𝜇𝑑 > 0, 𝑜𝑟 𝐻1 : 𝜇𝑑 < 0
𝑥̅𝐷 − 𝜇𝐷
𝑇𝑒𝑠𝑡 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐 𝑡= , 𝑑𝑓 = 𝑛 − 1 𝑖𝑠 𝑑𝑒𝑔𝑟𝑒𝑒𝑠 𝑜𝑓 𝑓𝑟𝑒𝑒𝑑𝑜𝑚
𝑠𝐷 /√𝑛𝐷

One-Factor Analysis of Variance

𝐻0 : 𝜇1 = 𝜇2 = ⋯ = 𝜇𝑘

𝐻1 : 𝐴𝑡 𝑙𝑒𝑎𝑠𝑡 𝑜𝑛𝑒 𝑚𝑒𝑎𝑛 𝑖𝑠 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑡

𝑀𝑆𝑇
𝑇𝑒𝑠𝑡 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐 𝐹𝑘−1,𝑁−𝑘 = , 𝑟𝑒𝑗𝑒𝑐𝑡𝑖𝑛𝑔 𝑡ℎ𝑒 𝑛𝑢𝑙𝑙 ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠 𝑤ℎ𝑒𝑛 𝐹𝑘−1,𝑁−𝑘 𝑖𝑠 𝑡𝑜𝑜 𝑙𝑎𝑟𝑔𝑒.
𝑀𝑆𝐸
𝑆𝑆𝑇 𝑆𝑆𝐸
𝑀𝑆𝑇 = , 𝑀𝑆𝐸 =
𝑘−1 𝑁−𝑘
Average size of the error standard deviation 𝑠𝑝 = √𝑀𝑆𝐸
Variation between the groups
𝑘

𝑇𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡 𝑆𝑢𝑚 𝑜𝑓 𝑆𝑞𝑢𝑎𝑟𝑒𝑠 𝑆𝑆𝑇 = ∑ 𝑛𝑖 (𝑥̅𝑖 − 𝑥̅ )2 , 𝑑𝑓 = 𝑘 − 1


𝑖=1
𝑥̅𝑖 𝑖𝑠 𝑡ℎ𝑒 𝑚𝑒𝑎𝑛 𝑜𝑓 𝑔𝑟𝑜𝑢𝑝 𝑖 𝑎𝑛𝑑 𝑥̅ 𝑖𝑠 𝑡ℎ𝑒 𝑜𝑣𝑒𝑟𝑎𝑙𝑙 𝑚𝑒𝑎𝑛 𝑜𝑓 𝑎𝑙𝑙 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠.

Variation within a group


𝑘

𝐸𝑟𝑟𝑜𝑟 𝑆𝑢𝑚 𝑜𝑓 𝑆𝑞𝑢𝑎𝑟𝑒𝑠 𝑆𝑆𝐸 = ∑(𝑛𝑖 − 1)𝑠𝑖2 ,


𝑖=1
𝑠𝑖2 𝑖𝑠 𝑡ℎ𝑒 𝑠𝑎𝑚𝑝𝑙𝑒 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑜𝑓 𝑔𝑟𝑜𝑢𝑝 𝑖, 𝑑𝑓 = 𝑁 − 𝑘

𝑆𝑆𝑇𝑜𝑡𝑎𝑙 = 𝑆𝑆𝑇 + 𝑆𝑆𝐸

ANOVA Table of One Factor Analysis

Source df Sum of Squares Mean Square F-Ratio P(x > F)


Treatment (Between) k-1 SST MST MST/MSE p-value
Error (Within) N-k SSE MSE
Total N-1 SSTotal
Two-Factor Analysis of Variance

𝑥𝑖𝑗 represents the ith level of the first factor and the jth level of the second factor. First factor A has a levels and
second factor B has b levels.
𝑏 𝑎

𝑇𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡 𝑆𝑢𝑚 𝑜𝑓 𝑆𝑞𝑢𝑎𝑟𝑒𝑠 𝑓𝑜𝑟 𝑡ℎ𝑒 𝑓𝑖𝑟𝑠𝑡 𝑓𝑎𝑐𝑡𝑜𝑟 𝑆𝑆𝐴 = ∑ ∑(𝑥̅𝑖 − 𝑥̿ )2


𝑗=1 𝑖=1
𝑥̅𝑖 𝑖𝑠 𝑡ℎ𝑒 𝑚𝑒𝑎𝑛 𝑜𝑓 𝑎𝑙𝑙 𝑠𝑢𝑏𝑗𝑒𝑐𝑡𝑠 𝑎𝑠𝑠𝑖𝑔𝑛𝑚𝑒𝑛𝑡 𝑎𝑠𝑠𝑖𝑔𝑛𝑒𝑑 𝑙𝑒𝑣𝑒𝑙 𝑖 𝑜𝑓 𝑓𝑎𝑐𝑡𝑜𝑟 𝐴 𝑎𝑛𝑑 𝑥̿ 𝑖𝑠 𝑡ℎ𝑒 𝑜𝑣𝑒𝑟𝑎𝑙𝑙 𝑚𝑒𝑎𝑛 𝑜𝑓 𝑎𝑙𝑙

𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠.
𝑆𝑆𝐴
𝑀𝑆𝐴 =
𝑎−1
𝑎 𝑏
2
𝑇𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡 𝑆𝑢𝑚 𝑜𝑓 𝑆𝑞𝑢𝑎𝑟𝑒𝑠 𝑓𝑜𝑟 𝑡ℎ𝑒 𝑠𝑒𝑐𝑜𝑛𝑑 𝑓𝑎𝑐𝑡𝑜𝑟 𝑆𝑆𝐵 = ∑ ∑(𝑥̅𝑗 − 𝑥̿ )
𝑖=1 𝑗=1
𝑥̅𝑗 𝑖𝑠 𝑡ℎ𝑒 𝑚𝑒𝑎𝑛 𝑜𝑓 𝑎𝑙𝑙 𝑠𝑢𝑏𝑗𝑒𝑐𝑡𝑠 𝑎𝑠𝑠𝑖𝑔𝑛𝑚𝑒𝑛𝑡 𝑎𝑠𝑠𝑖𝑔𝑛𝑒𝑑 𝑙𝑒𝑣𝑒𝑙 𝑗 𝑜𝑓 𝑓𝑎𝑐𝑡𝑜𝑟 𝐵 𝑎𝑛𝑑 𝑥̿ 𝑖𝑠 𝑡ℎ𝑒 𝑜𝑣𝑒𝑟𝑎𝑙𝑙 𝑚𝑒𝑎𝑛 𝑜𝑓 𝑎𝑙𝑙

𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠.
𝑆𝑆𝐵
𝑀𝑆𝐵 =
𝑏−1

𝑆𝑆𝐸 = 𝑆𝑆𝑇𝑜𝑡𝑎𝑙 − (𝑆𝑆𝐴 + 𝑆𝑆𝐵)


𝑎 𝑏
2
𝑆𝑆𝑇𝑜𝑡𝑎𝑙 = ∑ ∑(𝑥𝑖𝑗 − 𝑥̿ )
𝑖=1 𝑗=1

𝑆𝑆𝐸
𝑀𝑒𝑎𝑛 𝑆𝑞𝑢𝑟𝑒 𝑓𝑜𝑟 𝐸𝑟𝑟𝑜𝑟 𝑀𝑆𝐸 = , 𝑤ℎ𝑒𝑟𝑒 𝑁 = 𝑎 × 𝑏
𝑁 − (𝑎 + 𝑏 − 1)

Test statistic
𝑀𝑆𝐴
𝐹𝑎−1, 𝑁−(𝑎+𝑏−1) =
𝑀𝑆𝐸
𝑀𝑆𝐵
𝐹𝑏−1, 𝑁−(𝑎+𝑏−1) =
𝑀𝑆𝐸

ANOVA Table of Two-Factor without Replication

Source df Sum of Squares Mean Square F-Ratio Prob > F


Factor A (Rows) a-1 SSA MSA MSA/MSE p-value
Factor B (Columns) b-1 SSB MSB MSB/MSE p-value
Error (Within) N - (a + b - 1) SSE MSE
Total (Corrected) N-1 SSTotal

𝑀𝑆𝐴𝐵
𝐹(𝑎−1)(𝑏−1), 𝑎𝑏(𝑟−1) = 𝑖𝑓 𝑡ℎ𝑒 𝑒𝑥𝑝𝑒𝑟𝑖𝑚𝑒𝑛𝑡 𝑖𝑠 𝑟𝑒𝑝𝑙𝑖𝑐𝑎𝑡𝑒𝑑 𝑟 𝑡𝑖𝑚𝑒𝑠.
𝑀𝑆𝐸

𝑁 = 𝑎 × 𝑏 × 𝑟, 𝑘 =𝑎×𝑏
ANOVA Table of Two-Factor with Replication

Source df Sum of Squares Mean Square F-Ratio Prob > F


Factor A (Sample) a-1 SSA MSA MSA/MSE p-value
Factor B (Columns) b-1 SSB MSB MSB/MSE p-value
Interaction (a – 1)(b – 1) SSAB MSAB MSAB/MSE p-value
Error (Within) N-k = ab(r-1) SSE MSE
Total (Corrected) N-1 SSTotal

• H0: the means of the levels of factor A are equal,


• H0: the means of the levels of factor B are equal, and
• H0: the effects of factor A are constant across the levels of factor B (or vice versa)

Chi-Square Test ---- Goodness of Fit Test

𝑅𝑒𝑠𝑖𝑑𝑢𝑎𝑙 = 𝑂𝑏𝑠 − 𝐸𝑥𝑝 = 𝑂𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝐶𝑜𝑢𝑛𝑡𝑠 − 𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝐶𝑜𝑢𝑛𝑡𝑠


𝑂𝑏𝑠 − 𝐸𝑥𝑝
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑𝑖𝑧𝑒𝑑 𝑅𝑒𝑠𝑖𝑑𝑢𝑎𝑙 =
√𝐸𝑥𝑝

(𝑂𝑏𝑠 − 𝐸𝑥𝑝)2
𝐶ℎ𝑖 − 𝑆𝑞𝑢𝑎𝑟𝑒 𝑆𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐, 𝜒2 = ∑
𝐸𝑥𝑝
𝐷𝑒𝑔𝑟𝑒𝑒𝑠 𝑜𝑓 𝐹𝑟𝑒𝑒𝑑𝑜𝑚 = 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑒𝑙𝑙𝑠 − 1

Chi-Square Test ---- Homogeneity

𝑇𝑜𝑡𝑎𝑙𝑟𝑜𝑤 𝑖 × 𝑇𝑜𝑡𝑎𝑙𝐶𝑜𝑙 𝑗
𝐸𝑥𝑝𝑖𝑗 =
𝑇𝑎𝑏𝑙𝑒 𝑇𝑜𝑡𝑎𝑙

𝑅𝑒𝑠𝑖𝑑𝑢𝑎𝑙 = 𝑂𝑏𝑠 − 𝐸𝑥𝑝 = 𝑂𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝐶𝑜𝑢𝑛𝑡𝑠 − 𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝐶𝑜𝑢𝑛𝑡𝑠

𝑂𝑏𝑠 − 𝐸𝑥𝑝
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑𝑖𝑧𝑒𝑑 𝑅𝑒𝑠𝑖𝑑𝑢𝑎𝑙 =
√𝐸𝑥𝑝

(𝑂𝑏𝑠 − 𝐸𝑥𝑝)2
𝐶ℎ𝑖 − 𝑆𝑞𝑢𝑎𝑟𝑒 𝑆𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐, 𝜒2 = ∑
𝐸𝑥𝑝

𝐷𝑒𝑔𝑟𝑒𝑒𝑠 𝑜𝑓 𝐹𝑟𝑒𝑒𝑑𝑜𝑚 = (𝑅 − 1)(𝐶 − 1),


𝑅 𝑖𝑠 𝑡ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑟𝑜 𝑟𝑜𝑤𝑠 𝑎𝑛𝑑 𝐶 𝑖𝑠 𝑡ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑜𝑙𝑢𝑚𝑛𝑠
List of Excel Functions

NORM.S.DIST(z,1) gives the value of 𝑃(𝑋 < 𝑧)


if is X is standard normal distribution variable

NORM.DIST(z,µ,ơ,1) gives the value of 𝑃(𝑋 < 𝑧)


if is X is normal distribution variable with mean µ and standard deviation ơ

NORM.S.INV(𝑝) gives the value of z


if X is standard normal distribution variable and 𝑃(𝑋 < 𝑧) = 𝑝

NORM.INV(𝑝, 𝜇, 𝜎) gives the value of z,

if is X is normal distribution variable with mean µ and

standard deviation ơ and 𝑃(𝑋 < 𝑧) = 𝑝

T.DIST(x, degree of freedom, 1) gives the probability of the left side of x.

T.DIST.2T(x, degree of freedom) gives the probability of the two tails outside of
the interval (-x, x).

T.DIST.RT(x, degree of freedom) gives the probability of the right side of x.

T.INV(left side probability, degree of freedom) gives the critical t-value.

T.INV.2T(significance level, degree of freedom) gives the critical t-value.

F.DIST(x, df1, df2, cumulative) gives the probability of the left side of x when
the cumulative is true.

F.INV(probability, df1, df2) gives the critical value given the probability of left
side of the critical value.

F.DIST.RT(x, df1, df2) gives the probability of the right side of x.

F.INV.RT(probability, df1, df2) gives the critical value given the probability of
right side of the critical value.

CONFIDENCE.NORM(alpha, ơ, n) gives the radius of confidence interval with


confidence level “1 – alpha”, population standard deviation ơ and sample size n.

CONFIDENCE.T(alpha, s, n) gives the radius of confidence interval with


confidence level “1 – alpha”, population standard deviation s and sample size n.
Instruction of adding the data analysis in since it is not default setting in Excel. All
temporary settings in the computer in our campus are cleaned up every day. If you
need to add the function in, here is the instruction.

Excel has all these analysis programs built in but they are not showing up in the
default setting when you open Excel. You have to do the following steps in Excel to
have the functions available for you to use.

➢ Click the “File” button on the top left corner of Excel;


➢ Click the “Option” at the bottom of the drop down menu;
➢ Choose “Add-Ins” on the second bottom of the list on the left side of the pop-
up window;
➢ Choose “Analysis ToolPak” in the middle;
➢ Click “Go” at the bottom of the window;
➢ Check mark “Analysis ToolPak” on the top of the list in the Pop-up window;
➢ Click OK.

Now when you go back to Excel interface, click the group “Data”, you will see “Data
Analysis” showing on the right side of the ribbon area. Click it, you will be able to
perform different kind of z-tests, t-tests and other analysis.

AVERAGE(range of data in Excel) gives the mean of the data

STDEV.P(range of data in Excel) gives the population standard deviation

STDEV.S(range of data in Excel) gives the sample standard deviation

VAR.P(range of data in Excel) gives the population variance

VAR.S(range of data in Excel) gives the sample variance

You might also like