LECTURE 1.
TWO POPULATION MEANS TESTS
▪ Dependent and Independent Samples
▪ Testing for Two Dependent Means
▪ Testing for Two Independent Means
▪ Testing for Two Variances
1
1.1. Dependent and Independent Samples
▪ Dependent samples (or related, pair sample): two
variable 𝑋1 , 𝑋2 gained from the same individuals
▪ Number of observations must equals
▪ Order of value cannot be changed
▪ Independent samples: observations gained from
different and independent individuals; 𝑋1 from one
sample, 𝑋2 from the other
▪ Number of observations can be different
▪ Order of value can be changed
2
Example 1.1
▪ Dependent sample ▪ Independent sample
Store Before After Firm A Firm B
72 76 76 90
75 79 79 82
70 77 77 85
82 80 80 90
70 75 75 80
83 89 89 79
Good Advertising policy ? 87
88
On average, are A and B really different? 84
3
1.2. Testing Two Dependent Means
▪ Pair sample 𝑋1𝑖 , 𝑋2𝑖 , 𝑖 = 1,2, … , 𝑛
▪ Sample size is 𝑛 for both 𝑋1 and 𝑋2
▪ Testing: 𝑀𝑒𝑎𝑛(𝑋1 ) and 𝑀𝑒𝑎𝑛(𝑋2 ) in population
unequals
𝐻0 : 𝜇1 = 𝜇2
ቊ
𝐻1 : 𝜇1 ≠ 𝜇2
𝐻1 could be: 𝜇1 > 𝜇2 or 𝜇1 < 𝜇2
▪ Assumption:
▪ 𝑋1 and 𝑋2 are normally distributed
▪ Or Sample size are large enough
4
Statistical and Critical value
▪ Let: 𝒅 = 𝑿𝟏 − 𝐗 𝟐 , sample: 𝒅𝒊 = 𝑿𝟏𝒊 − 𝑿𝟐𝒊
𝑯𝟎 : 𝝁𝟏 − 𝝁𝟐 = 𝟎 𝑯𝟎 : 𝝁𝒅 = 𝟎
▪ Hypothesis: ቊ or ቊ
𝑯𝟏 : 𝝁𝟏 − 𝝁𝟐 ≠ 𝟎 𝑯𝟏 : 𝝁𝒅 ≠ 𝟎
▪ Statistic value:
ഥ−𝟎
𝒅
𝒕=
𝒔𝒅 / 𝒏
▪ Critical value: 𝒕 𝒏−𝟏 𝜶/𝟐
▪ If 𝒕 > 𝒕 𝒏−𝟏 𝜶/𝟐 then reject 𝐻0
▪ Similarly in cases of 𝐻1 : 𝜇𝑑 > 0 and 𝐻1 : 𝜇𝑑 < 0
5
Example 1.2
▪ Does Advertising policy increases sales? α = 5%
▪ (Sales is Normally distributed)
𝐻0 : 𝜇𝑑 = 0
▪ ቊ Store Before After Difference
𝐻1 : 𝜇𝑑 > 0 1 72 76 4
1 24 2 75 79 4
ҧ
𝑑 = 𝑑𝑖 = =4 3
𝑛 6 70 77 7
4 82 80 –2
1
2
𝑠𝑑 = (𝑑𝑖 −𝑑)ҧ 2 = 10 5 70 75 5
𝑛−1
4−0 6 83 89 6
𝑡= = 3.1
10/ 6 Sum 24
▪ Critical 𝑡 5 0.05 = 2.015
6
Estimate the Difference
▪ Confidence interval 95%
𝒔𝒅 𝒔𝒅
ഥ − 𝒕(𝒏−𝟏)𝜶/𝟐
𝒅 ഥ
< 𝝁𝒅 < 𝒅 + 𝒕(𝒏−𝟏)𝜶/𝟐
𝒏 𝒏
▪ 𝑡(𝑛−1)𝛼/2 = 𝑡 5 0.025 = 2.57
3.16 3.16
4 − 2.57 < 𝜇𝑑 < 4 + 2.57
6 6
0.68 < 𝜇𝑑 < 7.32
7
In General
▪ Hypotheses:
𝑯𝟎 : 𝝁𝟏 − 𝝁𝟐 = 𝑫𝟎 𝑯𝟎 : 𝝁𝒅 = 𝑫𝟎
▪ ቊ ⇔ ቊ
𝑯𝟏 : 𝝁𝟏 − 𝝁𝟐 ≠ 𝑫𝟎 𝑯𝟏 : 𝝁𝒅 ≠ 𝑫𝟎
ഥ
𝒅−𝑫 𝟎
▪ Statistic: 𝒕 =
𝒔𝒅 / 𝒏
▪ Critical value: 𝑡(𝑛−1)𝛼/2 ;
▪ Rule: |t| > Critical value → Reject H0
8
Example 1.3
House Inc. Exp.
d ▪ Use the following data, test
-hold (1000) (1000)
the hypothesis that on
1 12 8 4
average, income is $3.000
2 15 10 5
higher than expenditure,
3 18 12 6
significant level at 5.
4 10 12 -2
5 16 16 0
▪ Income and Expenditure
6 16 9 7
are Normally distributed
7 14 15 -1 ▪ Summary statistics:
8 12 7 5 𝑑ҧ = 2.9; 𝑠𝑑2 = 13.8
9 15 8 7
10 11 13 -2
29
9
1.3. Testing Two Independent Means
▪ 𝑋1 is normality 𝑁(𝜇1 , 𝜎12 ), 𝑋2 is normality 𝑁(𝜇2 , 𝜎22 )
▪ Two independent sample: size 𝑛1 and 𝑛2
Known 𝜎12 , 𝜎22 Z-test : self study
Unknown 𝜎12 , 𝜎22
t-test with
Assume 𝜎12 = 𝜎22 pooled variance 𝑠𝑝2
Assume 𝜎12 ≠ 𝜎22 t-test with
two variance 𝑠12 , 𝑠22
10
2 2
Population Variance 𝜎𝟏 , 𝜎𝟐 are Known
Z-test
Statistic Hypotheses Reject H0 P-value
𝑯 : 𝝁 − 𝝁𝟐 = 𝑫𝟎 𝑧 > 𝑧𝛼/2 or 2 × 𝑃(𝑍 > |𝑧|)
(𝒙𝟏 − 𝒙𝟐 ) − 𝑫𝟎 ቊ 𝟎 𝟏 Two-tailed
𝑯𝟏 : 𝝁𝟏 − 𝝁𝟐 ≠ 𝑫𝟎 𝑧 <– 𝑧
𝒛= 𝛼/2
𝝈𝟐𝟏 𝝈𝟐𝟐 𝑃(𝑍 > 𝑧)
+ 𝑯𝟎 : 𝝁𝟏 − 𝝁𝟐 = 𝑫𝟎
𝒏𝟏 𝒏𝟐 ቊ 𝑧 > 𝑧𝛼 One-tailed
𝑯𝟏 : 𝝁𝟏 − 𝝁𝟐 > 𝑫𝟎
𝑯𝟎 : 𝝁𝟏 − 𝝁𝟐 = 𝑫𝟎 𝑃(𝑍 < 𝑧)
ቊ 𝑧 <– 𝑧𝛼 One-tailed
𝑯𝟏 : 𝝁𝟏 − 𝝁𝟐 < 𝑫𝟎
11
𝟐 𝟐
Unknown, Assumed 𝝈𝟏 = 𝝈𝟐
𝟐 𝟐
𝒏 − 𝟏 𝒔 + 𝒏 − 𝟏 𝒔
▪ Pooled variance 𝟐
𝒔𝒑 =
𝟏 𝟏 𝟐 𝟐
𝒏𝟏 + 𝒏𝟐 − 𝟐
t-test Hypotheses Reject H0 P-value
Statistic
𝑯𝟎 : 𝝁𝟏 − 𝝁𝟐 = 𝑫𝟎 2 × 𝑃(𝑇 > |𝑡|)
(𝒙𝟏 − 𝒙𝟐 ) − 𝑫𝟎 ቊ 𝑡 > 𝑡(𝑑𝑓)𝛼/2 Two-tailed
𝑯𝟏 : 𝝁𝟏 − 𝝁𝟐 ≠ 𝑫𝟎
𝒕=
𝒔𝟐𝒑 𝒔𝟐𝒑 𝑯 : 𝝁 − 𝝁𝟐 = 𝑫𝟎 𝑃(𝑇 > 𝑡)
+
𝒏𝟏 𝒏𝟐 ቊ 𝟎 𝟏 t>𝑡 One-tailed
𝑯𝟏 : 𝝁𝟏 − 𝝁𝟐 > 𝑫𝟎 𝑑𝑓 𝛼
𝑑𝑓 = 𝑛1 + 𝑛2 – 2 𝑯 : 𝝁 − 𝝁𝟐 = 𝑫𝟎 𝑃(𝑇 < 𝑡)
ቊ 𝟎 𝟏 t < −𝑡 One-tailed
𝑯𝟏 : 𝝁𝟏 − 𝝁𝟐 < 𝑫𝟎 𝑑𝑓 𝛼
12
Example 1.4
▪ Sales data of two firms is below. Assuming of equal
population variances, test that average sales in firm A
and B are different; at 5% and 1%
Firm A Firm B
▪ Assumed that variances are equal (𝑿𝟏 ) (𝑿𝟐 )
▪ 𝑥1 =
476
= 79.33 ; 𝑥2 =
765
= 85 76 90
6 9 79 82
▪ 𝑠12 = 25.87; 𝑠22 = 16.75 77 85
80 90
75 80
89 79
87
88
84
13
𝟐
Unknown, Assumed 𝝈𝟏 𝟐
𝝈𝟐
▪ Welch’s adjusted (𝑠12 /𝑛1 +𝑠22 /𝑛2 )2
degrees of freedom: 𝑑𝑓 = 2
𝑠1 Τ𝑛1 2 𝑠22 Τ𝑛2 2
+
𝑛1 − 1 𝑛2 − 1
t-test
Hypotheses Reject H0 P-value
Statistic
𝑯𝟎 : 𝝁𝟏 − 𝝁𝟐 = 𝑫𝟎 2 × 𝑃(𝑇 > |𝑡|)
ቊ
(𝒙𝟏 − 𝒙𝟐 ) − 𝑫𝟎 𝑯𝟏 : 𝝁𝟏 − 𝝁𝟐 ≠ 𝑫𝟎 𝑡 > 𝑡(𝑑𝑓)𝛼/2 Two-tailed
𝒕=
𝒔𝟐𝟏 𝑺𝟐𝟐 𝑯𝟎 : 𝝁𝟏 − 𝝁𝟐 = 𝑫𝟎 𝑃(𝑇 > 𝑡)
+ ቊ t>𝑡
𝒏𝟏 𝒏𝟐 𝑯𝟏 : 𝝁𝟏 − 𝝁𝟐 > 𝑫𝟎 𝑑𝑓 𝛼 One-tailed
𝑛1 , 𝑛2 > 30 then 𝑯 : 𝝁 − 𝝁𝟐 = 𝑫𝟎 𝑃(𝑇 < 𝑡)
ቊ 𝟎 𝟏
𝑡 𝑑𝑓 𝛼 = 𝑧𝛼 𝑯𝟏 : 𝝁𝟏 − 𝝁𝟐 < 𝑫𝟎 t < −𝑡 𝑑𝑓 𝛼 One-tailed
14
Example 1.5
▪ Testing that average sales in firm A and B are different
▪ Assuming of unequal variances Firm A Firm B
(𝑿𝟏 ) (𝑿𝟐 )
▪ Significant level of 5% and 2%
76 90
▪ 𝑥1 = 79.33 ; 𝑥2 = 85 79 82
▪ 𝑠12 = 25.87; 𝑠22 = 16.75; 𝑑𝑓 = 9 77 85
80 90
75 80
89 79
87
88
84
15
Estimate the difference of two means, 𝝁𝟏 − 𝝁𝟐
The 95% confidence interval for 𝜇1 − 𝜇2
▪ Assuming equal variances:
𝒔𝟐𝒑 𝒔𝟐𝒑
(𝝁𝟏 −𝝁𝟐 ) ∈ (𝒙𝟏 − 𝒙𝟐 ) ± 𝒕(𝒏𝟏 +𝒏𝟐 −𝟐)𝜶/𝟐 +
𝒏𝟏 𝒏𝟐
▪ Assuming unequal variances:
𝒔𝟐𝟏 𝒔𝟐𝟐
(𝝁𝟏 −𝝁𝟐 ) ∈ (𝒙𝟏 − 𝒙𝟐 ) ± 𝒕(𝒅𝒇)𝜶/𝟐 +
𝒏𝟏 𝒏𝟐
16
Example 1.6
17
Testing Two Independent Means
Unknown 𝜎12 , 𝜎22
Assume 𝜎12 = 𝜎22 How to know which
assumption is
Assume 𝜎12 𝜎22 correct?
Testing two variances
H0: 𝜎12 = 𝜎22
H1: 𝜎12 𝜎22
18
1.4. Testing two variances
▪ H0: 𝝈𝟐𝑿 = 𝝈𝟐𝒀 H1: 𝝈𝟐𝑿 𝝈𝟐𝒀
▪ F-statistic:
𝑺𝟐𝑿
𝑭= 𝟐
𝑺𝒀
▪ If 𝑭 > 𝑭 𝒏𝑿 −𝟏,𝒏𝒀 −𝟏 𝜶/𝟐 or 𝑭 < 𝑭 𝒏𝑿 −𝟏,𝒏𝒀 −𝟏 𝟏−𝜶/𝟐
then reject H0
▪ Note:
1
𝐹 𝑛𝑋 −1,𝑛𝑌 −1 1−𝛼/2 =
𝐹 𝑛𝑌 −1,𝑛𝑋 −1 𝛼/2
19
1.4. Testing Two Variances
F-test
Hypotheses Reject H0
Statistic
H0: 𝜎𝑋2 = 𝜎𝑌2 𝐹 > 𝐹 𝑛𝑋 −1,𝑛𝑌 −1 𝛼/2
𝑺𝟐𝑿 H1: 𝜎𝑋2 𝜎𝑌2 Or 𝐹 < 𝐹 𝑛𝑋 −1,𝑛𝑌 −1 1−𝛼/2
𝑭= 𝟐
𝑺𝒀
H0: 𝜎𝑋2 = 𝜎𝑌2
𝐹>𝐹 𝑛𝑋 −1,𝑛𝑌 −1 𝛼
H1: 𝜎𝑋2 > 𝜎𝑌2
H0: 𝜎𝑋2 = 𝜎𝑌2
𝐹<𝐹 𝑛𝑋 −1,𝑛𝑌 −1 1−𝛼
H1: 𝜎𝑋2 < 𝜎𝑌2
20
Example 1.7
▪ Testing for the hypothesis that variances of two firms
are different, at 5%?
2 2 Firm A Firm B
▪ 𝑠1 = 25.87; 𝑠2 = 16.75 (X) (Y)
▪ 𝐹 6−1,9−1 0.025 = 4.81 76 90
1 79 82
▪ 𝐹 6−1,9−1 0.975 = = 0.148 77 85
𝐹 8,5 0.025
80 90
75 80
89 79
87
88
84
21
Summary
Variable 𝑋1 and 𝑋2 𝑑ത
𝑡=
𝑆𝑑2 /𝑛
Yes
Pair sample? t-Test: 𝑑 = 𝑋1 − 𝑋2
𝑥1 −𝑥2 −𝐷0
No 𝑧=
Yes 𝜎2
1 𝜎2
+ 2
Known 𝜎12 , 𝜎22 ? z-Test 𝑛1 𝑛2
No 𝑥1 −𝑥2 −𝐷0
Yes 𝑡=
𝜎12 = 𝜎22 ? t-Test: 𝑠𝑝2 𝑆2
𝑝
𝑛1
𝑆2
+𝑛
𝑝
2
No
𝑆12 t-Test: 𝑠12 , 𝑠22 𝑡=
𝑥1 −𝑥2 −𝐷0
F-Test: 𝐹 =
𝑆22 𝑆2
1 𝑆2
+ 2
𝑛1 𝑛2
22
1.5 Testing Two Proportions
▪ Independent populations and samples 1, 2
▪ Population proportion: 𝑝1 , 𝑝2
𝑓1
▪ Sample 1 from population 1: 𝑝Ƹ1 =
𝑛1
𝑓2
▪ Sample 2 from population 2: 𝑝Ƹ 2 =
𝑛2
23
1.5 Testing Two Proportions
𝒇𝟏 𝒇𝟐
ෝ𝟏 =
𝒑 ෝ𝟐 =
𝒑
𝒏𝟏 𝒏𝟐
Statistic value Hypotheses Reject H0
ෝ𝟏 − 𝒑
𝒑 ෝ𝟐 𝐻0 : 𝑝1 ≤ 𝑝2
𝒁𝒔𝒕𝒂𝒕 = ቊ 𝑍𝑠𝑡𝑎𝑡 > 𝑧𝛼
𝟏 𝟏 𝐻1 : 𝑝1 > 𝑝2
ෝ ෝ
𝒑𝟎 (𝟏 − 𝒑𝟎 ) +
𝒏𝟏 𝒏𝟐
𝐻0 : 𝑝1 ≥ 𝑝2
𝒇𝟏 + 𝒇𝟐 ቊ 𝑍𝑠𝑡𝑎𝑡 < −𝑧𝛼
ෝ𝟎 =
𝒑
𝐻1 : 𝑝1 < 𝑝2
𝒏𝟏 + 𝒏𝟐
𝐻0 : 𝑝1 = 𝑝2
ቊ |𝑍𝑠𝑡𝑎𝑡 | > 𝑧𝛼/2
𝐻1 : 𝑝1 ≠ 𝑝2
24
Example 1.8
▪ Observe customers in three 1 2 3
store 1, 2, 3 Female 55 115 85
▪ At level of 5%, test the Male 45 55 35
hypothesis that “Female
Sum 100 170 120
customer proportion in store 1
and 2 are equal”
▪ What is the answer if significant level is 1%
▪ Compare Female proportion between store 2 and 3
BUSINESS STATISTICS – Bui Duong Hai – NEU – www.mfe.edu.vn/buiduonghai 25
Confidence Interval of Difference
▪ Confidence interval of p1 – p2 is 𝒑 ෝ𝟐 ± ME
ෝ𝟏 − 𝒑
𝑝ො1 (1−𝑝ො1 ) 𝑝ො2 (1−𝑝ො2 )
▪ Marginal error: 𝑀𝐸 = 𝑧𝛼/2 +
𝑛1 𝑛2
Example 1.9
▪ In 200 male and 300 female customers, there are 126
and 144 regular ones, respectively. At significant level
of 5%, whether regular proportion in male is higher
than that in female? If Yes, estimate the difference
with confidence level 95%.
26
1.6 Correlation Coefficient
▪ Pair samples data of quantity variables, 𝑥𝑖 , 𝑦𝑖 , 𝑖 = 1, 𝑛
▪ Covariance
σ𝑛𝑖=1 𝑥𝑖 − 𝑥ҧ 𝑦𝑖 − 𝑦ത
𝐶𝑜𝑣 𝑋, 𝑌 =
𝑛−1
▪ 𝐶𝑜𝑣 > 0: (+) correlated
▪ 𝐶𝑜𝑣 = 0: no correlated
▪ 𝐶𝑜𝑣 < 0: (-) correlated
▪ Correlation coefficient
𝐶𝑜𝑣 𝑋, 𝑌
𝑟𝑋,𝑌 =
𝑠𝑋 𝑠𝑌
27
Correlation
▪ (Pearson) sample correlation coefficient
𝒓 = −𝟏 −𝟏 < 𝒓 < 𝟎 𝒓=𝟎 𝟎<𝒓<𝟏 𝒓=𝟏
Linear Negatively No Positively Linear
Negatively correlated correlated correlated Positively
correlated correlated
28
Example 1.10
ഥ
𝒙𝒊 − 𝒙 𝟐 𝟐
𝒊 𝒙𝒊 𝒚𝒊 ഥ
𝒙𝒊 − 𝒙 ഥ
𝒚𝒊 − 𝒚 ഥ
𝒙𝒊 − 𝒙 ഥ
𝒚𝒊 − 𝒚
ഥ)
*(𝒚𝒊 − 𝒚
① 1 4 -1.4 -2.2 3.08 1.96 4.84
② 2 6 -0.4 -0.2 0.08 0.16 0.04
③ 2 5 -0.4 -1.2 0.48 0.16 1.44
④ 3 7 0.6 0.8 0.48 0.36 0.64
⑤ 4 9 1.6 2.8 4.48 2.56 7.84
σ 12 31 0 0 8.6 5.2 14.8
12 5.2 8.6
▪ 𝑥ҧ = = 2.4 𝑠𝑥2 = = 1.3 𝐶𝑜𝑣 = = 2.15
5 4 4
31 14.8 2.15
▪ 𝑦ത = = 6.2 𝑠𝑦2 = = 3.7 𝑟= = 0.9803
5 4 1.3∗3.7
29
Correlation Test
▪ In Population: 𝜌𝑋,𝑌 is unknown
▪ Hypotheses pair
▪ H0: 𝜌𝑋,𝑌 = 0 : X and Y are no correlated
▪ H1: 𝜌𝑋,𝑌 ≠ 0 : X and Y are correlated
▪ Z-Statistic: 𝑧 = 𝑟𝑋,𝑌 𝑛
▪ If 𝑧 > 𝑧𝛼/2 then reject H0
▪ Confidence interval of correlation coefficient
1 1
𝑟𝑋,𝑌 − 𝑧𝛼/2 < 𝜌𝑋,𝑌 < 𝑟𝑋,𝑌 + 𝑧𝛼/2
𝑛 𝑛
30
Example 1.11
▪ The following data shown Quantity of sale (Q), Price
(P), and competitive price (Z)
Q 20 25 24 26 28 29 25 26 28 26 27 28 26 25 27 28
P 18 18 17 15 15 12 16 17 14 15 14 13 15 16 16 15
Z 15 15 14 14 17 17 15 12 18 13 14 19 12 16 16 18
▪ Using Excel to calculate correlation coefficients
between Q and P; Q and Z
▪ At significant level of 5%, test for correlation between
Q and P; and Q and Z
▪ Find C.I 95% for the significant correlation
31
1.7 Practice
▪ Testing for means - dependent samples
▪ Testing for two variances
▪ Testing for means - independent samples
32