StatisticalTests (2slidesPerPage)
StatisticalTests (2slidesPerPage)
Slides are at
http://www.design.kyushu-u.ac.jp/~takagi/TAKAGI/StatisticalTests.html
Hideyuki TAKAGI
Kyushu University, Japan
http://www.design.kyushu-u.ac.jp/~takagi/
ver. July 15, 2013
ver. July 11, 2013
ver. April 23, 2013
Contents
2 groups n groups (n > 2)
data
distribution
(related) (independent) (related) (independent)
(Analysis of Variance)
unpaired
Parametric Test
ANOVA
paired
・sign test
paired
two-way
・Wilcoxon signed-ranks test data ・Friedman test
+
Scheffé's method of paired comparison for Human Subjective Tests
How to Show Significance?
fitness
conventional conventional
EC EC
proposed EC2
proposed EC1
generations generations
Fig. XX Average convergence curves of n times of trial runs.
My method is
significantly better!
statistical test
data
distribution
(related) (independent)
(Analysis of Variance)
paired unpaired
Non-parametric Test Parametric Test
・sign test
paired
two-way
・Wilcoxon signed-ranks test data ・Friedman test
Which Test Should We Use?
2 groups n groups (n > 2)
data
distribution (related) (independent)
(Analysis of Variance)
paired unpaired
Non-parametric Test Parametric Test
ANOVA
(normality)
・sign test
paired
two-way
・Wilcoxon signed-ranks test data ・Friedman test
data
distribution
(related) (independent)
(Analysis of Variance)
paired unpaired
Non-parametric Test Parametric Test
Normality Test
(normality)
• Anderson-Darling test
・paired t -test
• D'Agostino-Pearson test
・ two-way ANOVA
• Kolmogorov-Smirnov test
(independent)
• Shapiro-Wilk test
unpaired
(no normality)
• Jarque–Bera
・Mann-Whitney U-test test
one-way
data
・Kruskal-Wallis test
・・・・
(related)
・sign test
paired
two-way
・Wilcoxon signed-ranks test data ・Friedman test
Which Test Should We Use?
2 groups n groups (n > 2)
data
distribution (related) (independent)
(Analysis of Variance)
paired unpaired
Non-parametric Test Parametric Test
・unpairedunpaired
t -test data paired data ANOVA
・ one-way
ANOVA
(normality)
(independent) (related)
initial conven
group A group B proposed
data # tional
・paired t 4.23
-test 2.51 1 ・ two-way
4.23 2.51 ANOVA
3.21 3.3 2 3.21 3.30
(independent)
unpaired
・sign test
paired
data
distribution
(related) (independent)
(Analysis of Variance)
paired unpaired
Non-parametric Test Parametric Test
・unpairedunpaired
t -test data paired data ANOVA
・ one-way
ANOVA
(normality)
(independent) (related)
A group data B group data initial
GA proposed
data #
・paired t -test ・ two-way ANOVA
(independent)
unpaired
(no normality)
・sign test
paired
two-way
・Wilcoxon signed-ranks test data ・Friedman test
Which Test Should We Use?
Q1: Which tests are more sensitive,
2those
groups
for unpaired data ornpaired
groups data?
(n > 2)
data
distribution
A1: Statistical tests for paired data
(related) (independent)
because of more data information.
(Analysis of Variance)
paired unpaired
Non-parametric Test Parametric Test
・unpairedunpaired
t -test data paired data ANOVA
・ one-way
ANOVA
(normality)
(independent) (related)
A group data B group data initial
GA proposed
data #
・paired t -test ・ two-way ANOVA
(independent)
unpaired
(no normality)
・sign test
paired
two-way
・Wilcoxon signed-ranks test data ・Friedman test
(Analysis of Variance)
paired unpaired
Non-parametric Test Parametric Test
・sign test
paired
two-way
・Wilcoxon signed-ranks test data ・Friedman test
n-th generation
Which Test Should we Use?
Q3: Which statistical tests are sensitive,
parametric
2 groups tests or non-parametric
n groups ones
(n > 2)
and why?
data
distribution A3: Parametric tests which can use information
of assumed data distribution.
(related) (independent)
(Analysis of Variance)
paired unpaired
Non-parametric Test Parametric Test
ANOVA
(normality)
・sign test
paired
two-way
・Wilcoxon signed-ranks test data ・Friedman test
t -Test
2 groups n groups (n > 2)
data
distribution
(related) (independent)
(Analysis of Variance)
paired unpaired
Non-parametric Test Parametric Test
t -test
・paired t -test ・ two-way ANOVA
(independent)
unpaired
(no normality)
・sign test
paired
two-way
・Wilcoxon signed-ranks test data ・Friedman test
t-Test
How to Show Significance?
g
significant?
n-th generation
t-Test
A B
significant
12 10
difference?
14 9
14 7
11 15
16 11
19 10
t-Test
Excel (32 bits version only?) has t-tests and ANOVA in Data Analysis
Tools. You must install its add-in. (File -> option -> add-in, and set its add-in.)
t-Test
(1) t-Test: Pairs two sample for means
n-th generation
t-Test
sample data t-Test: Paired Two Sample for Means
A B Variable 1 Variable 2
4.23 2.51 Mean 3.897 3.544
Variance 0.125823333 0.208693333
3.21 3.31
Observations 10 10
3.63 3.75 Pearson Correlation -0.161190073
4.42 3.22 Hypothesized Mean
0
4.08 3.99 Difference
3.98 3.65 df 9
t Stat 1.794964241
3.68 3.35
P(T<=t) one-tail 0.053116886
4.18 3.93 t Critical one-tail 1.833112933
3.85 3.91 P(T<=t) two-tail 0.106233772
3.71 3.82 t Critical two-tail 2.262157163
t-Test
sample data
When p-value is lesst-Test: Paired
than 0.01 Two
or 0.05, weSample
assume that for Means
there is significant difference with the level of significance
A B
of (p < 0.01) or (p < 0.05).
Variable 1 Variable 2
4.23 2.51 Mean 3.897 3.544
Variance 0.125823333 0.208693333
3.21 3.31
Observations 10 10
3.63 3.75
2.5% Pearson
2.5%Correlation -0.161190073 5%
4.42 3.22
A>B A ≈ BHypothesized
A<B Mean When A>B never happens,
0
4.08 3.99 Difference you may use a one-tail test.
3.98 3.65 df 9
t Stat 1.794964241
3.68 3.35
P(T<=t) one-tail 0.053116886
4.18 3.93 t Critical one-tail 1.833112933
3.85 3.91 P(T<=t) two-tail 0.106233772
3.71 3.82 t Critical two-tail 2.262157163
t-Test
(1) t-Test: Pairs two sample for means (2) t-Test: Two-sample assuming
equal variances
data
distribution (related) (independent)
(Analysis of Variance)
paired unpaired
Non-parametric Test Parametric Test
ANOVA
(normality)
ANOVA
・paired t -test ・ two-way ANOVA
(independent)
unpaired
(no normality)
・sign test
paired
two-way
・Wilcoxon signed-ranks test data ・Friedman test
n-th generation
ANOVA: Analysis of Variance
1. Analysis of more than two data groups.
2. Normality and equal variance are required.
n-th generation
When data are independent, use When data correspond each other, use
one-way ANOVA (single factor ANOVA). two-way ANOVA (two-factor ANOVA).
When data are independent, use When data correspond each other, use
one-way ANOVA
column(single
factor factor ANOVA). two-way ANOVA (two-factor
column factor ANOVA).
sample factor
ANOVA: Analysis of Variance
one-factor (one-way) ANOVA two-factor (two-way) ANOVA
sample factor
3.63 3.75 3.55 #2 3.21 3.3 2.89
4.42 3.22 4.39 #3 3.63 3.75 3.55
4.08 3.99 3.86 #4 4.42 3.22 4.39
3.98 3.65 3.5 #5 4.08 3.99 3.86
3.75 2.62 3.6 #6 3.98 3.65 3.5
3.22 2.93 3.21 #7 3.75 2.62 3.6
#8 3.22 2.93 3.21
Column
factor
Source of Variation SS df MS F P-value F crit A B C
Sample factor
11.5 9.5 16.8
Interaction 0.139411 2 0.069706 0.508573 0.613752 3.885294
16.4 14.0 14.3
Within 1.644733 12 0.137061 16.0 15.2 17.0
15.0 13.0 14.6
Total 6.12165 17 12.8 12.4 17.0
13.6 15.0 14.3
13.0 12.4 15.6
12.0 17.8 15.0
significant? 13.4 12.6 18.6
10.0 13.4 12.4
10.8 16.8 15.4
Non-Parametric Tests
2 groups n groups (n > 2)
data
distribution
(related) (independent)
(Analysis of Variance)
paired unpaired
Non-parametric Test Parametric Test
・Mann-Whitney U-test
(no normality)
・sign test
paired
two-way
・Wilcoxon signed-ranks test data ・Friedman test
Mann-Whitney U-test
2 groups n groups (n > 2)
data
distribution (related) (independent)
(Analysis of Variance)
paired unpaired
Non-parametric Test Parametric Test
ANOVA
(normality)
・Mann-Whitney U-test
(no normality)
・sign test
paired
two-way
・Wilcoxon signed-ranks test data ・Friedman test
Mann-Whitney U-test
(Wilcoxon-Mann-Whitney test, two sample Wilcoxon test)
?
no normality
n-th generation
Mann-Whitney U-test
(Wilcoxon-Mann-Whitney test, two sample Wilcoxon test)
1. Calculate a U value.
0
2
3
4
U =0+2+3+4=9
U' = 11 (U + U' = n1n2)
( when )
two values are the same,
count as 0.5.
U = 9 U = 12 U = 23.5
U' = 11 U' = 13 U' = 1.5
2.5
4
5
6
6
6
U = 29.5
significance is not found )
(Since U' > 5, (p > 0.05):
U' = 6.5
(p < 0.05) (p < 0.01)
n2 n2
n1 4 5 6 7 n1 4 5 6 7
3 ー 0 1 1 3 ー ー ー ー
4 0 1 2 3 4 ー ー 0 0
5 2 3 5 5 1 1 1
6 5 6 6 2 3
Sign Test
2 groups n groups (n > 2)
data
distribution (related) (independent)
(Analysis of Variance)
paired unpaired
Non-parametric Test Parametric Test
ANOVA
(normality)
・Mann-Whitney U-test
(no normality)
・sign test
paired
two-way
・Wilcoxon signed-ranks test data ・Friedman test
Sign Test
(1)Sign Test
significance test between the # of winnings and losses
n-th generation
Generations
Sign Test
0 10 20 30 40 50
|__________|__________|__________|__________|__________|
F1: DE_N vs. DE_LR +++++++++++++++++++++++++++++++++++++++++++++++++
F1: DE_N vs. DE_LS +++++++++++++++++++++++++++++++++++++++++++++++++
F1: DE_N vs. DE_FR_GLB_nD + +++++++++++++++++++++++++++++++++++++
F1: DE_N vs. DE_FR_LOC_nD ++ ++++++++++++++++++++++++++++++++++++
F1: DE_N vs. DE_FR_GLB_1D + +++++++++++++++++++++++++++++++++++++
Fig.3 in F1: DE_N vs. DE_FR_LOC_1D +++ ++++++++++++++++++++++++++++++++++++
F2: DE_N vs. DE_LR + ++++++++++++++++++++++++++++++++++++++++++
Y. Pei and H. Takagi, "Fourier analysis of the fitness F2: DE_N vs. DE_LS ++++++++++++++++++++++++++++++++++++++++++++
landscape for evolutionary search acceleration," IEEE F2: DE_N vs. DE_FR_GLB_nD
Congress on Evolutionary Computation (CEC), pp.1-7, F2: DE_N vs. DE_FR_LOC_nD
Brisbane, Australia (June 10-15, 2012). F2: DE_N vs. DE_FR_GLB_1D
F2: DE_N vs. DE_FR_LOC_1D
The (+,-) marks show whether our proposed F3: DE_N vs. DE_LR
F3: DE_N vs. DE_LS
methods converge significantly better or poorer F3: DE_N vs. DE_FR_GLB_nD ++++++++++++++++++++++++
than normal DE, respectively, (p ≤0.05). F3: DE_N vs. DE_FR_LOC_nD
F3: DE_N vs. DE_FR_GLB_1D
F3: DE_N vs. DE_FR_LOC_1D ++
F4: DE_N vs. DE_LR +++++++++++++++++++++++++++++++++++++++++++++++++
F4: DE_N vs. DE_LS ++++++++++++++++++++++++++++++++++++++++++++++++++
F4: DE_N vs. DE_FR_GLB_nD ++++++++++++++++++++++++++++++++++++++++++++++
F4: DE_N vs. DE_FR_LOC_nD
F4: DE_N vs. DE_FR_GLB_1D + ++++++++++++++++++++++++++++++++++++++++++++++
F4: DE_N vs. DE_FR_LOC_1D ++++++++++++++++++++++++++++++++++++++++++++++
F5: DE_N vs. DE_LR + ++ +
F5: DE_N vs. DE_LS
F5: DE_N vs. DE_FR_GLB_nD +++++++++++++++++++
F5: DE_N vs. DE_FR_LOC_nD ++
F5: DE_N vs. DE_FR_GLB_1D +++
F5: DE_N vs. DE_FR_LOC_1D +
F6: DE_N vs. DE_LR ++++++++++++ + +++++++
F6: DE_N vs. DE_LS +++++++++++++++++++++++++++++++++++ +++++
F6: DE_N vs. DE_FR_GLB_nD + + +++++++++++++++++++++++++++++++++++
F6: DE_N vs. DE_FR_LOC_nD +++++++++++++++++++++++++++++++++
F6: DE_N vs. DE_FR_GLB_1D + + +++++++++++++++++++++++++++++++++++
F6: DE_N vs. DE_FR_LOC_1D +++++++++ ++++++++++++++++++++++ +++
F7: DE_N vs. DE_LR +
F7: DE_N vs. DE_LS
F7: DE_N vs. DE_FR_GLB_nD ++++++ + ++
F7: DE_N vs. DE_FR_LOC_nD +
F7: DE_N vs. DE_FR_GLB_1D +
F7: DE_N vs. DE_FR_LOC_1D
F8: DE_N vs. DE_LR +++++++++++++++++++++++++++++++++++
F8: DE_N vs. DE_LS ++++++++++++++++++++++++++++++++++++++++++++++++
F8: DE_N vs. DE_FR_GLB_nD +++++++++++++++++++++++++++++++++++++++++++++++
F8: DE_N vs. DE_FR_LOC_nD +
F8: DE_N vs. DE_FR_GLB_1D +++++++++++++++++++++++++++++++++++
Fig.2 in the same paper. F8: DE_N vs. DE_FR_LOC_1D +++ +++++++++++++++++++++++++++++++++++++++++++++
level of level of
Sign Test significance
% %
significance
% %
Task Example
Whether performances of pattern recognition
methods A and B are significantly different?
How to check?
1. Set N = n2 + n3.
2. Check the right table with the N.
3. If min(n2, n3) is smaller than the number for the N,
we can say that there is significant difference with
the significant risk level of XX.
Exercise
Whether there is significant difference for
n2 = 12 and n3 = 28?
ANSWER:
Check the right table with N = 40.
As n2 is bigger than 11 and smaller than 13, we can say
that there is a significant difference between two with
(p < 0.05) but cannot say so with (p < 0.01).
14 vs. 1
9 vs. 3
18 vs. 5
data
distribution
(related) (independent)
(Analysis of Variance)
paired unpaired
Non-parametric Test Parametric Test
・Mann-Whitney U-test
(no normality)
・sign test
paired
two-way
・Wilcoxon signed-ranks test data ・Friedman test
Wilcoxon Signed-Ranks Test
Q: When a sign test could not show significance,
how to do?
A: Try the Wilcoxon signed-ranks test. It is more sensitive
than a simple sign test due to more information use.
n-th generation
(step 6)
Wilcoxon test table
3 6
n= (step 5) T # of ( Step 4)
T=
(step 6)
Wilcoxon test table
Exercise 1: Wilcoxon Signed-Ranks Test
(step 1) (step 2) (step 3) (step 4)
add sign to rank of fewer
v (system A) v (system B) difference d rank of |d|
the ranks # of signs
182 163 19 7 7
169 142 27 8 8
173 172 1 1 1
143 137 6 4 4
158 151 7 5 5
156 143 13 6 6
176 172 4 3 3
165 168 -3 2 -2 2
n= (step 5) T # of ( Step 4)
(step 6)
Wilcoxon test table
Exercise 2: Wilcoxon Signed-Ranks Test
(step 1) (step 2) (step 3) (step 4)
add sign to rank of fewer
v (system A) v (system B) difference d rank of |d|
the ranks # of signs
27 31 -4 5 -5
20 25 -5 6 -6
34 33 1 2 2 2
25 27 -2 4 -4
31 31 0 (No need to care the case of d = 0.)
23 29 -6 7.5 -7.5
26 27 -1 2 -2
24 30 -6 7.5 -7.5
35 34 1 2 2 2
n-th generation
Kruskal-Wallis Test
2 groups n groups (n > 2)
data
distribution (related) (independent)
(Analysis of Variance)
paired unpaired
Non-parametric Test Parametric Test
ANOVA
(normality)
(related)
・sign test
paired
two-way
・Wilcoxon signed-ranks test data ・Friedman test
Kruskal-Wallis Test
1. Comparison of more than two groups.
2. Data have no normality.
3. There are no data corresponding
no normality
n-th generation
Kruskal-Wallis Test
Let's use ranks of data.
1
2
3
4
5
6 7
8 9
10
11 12
13
14
15
16
17
Kruskal-Wallis Test
N: total # of data
k: # of groups How to Test
ni: # of data of group i
Ri : sum of ranks of group i 1. Rank all data.
2. Calculate N, k, and Ri .
1 3. Calculate statistical value H.
2
3 12 k
Ri2
4
5
H 3( N 1)
N ( N 1) i 1 ni
6 7
8 9 4. If k = 3 and N ≤ 17, compare
11
10 the H with a significant point
13 12
14 in a Kruskal-Wallis test table.
15
16 Otherwise, assume that H
17
follows the χ2 distribution and
test the H using a χ2
distribution table of (k-1)
R1 = 38 R2 = 69 R3 = 46
degrees of freedom
Kruskal-Wallis Test Table
Example: Kruskal-Wallis Test n1 n2
(for k = 3 and N ≤17)
n3 p < 0.05 p < 0.01 n1 n2 n3 p < 0.05 p < 0.01
2 2 2 - - 3 3 3 5.606 7.200
4.714 5.791 6.746
N = n1+n2+n3 = 17 data 2
2
2
2
3
4 5.333
-
-
3
3
3
3
4
5 6.649 7.079
2 2 5 5.160 6.533 3 3 6 5.615 7.410
k = 3 groups 2 2 6 5.346 6.655 3 3 7 5.620 7.228
2 2 7 5.143 7.000 3 3 8 5.617 7.350
(n1, n2, n3) = (6, 5, 6) 2 2 8 5.356 6.664 3 3 9 5.589 7.422
2 2 9 5.260 6.897 3 3 10 5.588 7.372
(R1, R2, R3) = (38, 69, 46) 2
2
2
2
10
11
5.120
5.164
6.537
6.766
3
3
3
4
11
4
5.583
5.599
7.418
7.144
2 2 12 5.173 6.761 3 4 5 5.656 7.445
2 2 13 5.199 6.792 3 4 6 5.610 7.500
12 k
Ri2 2 3 3 5.361 - 3 4 7 5.623 7.550
H 3( N 1)
N ( N 1) i 1 ni
2
2
2
3
3
3
4
5
6
5.444
5.251
5.349
6.444
6.909
6.970
3
3
3
4
4
4
8
9
10
5.623
5.652
5.661
7.585
7.614
7.617
2 3 7 5.357 6.839 3 5 5 5.706 7.578
2 3 8 5.316 7.022 3 5 6 5.602 7.591
12 38 * 38 69 * 69 46 * 46 2 3 9 5.340 7.006 3 5 7 5.607 7.697
3(17 1) 2 3 10 5.362 7.042 3 5 8 5.614 7.706
17(17 1) 6 5 6 2
2
3
3
11
12
5.374
5.350
7.094
7.134
3
3
5
6
9
6
5.670
5.625
7.733
7.725
2 4 4 5.455 7.036 3 6 7 5.689 7.756
= 6.609 2
2
4
4
5
6
5.273
5.340
7.205
7.340
3
3
6
7
8
7
5.678
5.688
7.796
7.810
2 4 7 5.376 7.321 4 4 4 5.692 7.654
2 4 8 5.393 7.350 4 4 5 5.657 7.760
Since significant points of (p<0.05) and (p<0.01) 2 4 9 5.400 7.364 4 4 6 6.681 7.795
2 4 10 5.345 7.357 4 4 7 5.650 7.814
for (n1, n2, n3) = (6, 5, 6) are 5.765 and 8.124, 2 4 11 5.365 7.396 4 4 8 5.779 7.853
2 5 5 5.339 7.339 4 4 9 5.704 7.910
respectively, there are significant difference(s) 2 5 6 5.339 7.376 4 5 5 5.666 7.823
2 5 7 5.393 7.450 4 5 6 5.661 7.936
somewhere among three groups (p<0.05). 2 5 8 5.415 7.440 4 5 7 5.733 7.931
2 5 9 5.396 7.447 4 5 8 5.718 7.992
2 5 10 5.420 7.514 4 6 6 5.724 8.000
6.609 2 6 6 5.410 7.467 4 6 7 5.706 8.039
5.765 8.124 2 6 7 5.357 7.491 5 5 5 5.780 8.000
2 6 8 5.404 7.522 5 5 6 5.729 8.028
significance significance 2 6 9 5.392 7.566 5 5 7 5.708 8.108
point of point of 2 7 7 5.398 7.491 5 6 6 5.765 8.124
(p<0.05) (p<0.01) 2 7 8 5.403 7.571
12 k
Ri2 2 3 3 5.361 - 3 4 7 5.623 7.550
R1 = 24 R2 = 44 R3 = 23 significance significance
point of point of
(p<0.05) (p<0.01)
Friedman Test
2 groups n groups (n > 2)
data
distribution
(related) (independent)
(Analysis of Variance)
paired unpaired
Non-parametric Test Parametric Test
・Kruskal-Wallis test
data
(related)
・sign test
paired
methods
(ex.) Comparison of recognition rates.
a b c d
benchmark methods 4
tasks a b c d 3
A 0.92 0.75 0.65 0.81 2
1
B 0.48 0.45 0.41 0.52 4
4 3 2
C 0.56 0.41 0.47 0.50 1 4
3 2 2 3
D 0.61 0.50 0.56 0.54
1 1
Friedman Test
Step 1: Make a ranking table. methods
Step 2: Sum ranks of the factor that you want to test. a b c d
4
benchmark method
tasks a b c d 3
2
A 4 2 1 3 1
# of data
(n = 4)
B 3 2 1 4
C 4 1 2 3
4 3 2
4 1
D 4 1 3 2 2 4
3 2 3
Σ 15 6 7 12
1 1
# of methods (k = 4) ranking among methods
# of data (n = 4)
tasks a b c d
A 4 2 1 3
B 3 2 1 4
C 4 1 2 3
D 4 1 3 2
Σ 15 6 7 12
# of methods (k = 4)
Friedman test table.
k n p<0.05 p<0.01
Step 3: Calculate the Friedman test value, χ2r .
12 k 3 6.00 -
r2
nk (k 1) i 1
Ri2 3n(k 1) 4 6.50 8.00
5 6.40 8.40
12
4* 4*5
152 6 2 7 2 12 2 3 * 4 * 5
3 6 7.00 9.00
7 7.14 8.86
8.1
8 6.25 9.00
Step 4: Since significant point for (k,n) = (4,4) is7.80, 9 6.22 9.56
there is/are significant difference(s) somewhere ∞ 5.99 9.21
among four methods, a, b, c, and d (p<0.05). 3 7.40 9.00
8.1
7.8 9.6 4 7.80 9.60
4
significance significance 5 7.80 9.96
point of point of ∞ 7.81 11.34
(p<0.05) (p<0.01)
tasks a b c d
Q1: Where is significant among
A 34 a, b, c, or d?
2 1
B 3 2 1 4
C 4 1 2 3
A1: Apply multiple comparisons between all pairs
D 4 1 3 2
among columns.Σ 15 6 7 12
# of methods
Friedman test table.
(Fisher's PLSD method, Scheffé method, Bonferroni-Dunn test, Dunnett
(k = 4) Nemenyi test, Tukey-Kramer
method, Williams method, Tukey method, k n p<0.05 p<0.01
method,
Step 3: Games/Howell
Calculate method,
the Friedman testDuncan's
value, χnew
2 . multiple range test,
r
Student-Newman-Keuls
12 k method, etc. Each has different characteristics.)
3 6.00 -
r2
nk (k 1)
R
i 1
i
2
3n(k 1) 4 6.50 8.00
5 6.40 8.40
12
4* 4*5
152 6 2 7 2 12 2 3 * 4 * 5
3 6 7.00 9.00
7 7.14 8.86
8.1
8 6.25 9.00
Step 4: Since significant point for (k,n) = (4,4) is7.80, 9 6.22 9.56
there is/are significant difference(s) somewhere ∞ 5.99 9.21
among four methods, a, b, c, and d (p<0.05). 3 7.40 9.00
8.1
7.8 9.6 4 7.80 9.60
4
significance significance 5 7.80 9.96
point of point of ∞ 7.81 11.34
(p<0.05) (p<0.01)
Scheffé's Method of Paired Comparison
2 groups n groups (n > 2)
data
distribution
one-way
・sign test data ・kruskal-wallis test
+
Scheffé's method of paired comparison for Human Subjective Tests
Evolutionary subjective
L
Computation evaluations
B Can ??
Verenda you
room layout Interactive Evolutionary Computation he a r
planning design me ?
IEC
hearing-aid fitting
measuring
mental scale
MEMS design geological simulation
Scheffé's Method of Paired Comparison
slightly slightly
better better even better better
ANOVA
slightly slightly
better better even better better
slightly slightly
better better even better better
significance check
using a yardstick
yes
(原法, 1952) (浦の変法, 1956)
O1 O2 O3 O4 O5 O6
Paired comparisons
slightly slightly
better better even better better
slightly slightly
better better even better better A1 - A3 2 2 1 1 1 1
A2 - A3 1 0 1 1 -1 0
slightly slightly
better better even better better
・・・
Six subjects (N = 6)
Application Example:
What is the best present to be her/his boy/girl friend?
Ex. Q.
・・・・
Results of Scheffé's Method of Paired Comparison (Nakaya's variation)
What is the best present to be her/his boy/girl friend?
(significant difference)
effective
effective
effective
effective
more
more
less
less
I hesitate to accept it
Reality is ...
effective
effective
more
effective
effective
less
more
less
-1 -0.5 0 0.5 1
-1 -0.5 0 0.5 1
yes
(原法, 1952) (浦の変法, 1956)
-2 -1 0 1 2 -2 -1 0 1 2
-2 -1 0 1 2 -2 -1 0 1 2
-2 -1 0 1 2 -2 -1 0 1 2
-2 -1 0 1 2 -2 -1 0 1 2
-2 -1 0 1 2 -2 -1 0 1 2
-2 -1 0 1 2 -2 -1 0 1 2
Scheffé's Method of Paired Comparison
Modified method byy Ura
Step 1: Make paired comparison
table of each human subject.
better slightly even slightly better
better better
A1 A2
-2 -1 0 1 2
A1 A3 A1 A2 A3 A4
-2 -1 0 1 2
A1 0 -1 -1
A1 A4
-2 -1 0 1 2 A2 3 0 0
A2 A3 3 1 -1
A3
-2 -1 0 1 2
A4 3 3 1
A2 A4
-2 -1 0 1 2
A3 A4
-2 -1 0 1 2
・
・
・
・
・
・
A4 A3 A2 A1
1
S
2tN i
( xi xi ) 2 unbiased variance = S/f
where S S , S ( B ) , S , S , S ( B ) , S , ST
1
S ( B ) ( xil xil ) 2 S and f degree of freedom.
2t l i
1
S
2 N i j i
( xij x ji ) 2 S
F=
unbiased variance
1 unbiased variance of S
S x2
Nt (t 1) for F tests.
1
S ( B )
t (t 1) i
x2l S
S ST S S ( B ) S S S ( B )
ST xijl
2
l i j i
Scheffé's Method of Paired Comparison
Modified method byy Ura
1
S
1
( xi xi ) 2 S x2
2tN i Nt (t 1)
1
1
S ( B ) ( xil xil ) 2 S
2t l i
S ( B )
t (t 1) i
x2l S
1 S ST S S ( B ) S S S ( B )
S ( xij x ji ) 2 S
2 N i j i ST xijl
2
ANOVA table. l i j i
yes
(原法, 1952) (浦の変法, 1956)
slightly slightly
better better even better better
-2 -1 0 1 2
slightly slightly
better better even better better
-2 -1 0 1 2
slightly slightly
better better even better better
-2 -1 0 1 2
Questionnaire
Six human subjects (N = 6)
slightly slightly
better better even better better
-2 -1 0 1 2 O1 O2 O3 O4 O5 O6
Paired comparisons
for t=3 objects.
A1 - A2 2 3 3 2 0 1
slightly slightly
better better even better better
A1 - A3 2 0 0 1 1 0
-2 -1 0 1 2
A2 - A3 -3 -2 -1 -1 -3 -2
slightly slightly
better better even better better
-2 -1 0 1 2
Scheffé's Method of Paired Comparison
Modified method byy Nakaya
y
Step 1: Make paired comparison table of each human subject.
xijl : evaluation value when the l-th human subject compares
the i-th object with the j-th object.
ANOVA table.
Scheffé's Method of Paired Comparison
Modified method byy Nakaya
y
Step 4: Apply multiple comparisons between all pairs and
find which distance is significant.
(Fisher's PLSD method, Scheffé method, Bonferroni-Dunn test, Dunnett method,
Williams method, Tukey method, Nemenyi test, Tukey-Kramer method, Games/Howell
method, Duncan's new multiple range test, Student-Newman-Keuls method, etc. Each
has different characteristics.)
where (ˆ , t , N ) are an unbiased variance of Sε, the # of objects, and the #of
2
Y0.05 4.60 1.79 / 3 6 1.4506 (See q0.05 (3,5) in the next slide.)
SUMMARY
1. We overview which statistical test we should use for which case.
2 groups n groups (n > 2)
data
distribution
(related) (independent) (related) (independent)
(Analysis of Variance)
paired unpaired paired unpaired
Parametric Test
ANOVA
one-way
(no normality)
+
Scheffé's method of paired comparison for Human Subjective Tests