0% found this document useful (0 votes)
30 views50 pages

StatisticalTests (2slidesPerPage)

This document discusses various statistical tests that can be used for computational intelligence research and human subjective tests. It provides an overview of parametric and non-parametric tests that can be used for two groups or more than two groups. The document emphasizes that statistical tests are necessary to scientifically demonstrate the superiority of a proposed method over other methods. It also provides examples of how to inappropriately show significance visually versus doing a proper statistical test.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views50 pages

StatisticalTests (2slidesPerPage)

This document discusses various statistical tests that can be used for computational intelligence research and human subjective tests. It provides an overview of parametric and non-parametric tests that can be used for two groups or more than two groups. The document emphasizes that statistical tests are necessary to scientifically demonstrate the superiority of a proposed method over other methods. It also provides examples of how to inappropriately show significance visually versus doing a proper statistical test.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

Statistical Tests for

Computational Intelligence Research and


Human Subjective Tests

Slides are at
http://www.design.kyushu-u.ac.jp/~takagi/TAKAGI/StatisticalTests.html

Hideyuki TAKAGI
Kyushu University, Japan
http://www.design.kyushu-u.ac.jp/~takagi/
ver. July 15, 2013
ver. July 11, 2013
ver. April 23, 2013

Contents
2 groups n groups (n > 2)

data
distribution
(related) (independent) (related) (independent)

(Analysis of Variance)
unpaired
Parametric Test

・unpaired t -test ・ one-way ANOVA


(normality)

ANOVA
paired

・paired t -test ・ two-way ANOVA


unpaired
Non-parametric Test
(no normality)

・Mann-Whitney U-test one-way ・Kruskal-Wallis test


data

・sign test
paired

two-way
・Wilcoxon signed-ranks test data ・Friedman test


Scheffé's method of paired comparison for Human Subjective Tests
How to Show Significance?

Just compare averages visually?


It is not scientific.
fitness

fitness
conventional conventional
EC EC
proposed EC2
proposed EC1
generations generations
Fig. XX Average convergence curves of n times of trial runs.

How to Show Significance?

Sound design concept: exiting

sound made by sound made by sound made by


conventional IEC proposed IEC1 proposed IEC2

Which method is good to make exiting sound?


How to show it?
You cannot show the superiority of
your method without statistical tests.

Papers without statistics tests may be rejected.

My method is
significantly better!

statistical test

Which Test Should We Use?


2 groups n groups (n > 2)

data
distribution
(related) (independent)

(Analysis of Variance)
paired unpaired
Non-parametric Test Parametric Test

・unpaired t -test ・ one-way ANOVA


ANOVA
(normality)

・paired t -test ・ two-way ANOVA


(independent)
unpaired
(no normality)

・Mann-Whitney U-test one-way ・Kruskal-Wallis test


data
(related)

・sign test
paired

two-way
・Wilcoxon signed-ranks test data ・Friedman test
Which Test Should We Use?
2 groups n groups (n > 2)

data
distribution (related) (independent)

(Analysis of Variance)
paired unpaired
Non-parametric Test Parametric Test

・unpaired t -test ・ one-way ANOVA

ANOVA
(normality)

・paired t -test ・ two-way ANOVA


(independent)
unpaired
(no normality)

・Mann-Whitney U-test one-way ・Kruskal-Wallis test


data
n-th generation n-th generation
(related)

・sign test
paired

two-way
・Wilcoxon signed-ranks test data ・Friedman test

Which Test Should we Use?


2 groups n groups (n > 2)

data
distribution
(related) (independent)

(Analysis of Variance)
paired unpaired
Non-parametric Test Parametric Test

・unpaired t -test ・ one-way ANOVA


ANOVA

Normality Test
(normality)

• Anderson-Darling test
・paired t -test
• D'Agostino-Pearson test
・ two-way ANOVA
• Kolmogorov-Smirnov test
(independent)

• Shapiro-Wilk test
unpaired
(no normality)

• Jarque–Bera
・Mann-Whitney U-test test
one-way
data
・Kruskal-Wallis test
・・・・
(related)

・sign test
paired

two-way
・Wilcoxon signed-ranks test data ・Friedman test
Which Test Should We Use?
2 groups n groups (n > 2)

data
distribution (related) (independent)

(Analysis of Variance)
paired unpaired
Non-parametric Test Parametric Test

・unpairedunpaired
t -test data paired data ANOVA
・ one-way

ANOVA
(normality)

(independent) (related)
initial conven
group A group B proposed
data # tional
・paired t 4.23
-test 2.51 1 ・ two-way
4.23 2.51 ANOVA
3.21 3.3 2 3.21 3.30
(independent)
unpaired

3.63 3.75 3 3.63 3.75


(no normality)

・Mann-Whitney U-test one-way ・Kruskal-Wallis test


4.42 3.22 4
data 4.42 3.22

4.08 3.99 5 4.08 3.99


(related)

・sign test
paired

3.98 3.65 two-way


6 3.98 3.65
・Wilcoxon signed-ranks test data ・Friedman test

Which Test Should We Use?


2 groups n groups (n > 2)

data
distribution
(related) (independent)

(Analysis of Variance)
paired unpaired
Non-parametric Test Parametric Test

・unpairedunpaired
t -test data paired data ANOVA
・ one-way
ANOVA
(normality)

(independent) (related)
A group data B group data initial
GA proposed
data #
・paired t -test ・ two-way ANOVA
(independent)
unpaired
(no normality)

・Mann-Whitney U-test one-way ・Kruskal-Wallis test


data
(related)

・sign test
paired

two-way
・Wilcoxon signed-ranks test data ・Friedman test
Which Test Should We Use?
Q1: Which tests are more sensitive,
2those
groups
for unpaired data ornpaired
groups data?
(n > 2)

data
distribution
A1: Statistical tests for paired data
(related) (independent)
because of more data information.

(Analysis of Variance)
paired unpaired
Non-parametric Test Parametric Test

・unpairedunpaired
t -test data paired data ANOVA
・ one-way

ANOVA
(normality)

(independent) (related)
A group data B group data initial
GA proposed
data #
・paired t -test ・ two-way ANOVA
(independent)
unpaired
(no normality)

・Mann-Whitney U-test one-way ・Kruskal-Wallis test


data
(related)

・sign test
paired

two-way
・Wilcoxon signed-ranks test data ・Friedman test

Which Test Should We Use?


Q2: How should you design your experimental
conditions
2 groups to use statisticalntests
groupsfor (n > 2)
paired data and reduce the # of trial runs?
data
distribution
A2: Use the same initialized data for the set of
(method A, method B) at each trial run.
(related) (independent)

(Analysis of Variance)
paired unpaired
Non-parametric Test Parametric Test

・unpaired t -test ・ one-way ANOVA


ANOVA
(normality)

・paired t -test ・ two-way ANOVA


significant?
(independent)
unpaired
(no normality)

・Mann-Whitney U-test one-way ・Kruskal-Wallis test


data
(related)

・sign test
paired

two-way
・Wilcoxon signed-ranks test data ・Friedman test
n-th generation
Which Test Should we Use?
Q3: Which statistical tests are sensitive,
parametric
2 groups tests or non-parametric
n groups ones
(n > 2)
and why?
data
distribution A3: Parametric tests which can use information
of assumed data distribution.
(related) (independent)

(Analysis of Variance)
paired unpaired
Non-parametric Test Parametric Test

・unpaired t -test ・ one-way ANOVA

ANOVA
(normality)

・paired t -test ・ two-way ANOVA


(independent)
unpaired
(no normality)

・Mann-Whitney U-test one-way ・Kruskal-Wallis test


data
(related)

・sign test
paired

two-way
・Wilcoxon signed-ranks test data ・Friedman test

t -Test
2 groups n groups (n > 2)

data
distribution
(related) (independent)

(Analysis of Variance)
paired unpaired
Non-parametric Test Parametric Test

・unpaired t -test ・ one-way ANOVA


ANOVA
(normality)

t -test
・paired t -test ・ two-way ANOVA
(independent)
unpaired
(no normality)

・Mann-Whitney U-test one-way ・Kruskal-Wallis test


data
(related)

・sign test
paired

two-way
・Wilcoxon signed-ranks test data ・Friedman test
t-Test
How to Show Significance?
g

significant?

n-th generation

t-Test

Test this difference with assuming no difference.


(null hypothesis)

A B
significant
12 10
difference?
14 9
14 7
11 15
16 11
19 10

Conditions to use t-tests:


(1) normality
(2) equal variances
t-Test
F-Test
Test this difference with assuming no difference.
(null hypothesis)
Normality Test
A B • Anderson-Darling test
significant
12 When
10 (p > 0.05), we assume • D'Agostino-Pearson test
difference?
14 that
9 there is no significant • Kolmogorov-Smirnov test
difference between σ2A and σ2B .
14 7 • Shapiro-Wilk test
11 15 • Jarque–Bera test
・・・・
16 11
19 10

Conditions to use t-tests:


(1) normality
(2) equal variances

t-Test
Excel (32 bits version only?) has t-tests and ANOVA in Data Analysis
Tools. You must install its add-in. (File -> option -> add-in, and set its add-in.)
t-Test
(1) t-Test: Pairs two sample for means

significant? This is a case when each pair of two


methods with the same initial condition.

n-th generation

(2) t-Test: Two-sample assuming (3) t-Test: Two-sample assuming


equal variances unequal variances: Welch's t-test

t-Test
sample data t-Test: Paired Two Sample for Means
A B Variable 1 Variable 2
4.23 2.51 Mean 3.897 3.544
Variance 0.125823333 0.208693333
3.21 3.31
Observations 10 10
3.63 3.75 Pearson Correlation -0.161190073
4.42 3.22 Hypothesized Mean
0
4.08 3.99 Difference
3.98 3.65 df 9
t Stat 1.794964241
3.68 3.35
P(T<=t) one-tail 0.053116886
4.18 3.93 t Critical one-tail 1.833112933
3.85 3.91 P(T<=t) two-tail 0.106233772
3.71 3.82 t Critical two-tail 2.262157163
t-Test
sample data
When p-value is lesst-Test: Paired
than 0.01 Two
or 0.05, weSample
assume that for Means
there is significant difference with the level of significance
A B
of (p < 0.01) or (p < 0.05).
Variable 1 Variable 2
4.23 2.51 Mean 3.897 3.544
Variance 0.125823333 0.208693333
3.21 3.31
Observations 10 10
3.63 3.75
2.5% Pearson
2.5%Correlation -0.161190073 5%
4.42 3.22
A>B A ≈ BHypothesized
A<B Mean When A>B never happens,
0
4.08 3.99 Difference you may use a one-tail test.
3.98 3.65 df 9
t Stat 1.794964241
3.68 3.35
P(T<=t) one-tail 0.053116886
4.18 3.93 t Critical one-tail 1.833112933
3.85 3.91 P(T<=t) two-tail 0.106233772
3.71 3.82 t Critical two-tail 2.262157163

t-Test

(1) t-Test: Pairs two sample for means (2) t-Test: Two-sample assuming
equal variances

Difference between two groups We cannot say that there is


is significant (p < 0.01). a significant difference
between two group.
ANOVA: Analysis of Variance
2 groups n groups (n > 2)

data
distribution (related) (independent)

(Analysis of Variance)
paired unpaired
Non-parametric Test Parametric Test

・unpaired t -test ・ one-way ANOVA

ANOVA
(normality)

ANOVA
・paired t -test ・ two-way ANOVA
(independent)
unpaired
(no normality)

・Mann-Whitney U-test one-way ・Kruskal-Wallis test


data
(related)

・sign test
paired

two-way
・Wilcoxon signed-ranks test data ・Friedman test

ANOVA: Analysis of Variance significant?

n-th generation
ANOVA: Analysis of Variance
1. Analysis of more than two data groups.
2. Normality and equal variance are required.

Excel has ANOVA in


Data Analysis Tools.
A B C
11.0 12.8 9.4
9.3 11.3 12.4
11.5 9.5 16.8 C A B
16.4 14.0 14.3
16.0 15.2 17.0
15.0 13.0 14.6
12.8 12.4 17.0
13.6 15.0 14.3
13.0 12.4 15.6
12.0 17.8 15.0
13.4 12.6 18.6
10.0 13.4 12.4
10.8 16.8 15.4

ANOVA: Analysis of Variance


1. Analysis of more than two data groups.
2. Normality and equal variance are required.

Excel has ANOVA in


Data Analysis
Check Tools .
it using the Bartlett test.
A B C
11.0 12.8 9.4
9.3 11.3 12.4
11.5 9.5 16.8 C A B
16.4 14.0 14.3
16.0 15.2 17.0
15.0 13.0 14.6
12.8 12.4 17.0
13.6 15.0 14.3
13.0 12.4 15.6
12.0 17.8 15.0
three t-tests = one ANOVA
13.4 12.6 18.6
10.0 13.4 12.4 Three times of t-test with (p<0.05) equivalent
10.8 16.8 15.4 one ANOVA (p<0.14). 1-(1-0.05)3 = 0.14
ANOVA: Analysis of Variance

n-th generation

When data are independent, use When data correspond each other, use
one-way ANOVA (single factor ANOVA). two-way ANOVA (two-factor ANOVA).

ANOVA: Analysis of Variance

Q1: What are "single factor" and "two factors"?

A1: A column factor (e.g. three groups) and


a sample factor (e.g. initialized condition).

When data are independent, use When data correspond each other, use
one-way ANOVA
column(single
factor factor ANOVA). two-way ANOVA (two-factor
column factor ANOVA).
sample factor
ANOVA: Analysis of Variance
one-factor (one-way) ANOVA two-factor (two-way) ANOVA

column factor column factor

group A group B group C initial


group A group B group C
4.23 2.51 3.04 condition
3.21 3.3 2.89 #1 4.23 2.51 3.04

sample factor
3.63 3.75 3.55 #2 3.21 3.3 2.89
4.42 3.22 4.39 #3 3.63 3.75 3.55
4.08 3.99 3.86 #4 4.42 3.22 4.39
3.98 3.65 3.5 #5 4.08 3.99 3.86
3.75 2.62 3.6 #6 3.98 3.65 3.5
3.22 2.93 3.21 #7 3.75 2.62 3.6
#8 3.22 2.93 3.21

We cannot say that three groups There are significant difference


are significantly different. (p=0.089) somewhere among three groups.
(p<0.05)

ANOVA: Analysis of Variance


Output of the one-way ANOVA
Source of Variation SS df MS F P-value F crit
Between Groups 6.11342 2 3.05671 15.30677 3.6E-05 3.354131
Within Groups 5.39181 27 0.199697

Total 11.50523 29 When (p-value < 0.01 or 0.05),


there is(are) significant difference
somewhere among data groups.
Column
Output of the two-way ANOVA factor
Source of Variation SS df MS F P-value F crit A B C

Sample 0.755233 2 0.377617 2.755097 0.103596 3.885294 11.0 12.8 9.4


9.3 11.3 12.4
Columns 3.582272 1 3.582272 26.13631 0.000256 4.747225
Sample factor

11.5 9.5 16.8


Interaction 0.139411 2 0.069706 0.508573 0.613752 3.885294
16.4 14.0 14.3
Within 1.644733 12 0.137061 16.0 15.2 17.0
15.0 13.0 14.6
Total 6.12165 17 12.8 12.4 17.0
• Significant difference among Sample (e.g. initial 13.6 15.0 14.3

conditions) cannot be found (p > 0.05).


13.0 12.4 15.6
12.0 17.8 15.0
• Significant difference can be found somewhere 13.4 12.6 18.6

among Columns (e.g. three methods) (p < 0.01). 10.0


10.8
13.4
16.8
12.4
15.4
• We need not care an interaction effect between two
factors (e.g. initial condition vs. methods) (p > 0.05).
ANOVA: Analysis of Variance
Q1: Where is significant among A, B, and C?
A1: Apply multiple comparisons between all pairs
among columns.
(Fisher's PLSD method, Scheffé method, Bonferroni-Dunn test, Dunnett
method, Williams method, Tukey method, Nemenyi test, Tukey-Kramer
method, Games/Howell method, Duncan's new multiple range test,
Student-Newman-Keuls method, etc. Each has different characteristics.)

Column
factor
Source of Variation SS df MS F P-value F crit A B C

Sample 0.755233 2 0.377617 2.755097 0.103596 3.885294 11.0 12.8 9.4


9.3 11.3 12.4
Columns 3.582272 1 3.582272 26.13631 0.000256 4.747225

Sample factor
11.5 9.5 16.8
Interaction 0.139411 2 0.069706 0.508573 0.613752 3.885294
16.4 14.0 14.3
Within 1.644733 12 0.137061 16.0 15.2 17.0
15.0 13.0 14.6
Total 6.12165 17 12.8 12.4 17.0
13.6 15.0 14.3
13.0 12.4 15.6
12.0 17.8 15.0
significant? 13.4 12.6 18.6
10.0 13.4 12.4
10.8 16.8 15.4

Non-Parametric Tests
2 groups n groups (n > 2)

data
distribution
(related) (independent)

(Analysis of Variance)
paired unpaired
Non-parametric Test Parametric Test

・unpaired t -test ・ one-way ANOVA


ANOVA
(normality)

If normality and equal variances are not


guaranteed, use non-parametric tests.
・paired t -test ・ two-way ANOVA
(independent)
unpaired

・Mann-Whitney U-test
(no normality)

one-way ・Kruskal-Wallis test


data
(related)

・sign test
paired

two-way
・Wilcoxon signed-ranks test data ・Friedman test
Mann-Whitney U-test
2 groups n groups (n > 2)

data
distribution (related) (independent)

(Analysis of Variance)
paired unpaired
Non-parametric Test Parametric Test

・unpaired t -test ・ one-way ANOVA

ANOVA
(normality)

・paired t -test ・ two-way ANOVA


(independent)
unpaired

・Mann-Whitney U-test
(no normality)

one-way ・Kruskal-Wallis test


data
(related)

・sign test
paired

two-way
・Wilcoxon signed-ranks test data ・Friedman test

Mann-Whitney U-test
(Wilcoxon-Mann-Whitney test, two sample Wilcoxon test)

1. Comparison of two groups.


2. Data have no normality.
3. There are no data corresponding
between two groups (independent).

?
no normality

n-th generation
Mann-Whitney U-test
(Wilcoxon-Mann-Whitney test, two sample Wilcoxon test)

1. Calculate a U value.
0

2
3
4

U =0+2+3+4=9
U' = 11 (U + U' = n1n2)
( when )
two values are the same,
count as 0.5.

Mann-Whitney U-test (cont.)


(Wilcoxon-Mann-Whitney test, two sample Wilcoxon test)

2. See a U-test table.


• Use the smaller value of U or U'.
• When n1 ≤ 20 and n2 ≤ 20 , see a Mann-Whitney test table.
(where n1 and n2 are the # of data of two groups.)
• Otherwise, since U follows the below normal distribution roughly,
 n n n n (n  n  1) 
 
N U ,  U2  N  1 2 , 1 2 1 2 
 2 12 
U  U
normalize U as z  and check a standard normal distribution table
U
n1n2 n1n2 (n1  n2  1)
with the z, where U  and  U  .
2 12
Examples: Mann-Whitney U-test
(Wilcoxon-Mann-Whitney test, two sample Wilcoxon test)

Ex.1 0 Ex.2 0 Ex.3


0.5 3.5
2 2.5 5
5
3 4 5
4 5 5

U = 9 U = 12 U = 23.5
U' = 11 U' = 13 U' = 1.5

(p < 0.05) (p < 0.01)


n2 n2
n1 4 5 6 ・・・ n1 4 5 6 ・・・
・・・ ー ・・・ ・・・ ・・・ ・・・ ー ・・・ ・・・ ・・・
4 0 1 2 ・・・ 4 ー ー 0 ・・・
5 2 3 ・・・ 5 1 1 ・・・
・・・ ・・・ ・・・ ・・・ ・・・ ・・・

Exercise: Mann-Whitney U-test


(Wilcoxon-Mann-Whitney test, two sample Wilcoxon test)

2.5

4
5
6
6
6

U = 29.5
significance is not found )
(Since U' > 5, (p > 0.05):
U' = 6.5
(p < 0.05) (p < 0.01)
n2 n2
n1 4 5 6 7 n1 4 5 6 7
3 ー 0 1 1 3 ー ー ー ー

4 0 1 2 3 4 ー ー 0 0
5 2 3 5 5 1 1 1
6 5 6 6 2 3
Sign Test
2 groups n groups (n > 2)

data
distribution (related) (independent)

(Analysis of Variance)
paired unpaired
Non-parametric Test Parametric Test

・unpaired t -test ・ one-way ANOVA

ANOVA
(normality)

・paired t -test ・ two-way ANOVA


(independent)
unpaired

・Mann-Whitney U-test
(no normality)

one-way ・Kruskal-Wallis test


data
(related)

・sign test
paired

two-way
・Wilcoxon signed-ranks test data ・Friedman test

Sign Test
(1)Sign Test
significance test between the # of winnings and losses

(2)Wilcoxon's Signed Ranks Test


significance test using both the # of winnings and losses
and the level of winnings/losses

data of # of winnings the level of


2 groups and losses winnings/losses
173 174 - + -1
143 137 + - +6
158 151 + - +7
156 143 + - +13
176 180 - + -4
165 162 + - +3
Sign Test
1. Calculate the # of winnings and
losses by comparing runs with the
same initial data.
2. Check a sign test table to show
significance of two methods.

n-th generation

Generations

Sign Test
0 10 20 30 40 50
|__________|__________|__________|__________|__________|
F1: DE_N vs. DE_LR +++++++++++++++++++++++++++++++++++++++++++++++++
F1: DE_N vs. DE_LS +++++++++++++++++++++++++++++++++++++++++++++++++
F1: DE_N vs. DE_FR_GLB_nD + +++++++++++++++++++++++++++++++++++++
F1: DE_N vs. DE_FR_LOC_nD ++ ++++++++++++++++++++++++++++++++++++
F1: DE_N vs. DE_FR_GLB_1D + +++++++++++++++++++++++++++++++++++++
Fig.3 in F1: DE_N vs. DE_FR_LOC_1D +++ ++++++++++++++++++++++++++++++++++++
F2: DE_N vs. DE_LR + ++++++++++++++++++++++++++++++++++++++++++
Y. Pei and H. Takagi, "Fourier analysis of the fitness F2: DE_N vs. DE_LS ++++++++++++++++++++++++++++++++++++++++++++
landscape for evolutionary search acceleration," IEEE F2: DE_N vs. DE_FR_GLB_nD
Congress on Evolutionary Computation (CEC), pp.1-7, F2: DE_N vs. DE_FR_LOC_nD
Brisbane, Australia (June 10-15, 2012). F2: DE_N vs. DE_FR_GLB_1D
F2: DE_N vs. DE_FR_LOC_1D
The (+,-) marks show whether our proposed F3: DE_N vs. DE_LR
F3: DE_N vs. DE_LS
methods converge significantly better or poorer F3: DE_N vs. DE_FR_GLB_nD ++++++++++++++++++++++++
than normal DE, respectively, (p ≤0.05). F3: DE_N vs. DE_FR_LOC_nD
F3: DE_N vs. DE_FR_GLB_1D
F3: DE_N vs. DE_FR_LOC_1D ++
F4: DE_N vs. DE_LR +++++++++++++++++++++++++++++++++++++++++++++++++
F4: DE_N vs. DE_LS ++++++++++++++++++++++++++++++++++++++++++++++++++
F4: DE_N vs. DE_FR_GLB_nD ++++++++++++++++++++++++++++++++++++++++++++++
F4: DE_N vs. DE_FR_LOC_nD
F4: DE_N vs. DE_FR_GLB_1D + ++++++++++++++++++++++++++++++++++++++++++++++
F4: DE_N vs. DE_FR_LOC_1D ++++++++++++++++++++++++++++++++++++++++++++++
F5: DE_N vs. DE_LR + ++ +
F5: DE_N vs. DE_LS
F5: DE_N vs. DE_FR_GLB_nD +++++++++++++++++++
F5: DE_N vs. DE_FR_LOC_nD ++
F5: DE_N vs. DE_FR_GLB_1D +++
F5: DE_N vs. DE_FR_LOC_1D +
F6: DE_N vs. DE_LR ++++++++++++ + +++++++
F6: DE_N vs. DE_LS +++++++++++++++++++++++++++++++++++ +++++
F6: DE_N vs. DE_FR_GLB_nD + + +++++++++++++++++++++++++++++++++++
F6: DE_N vs. DE_FR_LOC_nD +++++++++++++++++++++++++++++++++
F6: DE_N vs. DE_FR_GLB_1D + + +++++++++++++++++++++++++++++++++++
F6: DE_N vs. DE_FR_LOC_1D +++++++++ ++++++++++++++++++++++ +++
F7: DE_N vs. DE_LR +
F7: DE_N vs. DE_LS
F7: DE_N vs. DE_FR_GLB_nD ++++++ + ++
F7: DE_N vs. DE_FR_LOC_nD +
F7: DE_N vs. DE_FR_GLB_1D +
F7: DE_N vs. DE_FR_LOC_1D
F8: DE_N vs. DE_LR +++++++++++++++++++++++++++++++++++
F8: DE_N vs. DE_LS ++++++++++++++++++++++++++++++++++++++++++++++++
F8: DE_N vs. DE_FR_GLB_nD +++++++++++++++++++++++++++++++++++++++++++++++
F8: DE_N vs. DE_FR_LOC_nD +
F8: DE_N vs. DE_FR_GLB_1D +++++++++++++++++++++++++++++++++++
Fig.2 in the same paper. F8: DE_N vs. DE_FR_LOC_1D +++ +++++++++++++++++++++++++++++++++++++++++++++
level of level of
Sign Test significance
% %
significance
% %

Task Example
Whether performances of pattern recognition
methods A and B are significantly different?

n1 cases: Both methods succeeded.


n2 cases: Method A succeeded, and method B failed.
n3 cases: Method A failed, and method B succeeded.
n4 cases: Both methods failed.

How to check?
1. Set N = n2 + n3.
2. Check the right table with the N.
3. If min(n2, n3) is smaller than the number for the N,
we can say that there is significant difference with
the significant risk level of XX.

Exercise
Whether there is significant difference for
n2 = 12 and n3 = 28?
ANSWER:
Check the right table with N = 40.
As n2 is bigger than 11 and smaller than 13, we can say
that there is a significant difference between two with
(p < 0.05) but cannot say so with (p < 0.01).

Sign Test level of significance


% %

Let's think about the case of N = 17.

To say that n1 and n2 are significantly different,


(n1 vs. n2) = (17 vs. 0), (16 vs. 1), or (15 vs. 2) (p < 0.01)
or
(n1 vs. n2) = (14 vs. 3) or (13 vs. 4) (p < 0.05)
Exercise: Sign Test level of significance
% %

Check the significance of:


16 vs. 4

14 vs. 1

9 vs. 3

18 vs. 5

Wilcoxon Signed-Ranks Test


2 groups n groups (n > 2)

data
distribution
(related) (independent)

(Analysis of Variance)
paired unpaired
Non-parametric Test Parametric Test

・unpaired t -test ・ one-way ANOVA


ANOVA
(normality)

・paired t -test ・ two-way ANOVA


(independent)
unpaired

・Mann-Whitney U-test
(no normality)

one-way ・Kruskal-Wallis test


data
(related)

・sign test
paired

two-way
・Wilcoxon signed-ranks test data ・Friedman test
Wilcoxon Signed-Ranks Test
Q: When a sign test could not show significance,
how to do?
A: Try the Wilcoxon signed-ranks test. It is more sensitive
than a simple sign test due to more information use.

n-th generation

Wilcoxon Signed-Ranks Test


(1)Sign Test
significance test between the # of winnings and losses

(2)Wilcoxon's Signed Ranks Test


significance test using both the # of winnings and losses
and the level of winnings/losses

data of # of winnings the level of


2 groups and losses winnings/losses
173 174 - + -1
143 137 + - +6
158 151 + - +7
156 143 + - +13
176 180 - + -4
165 162 + - +3
Wilcoxon Signed-Ranks Test

Example: (step 1) (step 2) (step 3) (step 4)


add sign to rank of fewer
v (system A) v (system B) difference d rank of |d|
the ranks # of signs
182 163 19 7 7
169 142 27 8 8
172 173 -1 1 -1 1
143 137 6 4 4
158 151 7 5 5
156 143 13 6 6
176 172 4 3 3
165 168 -3 2 -2 2
n8 (step 5) T   # of ( Step 4)
3

(step 6)
Wilcoxon test table

Wilcoxon Test Table:


(step 6) significance point of T
n=8 one-tail p < 0.025 p < 0.005
T=3 two-tail p < 0.05 p < 0.01
T=3 ≤ 3 (n=8, p<0.05), n= 6 0
then difference between systems A and B is significant. 7 2
8 3 0
T=3 > 0 (n=8, p<0.01), 9 5 1
then we cannot say there is a significant difference. 10 8 3
11 10 5
12 13 7
13 17 9
14 21 12
When n > 25 15 25 15
As T follows the below normal distribution roughly, 16 29 19
17 34 23
 n(n  1) n(n  1)(2n  1) 
 
N T ,  T2  N  ,  18 40 27
 4 24  19 46 32
normalize T as the below and check a standard 20 52 37
normal distribution table with the z; see T and  T 21 58 42
in the above equation. 22 65 48
T  T 23 73 54
z 24 81 61
T 25 89 68
Wilcoxon Signed-Ranks Test
(step 1) (step 2) (step 3) (step 4)
add sign to rank of fewer
v (system A) v (system B) difference d rank of |d|
the ranks # of signs
176 163 13 7 → 6.5 Tip #2 6.5
142 142 0 Tip #1
172 173 -1 1 -1 1
143 137 6 4 4
158 151 7 5 5
156 143 13 6 → 6.5 Tip #2 6.5
176 172 4 3 3
165 168 -3 2 -2 2

Give the average rank


Tips: 6.5 = (5+6+7+8)/4. 10
1. When d = 0, ignore the data. 9
2. When there are the same ranks of |d|, 5 7 8
4
2
give average ranks. 1

3 6

Exercise 1: Wilcoxon Signed-Ranks Test


(step 1) (step 2) (step 3) (step 4)
v (system A) v (system B) difference d rank of |d| add sign to rank of fewer
the ranks # of signs
182 163
169 142
173 172
143 137
158 151
156 143
176 172
165 168

n= (step 5) T   # of ( Step 4)
T=
(step 6)
Wilcoxon test table
Exercise 1: Wilcoxon Signed-Ranks Test
(step 1) (step 2) (step 3) (step 4)
add sign to rank of fewer
v (system A) v (system B) difference d rank of |d|
the ranks # of signs
182 163 19 7 7
169 142 27 8 8
173 172 1 1 1
143 137 6 4 4
158 151 7 5 5
156 143 13 6 6
176 172 4 3 3
165 168 -3 2 -2 2

n=8 (step 5) T   # of ( Step 4)


T=2
(step 6)
Wilcoxon test table
As T(=2) < 3, there is a significant difference between A and B (p<0.05).
But, as 0 < T(=2), we cannot say so with the significance level of (p<0.01).

Exercise 2: Wilcoxon Signed-Ranks Test


(step 1) (step 2) (step 3) (step 4)
v (system A) v (system B) difference d rank of |d| add sign to rank of fewer
the ranks # of signs
27 31
20 25
34 33
25 27
31 31
23 29
26 27
24 30
35 34

n= (step 5) T   # of ( Step 4)

(step 6)
Wilcoxon test table
Exercise 2: Wilcoxon Signed-Ranks Test
(step 1) (step 2) (step 3) (step 4)
add sign to rank of fewer
v (system A) v (system B) difference d rank of |d|
the ranks # of signs
27 31 -4 5 -5
20 25 -5 6 -6
34 33 1 2 2 2
25 27 -2 4 -4
31 31 0 (No need to care the case of d = 0.)
23 29 -6 7.5 -7.5
26 27 -1 2 -2
24 30 -6 7.5 -7.5
35 34 1 2 2 2

n = 8 (no count for d = 0.) (step 5) T   # of ( Step 4)


T=4
(step 6)
Wilcoxon test table
As T > 3, we cannot say that there is a
significant difference between A and B.

Exercise 3: Wilcoxon Signed-Ranks Test

Explain how to apply this test to test


whether two groups are significantly
different at the below generation?

n-th generation
Kruskal-Wallis Test
2 groups n groups (n > 2)

data
distribution (related) (independent)

(Analysis of Variance)
paired unpaired
Non-parametric Test Parametric Test

・unpaired t -test ・ one-way ANOVA

ANOVA
(normality)

・paired t -test ・ two-way ANOVA


(independent)
unpaired

・Mann-Whitney U-test ・Kruskal-Wallis test


(no normality)

(related)

・sign test
paired

two-way
・Wilcoxon signed-ranks test data ・Friedman test

Kruskal-Wallis Test
1. Comparison of more than two groups.
2. Data have no normality.
3. There are no data corresponding
no normality

among groups (independent).


?
?
?

n-th generation
Kruskal-Wallis Test
Let's use ranks of data.
1
2
3
4
5
6 7
8 9
10
11 12
13
14
15
16

17

Kruskal-Wallis Test
N: total # of data
k: # of groups How to Test
ni: # of data of group i
Ri : sum of ranks of group i 1. Rank all data.
2. Calculate N, k, and Ri .
1 3. Calculate statistical value H.
2
3 12 k
Ri2
4
5
H   3( N  1)
N ( N  1) i 1 ni
6 7
8 9 4. If k = 3 and N ≤ 17, compare
11
10 the H with a significant point
13 12
14 in a Kruskal-Wallis test table.
15
16 Otherwise, assume that H
17
follows the χ2 distribution and
test the H using a χ2
distribution table of (k-1)
R1 = 38 R2 = 69 R3 = 46
degrees of freedom
Kruskal-Wallis Test Table
Example: Kruskal-Wallis Test n1 n2
(for k = 3 and N ≤17)
n3 p < 0.05 p < 0.01 n1 n2 n3 p < 0.05 p < 0.01
2 2 2 - - 3 3 3 5.606 7.200
4.714 5.791 6.746
N = n1+n2+n3 = 17 data 2
2
2
2
3
4 5.333
-
-
3
3
3
3
4
5 6.649 7.079
2 2 5 5.160 6.533 3 3 6 5.615 7.410
k = 3 groups 2 2 6 5.346 6.655 3 3 7 5.620 7.228
2 2 7 5.143 7.000 3 3 8 5.617 7.350
(n1, n2, n3) = (6, 5, 6) 2 2 8 5.356 6.664 3 3 9 5.589 7.422
2 2 9 5.260 6.897 3 3 10 5.588 7.372
(R1, R2, R3) = (38, 69, 46) 2
2
2
2
10
11
5.120
5.164
6.537
6.766
3
3
3
4
11
4
5.583
5.599
7.418
7.144
2 2 12 5.173 6.761 3 4 5 5.656 7.445
2 2 13 5.199 6.792 3 4 6 5.610 7.500

12 k
Ri2 2 3 3 5.361 - 3 4 7 5.623 7.550

H   3( N  1)
N ( N  1) i 1 ni
2
2
2
3
3
3
4
5
6
5.444
5.251
5.349
6.444
6.909
6.970
3
3
3
4
4
4
8
9
10
5.623
5.652
5.661
7.585
7.614
7.617
2 3 7 5.357 6.839 3 5 5 5.706 7.578
2 3 8 5.316 7.022 3 5 6 5.602 7.591
12  38 * 38 69 * 69 46 * 46  2 3 9 5.340 7.006 3 5 7 5.607 7.697
      3(17  1) 2 3 10 5.362 7.042 3 5 8 5.614 7.706
17(17  1)  6 5 6  2
2
3
3
11
12
5.374
5.350
7.094
7.134
3
3
5
6
9
6
5.670
5.625
7.733
7.725
2 4 4 5.455 7.036 3 6 7 5.689 7.756
= 6.609 2
2
4
4
5
6
5.273
5.340
7.205
7.340
3
3
6
7
8
7
5.678
5.688
7.796
7.810
2 4 7 5.376 7.321 4 4 4 5.692 7.654
2 4 8 5.393 7.350 4 4 5 5.657 7.760
Since significant points of (p<0.05) and (p<0.01) 2 4 9 5.400 7.364 4 4 6 6.681 7.795
2 4 10 5.345 7.357 4 4 7 5.650 7.814
for (n1, n2, n3) = (6, 5, 6) are 5.765 and 8.124, 2 4 11 5.365 7.396 4 4 8 5.779 7.853
2 5 5 5.339 7.339 4 4 9 5.704 7.910
respectively, there are significant difference(s) 2 5 6 5.339 7.376 4 5 5 5.666 7.823
2 5 7 5.393 7.450 4 5 6 5.661 7.936
somewhere among three groups (p<0.05). 2 5 8 5.415 7.440 4 5 7 5.733 7.931
2 5 9 5.396 7.447 4 5 8 5.718 7.992
2 5 10 5.420 7.514 4 6 6 5.724 8.000
6.609 2 6 6 5.410 7.467 4 6 7 5.706 8.039
5.765 8.124 2 6 7 5.357 7.491 5 5 5 5.780 8.000
2 6 8 5.404 7.522 5 5 6 5.729 8.028
significance significance 2 6 9 5.392 7.566 5 5 7 5.708 8.108
point of point of 2 7 7 5.398 7.491 5 6 6 5.765 8.124
(p<0.05) (p<0.01) 2 7 8 5.403 7.571

Kruskal-Wallis Test Table


Example: Kruskal-Wallis Test n1 n2
(for k = 3 and N ≤17)
n3 p < 0.05 p < 0.01 n1 n2 n3 p < 0.05 p < 0.01
2 2 2 - - 3 3 3 5.606 7.200
4.714 5.791 6.746
N = n1+n2+n3 = 17 data 2
2
2
2
3
4 5.333
-
-
3
3
3
3
4
5 6.649 7.079
2 2 5 5.160 6.533 3 3 6 5.615 7.410
k = 3 groups 2 2 6 5.346 6.655 3 3 7 5.620 7.228
2 2 7 5.143 7.000 3 3 8 5.617 7.350
(n1, n2, n3) = (6, 5, 6) 2 2 8 5.356 6.664 3 3 9 5.589 7.422
2 2 9 5.260 6.897 3 3 10 5.588 7.372
(R1, R2, R3) = (38, 69, 46) 2
2
2
2
10
11
5.120
5.164
6.537
6.766
3
3
3
4
11
4
5.583
5.599
7.418
7.144
Q1: Where is significant among A, B, and C? 2
2
2
2
12
13
5.173
5.199
6.761
6.792
3
3
4
4
5
6
5.656
5.610
7.445
7.500

12 k
Ri2 2 3 3 5.361 - 3 4 7 5.623 7.550

H A1: Apply multiple


 3( Ncomparisons
 1) between all pairs 2
2
3
3
4
5
5.444
5.251
6.444
6.909
3
3
4
4
8
9
5.623
5.652
7.585
7.614
N  1) i 1columns.
N (among ni 2
2
3
3
6
7
5.349
5.357
6.970
6.839
3
3
4
5
10
5
5.661
5.706
7.617
7.578
(Fisher's PLSD method, Scheffé method, Bonferroni-Dunn test, Dunnett
2 3 8 5.316 7.022 3 5 6 5.602 7.591
3
12 Williams Ri2 method,
38 * 38 Tukey
69 * 69method, 
46 * 46Nemenyi 2 3 9 5.340 7.006 3 5 7 5.607 7.697
method,

method, 
17(17 Games/Howell
 
1) i 1 ni  6 method,

5 Duncan's6 new
17 test,
2
2
1) Tukey-Kramer
  3(multiple3
3
10 5.362
range test,
11 5.374
7.042
7.094
3
3
5
5
8
9
5.614
5.670
7.706
7.733
Student-Newman-Keuls method, etc. Each has different characteristics.)
2
2
3
4
12
4
5.350
5.455
7.134
7.036
3
3
6
6
6
7
5.625
5.689
7.725
7.756
= 6.609 2
2
4
4
5
6
5.273
5.340
7.205
7.340
3
3
6
7
8
7
5.678
5.688
7.796
7.810
2 4 7 5.376 7.321 4 4 4 5.692 7.654
2 4 8 5.393 7.350 4 4 5 5.657 7.760
Since significant points of (p<0.05) and (p<0.01) 2 4 9 5.400 7.364 4 4 6 6.681 7.795
2 4 10 5.345 7.357 4 4 7 5.650 7.814
for (n1, n2, n3) = (6, 5, 6) are 5.765 and 8.124, 2 4 11 5.365 7.396 4 4 8 5.779 7.853
2 5 5 5.339 7.339 4 4 9 5.704 7.910
respectively, there are significant difference(s) 2 5 6 5.339 7.376 4 5 5 5.666 7.823
2 5 7 5.393 7.450 4 5 6 5.661 7.936
somewhere among three groups (p<0.05). 2 5 8 5.415 7.440 4 5 7 5.733 7.931
2 5 9 5.396 7.447 4 5 8 5.718 7.992
2 5 10 5.420 7.514 4 6 6 5.724 8.000
6.609 2 6 6 5.410 7.467 4 6 7 5.706 8.039
5.765 8.124 2 6 7 5.357 7.491 5 5 5 5.780 8.000
2 6 8 5.404 7.522 5 5 6 5.729 8.028
significance significance 2 6 9 5.392 7.566 5 5 7 5.708 8.108
point of point of 2 7 7 5.398 7.491 5 6 6 5.765 8.124
(p<0.05) (p<0.01) 2 7 8 5.403 7.571
Exercise: Kruskal-Wallis Test
N = n1+n2+n3 = 13 samples
1
k = 3 groups
2 (n1, n2, n3) = ( 5, 4, 4)
3 (R1, R2, R3) = (24, 44, 23)
4
5
12 k
Ri2
7
6 H   3( N  1)
N ( N  1) i 1 ni
8 9
= 6.227
10
11
12 There is/are significant difference(s)
13 somewhere among three groups (p<0.05).
6.227
5.657 7.760

R1 = 24 R2 = 44 R3 = 23 significance significance
point of point of
(p<0.05) (p<0.01)

Friedman Test
2 groups n groups (n > 2)

data
distribution
(related) (independent)

(Analysis of Variance)
paired unpaired
Non-parametric Test Parametric Test

・unpaired t -test ・ one-way ANOVA


ANOVA
(normality)

・paired t -test ・ two-way ANOVA


(independent)
unpaired

・Mann-Whitney U-test one-way


(no normality)

・Kruskal-Wallis test
data
(related)

・sign test
paired

・Wilcoxon signed-ranks test ・Friedman test


Friedman Test
When
(1) more than two groups,
(2) data have correspondence (not independent), but
(3) the conditions of two-way ANOVA are not satisfied,
Let' use ranks of data and Friedman test.

methods
(ex.) Comparison of recognition rates.
a b c d
benchmark methods 4
tasks a b c d 3
A 0.92 0.75 0.65 0.81 2
1
B 0.48 0.45 0.41 0.52 4
4 3 2
C 0.56 0.41 0.47 0.50 1 4
3 2 2 3
D 0.61 0.50 0.56 0.54
1 1

Friedman Test
Step 1: Make a ranking table. methods
Step 2: Sum ranks of the factor that you want to test. a b c d
4
benchmark method
tasks a b c d 3
2
A 4 2 1 3 1
# of data
(n = 4)

B 3 2 1 4
C 4 1 2 3
4 3 2
4 1
D 4 1 3 2 2 4
3 2 3
Σ 15 6 7 12
1 1
# of methods (k = 4) ranking among methods

Step 3: Calculate the Friedman test value, χ2r .


k
12
 
2
r 
nk (k  1) i 1
Ri2  3n(k  1)
where (k, n) are the # of levels of factors 1 and 2.

Step 4: If k =3 or 4, compare χ2r with a significant point


in a Friedman test table.
Otherwise, use a χ2 table of (k-1) degrees of freedom.
Example: Friedman Test
Step 1: Make a ranking table.
Step 2: Sum ranks of the factor that you want to test.
benchmark method

# of data (n = 4)
tasks a b c d
A 4 2 1 3
B 3 2 1 4
C 4 1 2 3
D 4 1 3 2
Σ 15 6 7 12

# of methods (k = 4)
Friedman test table.
k n p<0.05 p<0.01
Step 3: Calculate the Friedman test value, χ2r .
12 k 3 6.00 -
 r2  
nk (k  1) i 1
Ri2  3n(k  1) 4 6.50 8.00
5 6.40 8.40

12
4* 4*5
 
152  6 2  7 2  12 2   3 * 4 * 5
3 6 7.00 9.00
7 7.14 8.86
 8.1
8 6.25 9.00
Step 4: Since significant point for (k,n) = (4,4) is7.80, 9 6.22 9.56
there is/are significant difference(s) somewhere ∞ 5.99 9.21
among four methods, a, b, c, and d (p<0.05). 3 7.40 9.00
8.1
7.8 9.6 4 7.80 9.60
4
significance significance 5 7.80 9.96
point of point of ∞ 7.81 11.34
(p<0.05) (p<0.01)

Example: Friedman Test


Step 1: Make a ranking table.
Step 2: Sum ranks of the factor that you want to test.
benchmark method
# of data (n = 4)

tasks a b c d
Q1: Where is significant among
A 34 a, b, c, or d?
2 1
B 3 2 1 4
C 4 1 2 3
A1: Apply multiple comparisons between all pairs
D 4 1 3 2
among columns.Σ 15 6 7 12

# of methods
Friedman test table.
(Fisher's PLSD method, Scheffé method, Bonferroni-Dunn test, Dunnett
(k = 4) Nemenyi test, Tukey-Kramer
method, Williams method, Tukey method, k n p<0.05 p<0.01
method,
Step 3: Games/Howell
Calculate method,
the Friedman testDuncan's
value, χnew
2 . multiple range test,
r
Student-Newman-Keuls
12 k method, etc. Each has different characteristics.)
3 6.00 -
 r2 
nk (k  1)
R
i 1
i
2
 3n(k  1) 4 6.50 8.00
5 6.40 8.40

12
4* 4*5
 
152  6 2  7 2  12 2   3 * 4 * 5
3 6 7.00 9.00
7 7.14 8.86
 8.1
8 6.25 9.00
Step 4: Since significant point for (k,n) = (4,4) is7.80, 9 6.22 9.56
there is/are significant difference(s) somewhere ∞ 5.99 9.21
among four methods, a, b, c, and d (p<0.05). 3 7.40 9.00
8.1
7.8 9.6 4 7.80 9.60
4
significance significance 5 7.80 9.96
point of point of ∞ 7.81 11.34
(p<0.05) (p<0.01)
Scheffé's Method of Paired Comparison
2 groups n groups (n > 2)

data
distribution

normality ANOVA one-way ANOVA


(parametric) t -test (Analysis of
Variance) two-way ANOVA

one-way
・sign test data ・kruskal-wallis test

no normality ・Wilcoxon Signed-Ranks Test two-way


(non-parametric) data ・Friedman test


Scheffé's method of paired comparison for Human Subjective Tests

Scheffé's Method of Paired Comparison


image enhancement processing

room lighting design by


optimizing LED assignments
lighting design
of 3-D CG
Corridor
W Target
K
System
Wall

Evolutionary subjective
L
Computation evaluations
B Can ??
Verenda you
room layout Interactive Evolutionary Computation he a r
planning design me ?

IEC
hearing-aid fitting

measuring
mental scale
MEMS design geological simulation
Scheffé's Method of Paired Comparison

ANOVA based on nC2 paired comparisons for n objects.

slightly slightly
better better even better better
ANOVA

slightly slightly
better better even better better

slightly slightly
better better even better better

significance check
using a yardstick

Scheffé's Method of Paired Comparison

Original method and three modified methods


All subjects must evaluate all pairs.
no yes

original Ura's variation


order effect

yes
(原法, 1952) (浦の変法, 1956)

Haga's variation Nakaya's variation


no (中屋の変法, 1970)
(芳賀の変法)
Order Effect

(1) and then

(2) and then

may result different evaluation.


Scheffé's Method of Paired Comparison
1. Ask N human subjects to evaluate t objects in 3, 5 or 7 grades.
2. Assign [-1, +1], [-2, +2] or [-3, +3] for these grades.
3. Then, start calculation (see other material).

Questionnaire Total row data

O1 O2 O3 O4 O5 O6

Paired comparisons
slightly slightly
better better even better better

for t=3 objects.


A1 - A2 2 1 1 2 1 2

slightly slightly
better better even better better A1 - A3 2 2 1 1 1 1

A2 - A3 1 0 1 1 -1 0
slightly slightly
better better even better better
・・・

Six subjects (N = 6)

Application Example:
What is the best present to be her/his boy/girl friend?

[SITUATION] He/he is my longing. I want to be her/his boy/girl friend


before we graduate from our university. To get over my one-way love, I
decided to present something of about 3,000 JPY and express my heart.
I show you 5C2 pairs of presents. Please compare each pair and mark
your relative evaluation in five levels.

strap for invitation to


tea /coffee stuffed animal fountain pen
a mobile phone a dinner

Ex. Q. 

・・・・
Results of Scheffé's Method of Paired Comparison (Nakaya's variation)
What is the best present to be her/his boy/girl friend?
(significant difference)

present from a male present from a female


I think effective.

I will catch her How about tea leave


heart by dinner. or a stuffed anima?

effective
effective
effective
effective

more
more

less
less

-1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1

I hesitate to accept it
Reality is ...

as we have not gone Eat! Eat! Eat!


about with him.

effective

effective
more
effective
effective

less
more
less

-1 -0.5 0 0.5 1
-1 -0.5 0 0.5 1

Scheffé's Method of Paired Comparison


Modified methods byy Ura and Nakaya
y

Original method and three modified methods


All subjects must evaluate all pairs.
no yes

original Ura's variation


order effect

yes
(原法, 1952) (浦の変法, 1956)

Haga's variation Nakaya's variation


no (中屋の変法, 1970)
(芳賀の変法)
Scheffé's Method of Paired Comparison
Modified method byy Ura
Pairwise comparisons for objects which are
effected by display order (order effect).

slightly slightly slightly slightly


better better even better better better better even better better

-2 -1 0 1 2 -2 -1 0 1 2

slightly slightly slightly slightly


better better even better better better better even better better

-2 -1 0 1 2 -2 -1 0 1 2

slightly slightly slightly slightly


better better even better better better better even better better

-2 -1 0 1 2 -2 -1 0 1 2

Scheffé's Method of Paired Comparison


Modified method byy Ura
Ask N human subjects to evaluate 2×tC2 pairs for t
objects in 3, 5 or 7 grades and assign [-1, +1], [-2, +2] or
[-3, +3], respectively.

slightly slightly slightly slightly


better better even better better better better even better better

-2 -1 0 1 2 -2 -1 0 1 2

slightly slightly slightly slightly


better better even better better better better even better better

-2 -1 0 1 2 -2 -1 0 1 2

slightly slightly slightly slightly


better better even better better better better even better better

-2 -1 0 1 2 -2 -1 0 1 2
Scheffé's Method of Paired Comparison
Modified method byy Ura
Step 1: Make paired comparison
table of each human subject.
better slightly even slightly better
better better
A1 A2
-2 -1 0 1 2

A1 A3 A1 A2 A3 A4
-2 -1 0 1 2
A1 0 -1 -1
A1 A4
-2 -1 0 1 2 A2 3 0 0

A2 A3 3 1 -1
A3
-2 -1 0 1 2
A4 3 3 1
A2 A4
-2 -1 0 1 2

A3 A4
-2 -1 0 1 2



Scheffé's Method of Paired Comparison


Modified method byy Ura
Step 1: Make paired comparison Subject
table of each human subject. O1

xijl : evaluation value when the l-th


human subject compares the i-th
object with the j-th object.
Subject Subject
O2 O3
Scheffé's Method of Paired Comparison
Modified method byy Ura
Step 2: Make a table summing all subjects' data and
calculate the average evaluations for all objects.

Average of four objects


1
ˆ i  ( x i  xi )
2tN
where t: # of object (4)
N: # of human subjects (3)
27 13 -12 -28

A4 A3 A2 A1

-1.1667 -0.5000 0.5417 1.1250


̂ 4 ̂ 3 ̂ 2 ̂1

Scheffé's Method of Paired Comparison


Modified method byy Ura
Step 3: Make a ANOVA table.

1
S 
2tN i
 ( xi  xi ) 2 unbiased variance = S/f
where S  S , S ( B ) , S , S , S ( B ) , S , ST
1
S ( B )   ( xil  xil ) 2  S and f  degree of freedom.
2t l i
1
S  
2 N i j i
( xij  x ji ) 2  S
F=
unbiased variance
1 unbiased variance of S
S  x2
Nt (t  1) for F tests.
1
S ( B )  
t (t  1) i
x2l  S

S  ST  S  S ( B )  S  S  S ( B )
ST   xijl
2

l i j i
Scheffé's Method of Paired Comparison
Modified method byy Ura
1
S 
1
 ( xi  xi ) 2 S  x2
2tN i Nt (t  1)
1
1
S ( B )   ( xil  xil ) 2  S
2t l i
S ( B )  
t (t  1) i
x2l  S

1 S  ST  S  S ( B )  S  S  S ( B )
S   ( xij   x ji ) 2  S
2 N i j i ST   xijl
2

ANOVA table. l i j i

Scheffé's Method of Paired Comparison


Modified method byy Ura
A4 A3 A2 A1

-1.1667 -0.5000 0.5417 1.1250


̂ 4 ̂ 3 ̂ 2 ̂1
There are significant difference among A1 - A4
ANOVA table.
Scheffé's Method of Paired Comparison
Modified method byy Ura
Step 4: Apply multiple comparisons.
Q1: Where is significant among A1, A2, and A3?
A1: Apply multiple comparisons between all pairs.
(Fisher's PLSD method, Scheffé method, Bonferroni-Dunn test, Dunnett method,
Williams method, Tukey method, Nemenyi test, Tukey-Kramer method, Games/Howell
method, Duncan's new multiple range test, Student-Newman-Keuls method, etc. Each
has different characteristics.)

Scheffé's Method of Paired Comparison


Modified method byy Ura
Step 4: Apply multiple comparisons between all pairs and
find which distance is significant.
(Fisher's PLSD method, Scheffé method, Bonferroni-Dunn test, Dunnett method,
Williams method, Tukey method, Nemenyi test, Tukey-Kramer method, Games/Howell
method, Duncan's new multiple range test, Student-Newman-Keuls method, etc. Each
has different characteristics.)

Example of a simple multiple comparison.


• Calculate a studentized yardstick
• When a difference of average > a studentized yardstick,
the distance is significant.
A4 A3 A2 A1

-1.1667 -0.5000 0.5417 1.1250


̂ 4 ̂ 3 ̂ 2 ̂1
Scheffé's Method of Paired Comparison
Modified method byy Ura
Step 4: Example of a simple multiple comparisons.
Y  q (t , f ) ˆ 2 / 2tN (studentized yardstick)
where (ˆ , t , N ) are an unbiased variance of Sε, the # of objects, and the #of
2

human subjects; q (t, f ) is a studentized range obtained is a statistical test table


for t, the degree of freedom of Sε ( f ), and the significant level of φ; see these
variables in an ANOVA table.

When (t, f) = (4,21), studentized yardsticks for significance levels of


5% and 1% are:
(See q0.05 (4,21) in the next slide.)

Studentized yardstick q0.05 (t , f)


f t 2 3 4 5 6 7 8 9 10 12 15 20
1 18.0 27.0 32.8 37.1 40.4 43.1 45.4 47.4 49.1 52.0 55.4 59.6
2 6.09 8.30 9.80 10.9 11.7 12.4 13.0 13.5 14.0 14.7 15.7 16.8
3 4.50 5.91 6.82 7.50 8.04 8.48 8.85 9.18 9.46 9.95 10.5 11.2
4 3.93 5.04 5.76 6.29 6.71 7.05 7.35 7.60 7.83 8.21 8.66 9.23
5 3.64 4.60 5.22 5.67 6.03 6.33 6.58 6.80 6.99 7.32 7.72 8.21
6 3.46 4.34 4.90 5.31 5.63 5.89 6.12 6.32 6.49 6.79 7.14 7.59
7 3.34 4.16 4.68 5.06 5.36 5.61 5.82 6.00 6.16 6.43 6.76 7.17
8 3.26 4.04 4.53 4.89 5.17 5.40 5.60 5.77 5.92 6.18 6.48 6.87
9 3.20 3.95 4.42 4.76 5.02 5.24 5.43 5.60 5.74 5.98 6.28 6.64
10 3.15 3.88 4.33 4.65 4.91 5.12 5.30 5.46 5.60 5.83 6.11 6.47
11 3.11 3.82 4.26 4.57 4.82 5.03 5.20 5.35 5.49 5.71 5.99 6.33
12 3.08 3.77 4.20 4.51 4.75 4.95 5.12 5.27 5.40 5.62 5.88 6.21
13 3.06 3.73 4.15 4.45 4.69 4.88 5.05 5.19 5.32 5.53 5.79 6.11
14 3.03 3.70 4.11 4.41 4.67 4.83 4.99 5.10 5.25 5.46 5.72 6.03
15 3.01 3.67 4.08 4.37 4.60 4.78 4.94 5.08 5.20 5.40 5.66 5.96
16 3.00 3.65 4.05 4.33 4.56 4.74 4.90 5.03 5.15 5.35 5.59 5.90
17 2.98 3.63 4.02 4.30 4.52 4.71 4.86 4.99 5.11 5.31 5.55 5.84
18 2.97 3.61 4.00 4.28 4.49 4.67 4.82 4.96 5.07 5.27 5.50 5.79
19 2.96 3.59 3.98 4.25 4.47 4.65 4.79 4.92 5.04 5.23 5.46 5.75
20 2.95 3.58 3.96 4.23 4.45 4.62 4.77 4.90 5.01 5.20 5.43 5.71
24 2.92 3.53 3.90 4.17 4.37 4.54 4.68 4.81 4.92 5.10 5.32 5.59
30 2.89 3.49 3.84 4.10 4.30 4.46 4.60 4.72 4.83 5.00 5.21 5.48
40 2.86 3.44 3.79 4.04 4.23 4.39 4.52 4.63 4.74 4.91 5.11 5.36
60 2.83 3.40 3.74 3.98 4.16 4.31 4.44 4.55 4.65 4.81 5.00 5.24
120 2.80 3.36 3.69 3.92 4.10 4.24 4.36 4.48 4.56 4.72 4.90 5.13
∞ 2.77 3.31 3.63 3.86 4.03 4.17 4.29 4.39 4.47 4.62 4.80 5.01
Scheffé's Method of Paired Comparison
Modified method byy Ura
Step 4: Example of a simple multiple comparisons.

Scheffé's Method of Paired Comparison


Modified methods byy Ura and Nakaya
y

Original method and three modified methods


All subjects must evaluate all pairs.
no yes

original Ura's variation


order effect

yes
(原法, 1952) (浦の変法, 1956)

Haga's variation Nakaya's variation


no (中屋の変法, 1970)
(芳賀の変法)
Scheffé's Method of Paired Comparison
Modified method byy Nakaya
y
Pairwise comparisons for objects that
can be compared without order effect.

slightly slightly
better better even better better

-2 -1 0 1 2

slightly slightly
better better even better better

-2 -1 0 1 2

slightly slightly
better better even better better

-2 -1 0 1 2

Scheffé's Method of Paired Comparison


Modified method byy Nakaya
y
1. Ask N human subjects to evaluate t objects in 3, 5 or 7 grades.
2. Assign [-1, +1], [-2, +2] or [-3, +3] for these grades, respectively.
3. Then, start calculation (see other material).

Questionnaire
Six human subjects (N = 6)
slightly slightly
better better even better better
-2 -1 0 1 2 O1 O2 O3 O4 O5 O6
Paired comparisons
for t=3 objects.

A1 - A2 2 3 3 2 0 1
slightly slightly
better better even better better
A1 - A3 2 0 0 1 1 0
-2 -1 0 1 2

A2 - A3 -3 -2 -1 -1 -3 -2
slightly slightly
better better even better better

-2 -1 0 1 2
Scheffé's Method of Paired Comparison
Modified method byy Nakaya
y
Step 1: Make paired comparison table of each human subject.
xijl : evaluation value when the l-th human subject compares
the i-th object with the j-th object.

Scheffé's Method of Paired Comparison


Modified method byy Nakaya
y
Step 2: Make a table summing all subjects' data and
calculate the average evaluations for all objects.
Average of four objects
1
̂ i  xi
tN
where t: # of object (3)
N: # of human subjects (6)
Scheffé's Method of Paired Comparison
Modified method byy Nakaya
y
Step 3: Make a ANOVA table.
1 1
S   
2
xi .. S  ( B)  xi2.l  S
tN i t l i
1 S  ST  S  S ( B )  S
S ( B )   xi2.l  S
t l i Unbiased variance
F 
1
S   Unbariased variance of S
2
xi..
tN i There are significant difference among A1 - A3
ANOVA table.

Scheffé's Method of Paired Comparison


Modified method byy Nakaya
y
Step 4: Apply multiple comparisons.

Q1: Where is significant among A1, A2, and A3?


A1: Apply multiple comparisons between all pairs
among columns.
(Fisher's PLSD method, Scheffé method, Bonferroni-Dunn test, Dunnett
method, Williams method, Tukey method, Nemenyi test, Tukey-Kramer
method, Games/Howell method, Duncan's new multiple range test,
Student-Newman-Keuls method, etc. Each has different characteristics.)

ANOVA table.
Scheffé's Method of Paired Comparison
Modified method byy Nakaya
y
Step 4: Apply multiple comparisons between all pairs and
find which distance is significant.
(Fisher's PLSD method, Scheffé method, Bonferroni-Dunn test, Dunnett method,
Williams method, Tukey method, Nemenyi test, Tukey-Kramer method, Games/Howell
method, Duncan's new multiple range test, Student-Newman-Keuls method, etc. Each
has different characteristics.)

Example of a simple multiple comparison.


• Calculate a studentized yardstick
• When a difference of average > a studentized yardstick,
the distance is significant.

Scheffé's Method of Paired Comparison


Modified method byy Nakaya
y
Step 4: Example of a simple multiple comparisons.
Y  q (t , f ) ˆ 2 / tN (studentized yardstick)

where (ˆ , t , N ) are an unbiased variance of Sε, the # of objects, and the #of
2

human subjects; q (t, f ) is a studentized range obtained is a statistical test table


for t, the degree of freedom of Sε ( f ), and the significant level of φ; see these
variables in an ANOVA table.

Y0.05  4.60  1.79 / 3  6  1.4506 (See q0.05 (3,5) in the next slide.)

Y0.01  6.97  1.79 / 3  6  2.1980


Studentized yardstick q0.05 (t , f)
f t 2 3 4 5 6 7 8 9 10 12 15 20
1 18.0 27.0 32.8 37.1 40.4 43.1 45.4 47.4 49.1 52.0 55.4 59.6
2 6.09 8.30 9.80 10.9 11.7 12.4 13.0 13.5 14.0 14.7 15.7 16.8
3 4.50 5.91 6.82 7.50 8.04 8.48 8.85 9.18 9.46 9.95 10.5 11.2
4 3.93 5.04 5.76 6.29 6.71 7.05 7.35 7.60 7.83 8.21 8.66 9.23
5 3.64 4.60 5.22 5.67 6.03 6.33 6.58 6.80 6.99 7.32 7.72 8.21
6 3.46 4.34 4.90 5.31 5.63 5.89 6.12 6.32 6.49 6.79 7.14 7.59
7 3.34 4.16 4.68 5.06 5.36 5.61 5.82 6.00 6.16 6.43 6.76 7.17
8 3.26 4.04 4.53 4.89 5.17 5.40 5.60 5.77 5.92 6.18 6.48 6.87
9 3.20 3.95 4.42 4.76 5.02 5.24 5.43 5.60 5.74 5.98 6.28 6.64
10 3.15 3.88 4.33 4.65 4.91 5.12 5.30 5.46 5.60 5.83 6.11 6.47
11 3.11 3.82 4.26 4.57 4.82 5.03 5.20 5.35 5.49 5.71 5.99 6.33
12 3.08 3.77 4.20 4.51 4.75 4.95 5.12 5.27 5.40 5.62 5.88 6.21
13 3.06 3.73 4.15 4.45 4.69 4.88 5.05 5.19 5.32 5.53 5.79 6.11
14 3.03 3.70 4.11 4.41 4.67 4.83 4.99 5.10 5.25 5.46 5.72 6.03
15 3.01 3.67 4.08 4.37 4.60 4.78 4.94 5.08 5.20 5.40 5.66 5.96
16 3.00 3.65 4.05 4.33 4.56 4.74 4.90 5.03 5.15 5.35 5.59 5.90
17 2.98 3.63 4.02 4.30 4.52 4.71 4.86 4.99 5.11 5.31 5.55 5.84
18 2.97 3.61 4.00 4.28 4.49 4.67 4.82 4.96 5.07 5.27 5.50 5.79
19 2.96 3.59 3.98 4.25 4.47 4.65 4.79 4.92 5.04 5.23 5.46 5.75
20 2.95 3.58 3.96 4.23 4.45 4.62 4.77 4.90 5.01 5.20 5.43 5.71
24 2.92 3.53 3.90 4.17 4.37 4.54 4.68 4.81 4.92 5.10 5.32 5.59
30 2.89 3.49 3.84 4.10 4.30 4.46 4.60 4.72 4.83 5.00 5.21 5.48
40 2.86 3.44 3.79 4.04 4.23 4.39 4.52 4.63 4.74 4.91 5.11 5.36
60 2.83 3.40 3.74 3.98 4.16 4.31 4.44 4.55 4.65 4.81 5.00 5.24
120 2.80 3.36 3.69 3.92 4.10 4.24 4.36 4.48 4.56 4.72 4.90 5.13
∞ 2.77 3.31 3.63 3.86 4.03 4.17 4.29 4.39 4.47 4.62 4.80 5.01

SUMMARY
1. We overview which statistical test we should use for which case.
2 groups n groups (n > 2)
data
distribution
(related) (independent) (related) (independent)

(Analysis of Variance)
paired unpaired paired unpaired
Parametric Test

・unpaired t -test ・ one-way ANOVA


(normality)

ANOVA

・paired t -test ・ two-way ANOVA


Non-parametric Test

one-way
(no normality)

・Mann-Whitney U-test data ・Kruskal-Wallis test

・sign test two-way


・Wilcoxon signed-ranks test data ・Friedman test


Scheffé's method of paired comparison for Human Subjective Tests

2. We can appeal the effectiveness of our experiments with correct use of


statistical tests.

You might also like