Efficient Distributed Genetic Algorithm For Rule Extraction: Applied Soft Computing
Efficient Distributed Genetic Algorithm For Rule Extraction: Applied Soft Computing
a r t i c l e
i n f o
Article history:
Received 15 March 2009
Received in revised form
18 December 2009
Accepted 29 December 2009
Available online 13 January 2010
Keywords:
Classication rules
Rule induction
Distributed computing
Coarse-grained implementation
Parallel genetic algorithms
a b s t r a c t
This paper presents an Efcient Distributed Genetic Algorithm for classication Rule extraction in data
mining (EDGAR), which promotes a new method of data distribution in computer networks. This is done
by spatial partitioning of the population into several semi-isolated nodes, each evolving in parallel and
possibly exploring different regions of the search space. The presented algorithm shows some advantages
when compared with other distributed algorithms proposed in the specic literature. In this way, some
results are presented showing signicant learning rate speedup without compromising the accuracy.
2010 Elsevier B.V. All rights reserved.
1. Introduction
Nowadays the size of datasets is growing quickly due to the
widespread use of automatic processes in commercial and public domains and the lower cost of massive storage. Mining large
datasets to obtain classication models with prediction accuracy
can be a very difcult task because the size of the dataset can make
data mining algorithms inefcacy and inefcient.
There are three main approaches to tackling the scaling problem:
Use as much as possible a priori knowledge to search in subspaces
small enough to be explored.
Perform data reduction.
Algorithm scalability.
The third approach, algorithm scalability, promotes the use
of computation capacity in order to handle the full dataset. The
use of computer grids to achieve a greater amount of computational resources has become more popular over the past few years
because they are much more cost-effective than single computers
of comparable speed. The main challenge when using distributed
computing is the need for new algorithms that take the architecture into account. Genetic algorithms are especially well suited for
this task because of their implicit parallelism. As a typical popula-
734
Global parallelisation [8]. Only the evaluation of individuals tness values is parallelised by assigning a fraction of the population
to each processor to be evaluated. This is an equivalent algorithm
that will produce the same results as the sequential one.
Coarse-grained [8] and ne-grained parallelisation [20]. In the
former, the entire population is partitioned into subpopulations
(demes). A GA is run on each subpopulation, and exchange
of information between demes (migration) takes place occasionally [8] in analogy with the natural evolution of spatially
distributed populations such as the island model (Fig. 1). Finegrained course has just one individual per processor and rules
to perform crossover in the closest neighbourhood dened by a
topology.
Supervised data distribution [11]. A master process uses a group
of processors (slaves) by sending them partial tasks and a smaller
data partition. Each node has a complete GA or a part of it. The
master process uses the partial results to reassign data and tasks
[10,1] to the processors until some condition is met.
Not supervised data distribution [18]. The full dataset is shared
out in several processors and moved to the next processor in
the topology after a pre-specied number of generations without removing the existing population. The individuals will try to
cover the newly arrived training data.
The proposal presented in this work follows a GCCL approach
and as a parallelisation strategy uses a coarse-grained implementation and a master process to build up the nal classier on the
basis of partial results.
3. Genetic Learning Proposal: EDGAR algorithm
This section describes the characteristics of an Efcient Distributed Genetic Algorithm for classication Rules extraction, from
now on designated EDGAR. The proposed algorithm distributes
population (rules in a GCCL approach) and training data in a coarsegrained model to achieve scalability.
We start by explaining the distributed model in Section 3.1. Sections 3.23.5 describe the components of the genetic algorithm:
representation, genetic operators, genetic search and data reduction. Finally, Section 3.6 is devoted to the strategy used to determine
the best set of rules that will make up the classier from the redundant population of rules generated by the GCCL algorithm.
3.1. Distributed model
This subsection explains the main properties of the distributed
framework. First, in Section 3.1.1 we describe the use of the coarsegrained implementation with data partition.
735
Only the rules in the last proposed classier are kept in the pool.
Only new discovered rules are included; the nodes keep track of
the sent rules and the pool checks against the received rules, preventing evaluating them again. This check is run in a reasonable
time frame by the use of native Java hash table implementation
based on the chromosome string representation to detect the
duplicated rules.
3.2. Genetic algorithm
EDGAR uses a GA in each node in a ring communications topology with the neighbourhood as in the aforementioned island model
and some training data for examples poorly covered by the local
classier.
Each node will work on a partition of the full dataset generated
by random samples. The initial population is created constructing
rules that cover some of the examples in the local dataset (seeding). Universal Suffrage (US) operator selects a set of individuals (g)
for crossover and mutation in each generation. Each offspring will
replace a randomly selected individual in the current population.
After a number of generations (Local Number of Generations
LNG ), some operations are performed (see Fig. 4):
Using a greedy algorithm (see Section 3.7) to extract the set of
rules that better classies the local data and copy them to the
next node in the ring and the pool.
Randomly replace selected individuals in the current population
with the individuals received from the previous node in the ring.
Copy the learning examples not covered or covered by low tness
rules to the next node in the ring.
Perform data training set reduction if the best individual does not
change (see Section 3.6).
736
1+
zeros(r)
length(r)
1FP
The node will stop searching when there is no data in the local
dataset or because the pool node orders it.
3.3. Representation
EDGAR uses a xed length bit string representation to code a disjunctive rule. The use of xed length chromosomes allows simpler
genetic operators.
Each different possible value of an attribute in a rule is represented as a single bit. A bit set to 1 signies the presence of the value
in the rule and a bit set to 0 means the absence of this value in the
rule. One advantage of this representation is that mutation on a bit
of the genotype only leads to a small change on the phenotype.
A rule is composed of characteristics C = c1 , c2 , . . ., cj , where each
one can take only one value in each instance of the data mined but
the rule may have more than one value for this characteristic.
For example, Fig. 5 shows a rule with three antecedents
c1 (v1 , v2 , v3 ), c2 (v4 , v5 ), c3 (v6 , v7 , v8 ) and the consequent
class(v9 , v10 ). As seen in this gure, when all the bits corresponding
to the same attribute are set to 0, it means that it does not affect
the rule.
The species formation is of great importance in a GCCL algorithm. For this purpose, EDGAR uses the US selection operator rst
used in [10]. This mechanism creates coverage niches that do not
compete with each other (co-evolution) through a voting process:
each generation a set of learning examples is selected. The process
searches the set of rules in the population that better covers each
example (tness) and perform a weighted roulette based on the
tness and the number of positive cases. In the event that no rule
covers an example, a new rule is created generalising the learning
example.
Crossover and mutation operators are based on standard bit
string representation and are applied on the selected parents based
on a given probability. The crossover operator used is the twopoint crossover where the offspring are evaluated to cover at least
one example before being inserted in the population. The mutation
operator changes one random bit in the chromosome depending
on the individual tness. The mutation operator behaves in a different manner depending on the tness of the selected individual.
In the early stage of the process, some of the rules cover only a few
examples. In this case, the mutation is driven to generalise the rule,
increasing the possibilities of covering new examples (i.e. generalises the rule). If the offspring has a higher tness than its parents,
it will be inserted in the population; otherwise the parent will be
used instead.
3.6. Data training set reduction under evolution and covering
US depends on a proper populationdataset ratio for a good
coverage. When training examples representing a concept are in
a smaller amount than other concepts, the rules for the some concepts may disappear under the attraction of the rules with more
voters. For instance, in a local dataset of 1000 instances of training data examples and 10 individuals in the population, US will use
10 examples randomly each time to select the 10 best rules that
represent them. The probability of a particular instance of being
selected will be less than 1%. Once selected, there will be at least
one rule representing it, but this one will disappear in the next 100
generations with a probability of 99%.
EDGAR deletes the examples already learned to focus on those
less represented examples. The process is as follows: when the
algorithm detects that the proposed rule set does not change in
a number of consecutive times (Local Stall Parameter LSP ), the best
rule and its covered data are removed from the node. Therefore,
the rest of the examples will receive more computational effort,
making it possible to induct rules on them.
This strategy makes the algorithm less dependent on the ratio
between learning examples and population because it guarantees
that all the examples will be selected either in the standard phase
with the initial dataset, or later, once all the examples covered by
the already learned rules have been removed from the local dataset.
737
Table 1
Compared percentage accuracy.
Monk-1
Monk-2
Monk-3
Tic-tac-toe
Credit
Breast
Vote
:
1+
zeros(r)
length(r)
1FP
TP
Once the rule is selected, all its positive cases are removed and the
remaining candidate rules are newly ordered. The process nalises
when all examples are covered or there are no more rules in the
rule set.
3.8. Architecture scalability
Scalability in parallel implementations is described in the literature [17] as the relation between number of processors and
execution time. The following paragraphs describe the policy for
keeping execution time scalable considering the network speed and
synchronisation in distributed processes.
The components of execution time in distributed systems are
[16]: time of each processor (Tpi ), number of communications (c),
average time for communication (Tc ), idle time waiting for synchronisation (Ti ) and probability of idle status for a node (p).
T=
n
Tpi + c (Tc + Ti p)
C4.5
EDGAR
REGAL
100
67.0
100.0
92.9
86.0
94.1
96.4
100.0
96.0
99.6
98.9
85.0
94.0
97.1
100.0
95.0
100.0
98.7
84.0
94.1
96.2
synchronisation (Ti ) and probability of idle status (p) for a node and
the pool becomes then close to zero.
Communication time (c*Tc ) depends on size and frequency of
communication of the best individuals and training data (DLF
technique) to the neighbourhood and to the pool. A system
parameter (Local Number of Generations LNG ) allows adjusting
the communication frequency (number of generations between
communications) to the convergence of the local model and the
network speed. If this parameter is too low, the local model will
over learn the assigned dataset. If the parameter is too high, the
newly arrived individuals will slow down or even prevent the
convergence with the local data [8] and the time expended in
communication handling will increase. When this time is less
than the time used in sending learning data and rule through the
network (Tc ), the algorithm execution time will be independent
of the network speed.
4. Experimental study
In this section, we describe the experimental study carried out
on a variety of datasets, ranging from standard benchmark to test
the accuracy against standard algorithms in rule induction to specic comparison with more complex problems. The experimental
study was carried out in a cluster of 8 workstations with 2 CPU Intel
Xeon 3 GHz each. In order to run more than 16 nodes at a time, each
node was implemented as a thread in a Java VM and communications time was simulated, adding a time equivalent to an Ethernet
100 mbs in a conguration of 8 processors. As an example, usually
the communication unit in Ethernet is 512 bytes per package; if
a rule is 30 bytes in length, the pack will have 25 records, which
means a delay of 0.0512/25 = 0.00015 s per rule.
Section 4.1 compares EDGAR with standard benchmarks. Section 4.2 analyses the effect of data distribution on accuracy and
speedup in a study case. Section 4.3 develops a statistical analysis
over a set of commonly used datasets.
4.1. Comparison with standard benchmarks
This section is devoted to testing in a rst run whether EDGAR is
able to obtain similar accuracy to distributed and non-distributed
learners in a variety of datasets chosen from University of California
at Irvine repository [25]. The selected problems are well known in
the literature so we will simply describe the main characteristics
of each. Monk-1, Monk-2 and Monk-3 are articial classication
problems whose aim is to test specic abilities of learning systems.
Tic-tac-toe consists in classifying the states of the homonymous
strategy game as winning or losing. Credit, Breast and Vote
are prediction problems related to the reliability of applicants for
credit cards, the prognosis of breast cancer, and the prediction of the
vote given by congressmen on the basis of their political previous
choices. All dataset testing was performed with 10-fold cross.
Table 1 reports results on this rst group of problems. The systems used for the comparison are C4.5 and REGAL. C4.5 [19] is a
classical propositional learner, whose results are used as a baseline.
Performance of C4.5 is reported in [2]. REGAL was executed in the
738
Table 2
Execution parameters.
Stopping criteria
Mutation percentage
Crossover percentage
Selection percentage g
Communication ratio
Training dataset reduction
Training dataset communication ratio
REGAL
EDGAR
Nodes
500 gen.
0.01%
60%
10%
10%
CSP = 5, LNG = 20
1%
90%
10%
10% max
LSP = 5
1% max
4
8
16
32
64
Comm.
318,092
558,501
681,782
1,085,929
1,267,774
Time
Rules
%Test.
%Tra.
0.79
0.68
0.57
0.54
0.43
16
14
15
15
16
99.93
99.95
99.95
99.94
99.95
99.98
99.99
99.99
99.99
99.99
Table 4
Results of Mushroom EDGAR.
Nodes
4
8
16
32
64
Comm.
Time
Rules
%Test
%Tra.
2129
3907
8232
17,60
20,11
1.20
0.55
0.35
0.23
0.18
14
13
15
16
16
99.92
99.94
99.96
99.95
99.93
99.94
99.96
99.99
99.90
99.95
Table 5
Results of Nursery REGAL.
Nodes
4
8
16
32
64
Comm.
784,870
2,319,559
6,304,130
18,595,490
50,664,658
Time
Rules
%Test
%Tra.
1.78
1.97
2.32
2.81
3.08
290
251
250
268
316
98.6
99.0
98.9
98.5
97.9
98.9
99.3
99.2
98.7
98.0
Comm.
Time
Rules
%Test
%Tra.
6818
3356
4836
8309
77,859
2.89
1.59
1.20
1.11
1.21
173
209
206
231
199
99.4
98.5
98.9
98.3
98.5
99.7
98.8
99.2
98.8
98.6
739
Table 8
Dataset characteristics.
Rule order
Positive cases
Negative cases
Dataset
Instances
Features
Classes
1
2
3
4
5
6
7
8
9
10
11
12
13
1985
1943
1786
1532
1098
11
9
5
5
2
25
1855
1909
0
0
0
0
0
0
0
0
0
0
22
1833
1513
Car
Cleveland
Credit
Ecoli
Glass
Haberman
House Votes
H Hypothyroid
Iris
Krvskp
Monks
Mushrooms
New-Thyroid
Nursery
Pima
Segment
Soybean
Splice
Tic-tac-toe
Vehicle
Waveform
Wine
Wisconsin
Vote
Thyroid
Zoo
1727
297
1000
336
214
305
432
1920
150
3198
432
8124
215
12,960
768
2308
307
3190
958
846
5000
178
683
435
7200
100
6
13
20
7
9
3
16
29
4
37
6
22
5
6
8
19
35
60
9
18
41
13
9
16
21
16
4
5
2
2
7
2
2
4
3
2
2
2
3
2
2
7
19
3
2
4
3
3
2
2
3
7
rank(di ) +
di =0
di >0
R =
di <0
1
rank(di )
2
rank(di ) +
1
rank(di )
2
di =0
740
Table 9
Average results.
Table 9 (Continued )
EDGAR
EDGAR
Rules
REGAL
Test
Time
Rules
Rules
Test
Time
Car
Mean
sd
61
6.06
97%
0.006
0.10
0.07
66
8.13
98%
0.006
63.4
71.6
Cleveland
Mean
sd
73
14.30
91%
0.070
0.07
0.07
62
8.33
96%
0.016
100.5
91.6
Credit
Mean
sd
69
10.62
90%
0.015
0.05
0.03
75
6.31
94%
0.028
Ecoli
Mean
sd
61
9.89
90%
0.036
0.04
0.03
47
7.84
Glass
Mean
sd
61
14.24
92%
0.105
0.07
0.03
Haberman
Mean
29
sd
4.84
82%
0.038
House Votes
Mean
26
sd
5.79
REGAL
Test
Time
Rules
Test
Time
11.7
Waveform
Mean
1026
sd
77.79
95%
0.044
15.15
8.81
1070
77.79
93%
0.044
15.1
8.8
Wine
Mean
sd
57
13.51
97%
0.031
0.03
0.01
60
14.52
97%
0.031
1.2
0.3
69.4
63.3
Wisconsin
Mean
sd
25
4.10
98%
0.056
0.02
0.01
25
3.63
100%
0.026
14.7
5.4
96%
0.022
95.5
86.7
Zoo
Mean
sd
69%
0.314
0.02
0.01
65%
0.311
9.1
14.5
45
9.57
96%
0.025
106.2
98.1
0.04
0.03
52
4.83
92%
0.018
125.4
115.5
98%
0.026
0.02
0.01
26
5.79
97%
0.025
0.7
9.5
Hypothyroid
Mean
200
sd
28.08
95%
0.038
1.00
0.94
178
20.42
95%
0.029
35.6
27.1
Iris
Mean
sd
14
5.55
96%
0.038
0.01
0.01
11
2.55
99%
0.017
36.2
9.6
Krvskp
Mean
sd
62
8.78
98%
0.014
0.10
0.06
54
10.40
86%
0.092
22.5
11.1
Monk
Mean
sd
46
7.25
99%
0.043
0.03
0.03
58
6.51
77%
0.055
61.7
41.3
Mushroom
Mean
13
sd
3.23
100%
0.012
0.14
0.08
16
4.19
100%
0.013
4.9
5.0
New-Thyroid
Mean
14
sd
2.97
99%
0.027
0.81
0.42
14
2.97
99%
0.027
51.9
49.7
Nursery
Mean
sd
270
28.41
99%
0.027
1.63
1.50
309
22.41
98%
0.013
91.0
77.2
Pima
Mean
sd
128
9.34
94%
0.017
0.19
0.22
127
9.49
94%
0.017
65.5
68.5
Segment
Mean
sd
132
13.51
99%
0.012
0.47
0.35
137
22.00
99%
0.013
30.5
29.5
9
5.035
6
4.097
Table 10
Wilcoxon signed-ranks test.
Nodes
REGAL/EDGAR
R+
Ties
p-Value
16
17
0
6
8
24
3
0
1
0.13
0.11
0.00
Soybean
Mean
sd
72
6.97
98%
0.026
0.41
0.36
81
11.94
97%
0.031
21.8
17.6
Splice
Mean
sd
77
10.91
100%
0.027
9.71
4.53
72
12.41
95%
0.026
29.1
4.5
12
17
1
9
8
24
4
0
0
0.49
0.31
0.00
Tic-tac-toe
Mean
87
sd
14.45
99%
0.028
0.06
0.04
50
13.12
91%
0.055
61.4
51.8
16
12
18
0
10
7
25
3
0
0
0.40
0.09
0.00
Vehicle
Mean
sd
181
14.11
95%
0.019
0.64
1.42
161
18.32
95%
0.028
39.5
46.2
32
10
17
0
12
8
25
3
0
0
0.21
0.32
0.00
24
7.13
97%
0.021
0.02
0.01
8
9.36
96%
0.442
11.8
All
12
20
0
10
4
24
3
1
1
0.73
0.01
0.00
Vote
Mean
sd
Observing Tables 911 (see Appendix A), we can make the following analysis:
741
Table 11
Detailed results.
Edgar
The number of nodes in EDGAR does not follow any trend regarding the accuracy. Even in some of the datasets the results are
better with 32 nodes than with 16 or less.
Processing time decreases in EDGAR with the number of nodes,
but it does not achieve a linear speedup.
Table 10, shows that in average (nodes = All), EDGAR wins in 20
out of 25 of the datasets in accuracy with a condence of 99%.
The other conguration shows a variety of results that does not
allow rejection of the null hypothesis. Processing time is better
in EDGAR with any conguration in 100% of the cases.
The number of rules cannot reject the null hypothesis because
the condence is only 27% for the average case and less than 90%
in the rest of the node congurations.
5. Conclusions
This work presents a distributed genetic algorithm for classication rules extraction based on the island model and enhanced
for scalability with data training partitioning. To be able to generate an accurate classier with data partition, two techniques were
proposed: an elitist pool for rule selection and a novel technique of
data distribution (DLF) that uses heuristics based on the local data
to dynamically redistribute the training data in the node neighbourhood.
In this study, EDGAR shows a considerable speedup and, moreover, this improvement does not compromise the accuracy and
complexity of the classier.
The complementarities of the proposed techniques allow having low dependency on parameter setting. Proportion of individuals
per learning example is compensated by the data training set
reduction that will handle the removal of the already learned
rules, redirecting computations efforts to the more difcult cases.
Seeding operator reintroduces rules preventing loss of diversity.
The elite pool also ensures that already discovered rules will be
kept in the nal classier even if they are removed from the
nodes.
Finally, we would like to point out the absence of a master
process to guide the search. This architecture suggests a better
scalability by avoiding idle time due to synchronisation issues or
network bottlenecks typically associated with masterslave synchronous relation.
Acknowledgements
This work was supported by the Spanish Ministry of Education and Science under Grant No. TIN2008-06681-C06-06, and the
Andalusian Government under Grant Nos. P05-TIC-00531 and P07TIC-03179.
Rules
Car
Mean
sd
Mean
sd
Mean
sd
Mean
sd
Mean
sd
Test
Time
Rules
Test
Time
57
4.34
61
6.85
61
6.57
56
4.72
61
6.06
96%
0.005
95%
0.005
94%
0.003
94%
0.004
97%
0.006
0.09
0.01
0.05
0.00
0.08
0.01
0.22
0.08
0.10
0.07
58
3.52
61
5.99
67
4.17
73
7.03
66
8.13
96%
0.007
95%
0.006
94%
0.005
94%
0.006
98%
0.006
15.8
7.4
28.6
20.4
75.8
45.5
148.6
87.6
63.4
71.6
Cleveland
Mean
4
sd
Mean
8
sd
Mean
16
sd
Mean
32
sd
Mean
All
sd
69
7.90
65
22.06
70
7.24
78
7.48
73
14.30
93%
0.054
84%
0.116
87%
0.029
89%
0.026
91%
0.070
0.09
0.10
0.06
0.09
0.05
0.02
0.05
0.01
0.07
0.07
54
4.04
55
4.66
62
4.24
68
6.41
62
8.33
96%
0.053
92%
0.007
91%
0.010
90%
0.013
96%
0.016
16.6
2.6
42.6
7.5
89.6
17.6
234.9
35.0
100.5
91.6
Credit
Mean
sd
Mean
sd
Mean
sd
Mean
sd
Mean
sd
62
6.39
63
7.36
68
7.51
79
9.62
69
10.62
91%
0.027
88%
0.010
87%
0.014
88%
0.013
90%
0.015
0.08
0.03
0.04
0.01
0.03
0.01
0.04
0.03
0.05
0.03
73
5.56
71
5.88
73
7.53
73
6.75
75
6.31
94%
0.059
89%
0.022
90%
0.025
89%
0.028
94%
0.028
15.2
2.7
32.5
11.5
62.3
27.8
147.4
56.3
69.4
63.3
52
4.88
57
6.19
55
7.15
70
6.87
61
9.89
92%
0.053
84%
0.035
89%
0.016
84%
0.029
90%
0.036
0.04
0.03
0.04
0.04
0.03
0.01
0.06
0.03
0.04
0.03
41
5.23
41
4.01
43
4.94
53
6.52
47
7.84
95%
0.056
91%
0.016
91%
0.019
94%
0.019
96%
0.022
21.2
5.4
48.2
16.2
89.3
53.9
158.0
97.8
95.5
86.7
55
10.22
53
6.21
51
23.40
65
9.63
61
14.24
96%
0.115
85%
0.019
77%
0.216
85%
0.013
92%
0.105
0.11
0.09
0.04
0.02
0.04
0.02
0.05
0.03
0.07
0.03
35
3.68
40
3.64
45
4.14
56
7.96
45
9.57
97%
0.042
93%
0.024
93%
0.020
93%
0.024
96%
0.025
18.1
3.4
42.0
5.9
100.8
16.2
250.9
53.0
106.2
98.1
Haberman
Mean
4
sd
Mean
8
sd
Mean
16
sd
Mean
32
sd
Mean
All
sd
31
5.73
25
5.33
24
5.14
21
3.61
29
4.84
83%
0.110
74%
0.030
73%
0.023
54%
0.009
82%
0.038
0.04
0.01
0.03
0.01
0.03
0.04
0.05
0.09
0.04
0.03
53
4.50
50
4.91
51
4.22
49
5.14
52
4.83
91%
0.036
88%
0.014
89%
0.016
91%
0.015
92%
0.018
15.6
8.7
37.3
21.6
122.9
52.4
228.6
106.9
125.4
115.5
House Votes
Mean
4
sd
Mean
8
sd
24
4.21
21
3.11
98%
0.115
89%
0.006
0.04
0.01
0.02
0.00
24
4.21
21
3.11
96%
0.112
86%
0.006
12.5
0.6
8.0
0.3
Ecoli
Mean
sd
Mean
sd
Mean
sd
Mean
sd
Mean
sd
Glass
Mean
sd
Mean
sd
Mean
sd
Mean
sd
Mean
sd
Regal
8
16
32
All
4
8
16
32
All
4
8
16
32
All
4
8
16
32
All
742
Table 11 (Continued )
Table 11 (Continued )
Edgar
Rules
Mean
sd
Mean
sd
Mean
sd
16
Regal
Test
Time
Rules
Edgar
Test
Time
23
3.92
30
4.83
26
5.79
89%
0.006
89%
0.005
98%
0.026
0.01
0.00
0.01
0.00
0.02
0.01
23
3.92
30
4.83
26
5.79
87%
0.006
87%
0.005
97%
0.025
9.6
0.4
9.3
0.7
0.7
9.5
170
28.72
173
11.98
197
13.67
180
6.40
200
28.08
95%
0.146
85%
0.002
85%
0.003
92%
0.001
95%
0.038
0.52
0.08
0.40
0.07
0.99
0.06
2.57
0.36
1.00
0.94
158
26.63
147
17.50
150
14.11
180
10.71
178
20.42
95%
0.147
85%
0.005
71%
0.001
90%
0.004
95%
0.029
9.6
4.0
13.6
2.8
21.9
6.4
56.5
20.5
35.6
27.1
13
4.06
14
2.77
12
4.87
15
3.47
14
5.55
97%
0.117
88%
0.024
90%
0.030
99%
0.030
96%
0.038
0.02
0.00
0.01
0.00
0.01
0.01
0.01
0.01
0.01
0.01
10
2.32
10
2.34
12
2.74
14
3.23
11
2.55
98%
0.035
95%
0.013
96%
0.008
97%
0.064
99%
0.017
11.8
5.2
30.2
11.4
66.1
29.0
28.8
12.5
36.2
9.6
56
8.14
60
8.42
63
6.59
61
7.99
62
8.78
98%
0.035
93%
0.005
93%
0.006
88%
0.005
98%
0.014
0.15
0.07
0.07
0.05
0.08
0.03
0.07
0.03
0.10
0.06
56
8.99
58
6.68
47
1.41
59
1.41
54
10.40
90%
0.004
90%
0.003
66%
0.004
65%
0.004
86%
0.092
13.3
4.8
33.9
11.3
23.0
1.6
29.0
2.7
22.5
11.1
52
6.40
43
3.90
32
6.42
36
3.12
46
7.25
99%
0.000
88%
0.007
88%
0.005
88%
0.009
99%
0.043
0.05
0.02
0.02
0.01
0.03
0.05
0.02
0.01
0.03
0.03
51
0.00
48
2.73
51
6.59
56
5.93
58
6.51
82%
0.068
70%
0.024
61%
0.018
64%
0.021
77%
0.055
5.6
0.0
39.6
27.1
43.1
19.4
88.2
48.7
61.7
41.3
Mushroom
Mean
4
sd
Mean
8
sd
Mean
16
sd
Mean
32
sd
Mean
All
sd
14
3.39
14
3.36
11
1.78
11
3.05
13
3.23
100%
0.035
98%
0.000
95%
0.000
95%
0.000
100%
0.012
0.24
0.07
0.07
0.01
0.07
0.02
0.08
0.03
0.14
0.08
13
2.80
15
4.54
16
2.97
16
5.25
16
4.19
100%
0.054
95%
0.000
95%
0.000
95%
0.000
100%
0.013
3.2
2.4
4.1
2.2
4.1
1.0
7.4
8.8
4.9
5.0
New-Thyroid
Mean
4
sd
Mean
8
sd
Mean
16
sd
Mean
32
sd
12
2.50
12
2.13
14
2.56
15
2.52
98%
0.128
90%
0.008
90%
0.009
90%
0.007
0.72
0.12
0.66
0.56
0.80
0.62
0.58
0.73
12
2.50
12
2.13
14
2.56
15
2.52
99%
0.117
90%
0.008
90%
0.008
90%
0.007
12.3
6.7
25.0
13.8
49.5
23.4
103.3
55.9
32
All
Hypothyroid
Mean
4
sd
Mean
8
sd
Mean
16
sd
Mean
32
sd
Mean
All
sd
Iris
Mean
sd
Mean
sd
Mean
sd
Mean
sd
Mean
sd
Krvskp
Mean
sd
Mean
sd
Mean
sd
Mean
sd
Mean
sd
Monk
Mean
sd
Mean
sd
Mean
sd
Mean
sd
Mean
sd
4
8
16
32
All
4
8
16
32
All
4
8
16
32
All
Rules
Mean
sd
Nursery
Mean
sd
Mean
sd
Mean
sd
Mean
sd
Mean
sd
Pima
Mean
sd
Mean
sd
Mean
sd
Mean
sd
Mean
sd
Segment
Mean
sd
Mean
sd
Mean
sd
Mean
sd
Mean
sd
Soybean
Mean
sd
Mean
sd
Mean
sd
Mean
sd
Mean
sd
Splice
Mean
sd
Mean
sd
Mean
sd
Mean
sd
Mean
sd
All
4
8
16
32
All
4
8
16
32
All
4
8
16
32
All
4
8
16
32
All
4
8
16
32
All
Tic-tac-toe
Mean
4
sd
Mean
8
sd
Mean
16
sd
Mean
32
sd
Mean
All
sd
Regal
Test
Time
Rules
Test
Time
14
2.97
99%
0.027
0.81
0.42
14
2.97
99%
0.027
51.9
49.7
259
26.59
261
14.27
262
24.03
224
34.43
270
28.41
97%
0.116
88%
0.006
92%
0.002
91%
0.001
99%
0.027
0.80
0.54
0.75
1.23
0.98
0.09
3.56
0.34
1.63
1.50
290
25.11
292
23.64
297
11.86
311
15.18
309
22.41
98%
0.053
94%
0.003
94%
0.003
94%
0.004
98%
0.013
13.7
8.0
40.8
20.9
105.3
30.9
187.5
49.3
91.0
77.2
125
11.27
126
9.27
132
8.06
132
11.53
128
9.34
94%
0.051
90%
0.011
90%
0.013
90%
0.013
94%
0.017
0.23
0.31
0.15
0.27
0.14
0.21
0.18
0.26
0.19
0.22
120
10.49
125
8.74
122
8.43
124
8.22
127
9.49
90%
0.052
83%
0.022
82%
0.014
89%
0.028
94%
0.017
12.5
6.2
27.1
12.6
51.7
23.0
159.0
57.8
65.5
68.5
132
13.86
130
13.55
131
11.89
119
11.54
132
13.51
98%
0.034
93%
0.006
95%
0.003
96%
0.003
99%
0.012
0.26
0.20
0.17
0.02
0.37
0.02
0.95
0.16
0.47
0.35
124
12.46
117
11.91
128
13.99
160
12.84
137
22.00
99%
0.054
94%
0.004
94%
0.005
94%
0.005
99%
0.013
8.0
4.2
15.6
6.9
27.9
12.0
65.2
33.6
30.5
29.5
66
7.32
65
7.62
67
4.08
71
6.93
72
6.97
98%
0.116
89%
0.003
89%
0.003
90%
0.003
98%
0.026
0.17
0.02
0.12
0.01
0.33
0.04
0.90
0.14
0.41
0.36
72
8.73
66
3.89
78
7.10
87
9.03
81
11.94
98%
0.118
89%
0.015
89%
0.017
88%
0.022
97%
0.031
5.2
1.3
9.6
1.0
19.4
2.7
45.3
3.8
21.8
17.6
84
10.26
73
7.97
66
6.91
53
8.59
77
10.91
100%
0.095
91%
0.001
91%
0.001
86%
0.000
100%
0.027
11.90
5.81
6.52
2.74
10.22
4.28
5.99
1.19
9.71
4.53
83
9.37
86
8.24
68
6.42
65
8.77
72
12.41
100%
0.088
83%
0.001
86%
0.001
84%
0.000
95%
0.026
24.4
5.5
36.5
2.5
39.7
4.0
25.6
1.2
29.1
4.5
70
10.38
74
6.83
83
4.68
95
7.97
87
14.45
99%
0.119
90%
0.006
90%
0.007
91%
0.002
99%
0.028
0.12
0.03
0.03
0.00
0.04
0.01
0.03
0.01
0.06
0.04
63
11.28
39
13.25
44
11.31
40
7.09
50
13.12
97%
0.113
85%
0.014
82%
0.011
77%
0.021
91%
0.055
5.2
2.8
42.2
17.8
69.4
56.9
106.8
24.1
61.4
51.8
References
Edgar
Rules
Vehicle
Mean
sd
Mean
sd
Mean
sd
Mean
sd
Mean
sd
Regal
Test
Time
Rules
Test
Time
190
13.06
171
10.66
162
7.32
147
5.00
181
14.11
92%
0.049
89%
0.010
90%
0.012
63%
0.018
95%
0.019
1.23
2.40
0.46
0.92
0.23
0.19
0.13
0.02
0.64
1.42
143
17.44
139
8.89
150
10.96
168
10.98
161
18.32
95%
0.115
86%
0.013
86%
0.014
86%
0.011
95%
0.028
6.9
1.5
13.3
3.2
23.1
5.2
101.0
36.1
39.5
46.2
18
3.54
17
4.53
19
3.48
29
5.56
24
7.13
97%
0.115
88%
0.006
89%
0.006
93%
0.008
97%
0.021
0.03
0.00
0.01
0.00
0.01
0.00
0.02
0.01
0.02
0.01
13
1.00
13
2.83
13
1.00
14
1.41
8
9.36
66%
0.000
66%
0.007
66%
0.003
67%
0.004
53%
0.442
12.8
14.9
16.6
0.4
12.6
13.9
17.1
0.1
11.8
11.7
1064
96.97
530
54.27
880
45.21
894
35.87
1026
77.79
93%
0.075
47%
0.075
88%
0.005
85%
0.004
95%
0.044
9.74
4.58
19.93
4.77
18.41
6.03
17.52
5.98
15.15
8.81
1032
96.97
1040
97.84
843
121.09
1069
45.21
1070
77.79
92%
0.074
93%
0.073
46%
0.059
86%
0.005
93%
0.044
9.7
4.6
9.8
4.6
19.9
4.7
18.4
6.0
15.1
8.8
51
9.97
52
5.99
65
15.12
68
14.55
57
13.51
97%
0.055
88%
0.013
89%
0.016
89%
0.017
97%
0.031
0.04
0.02
0.02
0.00
0.02
0.01
0.03
0.01
0.03
0.01
60
11.30
53
5.99
63
14.14
64
17.20
60
14.52
94%
0.054
86%
0.013
88%
0.016
88%
0.019
97%
0.031
1.1
0.5
0.7
0.1
1.1
0.3
1.2
0.3
1.2
0.3
Wisconsin
Mean
4
sd
Mean
8
sd
Mean
16
sd
Mean
32
sd
Mean
All
sd
24
4.74
22
3.93
15
3.93
18
4.29
25
4.10
98%
0.115
87%
0.007
48%
0.027
87%
0.047
98%
0.056
0.03
0.00
0.01
0.00
0.05
0.03
0.01
0.00
0.02
0.01
23
4.40
22
2.56
24
4.37
25
2.65
25
3.63
99%
0.117
90%
0.004
91%
0.004
91%
0.004
100%
0.026
4.9
2.8
7.0
4.4
16.0
10.1
25.8
8.2
14.7
5.4
Zoo
Mean
sd
Mean
sd
Mean
sd
Mean
sd
Mean
sd
10
2.21
9
2.52
8
4.80
12
3.63
9
5.04
99%
0.125
90%
0.017
90%
0.016
66%
0.314
69%
0.314
0.04
0.01
0.02
0.01
0.01
0.00
0.01
0.00
0.02
0.01
9
1.50
8
1.49
9
1.70
13
4.59
6
4.10
98%
0.120
89%
0.015
82%
0.034
90%
0.024
65%
0.311
10.2
3.3
15.6
3.8
28.8
3.7
32.9
4.6
9.1
14.5
Vote
Mean
sd
Mean
sd
Mean
sd
Mean
sd
Mean
sd
4
8
16
32
All
4
8
16
32
All
Waveform
Mean
4
sd
Mean
8
sd
Mean
16
sd
Mean
32
sd
Mean
All
sd
Wine
Mean
sd
Mean
sd
Mean
sd
Mean
sd
Mean
sd
4
8
16
32
All
4
8
16
32
All
743