0% found this document useful (0 votes)
16 views9 pages

Suprb: A Supervised Rule-Based Learning System For Continuous Problems

The SupRB learning system is a Pittsburgh-style learning classifier system designed for supervised learning on multi-dimensional continuous decision problems, particularly in industrial machinery parametrization. It learns quality function approximations from examples and provides human-readable if-then rules, enhancing operator trust through transparency and explainability. The paper outlines the architecture, training process, and a first implementation, SupRB-1, which aims to optimize parametrization choices based on quality measures.

Uploaded by

edwcaran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views9 pages

Suprb: A Supervised Rule-Based Learning System For Continuous Problems

The SupRB learning system is a Pittsburgh-style learning classifier system designed for supervised learning on multi-dimensional continuous decision problems, particularly in industrial machinery parametrization. It learns quality function approximations from examples and provides human-readable if-then rules, enhancing operator trust through transparency and explainability. The paper outlines the architecture, training process, and a first implementation, SupRB-1, which aims to optimize parametrization choices based on quality measures.

Uploaded by

edwcaran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

SupRB: A Supervised Rule-based Learning System for

Continuous Problems
Michael Heider∗ David Pätzel∗ Jörg Hähner
University of Augsburg University of Augsburg University of Augsburg
Organic Computing Group Organic Computing Group Organic Computing Group
Augsburg, Germany Augsburg, Germany Augsburg, Germany
michael.heider@informatik. david.paetzel@informatik. joerg.haehner@informatik.
uni-augsburg.de uni-augsburg.de uni-augsburg.de

ABSTRACT Parts of an operator’s knowledge can be seen as a collection


arXiv:2002.10295v1 [cs.LG] 24 Feb 2020

We propose the SupRB learning system, a new accuracy-based of mappings from parametrizations for the machine and variables
Pittsburgh-style learning classifier system (LCS) for supervised beyond their influence to an expected process quality resulting
learning on multi-dimensional continuous decision problems. SupRB from them—abstractly speaking, a collection of if-then rules with
learns an approximation of a quality function from examples (con- outcomes subject to noise. While many ML methods represent
sisting of situations, choices and associated qualities) and is then knowledge in a less or differently structured way, this is not the case
able to make an optimal choice as well as predict the quality of a for learning classifier systems (LCSs) whose models are collections
choice in a given situation. One area of application for SupRB is of human-readable if-then rules constructed using ML techniques
parametrization of industrial machinery. In this field, acceptance of and model structure optimizers [12, 31]. This learning scheme is
the recommendations of machine learning systems is highly reliant thus suited naturally to incorporate an operator’s knowledge as
on operators’ trust. While an essential and much-researched ingre- externally specified rules can be included directly. Also, due to their
dient for that trust is prediction quality, it seems that this alone is inner structure, LCSs can more easily provide explainations for
not enough. At least as important is a human-understandable ex- their predictions. Due to this transparency towards human users,
planation of the reasoning behind a recommendation. While many compared to black box systems, an increased trust by operators that
state-of-the-art methods such as artificial neural networks fall short contained knowledge and thus recommendations are correct can be
of this, LCSs such as SupRB provide human-readable rules that can expected; which is essential for these system’s actual applicability.
be understood very easily. The prevalent LCSs are not directly ap- This paper proposes the SupRB learning system, a new accuracy-
plicable to this problem as they lack support for continuous choices. based Pittsburgh-style LCS for supervised learning on continuous
This paper lays the foundations for SupRB and shows its general multi-dimensional decision problems such as the one of parametriza-
applicability on a simplified model of an additive manufacturing tion of industrial machinery. Pittsburgh-style LCSs [26] have a
problem. model structure optimizer (in classic Pittburgh-style systems, a
genetic algorithm (GA)) operate on a population of rule collections
KEYWORDS of variable length each of which represents a potential solution to
the learning problem at hand.
Learning Classifier Systems, Evolutionary Machine Learning, Man-
This work focuses on solving the problem of parametrization
ufacturing
optimization of industrial machinery, which is defined in Section 2.
An LCS architecture that solves this problem, SupRB, is introduced
1 INTRODUCTION in Section 3 along with its first implementation SupRB-1 in Section 4.
Parametrization of industrial machinery is often determined by SupRB-1 is evaluated on different function approximation problems
human operators. These specialists usually obtained most of their in Section 5. Section 6 gives an account of related research.
expertise through year-long experimental exploration based on
prior knowledge about the system or process at play. Transferring 2 PARAMETRIZATION OPTIMIZATION
that knowledge to other operators with as little loss as possible Parametrization optimization is the process of finding the best pa-
(e. g. to new colleagues whenever experienced operators retire or rameter choice, or parametrization, for a given system S with regard
to end users of the machinery after commissioning is finished) to some quality measure q. One such parametrization can be viewed
is a challenge: Humans’ ability of exactly attributing parameter as a vector a ∈ A ⊆ RD A where A is the parametrization space,
choices to the situations that led to them and then communicating D A is the number of parameters to be optimized and each com-
this knowledge in a comprehensible manner tends to be rather ponent of a corresponds to one adjustable system parameter for S.
restricted—which leads to new operators being forced to repeat Which parametrization is optimal regarding q depends on a number
exploration to learn for themselves. Machine learning (ML) can of environmental factors (e. g. ambient temperature or humidity)
help with this, for example, by supporting new operators or users in addition to characteristics of process, machine, material and the
with recommendations or simply by recording existing experiences part to be produced. For a given system S, we call one instance of
and extracting knowledge to make it available at a later point. those additional factors a situation; situations can again be assumed
to be represented by a vector x ∈ X ⊆ RD X where X is called the
∗ These authors contributed equally to the paper. situation space and D X is the fixed dimensionality of situations
for S. Having defined parametrizations and situations, we can now is not just about approximating q as close as possible; there are
specify the quality measure’s form as usually additional goals such as being explainable (cf. Section 1)
q : {X, A}T → R (1) which require the model structure’s complexity to be as low as
possible.
where every {x, a}T= (x 1 , . . . , x D X , a 1 , . . . , a D A )T ∈ {X, A}T is
a stacked vector consisting of a situation (x 1 , . . . , x D X ) ∈ X and 3.2 Local models: Classifiers
a parametrization (a 1 , . . . , a D A ) ∈ A. For readability, we write
A classifier c consists of three main components:
q(x, a) instead of q({x, a}T ). The target of q is a single scalar which
is possibly derived appropriately from a vector of multiple quality • Some representation of a matching function mc : X →
features. We assume that q(x, a) is at least continuous in a which {T, F}. We say that c matches situation x iff mc (x) = T. Cor-
we think is realistic in most real-world scenarios: respondingly, we say that c does not match x iff mc (x) = F.
• Some local model approximating q on all x ∈ X which the
lim q(x, a) = q(x, a 0 ) (2)
a→a 0 classifiers matches.
With the definition for q, we can now define the optimization prob- • An estimation of the classifier’s goodness-of-fit on the situa-
lem that describes the search for an optimal parametrizations for a tions it matches (solely used in classifier mixing).
given situation x: Be aware that the classifiers’ matching functions’ domain is
maximize q(x, a) (3) X and not {X, A}T . This increases explainability greatly as an
a ideal partitioning of X (total, without overlaps) entails that there
Note that, realistically, neither q nor its derivative can be assumed is exactly one rule regarding the parametrization for each possible
to be known (albeit either of those would simplify the problem situation. Conversely, if we partitioned in {X, A}T optimally, there
greatly). Instead, we assume that the only information about q is a would possibly still be multiple rules for a given situation as the
fixed set of examples. system might have partitioned in the dimensions of A as well. Since
Thus, the learning problem we consider is: Given a fixed set of we assume continuity of q(x, a) regarding a (see (2)), partitioning
N examples for situations and parametrizations {{x, a}T } as well in A would only be necessary if the local models could not capture
as their respective qualities {q(x, a)}, learn to predict for a given q’s behaviour in A, for example because it is highly multi-modal.
unknown situation x ∈ X a parametrization â max (x) ∈ A for which In that case, partitioning in A might be sensible, as would using
â max (x) ≈ a max (x) = argmax q(x, a) (4) more sophisticated local models.
a
where a max (x) is the actual optimal parametrization in situation x. 3.3 Epoch-wise training
A natural way of measuring improvements on this learning Training an LCS can generally be divided into two subproblems:
problem is the following: A model can be said to be an improvement For once, the classifiers’ local models need to be trained so that
over another on a set of situations X eval ⊂ X if the actual quality of the predictions they make on the subspace they are responsible for
the predicted optimal parametrizations on those situations is closer are as accurate as possible. Secondly, the overall model structure
to the actual quality of the actual optimal parametrizations. This of the LCS has to be optimized: the classifier’s localizations have
can be quantified, for example, by using the mean error for an error to be aligned in such a way that every local model can capture the
measure L on the model’s prediction: characteristics of the subspace it is assigned to as well as possible.
1 Õ
Michigan-style LCS such as XCS(F) [6, 37] or ExSTraCS [32]
L(q(x, a max (x)), q(x, â max (x))) (5)
|X eval | try to solve these problems incrementally by, for each seen exam-
x ∈X eval
ple, performing a single update on some of the classifier’s local
3 AN LCS ARCHITECTURE FOR models and then improving these classifier’s localizations. This
CONTINUOUS PROBLEMS approach is especially sensible when learning has to be incremental
(e. g. in reinforcement learning settings). However, due to the learn-
This section presents a high-level view of SupRB, the overall LCS
ing problem being non-incremental (all training data is available
architecture we propose, in order to solve parametrization opti-
from the very start), SupRB can be trained non-incrementally. This
mization problems which were introduced in the previous section.
means that training can be done in two separate phases that are
repeated alternatingly until overall convergence [8], each phase
3.1 Model structure
being responsible for solving one of the subproblems:
Just like other LCSs, SupRB forms a global model from a population
C of local models, called classifiers; in the case of SupRB, the global (1) (Re-)train each local classifier model on the data that it (now)
model is meant to approximate the quality measure q defined in matches.
Section 2. Each classifier is responsible for a subspace of the input (2) Optimize the model structure (i. e. the set of classifier condi-
space X; which subspace it is for a certain classifier is specified tions), for example using a heuristic such as a GA.
by that classifier’s condition which is also sometimes called its At that, each phase is executed until it converges or some termi-
localization. The set of classifier conditions forms the overall model nation criterion, such as a fixed number of updates, is met. It is
structure of SupRB; this structure fulfills a similar role as the graph important to note that during fitting of each phase’s parameters, the
structure of a neural network in that it needs to be chosen carefully parameters of the respective other are considered fixed—otherwise,
in order for the system to perform well. At that, performing well convergence cannot be guaranteed. If a GA is used for the model
2
structure optimization, then this GA works on a population of its function approximation:
classifier populations—these kind of systems are commonly called
Pittsburgh-style LCS. However, since we expect many optimization â max = argmax q̂(x 0 , a) (9)
a ∈Alocal
methods to be applicable to this (see Section 7), an implementation
of SupRB does not necessarily contain a GA. Nevertheless, the gen- For other functions, where an exact analytical solution is unknown
eral SupRB architecture should probably be placed into or close or impractical, there are other options that range from root-finding
to the Pittsburgh-style category. The implementation of SupRB we algorithms [4] to heuristics such as hill climbing with random
present in Section 4 is definitely a Pittsburgh-style LCS since it uses restarts [24], genetic algorithms [11] or chemical reaction opti-
a GA to optimize the model structure. mization [15]. Although these non-analytical methods require a
Dividing the learning process into two distinct phases is advan- comparably larger amount of computation time, they are feasible
tageous. First of all, optimization of the process’s hyperparameters in the setting SupRB targets: Industrial processes that are being
can be done more straightforwardly because hyperparameters are optimized are usually preplanned anyway, which takes a lot longer
divided into two disjoint sets, one for each of the two phases. Be- than any of the heuristics needs to find SupRB’s parametrization
sides that, the learning process is analysed more easily because choice.
the overall optimization problem of fitting the model to the data
decomposes nicely into the two subproblems solved by the phases
[8]—‘nicely’ meaning, that solving the subproblems independently 4 SUPRB-1: A FIRST IMPLEMENTATION OF
of each other solves the overall problem. For example, if learning SUPRB
does not work and, upon inspection, the classifier weight updates While the previous section introduced SupRB’s general architecture,
converge correctly and fast enough, it is immediatly clear that learning process as well as its desired prediction capabilities, we
the model structure optimization is the culprit and corresponding now want to give a detailed account of SupRB-1, a first implemen-
measures can be taken. tation of that system1 .

3.4 Prediction 4.1 Training and validation sets


After training, SupRB can make two kinds of prediction. A quality SupRB-1 randomly splits the available training data, {X , A}T , into
prediction consists of using SupRB’s internal function approxima- two disjoint sets of configurable sizes, {X , A}Ttrain ⊔ {X , A}Tvalid , a
tion to predict the quality resulting from a certain parametrization training and a validation set. The training set is used exclusively to
a given a certain situation x. To do so, SupRB retrieves all classifiers fit the classifier’s local models to the data they match whereas the
from the classifier population that match x, that is, the set validation set is used exclusively to optimize the model structure.
M(x) = {c ∈ C | mc (x) = T}. (6) This approach is rather simplistic; incorporating more sophisticated
sample management (k-fold cross validation etc.) is planned for the
The predictions of these classifiers then need to be mixed in order future.
to yield the overall system prediction for the inputs, q̂(x, a). One To simplify representation and computation, we assume that
way of mixing is a simple sum which is weighted by some accuracy parametrization and situation values are normalized to [−1, 1]D A
measure Fc defined for each classifier c: and [−1, 1]D X , respectively; this means that A ≃ Aactual needs to
Õ hold where Aactual is the actual action value that is reported back
q̂(x, a) = Fc q̂c (x, a) (7)
to an external system. Given the context of optimizing parame-
c ∈M (x )
ters of industrial machinery it is reasonable to assume that upper
Here, q̂c (x, a) denotes the quality value that the local model of c and lower bounds for Aactual exist in all cases which makes this
predicts for parametrization a in situation x. It is important that normalization trivial.
Õ
Fc = 1 (8) 4.2 Classifiers
M (x )
Classifier conditions are interval-based using an ordered bound rep-
or otherwise the classifier’s combined predictions systematically resentation [29, 36]; an extension to hyper-ellipsoids [5] is already
over- or undershoot the actual value as the local models must be in the works.
trained independently [8], which means that they are unaware of All classifiers’ local models are a simple linear regression2 on a
the other local model’s predictions during training. subset of the second order polynomial features of the input which
A parametrization choice (or â max -prediction) consists of pre- is fitted on {X , A}Ttrain and q train . In order to be able to analytically
dicting the best parametrization for a given fixed situation x 0 , that derive the â max -prediction, we exclude all combinations of different
is, performing (3) to yield (4)—which is the more central kind of dimensions of A resulting in the following features set:
prediction for parametrization optimization. The way of doing this
highly depends on the used form of local models. For example, if {x i x j , x i ak , ak2 | i, j ∈ 1, . . . , D X , k ∈ 1, . . . , D A } (10)
the local models are polynomial functions of a degree of less than
five, an exact analytical solution exists (Abel-Ruffini theorem) as
partial derivatives can be used to find the set of local optima Alocal 1 Which we will make available in the camera-ready version.
2 We
from which SupRB then can retrieve the global optimum by using use the one from the Python library scikit-learn [23].
3
The reasoning behind our choice for second order polynomial fea- or
e2
tures instead of linear models alone is that a linear model’s max- e1 ≥ e2 ∧ l1 ≤ k l2 (13)
imum is always at one of the boundaries of the domain, if they e1
exist. where k ∈ [0, 1] is a hyperparameter weighing higher solution
The classifiers’ goodness-of-fit is measured using a mean squared complexities against lower errors. This means that if i 1 ’s error on
error on {X , A}Ttrain . We don’t use a separate validation set for es- {X , A}Tvalid is smaller than i 2 ’s, i 1 ’s complexity is allowed to be up
timating the goodness-of-fit in order to be as sample efficient as to ee21 (> 1) times larger than the one of i 2 . On the other hand, in
possible; in the industrial machinery context this system mainly order for i 1 to win the tournament with a higher error, it needs to
targets, labeled data sets are comparably small. Nevertheless, a have an at least k ee12 (< 1) times smaller complexity than i 2 .
separate validation set for goodness-of-fit estimation would most
certainly help the system to evolve better generalizing solutions 4.4 Initialization
faster. The GA’s population of classifier populations is initialized by ran-
domly generating a number of individuals of a user-specified fixed
4.3 GA for optimizing the model structure size. We experimented with initializing randomly-sized individuals
We optimize the model structure (the classifier’s localizations) using up to an upper bound to have a higher chance of finding the ‘cor-
a simple GA whose population consists of classifier populations. As rect’ individuals’ size early on. However, that did not (yet) work
already stated above, this makes SupRB-1 a Pittsburgh-style LCS. out—it seems that an initially larger overall amount of classifiers is
The GA is generational with a configurable number of elitists more important than finding the correct solution size quickly.
(cf. for example [11]); a steady-state version was implemented as Classifiers for an individual are generated randomly by sampling
well but in a few short preliminary tests did not perform signifi- the bounds of their match function’s intervals uniformly from X.
cantly better (albeit there is still no conclusive answer yet). The GA Although this is one of the least sophisticated methods and results
performs mutation and crossover on classifier populations. in a high chance of initial overlaps and unmatched examples, it
For a single classifier population, mutation changes the bounds seemed to result in far larger overall stability when compared to
of the interval-based conditions of all classifiers by a normal dis- initializing classifier populations with evenly spaced individuals.
tribution widened by a step-size s. Mutation steps are clipped at The reason is probably the greater genetic diversity in the system.
the minimum lower and maximum upper interval bounds, −1 and At the end of initialization, all classifiers of all classifier popu-
1 respectively (this can be disabled via a hyperparameter) in order lations are fitted to the examples from {X , A}Ttrain that they match
to keep the hyperrectangle described by the classifier’s condition once.
entirely within [−1, 1]D X (confer Section 4.1). Given any bound
(lower or upper) b ∈ X, its mutated value is distributed according 4.5 Fitting classifiers
to SupRB uses the most simple linear regression model from scikit-
max(1, min(−1, {b + s ∗ N (µ = 0, σ = 1)})) (11) learn 3 as the local model for each classifier. These models provide
a builtin means of fitting them to data which we use with standard
The step size s is initially set to one thousandth of the maximum in- parametrization, which minimizes the L2 -norm by Ordinary Least
terval width, which is 1000 2 in our normalized case. SupRB-1 adapts
Squares.
s by the well-known one-fifth rule as it is used in [1] with a small
update factor of F = 1.05. 4.6 Prediction
Crossover is done similarly as in [8] but using a normal distri-
Due to the simplicity of the classifier’s local models, SupRB-1 can
bution instead of a uniform one in order to keep offspring sizes
easily perform the two kinds of prediction the SupRB architecture
closer together and less often generating really small offspring.
postulates. To predict a quality value given a situation and an action,
Given two parents of lengths l 1 and l 2 , a number l 1′ is drawn from
{x, a}T , the linear regression models of all classifiers c that match
N (µ = l 1 + l 2 /2, σ = 1) repeatedly until 1 ≤ l 1′ ≤ l 1 + l 2 − 1 (the
x are queried for their respective predictions q̂c ({x, a}T ). These
condition ensures that each offspring contains at least one classi-
predictions are then mixed with weights based on the classifiers’
fier). After that, the classifiers of both parents are shuffled together
normalized goodness-of-fit д which we calculate from their mean
and divided randomly among two children, one of size l 1′ , the other
squared error on {X , A}Ttrain . The unnormalized goodness-of-fit of
of size l 1 +l 2 −l 1′ . Performing this kind of crossover too often might
a single classifier is:
be too disruptive making a crossover rate necessary. 1
SupRB selects parents for crossover using a simple tournament дc′ = e +1 (14)
c
selection with tournaments of size 2 and the individual with the E
where E = c ′ ∈M (x ) ec ′ + 1 is the sum of the errors of all classifiers
Í
highest fitness always winning [21]. We measure fitness relatively
between two individuals i 1 and i 2 based on their respective mean matching x and serves as normalization term for the error. Note
squared errors e 1 and e 2 on the validation set {X , A}Tvalid and their the—for now—naïve addition of 1 to all errors in order to avoid zero
respective lengths l 1 and l 2 which is a naïve measure for their model terms. We further have to normalize д ′ again, in order to fulfill (8)
structure’s complexity. Individual i 1 wins the tournament if either yielding
of the following is true: д′
дc = c′ (15)
e2 G
e1 < e2 ∧ l1 ≤ l2 (12) 3 sklearn.linear_model.LinearRegression
e1
4
where G ′ = c ′ ∈M (x ) дc′ ′ is the sum of the unnormalized goodness- effect to lead to different behaviour for the same k on problems
Í
of-fit values of the classifiers matching x. Substituting into (7) re- that only differ in their dimensionality but not in their general
sults in the following mixing model: characteristics. However, we defer a closer look at this to future
Õ work.
q̂(x, a) = дc q̂c (x, a) (16)
The other hyperparameters have not that high of an impact and
c ∈M (x )
are discussed more in-depth in the publications referenced at their
An parametrization choice â max (x) for a given situation x again first mention. Table 1 gives a quick reference of all hyperparameters,
results from mixing each matching classifier c’s prediction. At their expected impact as well as a proposed default value.
that, â maxc (x) can be derived analytically based on the implicit
paraboloids that performing a linear regression on a second order 5 EVALUATION
polynomial feature space yields. For example, We evaluated SupRB-1 on two computable problems which are dis-
• if the paraboloid opens downwards, the parametrization cussed together with the obtained results in the following sections.
choice is the position of the vertex in A whereas
• if the paraboloid opens upwards, it is one of the points in 5.1 Frog Problem
{(a 1 , . . . , a D A ) | a 1 , . . . , a D A ∈ {−1, 1}}. (17) The 2-dimensional frog problem [38] was already used in the eval-
uation of systems with similar goals as SupRB, namely GCS [39],
The parametrization choices of the matching classifiers are then XCSFCA [30] and XCSRCFA [14] (cf. Section 6 for more information
mixed using the same procedure based on their respective mean on these systems), which is why it was chosen for this work as well.
squared errors on {X , A}Ttrain which gives Essentially a reinforcement learning problem with episode length
Õ 1 and continuous states, actions and rewards, achieving maximal
â max (x) = дc â maxc (x). (18)
performance constitutes choosing an action which is equal to the
c ∈M (x )
situation. (
4.7 Summary of hyperparameters x + a, if x + a ≤ 1
P(x, a) = (19)
We now want to give a quick overview of all the hyperparameters 2 − (x + a), if x + a ≥ 1
introduced so far and a discussion of their impact. We trained and evaluated SupRB-1 with standard parameter
Test set size is the percentage of the training data used exclusively settings (cf. Table 1) and k = 0.1. After 100 generations with only
for evaluating the individuals’ fitness (see Section 4.1). This value 50 training and 50 validation examples, the fitness elitist was able to
is only really critical whenever there is only little data available as consistently (averaged over 30 runs) reach an MSE of less than 0.05
in that case, a trade-off has to be made between giving more data on choosing the optimal action when given states from a holdout
to the process of fitting local model predictions versus the problem evaluation set. Regarding predicting the quality of a state-action
structure optimization. By incorporating more sophisticated train- pair the MSE was below 0.03 on the same data. The fitness elitist
ing data organisation techniques, however, this hyperparameter of the final generation contained 36 classifiers.
loses some if not most of its impact. GCS, XCSFCA and XCSRCFA all evaluate the function 100,000
The size of the initial individuals corresponds to the initial num- times, which is considerably more than SupRB-1’s 100 evaluations
ber of classifiers in the system (see Section 4.4). Having enough which took place to generate the training data. However, this di-
genetic diversity from the start is extremely relevant, so a higher rect comparison is slightly unfair as the three other systems could
value is generally better. However, higher values naturally tend to probably also have used a sample far smaller than 100,000 with
increase the time until a compact solution is found. a similar training procedure: They showed system errors below
The GA’s population size is a less sensitive hyperparameter 0.05 after only 10,000 evaluated samples. Nevertheless, given the
as long as there exist enough classifiers at the very start. Higher few examples required for training SupRB-1, a high sample effi-
values allow the system to explore more search space in the same ciency is very likely. GCS stagnates at an error of 0.05 with about
number of generations while generations themselves need more 1400 classifiers, while XCSFCA achieves an error below 0.01 after
computation time. A similar argument goes for the number of 30,000 samples with 740 classifiers whereas XCSRCFA solves the
elitists: While it is certainly important to have some amount of problem perfectly after 18,000 samples using about 740 classifiers
elitism for most problems in order to not accidentally forget good [14]. It should be noted that SupRB-1 achieves a much greater rule
solutions, the actual amount of elitists in the population seems not compactness, while definitely performing worse in terms of overall
to be that relevant. function approximation error.
Last, but not least, k (see (12) and (13)) is the most impactful It can be assumed that the higher function approximation error
hyperparameter as it directly interferes with the used fitness mea- originates in the fact that SupRB-1 does not partition the search
sure and thus with the evolutionary pressures within the system. A space in A (see Section 3.2) and therefore has to fit the non-continuous
higher value (closer to 1) emphasizes the generation of less complex function with paraboloids, which can not achieve a perfect approx-
solutions while allowing for a higher error. In the long-term, this imation performance.
value should probably made dependant on the dimensionalities of
the input space, D X and D A , because in higher-dimensional spaces, 5.2 AM-Gauss
a slight deviation from the intended target can lead to comparably The AM-Gauss problem is a simplified model of an FDM-based
larger errors than in spaces of lower dimensionality. We expect this additive manufacturing (AM) process’s part quality and was created
5
hyperparameter impact default
validation set size usually low 0.5
individuals’ initial sizes medium to high 30
GA population size usually low 30
number of elitists depends on GA population size 1
k (fitness parameter) high too problem dependent
F (one-fifth rule parameter) low 1.05
crossover rate medium 0.9 (more results pending)
Table 1: Overview of hyperparameters of SupRB-1 and their default value.

using expert knowledge. The process itself consists of material sets each contained 2000 examples for training (1000 training and
(usually thermoplastic polymers) being melted and then extruded 1000 validation examples) and 1000 examples we held out for eval-
to gradually construct a part whose quality depends on a number of uation. On each of those data sets SupRB-1 was run once for 500
factors such as the temperature to which the material is heated. For generations using all standard parameters but initial individual
a given material (one dimension of X), the resulting part quality sizes of 50 and k = 10−6 and then evaluated; the results are shown
varies at increasing temperatures: Up until the melting point any in Figures 1 and 2. Note that, having 30 different functions to test
resulting part’s quality can be expected to be zero as no part can SupRB-1 on leads to a better estimate of its general performance at
be produced at all. With a further increase in temperature, quality the cost of having a higher variance of results than when repeatedly
tends to increase as well at a rate depending on material properties testing on a single function.
up until some—unknown—point where the material becomes too We compare SupRB-1’s results with those achieved by a two-
soft to remain in shape which degrades part quality. At even higher layer fully connected artificial neural network (ANN) trained on
temperatures, material might simply fail to successfully construct identical data and functions. We performed simple automated archi-
the part at all at which point quality can effectively be treated as tecture optimization on the ANN in terms of error during validation,
zero again. This relationship of material, temperature and resulting determining optimal architecture for the given problems at 512 and
quality can be simplified to a Gaussian function. 8 hidden cells respectively, while using ReLu activation functions
The FDM-based AM process we consider contains five contin- twice; model complexity was not factored into the architecture
uous (obviously a simplification by itself) situation dimensions: optimization strategy. We show the results of this architecture on
Material, printer, room temperature, humidity and the kind of part the holdout datasets as a baseline.
to produce. These situations interact with six continuous param- It can be seen in Figure 1a that SupRB-1’s quality predictions’
eters: Extrusion temperature, print bed temperature, cooling fan RMSE on holdout data improves rapidly over the first 50 gener-
speed, extruder movement speed, material retraction speed and ations and then seems to converge at around 1.02 in generation
retraction distance (the first four parameters are rather self explana- 100 which is on par with the ANN baseline. At around generation
tory; the latter two come into effect whenever the extruder can not 200, however, the error starts to increase again and later fluctu-
construct the part using continuous movement and has to move ates around a value of 1.1. The same behaviour can be observed
without extruding material). Assuming that every combination of on parametrization choices’ RMSE on holdout data (Figure 1b) al-
situation dimensions and parameters can be modeled by a Gaussian though the baseline is missed on that metric. The same can be
function as motivated above leads to the following overall model seen, however, when looking at the quality prediction’s RMSE on
for the quality function: the training data (Figure 1c); this means that the problem can be
detected and averted during training especially since the number
x of examples that are not matched by any classifier increases in a
©© 1 ªª
­­ .. ®® similar manner (Figure 2b).
­­ . ®®
­­x ®®
T ! When looking at the number of classifiers in SupRB-1’s elitist
yj yj
­­ ®® Õ    
q(y) = q ­­­­ 5 ®®®® = exp − − s P j,k −s (Figure 2a), a steady decrease up to a convergence at only 2-4 classi-
­­a 1 ®® j ∈1, ...,11, yk yk
fiers can be observed. By construction it is highly unlikely that the
­­ . ®®
­­ .. ®® k ∈1, ...,11, AM-Gauss problem can be solved satisfyingly by this few local mod-
­­ ®® k ,j
««a 6 ¬¬
els. We tried to alleviate that problem by during mutation adding a
(20) random classifier with a probability of 0.5 (this is also used for the
Here, the P j,k ’s are randomly generated positive semi-definite ma- shown runs)—but to no avail. It can be seen that, between genera-
trix in R2x 2 with eigenvalues in [0, 30] (ensures sensible scaling) and tions 100 and 200, the number of classifiers lies between 13 and 35
s is a randomly generated vector in [−1, 1]2 representing the shift which seems to be the sweet spot with the used hyperparameters.
of the Gaussian function. We did not include noise in our model, The fact that after finding that sweet spot model complexity still
however, an evaluation on more realistic noisy environments is decreases leads us to to believe that there is an issue with SupRB-1’s
already planned. fitness measure. And indeed: It accepts individuals with slightly
We generated 30 such functions from consecutive random seeds worse error in favour of a lower complexity (see (13)), which, when
and used these to create 30 sets of training data for SupRB-1. These applied repeatedly can result in classifier population deterioration
6
1.4 4.50
1.3
4.25
1.3
4.00 1.2
1.2
3.75 1.1
1.1 3.50
1.0
3.25
1.0
0.9
3.00
0 100 200 300 400 500 0 100 200 300 400 500 0 100 200 300 400 500
Generations Generations Generations

(a) Quality predictions on holdout data (b) Parametrization choices on holdout (c) Quality predictions on training data
with ANN baseline. data with ANN baseline. {X, A}T .

Figure 1: Root mean squared error (with standard deviation (SD)) of different metrics on SupRB-1’s elitist’s performance,
averaged over 30 random AM-Gauss problems.

perform function approximation by replacing the constant payoff


50
1500
prediction with a local linear model, thus performing a piecewise-
40
linear approximation of the overall function. The linear local models
30 1000
have subsequently been replaced by more complex models, such
20 500 as higher order polynomials [17], interpolation [28] or neural net-
10
0 works [16].
0 Pittsburgh-style systems perform well when following a super-
0 100 200 300 400 500 0 100 200 300 400 500
Generations Generations
vised learning setup and have usually been applied to classifica-
(a) Number of classifiers in (b) Number of unmatched tion/data mining problems. GALE [18] performed data mining for
SupRB-1’s elitist on AM-Gauss examples of training data various classification tasks such as the detection of breast cancer,
problem holdout data. {X, A}T . solving multiplexers and the classification of irises. NAX [19] has
been applied to the diagnosis of prostate cancer without human in-
Figure 2: Simplistic complexity and knowledge gap measure- put. GAssist [2, 9] was build for supervised learning of classification
ments with SD of SupRB-1’s elitist. tasks and uses a standard GA to evolve rules basing on GABIL [7].
As typical for Pittsburgh-style LCS, individuals consist of a set of
rules that represent a complete problem solution of variable length.
such as the one observed. This problem can easily be fixed by The solution returned at the end of training is the highest fitness
making (13) dependent on the error of the best individual ever seen. individual. The basic system uses discrete inputs and continuous
Besides, due to the k hyperparameter being more delicate than inputs get discretised into dynamically generated micro-intervals.
expected (see Section 4.7) we assume that the value we determined Not covered samples get predicted as a default class whose samples
for these runs was by far not optimal. were not used for training.
A more recent example of Pittsburgh-style systems for classifica-
6 RELATED WORK tion on discretised data is EDARIC [25]. It is designed to deal with
The SupRB learning system is inspired by previous research work both over-fitting and class-imbalance by evolving populations for
in the field of learning classifier systems. LCS have been applied to a each class separately and using ensemble techniques for unknown
diverse field of problems resulting in a diverse field of applications, samples. Generalisation is achieved by starting from maximally
e. g. function approximation [33, 37], complex multiplexers [32], specific rules and gradually deleting constraints on less relevant
robot kinematics [20, 27]. LCS research can be further divided input attributes. It was shown to perform well when compared
into Michigan- and Pittsburgh-style systems, with Michigan-style to XCS, decision trees and GAssist for a number of classification
systems featuring a GA operating on the level of individual rules, datasets.
where the entirety of rules represents a solution to the problem, BioHEL [3, 9], a descendent of GAssist leaving the traditional
while in Pittsburgh-style (also abbreviated as Pitt-style) systems the Pittsburgh-style behind, is using an iterative rule learning approach
GA operates on sets of rules, where each set represents a complete to learn continuous and discrete attributes for bioinformatic datasets.
solution to the problem. It uses XCSR’s hyper-rectangle representation for continuous inputs
The most famous and well researched family of LCS stem from and GABILs representation for discrete inputs. Fitness is based on
Wilson’s XCS classifier system [34] following the Michigan-style. GAssist’s fitness function with the addition of including coverage
XCSR [35] expanded rule representations and therefore input op- of rules. It uses the default classification mechanism of GAssist.
tions from ternary to continuous by using interval predicates; a In reinforcement learning real world applications can often not
representation used by many following systems such as XCSF [37], be represented by discrete actions. Thus, the field of continuous
BioHEL [3] and of course SupRB-1. XCSF is an extension for XCS to
7
actions in XCS has found much research. While the problem de- can be described linearly with ease. SupRB keeps model complexity
scribed in section 1 is not understood to be in a reinforcement low while keeping performance high (see Section 3.1), we can thus
learning context and would only be of a single step nature, the expect that simpler models will be chosen where appropriate, for
optimal parametrization follows a similar design principle to multi- example, by including model type into the mutation operator.
dimensional continuous actions. Wilson proposed three architec- In SupRB-1, the model structure is evolved by a GA, however,
tures [39]: IAL, a second XCSF interpolating the choices of the other algorithms such as CRO [15] are capable of improving an
decision making XCSF, CAC, an actor-critic approach, and GCS, underlying model structure. Although this would arguably make
where a continuous action is aggregated with the input and a func- SupRB no longer a strict representative of Pittsburgh-style LCS,
tion of both is learned using XCSF. In GCS the optimal action is the as this name is strongly linked to GAs, an investigation seems
action maximizing the learned function. The general approach of appropriate. Both improvement techniques for GAs (e. g. n-point-
GCS is thus related to SupRB with the important distinction that crossover, different crossover and mutation rates) and different
GCS matches on both action and state. model structure optimizers should thus be investigated. As explain-
Tran et al. introduce XCSFCA [30] as another way to deal with ability is a key feature of SupRB, we will compare it with other
continuous actions by computing the action directly from the input. machine learning techniques commonly seen as explainable such
XCSFCA approximates a function (X, (X → A)) → R which, as decision trees.
due to currying, corresponds to X → ((X → A) → R). Since In SupRB-1 the search space is not partitioned in A to increase
A is never a domain, XCSFCA only learns exactly one (the best explainability, as understanding a singular function is much eas-
regarding the quality measure) action â max (x) for each x ∈ X . That ier than understanding a combination—possibly including mixing
optimal â max (x) is modelled using a linear model which is optimized models—of multiple heterogeneous functions. Note, that under-
separately using (1+1)-ES. SupRB instead learns (X, A) → R which standing a single function even of low order polynomials is non-
can be written as X → (A → R). Thus it is able to approximate trivial. A hierarchical approach where multiple classifiers will be
the quality q(x, a) of every possible actions a ∈ A, which is far located within a classifier matching a situation will be investigated
more informative than only getting the best possible action, as in terms of performance and critically evaluated with regard to
the resulting action quality function q̂ could be analysed at will explainability.
afterwards. The same argument can be made for all systems using Finally, SupRB-1 should be applied to real industrial datasets.
computed actions. Besides the above, the structure of Tran et al.’s
system is deliberately close to the one of XCS(F). 8 CONCLUSIONS
Howard et. al. [13] expanded on the idea of computed actions We introduced the SupRB learning system, a general accuracy-
in XCSF by using a neural network to determine both matching based Pittsburgh-style LCS architecture for supervised learning
and actions from the given inputs. Iqbal et al. [14] also dealt with on continuous multi-dimensional decision problems. We laid the
continuous actions by computing them in XCSRCFA, where the ground work for further investigation of this system by clearly
action is represented by a code fragment of a two branches deep defining parametrization optimization, the task SupRB is primarily
binary tree that is evolved when creating new classifiers, similar meant to perform, describing the overall architecture and providing
to genetic programming. Naqvi and Browne [22] incorporated this a first, deliberately simplistic, implementation (SupRB-1) of it. Said
approach to solve symbolic regression problems. implementation was evaluated on a problem from the continuous-
Hashemnia et al. [10] incorporate continuous actions into XCSR action LCS literature as well as on an abstract, simplified model
to balance an unmanned bicycle in simulation. However, to choose of an industrial FDM manufacturing process. SupRB-1 has some
an action to execute they discretise the actions, determine a discrete shortcomings but these can be attributed to its simplicity. The
set by a fitness-weighted roulette wheel selection mechanism and overall approach shows a lot of prospect.
choose the continuous action of the fittest classifier within the
determined set.
ACKNOWLEDGMENTS
This work was in part supported by the German Federal Ministry
7 FUTURE WORK for Economic Affairs and Energy (BMWi).
Given that SupRB-1 was deliberately designed to implement SupRB
while being as simplistic as possible there are numerous additions REFERENCES
that can and will be made. Some of them are already in the works, [1] Anne Auger. 2009. Benchmarking the (1+1) Evolution Strategy with One-fifth
such as an expansion to hyper-ellipsoid conditions [5] and testing of Success Rule on the BBOB-2009 Function Testbed. In Proceedings of the 11th An-
nual Conference Companion on Genetic and Evolutionary Computation Conference:
different polynomial and non-polynomial local models such as sine, Late Breaking Papers (GECCO ’09). Association for Computing Machinery, New
exponential and radial basis functions (e. g. similar to the Gaussians York, NY, USA, 2447–2452. https://doi.org/10.1145/1570256.1570342
[2] Jaume Bacardit. 2004. Pittsburgh Genetic-based Machine Learning in the Data
already used as a testbed for SupRB-1). An expansion to neural Mining Era: Representations, Generalization, and Run-time. Ph.D. Dissertation.
networks seems plausible as well [16], although, in order to keep Universitat Ramon Llull.
the degree of explainability high, they have to be kept as simplistic [3] Jaume Bacardit, Edmund K. Burke, and Natalio Krasnogor. 2009. Improving the
Scalability of Rule-based Evolutionary Learning. Memetic Computing 1, 1 (01
as possible. Mar 2009), 55–67. https://doi.org/10.1007/s12293-008-0005-4
Further, an investigation of a heterogeneous model landscape [4] Richard P. Brent. 1973. Algorithms for Minimization Without Derivatives. Prentice-
seems desirable, as some parts of a function might be harder to ap- Hall.
[5] Martin V. Butz. 2005. Kernel-based, Ellipsoidal Conditions in the Real-valued
proximate even with very specific classifier conditions while others XCS Classifier System. In Proceedings of the 7th Annual Conference on Genetic and
8
Evolutionary Computation (GECCO ’05). Association for Computing Machinery, [22] Syed S. Naqvi and Will N. Browne. 2016. Adapting Learning Classifier Systems to
New York, NY, USA, 1835–1842. https://doi.org/10.1145/1068009.1068320 Symbolic Regression. In 2016 IEEE Congress on Evolutionary Computation (CEC).
[6] Martin V. Butz and Stewart W. Wilson. 2002. An Algorithmic Description of XCS. 2209–2216. https://doi.org/10.1109/CEC.2016.7744061
Soft Computing 6, 3 (Jun 2002), 144–153. https://doi.org/10.1007/s005000100111 [23] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M.
[7] Kenneth A. DeJong and William M. Spears. 1991. Learning Concept Classification Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cour-
Rules Using Genetic Algorithms. In Proceedings of the 12th International Joint napeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine
Conference on Artificial Intelligence, vol. 2. Morgan Kaufmann Publishers Inc., Learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830.
651–656. [24] Stuart Russell and Peter Norvig. 2009. Artificial Intelligence: A Modern Approach
[8] Jan Drugowitsch. 2007. Learning Classifier Systems from First Principles: A Proba- (3rd ed.). Prentice Hall Press, USA.
bilistic Reformulation of Learning Classifier Systems from the Perspective of Machine [25] Shubhra K. K. Santu, Mustafizur Rahman, Monirul Islam, and Kazuyuki Murase.
Learning. Ph.D. Dissertation. University of Bath (United Kingdom). 2014. Towards Better Generalization in Pittsburgh Learning Classifier Systems.
[9] María A. Franco, Natalio Krasnogor, and Jaume Bacardit. 2013. GAssist vs. In 2014 IEEE Congress on Evolutionary Computation (CEC). 1666–1673. https:
BioHEL: Critical Assessment of Two Paradigms of Genetics-based Machine //doi.org/10.1109/CEC.2014.6900388
Learning. Soft Computing 17, 6 (01 Jun 2013), 953–981. https://doi.org/10.1007/ [26] Stephen F. Smith. 1980. A Learning System Based on Genetic Adaptive Algorithms.
s00500-013-1016-8 Ph.D. Dissertation. USA. AAI8112638.
[10] Saeed Hashemnia, Masoud Shariat Panahi, and Mohammad Mahjoob. 2018. [27] Patrick O. Stalph and Martin V. Butz. 2012. Learning Local Linear Jacobians for
Continuous-action XCSR with Dynamic Reward Assignment Dedicated to Con- Flexible and Adaptive Robot Arm Control. Genetic Programming and Evolvable
trol of Black-Box Mechanical Systems. Asian Journal of Control 20, 1 (2018), Machines 13, 2 (01 Jun 2012), 137–157. https://doi.org/10.1007/s10710-011-9147-0
356–369. https://doi.org/10.1002/asjc.1659 [28] Anthony Stein, Simon Menssen, and Jörg Hähner. 2018. What about Interpo-
[11] John H. Holland. 1975. Adaptation in Natural and Artificial Systems. University lation? A Radial Basis Function Approach to Classifier Prediction Modeling in
of Michigan Press, Ann Arbor, MI, USA. second edition, 1992. XCSF. In Proceedings of the Genetic and Evolutionary Computation Conference
[12] John H. Holland. 1976. Adaptation. In Progress in Theoretical Biology. Vol. 4. (GECCO ’18). Association for Computing Machinery, New York, NY, USA, 537–544.
Academic Press, New York, 263–293. https://doi.org/10.1145/3205455.3205599
[13] Gerard D. Howard, Larry Bull, and Pier-Luca Lanzi. 2009. Towards Continuous [29] Christopher Stone and Larry Bull. 2003. For Real! XCS with Continuous-Valued
Actions in Continuous Space and Time Using Self-Adaptive Constructivism Inputs. Evolutionary Computation 11, 3 (Sep 2003), 299––336.
in Neural XCSF. In Proceedings of the 11th Annual Conference on Genetic and [30] Trung Tran, Cédric Sanza, Yves Duthen, and Thuc Nguyen. 2007. XCSF with Com-
Evolutionary Computation (GECCO ’09). Association for Computing Machinery, puted Continuous Action. Proceedings of GECCO 2007: Genetic and Evolutionary
New York, NY, USA, 1219–1226. https://doi.org/10.1145/1569901.1570065 Computation Conference, 1861–1869. https://doi.org/10.1145/1276958.1277327
[14] Muhammad Iqbal, Will N. Browne, and Mengjie Zhang. 2012. XCSR with Com- [31] Ryan J. Urbanowicz and Jason H. Moore. 2009. Learning Classifier Systems: A
puted Continuous Action. In AI 2012: Advances in Artificial Intelligence, Michael Complete Introduction, Review, and Roadmap. J. Artif. Evol. App. 2009, Article
Thielscher and Dongmo Zhang (Eds.). Springer Berlin Heidelberg, Berlin, Heidel- Article 1 (Jan. 2009), 25 pages.
berg, 350–361. [32] Ryan J. Urbanowicz and Jason H. Moore. 2015. ExSTraCS 2.0: Description and
[15] Albert Y. S. Lam and Victor O. K. Li. 2010. Chemical-reaction-inspired Meta- Evaluation of a Scalable Learning Classifier System. Evolutionary Intelligence 8, 2
heuristic for Optimization. IEEE Transactions on Evolutionary Computation 14, 3 (01 Sep 2015), 89–116. https://doi.org/10.1007/s12065-015-0128-8
(June 2010), 381–399. https://doi.org/10.1109/TEVC.2009.2033580 [33] Ryan J. Urbanowicz, Niranjan Ramanand, and Jason Moore. 2015. Continuous
[16] Pier-Luca Lanzi and Daniele Loiacono. 2006. XCSF with Neural Prediction. Endpoint Data Mining with ExSTraCS: A Supervised Learning Classifier System.
In 2006 IEEE International Conference on Evolutionary Computation. 2270–2276. In Proceedings of the Companion Publication of the 2015 Annual Conference on
https://doi.org/10.1109/CEC.2006.1688588 Genetic and Evolutionary Computation (GECCO Companion ’15). Association for
[17] Pier-Luca Lanzi, Daniele Loiacono, Stewart W. Wilson, and David E. Goldberg. Computing Machinery, New York, NY, USA, 1029–1036. https://doi.org/10.1145/
2005. Extending XCSF beyond Linear Approximation. In Proceedings of the 2739482.2768453
7th Annual Conference on Genetic and Evolutionary Computation (GECCO ’05). [34] Stewart W. Wilson. 1995. Classifier Fitness Based on Accuracy. Evolutionary
Association for Computing Machinery, New York, NY, USA, 1827–1834. https: Computation 3, 2 (1995), 149–175.
//doi.org/10.1145/1068009.1068319 [35] Stewart W. Wilson. 2000. Get Real! XCS with Continuous-Valued Inputs. In
[18] Xavier Llorà and Josep M. Garrell. 2001. Knowledge-Independent Data Mining Learning Classifier Systems, Pier-Luca Lanzi, Wolfgang Stolzmann, and Stewart W.
with Fine-Grained Parallel Evolutionary Algorithms. In Proceedings of the 3rd Wilson (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 209–219.
Annual Conference on Genetic and Evolutionary Computation (GECCO ’01). Morgan [36] Stewart W. Wilson. 2001. Mining Oblique Data with XCS. In Advances in Learning
Kaufmann Publishers Inc., San Francisco, CA, USA, 461–468. Classifier Systems, Pier Luca Lanzi, Wolfgang Stolzmann, and Stewart W. Wilson
[19] Xavier Llorà, Rohith Reddy, Brian Matesic, and Rohit Bhargava. 2007. Towards (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 158–174.
Better than Human Capability in Diagnosing Prostate Cancer Using Infrared [37] Stewart W. Wilson. 2002. Classifiers That Approximate Functions. Natural
Spectroscopic Imaging. In Proceedings of the 9th Annual Conference on Genetic and Computing 1, 2 (01 Jun 2002), 211–234. https://doi.org/10.1023/A:1016535925043
Evolutionary Computation (GECCO ’07). Association for Computing Machinery, [38] Stewart W. Wilson. 2004. Classifier Systems for Continuous Payoff Environments.
New York, NY, USA, 2098–2105. https://doi.org/10.1145/1276958.1277366 In Genetic and Evolutionary Computation – GECCO 2004, Kalyanmoy Deb (Ed.).
[20] Didier Marin, Jérémie Decock, Lionel Rigoux, and Olivier Sigaud. 2011. Learning Springer Berlin Heidelberg, Berlin, Heidelberg, 824–835.
Cost-Efficient Control Policies with XCSF: Generalization Capabilities and Fur- [39] Stewart W. Wilson. 2007. Three Architectures for Continuous Action. In Pro-
ther Improvement. In Proceedings of the 13th Annual Conference on Genetic and ceedings of the 2003-2005 International Conference on Learning Classifier Systems
Evolutionary Computation (GECCO ’11). Association for Computing Machinery, (IWLCS ’03–05). Springer-Verlag, Berlin, Heidelberg, 239–257.
New York, NY, USA, 1235–1242. https://doi.org/10.1145/2001576.2001743
[21] Brad L. Miller, David E. Goldberg, et al. 1995. Genetic Algorithms, Tournament
Selection, and the Effects of Noise. Complex systems 9, 3 (1995), 193–212.

You might also like