A Layered Learning Approach To Scaling in Learning Classifier Systems For Boolean Problems
A Layered Learning Approach To Scaling in Learning Classifier Systems For Boolean Problems
Abstract
Learning classifier systems (LCSs) originated from cognitive-science research but mi-
grated such that LCS became powerful classification techniques. Modern LCSs can
be used to extract building blocks of knowledge to solve more difficult problems in
the same or a related domain. Recent works on LCSs showed that the knowledge
reuse through the adoption of Code Fragments, GP-like tree-based programs, into
LCSs could provide advances in scaling. However, since solving hard problems often
requires constructing high-level building blocks, which also results in an intractable
search space, a limit of scaling will eventually be reached. Inspired by human problem-
solving abilities, XCSCF* can reuse learned knowledge and learned functionality to
scale to complex problems by transferring them from simpler problems using layered
learning. However, this method was unrefined and suited to only the Multiplexer
problem domain. In this paper, we propose improvements to XCSCF* to enable it to
be robust across multiple problem domains. This is demonstrated on the benchmarks
Multiplexer, Carry-one, Majority-on, and Even-parity domains. The required base ax-
ioms necessary for learning are proposed, methods for transfer learning in LCSs de-
veloped and learning recast as a decomposition into a series of subordinate problems.
Results show that from a conventional tabula rasa, with only a vague notion of what
subordinate problems might be relevant, it is possible to capture the general logic be-
hind the tested domains, so the advanced system is capable of solving any individual
n-bit Multiplexer, n-bit Carry-one, n-bit Majority-on, or n-bit Even-parity problem.
Keywords
Learning Classifier Systems, Code Fragments, Layered Learning, Scalability, Building
Blocks, Genetic Programming.
1 Introduction
Learning Classifier Systems (LCSs) were first introduced by Holland (1975) as cogni-
tive systems designed to evolve a set of rules. LCSs were inspired by the principles of
stimulus-response in cognitive psychology (Holland, 1975, 1976; Schaffer, 1985) for in-
teraction with environments. LCSs morphed from being platforms to study cognition
to become powerful classification techniques (Bull, 2015; Butz, 2006; Lanzi and Riolo,
2000).
• Develop methods such that learned knowledge and learned functionality can be
reused for Transfer Learning of Threshold Concepts through LL.
• Determine the necessary axioms of knowledge, functions and skills needed for
any system from which to commence learning.
• Demonstrate the efficacy of the introduced methods in complex domains, i.e. the
Multiplexer, Carry-one, Majority-on, and Even-parity domains.
2 Background
Figure 1 depicts the main highlights of XCS, a Michigan-style LCS developed by Wilson
(Wilson, 1995). On receiving a problem instance, a.k.a. an environment state, a match
set [M ] of classifiers that match the state is created from the rule population. Each
available action from the match set is assigned a possible payoff. Based on this array
of predicted payoffs, an action is chosen. The chosen action is used to form an action
set [A] from the match set. The system executes the chosen action and receives a cor-
responding reward ρ. The action set is updated regarding the reward and the Genetic
Algorithm (GA) may be applied (Wilson, 1995; Butz and Wilson, 2000). Subsumption
takes place before the offspring are added to the population. If the new population size
exceeds the limit, classifiers are chosen to be deleted until the population size is within
the valid size.
XCS differs from its predecessors in a number of key ways: (1) XCS uses the pre-
diction accuracy to estimate rule fitness, which promotes a solution encompassing a
Environment
State
Rule Selection
Population
Reward
Select Action Set
Parents
Update Current
Action Set
Insert
Progeny Genetic
Reinforcement
Algorithm
Update Previous
Action Set
full map of the problem via accurate and optimally general rules; (2) evolutionary op-
erations operate within niches instead of the whole population; and (3) unlike the tra-
ditional LCS, XCS has no message list and therefore it is suitable for learning Markov
environments only (Butz and Wilson, 2000; Wilson, 1995).
Using Reinforcement Learning (RL), XCS guides the population of classifiers to-
wards increasing usefulness via numerous parameters, e.g. fitness. The main uses of
RL are of this mechanism are: 1) identify classifiers that are useful in obtaining future
rewards; 2) encourage the discovery of better rules (Urbanowicz and Moore, 2009). RL
acts independently to covering, where in case there is an empty match set, new rules
are created to match the new situation (Bull, 2015). The rules or classifiers are com-
posed of two main parts, the condition and the action. Originally the condition part
utilised a ternary alphabet composed of: {0, 1, #} and the action part utilised the binary
alphabet {0, 1} (Urbanowicz and Browne, 2017).
0
If < Conditions > T hen < Actions >0 (1)
0 0
If < Input > T hen < Output > (2)
F unction(Arguments < Input > Return < Output >) (3)
Eq. 1 is the standard way that a classifier would process its conditions to achieve an
action, which is analogous to eq. (2). Eq. 3 is the analogy of a function. These functions
will take a number of arguments as their input (rule conditions) and will return an
output (the effected action of the ruleset) (Alvarez et al., 2014a).
The technique used in XCSCF2 places emphasis on user-specified problems, rather
than user-specified instances, which is a subtle but important change in emphasis in
EC approaches. That is, the function set is partly formed from past problems rather
than preset functions. The advantage of learning functions is that the related CFs (asso-
ciated building blocks) are also formed and transferred to the new problem, which can
bootstrap the search. However, this technique lacks a rich representation at the action
part, which will need adapting due to the different types of action values expected in
this current work, e.g. binary, integer, and bitstring.
Previously, other Boolean problems have been solved successfully by using tech-
niques similar to the proposed work. One of these is a general solution to the Parity
problem described in (Huelsbergen, 1998). The technique is similar to the proposed
work because it evolves a general solution that is capable of solving parity problems of
any length. It can also address repeating patterns similar to the loop mechanism of the
proposed work. On the other hand, this technique makes use of predefined functions
making it a top-down approach. The proposed technique learns new functions, making
it more flexible.
The preliminary work proposed XCSCF* with of various components (Alvarez
et al., 2016). Since different types of actions are expected, e.g. Binary, Integer, Real,
and String (Bitstring); it is proposed that the functions be created by a system with CFs
in the action (XCSCFA), although any rule production system can also be used, e.g.
XCS, XCSCFC, etc. This will facilitate the use of real and integer values for the action
as well as enabling it to represent complex functionality. The proposed solution will
reuse learned functionality at the terminal nodes as well as the root nodes of the CFs
since this has been shown to be beneficial for scaling. XCSR would not be helpful here
because on a number of the steps, the permitted actions are not a number but a string
e.g., kBitString. Moreover, XCSR with Computed Continuous Action would present
unnecessary complications to the work because the conditions of the classifiers do not
require real values (Iqbal et al., 2012). Accordingly, it is necessary to explore further
ways to expand the preliminary work to adapt to different domains.
3 The Problems
In this section, we provide an analytical introduction to the tested problems that en-
ables the training flow in layers. These flows help formalise the intermediate layers in
Section 3.4. The problem understanding also provides an initial guess of the required
building blocks (functions and skills) that should be provided beforehand to bootstrap
the learning progress of the system. Although even these pre-provided building blocks
can also be divided into more elemental knowledge, this work is not to imitate the
education of machine intelligence from scratch. Instead, this paper aims to show the
ability of XCSCF* to learn progressively more complex tasks, which resemble human
intelligence.
One of the underlying reasons for choosing the Boolean problem domains for the
proposed work is that humans can solve this kind of problems by naturally combining
functions from other related domains along with functionalities from other Boolean
problems. Humans are also able to reason that some functions in their ‘experiential
toolbox’ may be appropriate for solving the problem. The experiential toolbox is the
whole of learned functionality for the agent. These functions include multiplication,
addition, power, and the notion of a number line. Therefore, the agent here must build-
up its toolbox of functions and associated pieces of knowledge (CFs). A computer
program would make use of these functions and potentially many more, but it can not
intuit which are appropriate to the problem, and which are not. Therefore, the agent
will need guidance in its learning so that it may have enough cross-domain functions to
solve the problem successfully. It will need to perform well with more functions than
necessary as the exact useful functions may not be known a priori. However, at this
stage of paradigm development, the agent is not expected to be able to adjust to fewer
functions than necessary. The other reason is that Boolean problems are interrogatable
so that a solutions to problems at scales beyond enumeration can still be verified.
6-bit Multiplexer
Condition : Action
0 1 1 1 0 0 : 1
Address Data Channels
D0 D1 D2 D3 D4 D5
Figure 2: 6-bit Multiplexer problem showing the address bits and the data bits of the
condition, this distinction is not provided to the learning system.
Besides functions, the experiential toolbox will also contain skills. These are ca-
pabilities that the agent will have learned or will have been given beforehand; one ex-
ample is the looping skill. Skills, unlike functions, do not have a return value, but can
manipulate pointers to values (e.g. move around a bitstring). For example, a human
understands all the operations required for counting k number of bits, starting from the
left of the input string. Then a human would have to conceptualize how to convert the
address bits to decimal, which requires the ability to multiply and add. If we wanted
to increase the difficulty level, we could have the human determine the number of k
address bits required for a particular problem:
k = blog2 Lc (4)
Equation 4 determines the number of k address bits by using the length of the input.
In this case the person would need familiarity with the log base 2 function as well as
the floor function. A human would eventually determine the address bits with increas-
ing difficulty but a software system would have to learn this functionality before even
attempting to solve the n-bit Multiplexer problem.
Data Bit
Data channel
String(A)
Address Bits
int(k)
int(k)
Address
Length
Multiplexer
Input bitstring
Figure 3: Multiplexer training flow. Each stage of this flow is designed to obtain an
aspect of the logic behind the Multiplexer domain. These stages follow the analysis of
the Multiplexer problem in Section 3.
At each step, the system has access to leaf node candidates, hard-coded functions,
the learned CFs, and learned CF-ruleset functions. Table 1 shows a listing of all the
skills made available to the system along with their system tags (used to interpret re-
sults) and their input/output data types. This function list is anticipated to be useful
based on the above analysis of the benchmark problem domains. In addition to re-
quired skills, we also provide extra skills that complement the anticipated ones. These
skills could possibly provide unexpected solutions or at least test the ability of XCSCF*
to ignore redundant irrelevant skills.
It is important to note that the work presented here does not seek to provide a
learning plan for a system to follow and ultimately arrive at the solution to a given
problem. The aim here is to facilitate learning in a series of steps, where in this case
the learned functionality could potentially help a system to arrive at a general solution
to any set problem. In other words, it is important for the system to learn to mix and
match the different learned functions in a way contributive to learning; a way that
will produce a general solution. The number of subordinate problems can always be
increased in the future, e.g. learning basic functions such as an adder or a multiplier
via Boolean functions or even learning the log function via training data.
Carried Bit
Carry-one
A: first number Is Carried
B: second number
L: length of input Length(A+B)
Length of
Binary Sum
bitstring(A+B)
Binary Sum
bitstring(A) bitstring(B)
L/2
Half Length
Carry-one
Input bitstring
LCS
Constants
Skills
Figure 5: Training encompasses different types of functions, skills and axioms. The
experiential toolbox will contain general and problem specific learned functionality.
The question marks indicate the next domain and functionality learned from it.
sion of XCSCF* (Alvarez et al., 2016). The training data-set used consists of instances of
possible lengths and the corresponding number of address bits.
Is Carry-one - isCarried
This requires the general logic behind the Carry-one problem domain. It is to determine
whether 1 is carried at the highest bit when adding the binary number of the first half
and the binary number of the second half. The scales of this problem were set to vary
from 2-bit to 12-bit.
Is Majority On - isMajorityOn
This problem is the final stage of training the Majority-on domain. It expects a returned
value of T rue if more than half the bits in the input are on (1), and F alse otherwise. The
size of input bitstrings are randomly selected from the range of [1, 7] bits.
This work disrupts the standard learning paradigms in EC, where the goal is to learn
abilities using a top-down approach, by aligning it with LL. The proposed work uses a
bottom-up approach by learning functions and using parts or entire functions to solve
more difficult problems. In other words, the method here is to specify the order of
problems/domains (together with robust parameter values) while allowing the system
to automatically adjust the terminal set through feature construction and selection, and
ultimately develop the function set. This is analogous to a school teacher determining
the order of threshold concepts for a student in a curricula (Meyer and Land, 2006). The
system can use learned rule-sets as functions along with the associated building blocks,
i.e. CFs, that capture any associated patterns; this is an advantage over pre-specifying
functionality.
This method modifies the intrinsic problem from finding an overarching ‘single’
solution that covers all instances or features of a problem to finding the structure (links)
of sub-problems that construct the overall solution. Learning the underlying patterns
that describe the domain is anticipated to be more compact and reusable as they do
not grow as the domain scales (unlike individual solutions that can grow impractically
large as the problem grows, e.g. DNF solutions to the Multiplexer problem).
We employed an adapted XCSCFA as the algorithm for the agents of XCSCF* that
learn subproblems. This XCSCF* has the type-fitting property that can: (1) verify the
type compatibility between connected nodes within generated CFs; and (2) the output
type of CFs is compatible with the required actions of the current problem environ-
ment2 .
2 There are sufficient novel contributions to XCSCF* to warrant a new acronym, but as the old one is now
superseded and the LCS field already has many acronyms, XCSCF* is retained.
Algorithm 1 Typed CFs are generated based on a recursive function for generating
nodes. The function is given the set of action types Ta , the type set of base CFs Tb , the
expected output types To , the expected input types Ti , the intermediate level li , and a
clustered set of all functions Sf .
1: procedure GEN N ODE(To , Ti , li )
2: Ti0 = φ
3: if li = 2 then
4: Output types To = Ta
5: if li = 1 then
6: Output types Ti0 = Tb ∪ {integer}
7: Filter function set Sf iltered from Sf by required output types To and input types
Ti
8: Function f = randomSelect(Sf iltered )
9: for index i in f.inputs do
10: if li − f.level > 0 and random[0, 1) < 0.5 then
11: f.inputs[i]=GEN N ODE(f.input types[i],Ti0 , li − f.level)
12: else
13: Set of compatible base CFs SbCF = φ
14: if integer ∈ f unction.input types[i] then
15: c = randomSelect([1, ..., len(Atts)])
16: for cfbase in all base CFs do
17: if cfbase .out types&f.input types[i] 6= φ
then
18: Add cfbase to SbCF
19: f.inputs[i] = randomSelect(SbCF )
compatibility rules among the four value types (Binary, Integer, Real, and String). We
followed the sense of numerics as well as the type compatibility of the programming
language (Python) to devise compatibility rules among types. Boolean variables are
compatible with integers and floats, and integers are compatible with floats, the com-
patibility does not follow the opposite way. Lists are not compatible with other types.
5 Results
5.1 Experimental Setup
The experiments were executed 30 times with each having an independent random
seed. The stopping criterion was when the agent completed the number of training
instances allocated, which were chosen based on preliminary empirical tests on the
convergence of systems. The proposed systems were compared with XCSCFC and XCS.
The settings for the experiments are common to the LCS field (Urbanowicz and Browne,
2017) and similar to the settings of the previous version of XCSCF* (Alvarez et al.,
2016). They were as follows: Payoff 1, 000; the learning rate β = 0.2; the Probability
of applying crossover to an offspring χ = 0.8; the probability of mutation µ = 0.04;
the probability of using a don’t care symbol when covering P don0 tCare = 0.33; the
experience required for a classifier to be a subsumer Θsub = 20; the initial fitness value
when generating a new classifier FI = 0.01; the fraction of classifiers participating in a
tournament from an action set 0.4. In addition, error threshold 0 was set to 10.0. This
new XCSCF* naively uses the same population size N = 1000 for all problems.
1.0
1.0
0.8
0.8
0.8
Performance
Performance
Performance
0.6
0.6
0.6
0.4
0.4
0.4
0.2
0.2
0.2
0.0
0.0
0.0
0 100 200 300 400 500 0 100 200 300 400 500 0 100 200 300 400 500
1.0
1.0
0.9
0.8
0.8
0.8
Performance
Performance
Performance
0.6
0.6
0.7
0.4
0.4
0.6
0.2
0.2
0.5
0.0
0.0
0.4
0 100 200 300 400 500 0 100 200 300 400 500 0 100 200 300 400 500
Figure 9 shows that only the proposed system XCSCF* and XCSCFC were able
to solve the 135-bit Multiplexer problem. These experiments followed the standard
explore and exploit phases of XCS. This shows scaling by relearning, but it is the cap-
turing of the underlying patterns without retraining, which is the aim of this work.
1.0
1.0
1.0
0.8
0.8
0.8
Performance
Performance
Performance
0.6
0.6
0.6
0.4
0.4
0.4
0.2
0.2
0.2
0.0
0.0
0.0
0 100 200 300 400 500 0 100 200 300 400 500 0 100 200 300 400 500
1.0
1.0
0.9
0.8
0.8
0.8
Performance
Performance
Performance
0.6
0.6
0.7
0.4
0.4
0.6
0.2
0.2
0.5
0.0
0.0
0.4
0 100 200 300 400 500 0 100 200 300 400 500 0 100 200 300 400 500
Tests were conducted on the final rules produced by the final subproblem of the
Multiplexer, the Carry-one, the Even-parity, and the Majority-on domains to determine
if they were general enough to solve the corresponding problems at very large scales.
Table 4 shows that the rule produced by the small-scale Multiplexer problem was able
to solve the 1034-bit and even the 8205-bit Multiplexer problems 3 . Similarly, the final
rules of the final subproblem of other domains also achieved 100% accuracies on cor-
responding problems at all tested large scales. The system used to test the generality
of the rules was in straight exploitation: there was no covering, rule generation, or rule
update.
1.0
1.0
1.0
0.9
0.9
0.9
0.8
0.8
0.8
Performance
Performance
Performance
0.7
0.7
0.7
0.6
0.6
0.6
0.5
0.5
0.5
0.4
0.4
0.4
0 100 200 300 400 500 0 100 200 300 400 500 0 100 200 300 400 500
Figure 8: Learning curves of the subproblems of the Even-parity ((a) and (b)) and
Majority-on domains (c). The Majority-on domain also utilises the “Half Length”
subproblem of the Carry-one domain (7a).
Table 4: Accuracy tested on large-scale problems reusing solutions from final subprob-
lems without training.
Problems Accuracies
1034-bit Multiplexer 100%
8205-bit Multiplexer 100%
100-bit Carry-one 100%
200-bit Carry-one 100%
50-bit Even-parity 100%
100-bit Even-parity 100%
50-bit Majority-on 100%
105-bit Majority-on 100%
“Multiplexer Data Bit” problem, the final subproblem of the Multiplexer domain, is
illustrated in Figure 10. Function nodes follow the function tags in Table 1. The dashed
boxes in this figure are the reused learned functions with names defined in Figure 2.
The tree in Figure 10 is the rule action of the one compacted rule for the n-bit
Multiplexer problem. It accumulates a high-level function with many nested-layers of
complexity. This complex tree can encapsulate the logic behind the n-bit Multiplexer
problem through the guidance of all Multiplexer subproblems. For instance, the main
building block of this tree is in the code fragment CF 61, which provides the data bit
position in the input bitstring using the function dc learned from the “Data Bit Posi-
tion” problem. This function dc is also a complex function involving an addition (+)
of the outputs from two reused functions within it, k from the “Multiplexer Address
Length” problem and b2d from the “Multiplexer Data Channel” problem. The function
b2d converts the binary-string output of the function ks from the “Multiplexer Address
Bits” problem to a decimal value. ks returns the first “Multiplexer Address Length”
bits from the attlst (the input bitstring) using the function k. The function k is also
nested function reusing a simpler function kl from the “Multiplexer Address Length”
1.0
[1]
[2]
0.8
0.6
[3]
Performance
0.4
0.2
Instances (x 1000)
problem (with Multiplexer scale as the input). The block k is reused twice in the final
solution M @ for the Multiplexer domain. The logic of the n-bit Multiplexer problem
in the compacted rule with M @ in Table 5 was validated on the 1034-bit and 8205-bit
Multiplexer problems (see Table 4).
Table 5: Final rules learned before compaction while solving the “Multiplexer Data Bit”
problem, the final subproblem of the Multiplexer domain, cf. Figure 10.
Condition Action
# # ... # # attlst CF 61 @
Other final rules of the Carry-one, Majority-on, and Even-parity domains also
achieve maximal generality with all “don’t-care” bits in the condition part. These rules
were also validated on the corresponding domains at very large scales. The trees in the
rule actions of these final rules are illustrated respectively in Figure 11, 12, and 13. Be-
sides the Multiplexer domain, the Carry-one problem domain requires six subproblems
to obtain the final logic, which resulted in a high complexity of the rule action. The final
function iC has five distinct nested functions within it and three occurrences of func-
tion h. The complexity of the solution for the n-bit Carry-one problem is equivalent to
the complexity of function for the n-bit Multiplexer.
As the training flows of the Majority-on and Even-parity domains are straightfor-
ward, XCSCF* also discovered simpler rule actions in the final rules. XCSCF* yielded
several different solutions for the Even-parity domains. The two most popular ones
are illustrated in Figure 13. Solution 1 in this Figure appeared in most runs. Another
solution found in only two runs is identical in logic to the solution 2, but the node c1
(constant CF of value 1) is replaced with another CF that uses the division operator
MUX
output
Multiplexer (M@)
@
kl CF61
+
k
ks
[ 2d
b2d
dc
{ (
[
L
attlst
Figure 10: Multiplexer solution. Function nodes follow the tags in Table 1. This solution
uses nested learned functionalities in dashed boxes, which follow the tags in Table 2.
6 Discussions
It can be said that the reason XCSCF* is capable of solving problems to a much larger
scale than previously is that human knowledge separated the problem into appropriate
and simpler sub-problems. Nevertheless, it is still a difficult task to learn each sub-task
in such a manner that the learned knowledge/functionality could be transferred and
then to learn to combine these blocks effectively. It is considered that the solutions of
the tested problem domains, i.e. the Multiplexer, Carry-one, Even-parity, and Majority-
on domains, yielded by XCSCF* contain the general logic of these domains and can
solve these problems at any scale.
The way that humans select sub-problems is similar to that of humans selecting
function sets in standard EC approaches where too few or inappropriate selection pre-
vents effective learning, while selecting too many unnecessary components could in-
hibit training. In these experiments, a number of redundant functions, such as the
ceiling and the multiplication, and functions useful for only one specific problem do-
main, were never used by the final evolved solutions. XCSCF*, however, can identify
the correct combination of accumulated knowledge to build complex solutions for the
tested tasks.
Carry-one
output
Carry-one (iC)
>
L /
L c2
⊕
( ) h
Sh
/ / St
S+
L c2 L c2
L+
attlst
Majority-on output
Majority-on (iM)
>
sum /
L c2
h
attlst
Two main components of XCSCF* enables it to solve the tested problems fully.
First, the supply of constants furnishes the required functionalities in the Carry-one,
Even-parity, and Majority-on domains, as shown in the CFs of the final solutions. Sec-
ond, the availability of the CF attlst also contributes to solving the Carry-one, Majority-
on, and Even-parity domains because it provides appropriate input for general func-
tions, e.g. StringSum, HeadList, and T ailList. It is argued that we can still input
the environment state implicitly to all such functions. However, this method creates
the complication of defining the environment state when these functions are nested
output output
% c1 %
attlst attlst
Figure 13: Two most common solutions of the Even-parity domain. Solution 1 is more
popularly discovered than solution 2.
in rule-set functions. Furthermore, deciding which functions should take the environ-
ment state by default and which functions should choose other string inputs requires
extra human intervention. An extra benefit of using attlst is that XCSCF* can now
solve variable-scale problems in the tested domain. Previously, supplying a constant L
meant that the problem scale could not change.
It is evident that the proposed work has benefited from the transfer of learned
information from each of the sub-problems. Reusing functionalities enables the system
to achieve neat and abstract solutions although these solutions are actually complex
without bloat when fully expanded. Although a defined recipe was not furnished to
the system, it was able to form logical determinations as to the flow of the accumulated
functionality, see Figure 10. This property of the system is similar to deriving a set of
Threshold Concepts where significant learning towards the final target problem only
advances once the proper chain of functionality is formed and evaluated.
vided attribute list, a more general input to replace the human-generated constant L.
This explicit connections allows these functions to take any string-type inputs. There-
fore, new XCSCF* provides more flexible logic and reduces the need for customisation
to a given task. Also, the type-fitting property assures that generated CFs are compat-
ible within themselves as well as with the target problem, which results in the ability
to divide the search space by input and output types of available functions. Thus, this
style of learning system can have access to more functionality than necessary for a sin-
gle problem, without inhibited learning.
The general solutions from XCSCF* was validated by solving very difficult prob-
lems like the n-bit Multiplexer, n-bit Carry-one, n-bit Even-parity, and n-bit Majority-
on problems. Although the aforementioned problems are comprised of a vastly
sized search space, the proposed technique successfully discovered a minimal num-
ber (mostly one) of general rules. An advancement of this work was that the logic of
complex problems was captured by simple trees when being described by the learned
functionalities. However, once fully expanded, the CF trees contain certain complex
nested patterns. Thus, LL can facilitate interpreting complex tree solutions using the
intermediate learned components from the intermediate stages of LL.
Future work seeks to create a continuous-learning system with base axioms and a
number of problems, including their possible subproblems, to be solved in a parallel
architecture simultaneously. The ‘toolbox’ of functions (learned functions and axioms)
plus the complementary knowledge (CFs) will grow as problems are solved and will
be available for addressing future problems. The linked knowledge of solved problems
would demonstrate interesting meta-knowledge, a form of learning curricula, and pos-
sible relationships among known problems, such as n-bit Multiplexer, n-bit Carry-one,
etc. Furthermore, the research question is whether an XCS-based system with LL or
parallel learning can solve real-valued datasets. The first thing to consider is to estab-
lish real-valued datasets that furnish LL.
References
Alvarez, I. M., Browne, W. N., and Zhang, M. (2014a). Reusing learned functionality in XCS:
Code fragments with constructed functionality and constructed features. In Proceedings of the
Companion Publication of the 2014 Annual Conference on Genetic and Evolutionary Computation,
GECCO Comp 14, page 969976, New York, NY, USA. Association for Computing Machinery.
Alvarez, I. M., Browne, W. N., and Zhang, M. (2014b). Reusing learned functionality to address
complex boolean functions. In Simulated Evolution and Learning, Lecture Notes in Computer
Science, pages 383–394. Springer International Publishing.
Alvarez, I. M., Browne, W. N., and Zhang, M. (2016). Human-inspired scaling in learning clas-
sifier systems: Case study on the n-bit multiplexer problem set. In Proceedings of the Genetic
and Evolutionary Computation Conference 2016, GECCO 16, page 429436, New York, NY, USA.
Association for Computing Machinery.
Bull, L. (2015). A brief history of learning classifier systems: from CS-1 to XCS and its variants.
Evolutionary Intelligence, 8(2-3):55–70.
Butz, M. V. (2006). Rule-Based Evolutionary Online Learning Systems. Number v. 191 in Studies in
fuzziness and soft computing. Springer-Verlag, Berlin, Germany. OCLC: ocm61219110.
Falkner, N. J. G., Vivian, R. J., and Falkner, K. E. (2013). Computer science education: The first
threshold concept. In 2013 Learning and Teaching in Computing and Engineering, pages 39–46.
IEEE.
Feng, L., Ong, Y.-S., Tan, A.-H., and Tsang, I. W. (2015). Memes as building blocks: a case study
on evolutionary optimization + transfer learning for routing problems. Memetic Computing,
7(3):159–180.
Holland, J. H. (1975). Adaptation in natural and artificial systems: An introductory analysis with appli-
cations to biology, control, and artificial intelligence. Adaptation in natural and artificial systems:
An introductory analysis with applications to biology, control, and artificial intelligence. The
University of Michigan Press, Ann Arbor, Oxford, England.
Huelsbergen, L. (1998). Finding general solutions to the parity problem by evolving machine-
language representations. Genetic Programming, pages 158–166.
Ioannides, C. and Browne, W. (2008). Investigating scaling of an abstracted LCS utilising ternary
and S-expression alphabets. In Bacardit, J., Bernadó-Mansilla, E., Butz, M. V., Kovacs, T.,
Llorà, X., and Takadama, K., editors, Learning Classifier Systems, pages 46–56. Springer Berlin
Heidelberg, Berlin, Heidelberg.
Iqbal, M., Browne, W. N., and Zhang, M. (2012). XCSR with computed continuous action. In
AI 2012: Advances in Artificial Intelligence, pages 350–361, Berlin, Heidelberg. Springer Berlin
Heidelberg.
Iqbal, M., Browne, W. N., and Zhang, M. (2013a). Evolving optimum populations with XCS
classifier systems. Soft Computing, 17(3):503–518.
Iqbal, M., Browne, W. N., and Zhang, M. (2013b). Extending learning classifier system with
cyclic graphs for scalability on complex, large-scale boolean problems. In Proceedings of the
15th Annual Conference on Genetic and Evolutionary Computation, GECCO 13, page 10451052,
New York, NY, USA. Association for Computing Machinery.
Iqbal, M., Browne, W. N., and Zhang, M. (2013c). Learning overlapping natured and niche im-
balance boolean problems using XCS classifier systems. In 2013 IEEE Congress on Evolutionary
Computation, pages 1818–1825. IEEE.
Iqbal, M., Browne, W. N., and Zhang, M. (2014). Reusing building blocks of extracted knowledge
to solve complex, large-scale boolean problems. IEEE Transactions on Evolutionary Computation,
18(4):465–480.
Koza, J. R. (1991). A hierarchical approach to learning the boolean multiplexer function. 1:171 –
192.
Lanzi, P. L. and Riolo, R. L. (2000). A roadmap to the last decade of learning classifier system
research (from 1989 to 1999). In Lanzi, P. L., Stolzmann, W., and Wilson, S. W., editors, Learning
Classifier Systems, pages 33–61, Berlin, Heidelberg. Springer Berlin Heidelberg.
Meyer, J. H. F. and Land, R. (2006). Overcoming Barriers to Student Understanding: Threshold con-
cepts and troublesome knowledge. Routledge.
Nguyen, T. B., Browne, W. N., and Zhang, M. (2019a). Improvement of code fragment fitness to
guide feature construction in XCS. In Proceedings of the Genetic and Evolutionary Computation
Conference, GECCO 19, page 428436, New York, NY, USA. Association for Computing Machin-
ery.
Nguyen, T. B., Browne, W. N., and Zhang, M. (2019b). Online feature-generation of code frag-
ments for XCS to guide feature construction. In 2019 IEEE Congress on Evolutionary Computation
(CEC), pages 3308–3315. IEEE.
Pan, S. J. and Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on Knowledge and
Data Engineering, 22(10):1345–1359.
Price, C. J. and Friston, K. J. (2005). Functional ontologies for cognition: The systematic definition
of structure and function. Cognitive Neuropsychology, 22(3-4):262–275.
Schaffer, J. D. (1985). Learning multiclass pattern discrimination. In Proceedings of the 1st Interna-
tional Conference on Genetic Algorithms, page 7479, USA. L. Erlbaum Associates Inc.
Stone, P. and Veloso, M. (2000). Layered learning. In López de Mántaras, R. and Plaza, E., editors,
Machine Learning: ECML 2000, pages 369–381, Berlin, Heidelberg. Springer Berlin Heidelberg.
Urbanowicz, R., Granizo-Mackenzie, A., and Moore, J. (2012). Instance-linked attribute tracking
and feedback for michigan-style supervised learning classifier systems. In Proceedings of the
14th Annual Conference on Genetic and Evolutionary Computation, GECCO 12, page 927934, New
York, NY, USA. Association for Computing Machinery.
Urbanowicz, R. J. and Browne, W. N. (2017). Introduction to Learning Classifier Systems. Springer-
Briefs in Intelligent Systems. Springer-Verlag, Berlin Heidelberg.