0% found this document useful (0 votes)
9 views25 pages

A Layered Learning Approach To Scaling in Learning Classifier Systems For Boolean Problems

This paper presents improvements to the XCSCF* learning classifier system (LCS) to enhance its scalability across multiple Boolean problem domains, including Multiplexer, Carry-one, Majority-on, and Even-parity. By employing a layered learning approach, the authors aim to enable the reuse of learned knowledge and functionality, facilitating the solving of complex problems through a structured progression of simpler subproblems. The proposed methods demonstrate the potential for continuous learning systems to transition from theoretical Boolean tasks to practical real-world applications.

Uploaded by

edwcaran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views25 pages

A Layered Learning Approach To Scaling in Learning Classifier Systems For Boolean Problems

This paper presents improvements to the XCSCF* learning classifier system (LCS) to enhance its scalability across multiple Boolean problem domains, including Multiplexer, Carry-one, Majority-on, and Even-parity. By employing a layered learning approach, the authors aim to enable the reuse of learned knowledge and functionality, facilitating the solving of complex problems through a structured progression of simpler subproblems. The proposed methods demonstrate the potential for continuous learning systems to transition from theoretical Boolean tasks to practical real-world applications.

Uploaded by

edwcaran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

A Layered Learning Approach to Scaling in

Learning Classifier Systems for Boolean


Problems

Isidro M. Alvarez [email protected]


arXiv:2006.01415v1 [cs.NE] 2 Jun 2020

Trung B. Nguyen [email protected]


Will N. Browne [email protected]
Mengjie Zhang [email protected]
School of Engineering and Computer Science, Victoria University of Wellington, Kel-
burn, Wellington 6140, New Zealand

Abstract
Learning classifier systems (LCSs) originated from cognitive-science research but mi-
grated such that LCS became powerful classification techniques. Modern LCSs can
be used to extract building blocks of knowledge to solve more difficult problems in
the same or a related domain. Recent works on LCSs showed that the knowledge
reuse through the adoption of Code Fragments, GP-like tree-based programs, into
LCSs could provide advances in scaling. However, since solving hard problems often
requires constructing high-level building blocks, which also results in an intractable
search space, a limit of scaling will eventually be reached. Inspired by human problem-
solving abilities, XCSCF* can reuse learned knowledge and learned functionality to
scale to complex problems by transferring them from simpler problems using layered
learning. However, this method was unrefined and suited to only the Multiplexer
problem domain. In this paper, we propose improvements to XCSCF* to enable it to
be robust across multiple problem domains. This is demonstrated on the benchmarks
Multiplexer, Carry-one, Majority-on, and Even-parity domains. The required base ax-
ioms necessary for learning are proposed, methods for transfer learning in LCSs de-
veloped and learning recast as a decomposition into a series of subordinate problems.
Results show that from a conventional tabula rasa, with only a vague notion of what
subordinate problems might be relevant, it is possible to capture the general logic be-
hind the tested domains, so the advanced system is capable of solving any individual
n-bit Multiplexer, n-bit Carry-one, n-bit Majority-on, or n-bit Even-parity problem.
Keywords
Learning Classifier Systems, Code Fragments, Layered Learning, Scalability, Building
Blocks, Genetic Programming.

1 Introduction
Learning Classifier Systems (LCSs) were first introduced by Holland (1975) as cogni-
tive systems designed to evolve a set of rules. LCSs were inspired by the principles of
stimulus-response in cognitive psychology (Holland, 1975, 1976; Schaffer, 1985) for in-
teraction with environments. LCSs morphed from being platforms to study cognition
to become powerful classification techniques (Bull, 2015; Butz, 2006; Lanzi and Riolo,
2000).

c 2020 by the Massachusetts Institute of Technology Evolutionary Computation x(x): xxx-xxx


I M Alvarez et al.

An important strength of LCSs is their capability to subdivide the problem into


niches that can be solved efficiently. This is made possible by integrating generality
into the rules produced. This pressure towards generality means that one classifier
could be a solution to a bigger set of problem instances. In the proposed work, we seek
to go beyond what the Michigan-style LCS currently offers with its niching strength.
We start with XCS, an accuracy-based LCS, which creates accurate building blocks of
knowledge for an experienced niche to develop a system to scale to problems of any
size in a domain (Wilson, 1995; Butz and Wilson, 2000).
Although LCS techniques have facilitated progress in the field of machine learn-
ing, they had a fundamental weakness. Each time a solution is produced for a given
problem, the techniques tend to ‘jettison’ any learned knowledge and must start from
a blank slate when tasked with a new problem.
The field of Developmental Learning in cognitive systems contains an idea known
as the Threshold Concept (Falkner et al., 2013). This idea conveys the fact that in human
learning there exist certain pieces of knowledge that are transformative in advocating
the learning of a task. These concepts need to be learned in a particular order, thus
providing the learner with viable progress towards learning more difficult ideas at a
faster pace than otherwise. For instance, humans are taught mathematics in a certain
progression; arithmetic is taught before trigonometry, and these two are taught before
calculus. The empirical evidence indicates that this sequence will be more effective in
fostering the learning of progressively more difficult mathematics (Falkner et al., 2013).
Related to the benefits of the threshold concept are Layered Learning (LL) and
Transfer Learning (TL) in artificial systems. In LL, a sequence of knowledge is learned
(Stone and Veloso, 2000). LL requires crafting a series of problems, which enables the
learning agent to learn successively harder problems. The benefits of TL are actualised
when learning from one domain is transferred to aid learning knowledge to a similar
or related domain 1 . In essence, TL aims to extract the knowledge from one or more
source tasks and apply the knowledge to the target task (Feng et al., 2015).
Current LCSs can be utilised to extract building blocks of knowledge in the form of
GP-like trees, called Code Fragments (CFs). TL can then reuse these building blocks to
solve more difficult problems in the same or a related domain. The past work showed
that the reuse of knowledge through the integration of CFs into XCS, as a framework,
can provide dividends in scaling (Iqbal et al., 2014).
Numerous systems using CFs have been developed. XCSCFC is a system that has
extended XCS by replacing the condition of the classifiers with a number of CFs (Iqbal
et al., 2014). Although XCSCFC exhibits better scalability than XCS, eventually, a com-
putational limit in scalability will be reached (Iqbal et al., 2013b). The reason for this is
that multiple CFs can be used at the terminals, as the problem increases in size, then
any depth of tree could be created. Instead of using CFs in rule conditions, XCSCFA
integrates CFs in the action part of classifiers (Iqbal et al., 2013a). This method pro-
duced optimal populations in both discrete domain problems and continuous domain
problems. However, XCSCFA lacked scaling to very large problems, even where they
had repeated patterns in the data.
In the preliminary work, XCSCF* (Alvarez et al., 2016) has applied the threshold
concepts, LL, and TL to enable it to solve the n-bit Multiplexer problem. However, this
was only a single domain, so the question remains: was the approach robust and easy
to implement across multiple domains? Furthermore, the system output was human
interpretable after two days’ work, where it is needed to generate more transparent
1 Some fields define TL as transferring the underlying model (Pan and Yang, 2010)

2 Evolutionary Computation Volume x, Number x


A Layered Learning Approach for LCSs

solutions to n-bit problems. It is also important to discover ontologies of functions that


will map to numerous, heterogeneous patterns in data at scale. This will aid in evolving
a compact and optimal set of classifiers at each of the proposed steps (Price and Friston,
2005). This work requires hand-crafted layers, where it is usual for humans to specify
the problems for learning systems.
In this paper, we aim to develop improvements to XCSCF* that enables it to solve
more general Boolean problems using LL. The idea behind this system is still to learn
progressively more complex problems using hand-crafted layers. For each tested prob-
lem domain, i.e. the Multiplexer, Carry-one, Majority-on, and Even-parity domains, we
propose a series of subproblems to enable the LL system to evolve the complex logic be-
hind the tested problems. The Multiplexer and Carry-one problems are ones that lend
themselves for research because they are difficult, highly non-linear and have epistasis.
In the Multiplexer domain, the importance of the data bits is dependent on the address
bits, while, in the Carry-one domain, the first bits of the two half bitstrings occur more
frequently with larger niches in the search space. The Majority-on domain is known for
its highly overlapped niches, which tend to overwrite optimal rules with over-general
ones. Lastly, the Even-parity domain usually requires complex combinations of input
attributes to generalise.
The specific research objectives are as follows:

• Develop methods such that learned knowledge and learned functionality can be
reused for Transfer Learning of Threshold Concepts through LL.

• Determine the necessary axioms of knowledge, functions and skills needed for
any system from which to commence learning.

• Demonstrate the efficacy of the introduced methods in complex domains, i.e. the
Multiplexer, Carry-one, Majority-on, and Even-parity domains.

It is hypothesised that crafting solutions at low-scale problems that scale to any


problem in a domain is more plausible and practical than tackling each individually
large-scale problem. This is considered a necessary step towards continuous learning
systems, which will transition from interrogatable Boolean systems to practical real-
world classification tasks.

2 Background
Figure 1 depicts the main highlights of XCS, a Michigan-style LCS developed by Wilson
(Wilson, 1995). On receiving a problem instance, a.k.a. an environment state, a match
set [M ] of classifiers that match the state is created from the rule population. Each
available action from the match set is assigned a possible payoff. Based on this array
of predicted payoffs, an action is chosen. The chosen action is used to form an action
set [A] from the match set. The system executes the chosen action and receives a cor-
responding reward ρ. The action set is updated regarding the reward and the Genetic
Algorithm (GA) may be applied (Wilson, 1995; Butz and Wilson, 2000). Subsumption
takes place before the offspring are added to the population. If the new population size
exceeds the limit, classifiers are chosen to be deleted until the population size is within
the valid size.
XCS differs from its predecessors in a number of key ways: (1) XCS uses the pre-
diction accuracy to estimate rule fitness, which promotes a solution encompassing a

Evolutionary Computation Volume x, Number x 3


I M Alvarez et al.

Environment

State

Match Set Action

Rule Selection

Population
Reward
Select Action Set
Parents

Update Current
Action Set

Insert
Progeny Genetic
Reinforcement
Algorithm

Update Previous
Action Set

Figure 1: XCS framework showing the processes in the main loop.

full map of the problem via accurate and optimally general rules; (2) evolutionary op-
erations operate within niches instead of the whole population; and (3) unlike the tra-
ditional LCS, XCS has no message list and therefore it is suitable for learning Markov
environments only (Butz and Wilson, 2000; Wilson, 1995).
Using Reinforcement Learning (RL), XCS guides the population of classifiers to-
wards increasing usefulness via numerous parameters, e.g. fitness. The main uses of
RL are of this mechanism are: 1) identify classifiers that are useful in obtaining future
rewards; 2) encourage the discovery of better rules (Urbanowicz and Moore, 2009). RL
acts independently to covering, where in case there is an empty match set, new rules
are created to match the new situation (Bull, 2015). The rules or classifiers are com-
posed of two main parts, the condition and the action. Originally the condition part
utilised a ternary alphabet composed of: {0, 1, #} and the action part utilised the binary
alphabet {0, 1} (Urbanowicz and Browne, 2017).

2.1 CF-based XCSs


LCSs can select/deselect features using generality through the “don’t care” opera-
tor. Originally the don’t care symbol was a ‘#’ hash-mark, which comprised part of
the ternary alphabet {0, 1, #} (Holland, 1976). Since the initial introduction of LCSs,
the number of applicable alphabets has been expanded to include more representa-
tions such as Messy Genetic Algorithms (mGAs), S-Expressions, Automatically De-
fined Functions, and Code Fragments. A Code Fragment (CF) is an expression, similar
to a tree generated in Genetic Programming (Iqbal et al., 2012). CFs generate small
blocks of code in binary trees with an initial maximum depth of two. CFs have also
been expressed using sub-trees with more than two children, with varying degrees of
success (Alvarez et al., 2014b). The initial depth was chosen, based on empirical evi-
dence, to limit bloating caused by the introduction of large numbers of introns. Analy-
sis suggests that there is an implicit pressure for parsimony Iqbal et al. (2013c).
LCSs based on CFs can reuse learned information to scale to problems beyond the

4 Evolutionary Computation Volume x, Number x


A Layered Learning Approach for LCSs

capabilities of non-scaling techniques. One such technique is XCSCFC. This approach


uses CFs to represent each condition bit enabling feature construction in the condition
of the classifiers. The action part uses the binary alphabet {0, 1} (Iqbal et al., 2014).
An important benefit inherent in CFs is their decoupling between a CF and a position
within the condition, i.e. the ordering of the CFs is unimportant. High-level CFs can
capture the underlying complex patterns of data, but also pose a large space. The recent
LCS, XOF, introduced the Observed List to enable learning useful high-level CFs in rule
conditions (Nguyen et al., 2019b,a). Another way to capture the complex patterns of
data is to utilise CFs as rule actions while keeping the ternary representation for rule
conditions (Iqbal et al., 2013a).
Previously it has been shown that rule-sets learned by a modified CF-based LCS
system, termed XCSCF2 , can be reused in a manner similar to functions and their pa-
rameters in a procedural programming language (Alvarez et al., 2014a). These learned
functions then become available to any subsequent tasks. These functions are com-
posed of previously learned rule-sets that map inputs to outputs, which is a straight-
forward reformatting of the conditions and actions of rules:

0
If < Conditions > T hen < Actions >0 (1)
0 0
If < Input > T hen < Output > (2)
F unction(Arguments < Input > Return < Output >) (3)

Eq. 1 is the standard way that a classifier would process its conditions to achieve an
action, which is analogous to eq. (2). Eq. 3 is the analogy of a function. These functions
will take a number of arguments as their input (rule conditions) and will return an
output (the effected action of the ruleset) (Alvarez et al., 2014a).
The technique used in XCSCF2 places emphasis on user-specified problems, rather
than user-specified instances, which is a subtle but important change in emphasis in
EC approaches. That is, the function set is partly formed from past problems rather
than preset functions. The advantage of learning functions is that the related CFs (asso-
ciated building blocks) are also formed and transferred to the new problem, which can
bootstrap the search. However, this technique lacks a rich representation at the action
part, which will need adapting due to the different types of action values expected in
this current work, e.g. binary, integer, and bitstring.

2.2 Scaling Methods for XCS


An early attempt at scaling was the S-XCS system that utilizes optimal populations of
rules, which are learned in the same manner as classical XCS (Ioannides and Browne,
2008). These optimal rules are then imported into S-XCS as messages, thus enable ab-
straction. The system uses human-constructed functions, such as Multiply, Divide,
PowerOf, ValueAt, and AddrOf, among others (Ioannides and Browne, 2008). Al-
though these key functions provide the system with the scaffolding to piece together
the necessary knowledge blocks, they have an inherent bias and might not be avail-
able to the system in large problem domains. For example, in the Boolean domain,
the log and multiplication functions do not exist. It also assumes completely accurate
populations, whereas the proposed system is required to learn both the population and
functionality, from scratch. If supervised learning is permitted (unlike in this work), the
heterogeneous approach of ExSTraCS scales well; up to the 135-bit Multiplexer problem
(Urbanowicz et al., 2012).

Evolutionary Computation Volume x, Number x 5


I M Alvarez et al.

Previously, other Boolean problems have been solved successfully by using tech-
niques similar to the proposed work. One of these is a general solution to the Parity
problem described in (Huelsbergen, 1998). The technique is similar to the proposed
work because it evolves a general solution that is capable of solving parity problems of
any length. It can also address repeating patterns similar to the loop mechanism of the
proposed work. On the other hand, this technique makes use of predefined functions
making it a top-down approach. The proposed technique learns new functions, making
it more flexible.
The preliminary work proposed XCSCF* with of various components (Alvarez
et al., 2016). Since different types of actions are expected, e.g. Binary, Integer, Real,
and String (Bitstring); it is proposed that the functions be created by a system with CFs
in the action (XCSCFA), although any rule production system can also be used, e.g.
XCS, XCSCFC, etc. This will facilitate the use of real and integer values for the action
as well as enabling it to represent complex functionality. The proposed solution will
reuse learned functionality at the terminal nodes as well as the root nodes of the CFs
since this has been shown to be beneficial for scaling. XCSR would not be helpful here
because on a number of the steps, the permitted actions are not a number but a string
e.g., kBitString. Moreover, XCSR with Computed Continuous Action would present
unnecessary complications to the work because the conditions of the classifiers do not
require real values (Iqbal et al., 2012). Accordingly, it is necessary to explore further
ways to expand the preliminary work to adapt to different domains.

3 The Problems

In this section, we provide an analytical introduction to the tested problems that en-
ables the training flow in layers. These flows help formalise the intermediate layers in
Section 3.4. The problem understanding also provides an initial guess of the required
building blocks (functions and skills) that should be provided beforehand to bootstrap
the learning progress of the system. Although even these pre-provided building blocks
can also be divided into more elemental knowledge, this work is not to imitate the
education of machine intelligence from scratch. Instead, this paper aims to show the
ability of XCSCF* to learn progressively more complex tasks, which resemble human
intelligence.
One of the underlying reasons for choosing the Boolean problem domains for the
proposed work is that humans can solve this kind of problems by naturally combining
functions from other related domains along with functionalities from other Boolean
problems. Humans are also able to reason that some functions in their ‘experiential
toolbox’ may be appropriate for solving the problem. The experiential toolbox is the
whole of learned functionality for the agent. These functions include multiplication,
addition, power, and the notion of a number line. Therefore, the agent here must build-
up its toolbox of functions and associated pieces of knowledge (CFs). A computer
program would make use of these functions and potentially many more, but it can not
intuit which are appropriate to the problem, and which are not. Therefore, the agent
will need guidance in its learning so that it may have enough cross-domain functions to
solve the problem successfully. It will need to perform well with more functions than
necessary as the exact useful functions may not be known a priori. However, at this
stage of paradigm development, the agent is not expected to be able to adjust to fewer
functions than necessary. The other reason is that Boolean problems are interrogatable
so that a solutions to problems at scales beyond enumeration can still be verified.

6 Evolutionary Computation Volume x, Number x


A Layered Learning Approach for LCSs

3.1 The Multiplexer Domain


In the Multiplexer problems, the number of address bits is related to the length of the
message string and grows along with the length. The search space of the problem is
also adequate enough to show the benefits of the proposed work. For example, for the
135-bit Multiplexer the search space consists of 2135 combinations, which is immensely
beyond enumerated search (Koza, 1991).
An example of a 6 bit Multiplexer is depicted in Figure 2. Determining the num-
ber of address bits k requires using the log function, as depicted in equation 4, in this
example k is 2. Then k bits must be extracted from the string of bits to produce the
two address bits. The next step is to convert the address bits into decimal form; this
requires knowledge of the power base 2 function as well as elementary looping, addition
and subtraction functions. Depending on the approach to this step, multiplication may
also be required. The two address bits translate to 1 in decimal form, as shown in figure
2, i.e. D0 and D1 . The decimal number points to the data bit D1 that contains the value
to be returned. The index begins at 0 and proceeds from the left towards the right, as
shown in Figure 2.

6-bit Multiplexer
Condition : Action

0 1 1 1 0 0 : 1
Address Data Channels
D0 D1 D2 D3 D4 D5

Figure 2: 6-bit Multiplexer problem showing the address bits and the data bits of the
condition, this distinction is not provided to the learning system.

Besides functions, the experiential toolbox will also contain skills. These are ca-
pabilities that the agent will have learned or will have been given beforehand; one ex-
ample is the looping skill. Skills, unlike functions, do not have a return value, but can
manipulate pointers to values (e.g. move around a bitstring). For example, a human
understands all the operations required for counting k number of bits, starting from the
left of the input string. Then a human would have to conceptualize how to convert the
address bits to decimal, which requires the ability to multiply and add. If we wanted
to increase the difficulty level, we could have the human determine the number of k
address bits required for a particular problem:

k = blog2 Lc (4)

Equation 4 determines the number of k address bits by using the length of the input.
In this case the person would need familiarity with the log base 2 function as well as
the floor function. A human would eventually determine the address bits with increas-
ing difficulty but a software system would have to learn this functionality before even
attempting to solve the n-bit Multiplexer problem.

Evolutionary Computation Volume x, Number x 7


I M Alvarez et al.

3.2 The Carry-one Domain


The Carry-one domain is the set of problems that checks whether the addition of two
numbers, in the form of binary numbers, carries one in the addition of the highest-level
bits of the two numbers. Binary numbers are represented by bitstrings. The input of
Carry-one problems is a bitstring concatenating the two bitstrings of the two binary
numbers to be added.
Humans can approach this problem in various ways. In this work, we design a
training flow to check whether the summation of two binary numbers, as a bitstring,
has a length higher than the length of the two half-bitstrings representing the two bi-
nary numbers. First, the learning agent should learn to detach the two half-bitstrings to
obtain the two binary numbers. However, the learning agent has no idea about which
parts represent the bitstrings of the two binary numbers. Therefore, the first training
step is to teach the learning agent to obtain the half length of the input attributes. Then,
the next step is to train the agent to extract two half-bitstrings as the two binary num-
bers to be added with the knowledge of how many bits to be used. After that, the
learning agent is required to obtain the bitstring representing the result of the binary
summation. Finally, the last training stage is to check whether a bit 1 is carried at the
highest-level bit. Humans can anticipate that the solutions for these processes would
possibly require the following skills and functions: binary addition, head list extrac-
tion, tail list extraction, value comparison, division, and constant (the “Half Length”
problem would require a constant number of value 2).

3.3 The Even-parity and Majority-on Domains


Even-parity problems check whether the number of bits 1s in the input bitstring is even.
The operations of this problem domain are straightforward. We devised the training
flow with two steps in Section 3.4, including “Sum Modulo 2” and “Is Even-parity”
problems. For Majority-on domain, the learning agent is asked to check whether the
majority of bits in the input are 1s. We can anticipate that this problem domain would
involve with the subproblem “Half Length” from the Carry-one domain. The second
training step is to teach the learning agent to compare the summation of bits 1s with
the output of the “Half Length” problem. These two domains would be expected to
require summation of bitstring, modulo 2, constant, and comparison skills.

3.4 Individual Detailed Components


According to the analysis above, the probable methods of separating the Multiplexer
and Carry-one domains are shown in Figure 3 and Figure 4 respectively. These train-
ing flows require a “human teacher” to form these “curricula”. The training flow of
the Multiplexer domain has five main steps corresponding to the first six subproblems
listed below, while the Carry-one domain is divided into six other subproblems from
“Half Length” to “Is Carried” problems. Each subsequent part builds upon the rules
learned from the previous step as well as from the Axioms provided. Figure 5 illus-
trates the relationships between the Axioms, skills and learned functionality and their
CF representation in Multiplexer subproblems. The figure also depicts how the type
of problem faced can feed domain specific functionality into the experiential toolbox
of the system. This is shown by the arrow flowing from the Multiplexer domain to-
wards the Experiential Toolbox. All subproblems for the four benchmark problems
are described below with samples provided in the Supplementary material. Table 2
shows the set of the functions to be learned, note that these were furnished in order, as
a curricula. These functions correspond to the subproblems of the curricula.

8 Evolutionary Computation Volume x, Number x


A Layered Learning Approach for LCSs

Data Bit

Multiplexer k: length of address bits


A: address bits
Mux data bit
d: data bit index
(among data bits)
int(D)
D: data bit position
Data bit
Position
int(d)

Data channel
String(A)

Address Bits
int(k)
int(k)

Address
Length

Multiplexer
Input bitstring

Figure 3: Multiplexer training flow. Each stage of this flow is designed to obtain an
aspect of the logic behind the Multiplexer domain. These stages follow the analysis of
the Multiplexer problem in Section 3.

At each step, the system has access to leaf node candidates, hard-coded functions,
the learned CFs, and learned CF-ruleset functions. Table 1 shows a listing of all the
skills made available to the system along with their system tags (used to interpret re-
sults) and their input/output data types. This function list is anticipated to be useful
based on the above analysis of the benchmark problem domains. In addition to re-
quired skills, we also provide extra skills that complement the anticipated ones. These
skills could possibly provide unexpected solutions or at least test the ability of XCSCF*
to ignore redundant irrelevant skills.
It is important to note that the work presented here does not seek to provide a
learning plan for a system to follow and ultimately arrive at the solution to a given
problem. The aim here is to facilitate learning in a series of steps, where in this case
the learned functionality could potentially help a system to arrive at a general solution
to any set problem. In other words, it is important for the system to learn to mix and
match the different learned functions in a way contributive to learning; a way that
will produce a general solution. The number of subordinate problems can always be
increased in the future, e.g. learning basic functions such as an adder or a multiplier
via Boolean functions or even learning the log function via training data.

Multiplexer Address Length - kBitsGivenLength


The first step is to determine the number of k address bits that will contribute to the
solution for the n-bit Multiplexer. The length function (Table 1) furnishes the system
with the length of the environment state instead of the constant L in the previous ver-

Evolutionary Computation Volume x, Number x 9


I M Alvarez et al.

Carried Bit

Carry-one
A: first number Is Carried
B: second number
L: length of input Length(A+B)
Length of
Binary Sum
bitstring(A+B)

Binary Sum

bitstring(A) bitstring(B)

Head String Tail String

L/2
Half Length

Carry-one
Input bitstring

Figure 4: Carry-one training flow.

Table 1: Functionality Provided (Hard-coded functions)

Functions Tags Input Output


Floor [ Float Integer
Ceiling ] Float Integer
Log { Float Float
Length L String Integer
Power 2 Loop (binary to decimal) 2d String Integer
Add + Floats, Integers Integer
Subtract − Floats, Integers Float
Multiply ∗ Floats Integer
Divide / Floats Integer
ValueAt @ String, Integer Binary
Constant c None Integer
StringSum sum String Integer
BinaryAddition ⊕ Strings String
BinarySubtraction Strings String
HeadList ( String, Integer String
TailList ) String, Integer String
isGreater > Floats, Integers Binary
Modulo % Integers Integer

10 Evolutionary Computation Volume x, Number x


A Layered Learning Approach for LCSs

LCS

Axiomatic Experimental Toolbox


Functions
Floor, Ceiling, etc. Leaf-node
General
Candidates

Constants

1, 2, …, L Mux specific Carry specific ?

Skills

Move Left Move Right Carry-on


Loop Mux Problem ?
Problem

Figure 5: Training encompasses different types of functions, skills and axioms. The
experiential toolbox will contain general and problem specific learned functionality.
The question marks indicate the next domain and functionality learned from it.

Table 2: Functions to be learned.

Functions Tags Input Output


KBitsGivenLength kl Integer Integer
KBits k String Integer
KBitString ks String Integer
Bin2Int b2d String Float
AddressOf dc String Float
ValueAt M@ String Binary
HalfLength h Integer Integer
HeadString Sh String String
TailString St String String
BinarySum S+ String String
LengthBinarySum L+ String Integer
isCarried iC String Boolean
SumMod2 sm2 String Integer
isEvenParity iPe String Boolean
isMajorityOn iM String Boolean

sion of XCSCF* (Alvarez et al., 2016). The training data-set used consists of instances of
possible lengths and the corresponding number of address bits.

Multiplexer Address Length - kBits


This step is to determine the number of k address bits when the input is the original
input of the Multiplexer problem. The training dataset in this problem replaces the
input lengths of the previous “kBits given Length” problem (kl ) with the input bitstring
of the Multiplexer problem at various scales.

Evolutionary Computation Volume x, Number x 11


I M Alvarez et al.

Multiplexer Address Bits - kBitString


This part extracts the first k bits from a given input string. The data-set will be random
bit strings, say length 6, and a given k length where the action is the first k bits.
Multiplexer Data Channel - Bin2Int
This problem trains the learning agent to convert a binary number to a decimal integer.
This is crucial because the system needs this information to determine the position of
the data bit. However, this is not a trivial task as the system would need to be cog-
nizant of many functions that a human would potentially already have in their experi-
ential toolbox. The data-set will be random strings with the action being the equivalent
integer number.
Multiplexer Data Bit Position - AddressOf
This functionality determines the location of the data bit given the input bitstring. This
problem is to guide the learning agent to discover the addition of the address length
and the decoded data channel. The data-set will be random strings and decoded ad-
dress with the integer action.
Multiplexer Data Bit - ValueAt
The functionality to be learned is to return the bit referenced from a bitstring. The
system is trained using a dataset of bitstrings of varied lengths (from 3 to 20) with a
reference integer and corresponding output bit. This problem is actually a Multiplexer
problem with variable scales.
Sum Modulo 2 - SumMod2
This problem is the first step of training the Even-parity problem domain. It determines
whether the total number of bits 1 in the input bitstring is even or odd. The ground
truth of this problem is the summation of all bits in the input bitstring modulo 2.
Is Even-parity - isEvenParity
This step is the final step of training the Even-parity domain. This problem expects
T rue if the number of bits 1 in the input bitstring is even and F alse otherwise. It is
variable in size (from 1-bit to 11-bit Even-parity problems) to encourage only general
solutions for the Even-parity problem domain. With varied scales, the solution can
solve the problem at any scale. The Even-parity problems with relatively small scales
are already intractable to traditional XCS with the ternary encoding because XCS must
form a one-to-one mapping of instances to rules.
Half Length - HalfLength
This problem is a regression problem, which requires the learning agent to return the
half-length of the input bitstring. This problem can provide prerequisite knowledge for
both the Carry-one and Majority-on domains.
Head String - HeadString
This is a step in the training flow of the Carry-one domain, see Figure 4. It trains the
learning agent to obtain the first number of the addition, which is the first half of the
input bitstring. The outputs are binary numbers represented by bitstrings.
Tail String - TailString
This step is similar to the “Head String” problem but the expected output is the latter
half of the input bitstring or the second number of the addition.

12 Evolutionary Computation Volume x, Number x


A Layered Learning Approach for LCSs

Binary Summation of Two Strings - BinarySum


This problem requires the learning agent to add the outputs of the two preceding prob-
lems, which are the output of two input numbers in the Carry-one domain and also
represented by bitstrings.

Length of Binary Sum - SumStringLength


The expected output of this problem is the length of the output from the preceding
problem. This is to learn to predict the length of the binary number resulted by adding
the binary number of the first half and the binary number of the second half.

Is Carry-one - isCarried
This requires the general logic behind the Carry-one problem domain. It is to determine
whether 1 is carried at the highest bit when adding the binary number of the first half
and the binary number of the second half. The scales of this problem were set to vary
from 2-bit to 12-bit.

Is Majority On - isMajorityOn
This problem is the final stage of training the Majority-on domain. It expects a returned
value of T rue if more than half the bits in the input are on (1), and F alse otherwise. The
size of input bitstrings are randomly selected from the range of [1, 7] bits.

4 The New Method

This work disrupts the standard learning paradigms in EC, where the goal is to learn
abilities using a top-down approach, by aligning it with LL. The proposed work uses a
bottom-up approach by learning functions and using parts or entire functions to solve
more difficult problems. In other words, the method here is to specify the order of
problems/domains (together with robust parameter values) while allowing the system
to automatically adjust the terminal set through feature construction and selection, and
ultimately develop the function set. This is analogous to a school teacher determining
the order of threshold concepts for a student in a curricula (Meyer and Land, 2006). The
system can use learned rule-sets as functions along with the associated building blocks,
i.e. CFs, that capture any associated patterns; this is an advantage over pre-specifying
functionality.
This method modifies the intrinsic problem from finding an overarching ‘single’
solution that covers all instances or features of a problem to finding the structure (links)
of sub-problems that construct the overall solution. Learning the underlying patterns
that describe the domain is anticipated to be more compact and reusable as they do
not grow as the domain scales (unlike individual solutions that can grow impractically
large as the problem grows, e.g. DNF solutions to the Multiplexer problem).
We employed an adapted XCSCFA as the algorithm for the agents of XCSCF* that
learn subproblems. This XCSCF* has the type-fitting property that can: (1) verify the
type compatibility between connected nodes within generated CFs; and (2) the output
type of CFs is compatible with the required actions of the current problem environ-
ment2 .

2 There are sufficient novel contributions to XCSCF* to warrant a new acronym, but as the old one is now
superseded and the LCS field already has many acronyms, XCSCF* is retained.

Evolutionary Computation Volume x, Number x 13


I M Alvarez et al.

Table 3: Leaf node candidates. LEN is the length of input bitstring.

Leaf node candidates Tag Type


Base CFs of separated attributes D0 , D1 , etc. type of attributes
List of all attributes attlst String
Constants 1, 2, ..., LEN Integer

4.1 Type-fitting XCSCFA


In addition to type-fitting CFs, the next important adjustment is pre-provided CFs have
been introduced to this version of XCSCF*. Table 3 describes pre-provided CFs, which
are all candidates for leaf nodes of CFs. First, leaf node candidates include the CFs
representing input attributes, called “base CFs”, which resemble base CFs in XOFs
(Nguyen et al., 2019b).
In the previous version of XCSCF* (Alvarez et al., 2016), a constant L for the length
of input bitstrings is provided as a possible leaf node for CFs. This feature is infeasible
when the subproblems have variable scales. Therefore, the second change of to XCSCF*
to provide another base CF listing all attributes (termed attlst) in the order provided
by the problem. We hypothesise that this new feature improves the generality of the
system by providing inborn knowledge. Lastly, to learn more general problems, it is
necessary to provide a system with arbitrary constants. In the limit of Boolean prob-
lems, CF generation can access constants (constant CFs) of values from 1 to the ‘length
of the current input attributes’ as possible leaf nodes. The constant L in the previous
implementation can be obtained by the provided function Length in Table 1.
Type-fitting Code Fragments
Inspired by Strongly-Typed GP (Montana, 1995), we propose here the type-fitting prop-
erty for CFs to reduce the search space by fitting each node with only compatible inputs
and outputs. CFs with the new type-fitting property, called typed CFs, are designed to
create workable and eligible CFs. Being workable refers to the compatibility of the out-
put of CFs with the expected actions of the target problem and the compatibility among
the function nodes of a CF. The output type of a function in the node cfi must be com-
patible with the input types of the function in the node cfj that takes cfi as input.
Being eligible includes two conditions: the output type of the root node function must
be compatible with at least one of the action types of the problem, and the leaf nodes
are CFs from the leaf-node candidates. For example, when selecting a leaf node that is
the first input to a function that requires the first input to be of type String (e.g. sum,
@, 2d, etc., see Section 3.4), the only compatible leaf node candidate is the attlst. Ul-
timately, the type-fitting property keeps learning agents from generating unworkable
CFs.
Accordingly, generating typed CFs applies a top-down recursive process of gen-
erating tree nodes, i.e. the function genNode illustrated in Algorithm 1. We keep the
depth limit of CFs as 2, as is the original definition of CFs (Iqbal et al., 2014). Generat-
ing a new CF needs to match with the action types of the problem and available output
types from the leaf-node candidates. First, the top node of a typed CF must employ a
function with output types compatible with the action types of the problem. Then the
process recursively builds lower-level nodes that satisfy the type-fitting property. At
any point when generating nodes, there is also a fixed probability of 0.5 for generating
a leaf node from the leaf-node candidates, which stops the CF from going any deeper.
To reduce the search space and generate verifiable CFs, it is necessary to have

14 Evolutionary Computation Volume x, Number x


A Layered Learning Approach for LCSs

Algorithm 1 Typed CFs are generated based on a recursive function for generating
nodes. The function is given the set of action types Ta , the type set of base CFs Tb , the
expected output types To , the expected input types Ti , the intermediate level li , and a
clustered set of all functions Sf .
1: procedure GEN N ODE(To , Ti , li )
2: Ti0 = φ
3: if li = 2 then
4: Output types To = Ta
5: if li = 1 then
6: Output types Ti0 = Tb ∪ {integer}
7: Filter function set Sf iltered from Sf by required output types To and input types
Ti
8: Function f = randomSelect(Sf iltered )
9: for index i in f.inputs do
10: if li − f.level > 0 and random[0, 1) < 0.5 then
11: f.inputs[i]=GEN N ODE(f.input types[i],Ti0 , li − f.level)
12: else
13: Set of compatible base CFs SbCF = φ
14: if integer ∈ f unction.input types[i] then
15: c = randomSelect([1, ..., len(Atts)])
16: for cfbase in all base CFs do
17: if cfbase .out types&f.input types[i] 6= φ
then
18: Add cfbase to SbCF
19: f.inputs[i] = randomSelect(SbCF )

compatibility rules among the four value types (Binary, Integer, Real, and String). We
followed the sense of numerics as well as the type compatibility of the programming
language (Python) to devise compatibility rules among types. Boolean variables are
compatible with integers and floats, and integers are compatible with floats, the com-
patibility does not follow the opposite way. Lists are not compatible with other types.

5 Results
5.1 Experimental Setup
The experiments were executed 30 times with each having an independent random
seed. The stopping criterion was when the agent completed the number of training
instances allocated, which were chosen based on preliminary empirical tests on the
convergence of systems. The proposed systems were compared with XCSCFC and XCS.
The settings for the experiments are common to the LCS field (Urbanowicz and Browne,
2017) and similar to the settings of the previous version of XCSCF* (Alvarez et al.,
2016). They were as follows: Payoff 1, 000; the learning rate β = 0.2; the Probability
of applying crossover to an offspring χ = 0.8; the probability of mutation µ = 0.04;
the probability of using a don’t care symbol when covering P don0 tCare = 0.33; the
experience required for a classifier to be a subsumer Θsub = 20; the initial fitness value
when generating a new classifier FI = 0.01; the fraction of classifiers participating in a
tournament from an action set 0.4. In addition, error threshold 0 was set to 10.0. This

Evolutionary Computation Volume x, Number x 15


I M Alvarez et al.

new XCSCF* naively uses the same population size N = 1000 for all problems.

5.2 Experimental Tests


Figures 6a - 6f show that training was successful in the sub-problems, which enabled
XCSCF* to reuse the learned CF functionality of the Multiplexer problem. XCSCF*
also successfully solved the subproblems of the Carry-one domain (Figures 7a - 7f),
the Even-parity domain (Figures 8a and 8b), and the Majority-on domain (Figures 7a
and 8c) (note the use of the HalfLength problem twice). The numbers of rules after
compaction and CFs generated by all problems were generally only 1, except for the
“Is Even-parity” problem with a little diversity of the genotypes of final solutions (see
Section 5.3). Reusing solutions from small-scale problems to solve large scale problems
is plausible because maximally general rules are kept general without specific condition
bits when used in larger-scale problems. This requires the logic behind the rule actions
of the final solutions to be generalisable to the learned problems.
1.0

1.0

1.0
0.8

0.8

0.8
Performance

Performance

Performance
0.6

0.6

0.6
0.4

0.4

0.4
0.2

0.2

0.2
0.0

0.0

0.0

0 100 200 300 400 500 0 100 200 300 400 500 0 100 200 300 400 500

Instances (x 1000) Instances (x 1000) Instances (x 1000)

(a) kBitsGivenLength (b) kBits (c) kBitString


1.0

1.0

1.0
0.9
0.8

0.8

0.8
Performance

Performance

Performance
0.6

0.6

0.7
0.4

0.4

0.6
0.2

0.2

0.5
0.0

0.0

0.4

0 100 200 300 400 500 0 100 200 300 400 500 0 100 200 300 400 500

Instances (x 1000) Instances (x 1000) Instances (x 1000)

(d) Bin2Int (e) AddressOf (f) ValueAt

Figure 6: Learning curves of the subproblems of the Multiplexer domain.

Figure 9 shows that only the proposed system XCSCF* and XCSCFC were able
to solve the 135-bit Multiplexer problem. These experiments followed the standard
explore and exploit phases of XCS. This shows scaling by relearning, but it is the cap-
turing of the underlying patterns without retraining, which is the aim of this work.

16 Evolutionary Computation Volume x, Number x


A Layered Learning Approach for LCSs

1.0

1.0

1.0
0.8

0.8

0.8
Performance

Performance

Performance
0.6

0.6

0.6
0.4

0.4

0.4
0.2

0.2

0.2
0.0

0.0

0.0
0 100 200 300 400 500 0 100 200 300 400 500 0 100 200 300 400 500

Instances (x 1000) Instances (x 1000) Instances (x 1000)

(a) HalfLength (b) HeadString (c) TailString


1.0

1.0

1.0
0.9
0.8

0.8

0.8
Performance

Performance

Performance
0.6

0.6

0.7
0.4

0.4

0.6
0.2

0.2

0.5
0.0

0.0

0.4

0 100 200 300 400 500 0 100 200 300 400 500 0 100 200 300 400 500

Instances (x 1000) Instances (x 1000) Instances (x 1000)

(d) BinarySum (e) SumStringLength (f) isCarried

Figure 7: Learning curves of the subproblems of the Carry-one domain.

Tests were conducted on the final rules produced by the final subproblem of the
Multiplexer, the Carry-one, the Even-parity, and the Majority-on domains to determine
if they were general enough to solve the corresponding problems at very large scales.
Table 4 shows that the rule produced by the small-scale Multiplexer problem was able
to solve the 1034-bit and even the 8205-bit Multiplexer problems 3 . Similarly, the final
rules of the final subproblem of other domains also achieved 100% accuracies on cor-
responding problems at all tested large scales. The system used to test the generality
of the rules was in straight exploitation: there was no covering, rule generation, or rule
update.

5.3 Rules Generated by the Final Subproblems


The rule produced by the “Multiplexer Data Bit” problem, the final subproblem of the
Multiplexer domain, is illustrated in Table 5. This rule is maximally general with no
specified bit in its rule condition. Also, it seems very simple and neat for the general
logic of the Multiplexer domain. This is not surprising given the functions accumulated
by the experiential toolbox. The fully expanded tree in the rule action produced by the
3 Note that 28205 is a vast number, meaning that testing a million instances is a fractionally small sub-
sample, but will identify many deficiencies.

Evolutionary Computation Volume x, Number x 17


I M Alvarez et al.

1.0

1.0

1.0
0.9

0.9

0.9
0.8

0.8

0.8
Performance

Performance

Performance
0.7

0.7

0.7
0.6

0.6

0.6
0.5

0.5

0.5
0.4

0.4

0.4
0 100 200 300 400 500 0 100 200 300 400 500 0 100 200 300 400 500

Instances (x 1000) Instances (x 1000) Instances (x 1000)

(a) SumModulo2 (b) isEvenParity (c) isMajorityOn

Figure 8: Learning curves of the subproblems of the Even-parity ((a) and (b)) and
Majority-on domains (c). The Majority-on domain also utilises the “Half Length”
subproblem of the Carry-one domain (7a).

Table 4: Accuracy tested on large-scale problems reusing solutions from final subprob-
lems without training.

Problems Accuracies
1034-bit Multiplexer 100%
8205-bit Multiplexer 100%
100-bit Carry-one 100%
200-bit Carry-one 100%
50-bit Even-parity 100%
100-bit Even-parity 100%
50-bit Majority-on 100%
105-bit Majority-on 100%

“Multiplexer Data Bit” problem, the final subproblem of the Multiplexer domain, is
illustrated in Figure 10. Function nodes follow the function tags in Table 1. The dashed
boxes in this figure are the reused learned functions with names defined in Figure 2.
The tree in Figure 10 is the rule action of the one compacted rule for the n-bit
Multiplexer problem. It accumulates a high-level function with many nested-layers of
complexity. This complex tree can encapsulate the logic behind the n-bit Multiplexer
problem through the guidance of all Multiplexer subproblems. For instance, the main
building block of this tree is in the code fragment CF 61, which provides the data bit
position in the input bitstring using the function dc learned from the “Data Bit Posi-
tion” problem. This function dc is also a complex function involving an addition (+)
of the outputs from two reused functions within it, k from the “Multiplexer Address
Length” problem and b2d from the “Multiplexer Data Channel” problem. The function
b2d converts the binary-string output of the function ks from the “Multiplexer Address
Bits” problem to a decimal value. ks returns the first “Multiplexer Address Length”
bits from the attlst (the input bitstring) using the function k. The function k is also
nested function reusing a simpler function kl from the “Multiplexer Address Length”

18 Evolutionary Computation Volume x, Number x


A Layered Learning Approach for LCSs

1.0
[1]

[2]

0.8
0.6
[3]
Performance

0.4
0.2

135 Bit Multiplexer


[1] − 135−bits using XCSCF*
[2] − 135−bits using XCSCFC
0.0

[3] − 135−bits using XCS

0 1000 2000 3000 4000

Instances (x 1000)

Figure 9: Performance of XCSCF* and XCSCFC on the 135-bit Multiplexer problem.


Wilcoxon signed rank test shows no significant difference when converged.

problem (with Multiplexer scale as the input). The block k is reused twice in the final
solution M @ for the Multiplexer domain. The logic of the n-bit Multiplexer problem
in the compacted rule with M @ in Table 5 was validated on the 1034-bit and 8205-bit
Multiplexer problems (see Table 4).

Table 5: Final rules learned before compaction while solving the “Multiplexer Data Bit”
problem, the final subproblem of the Multiplexer domain, cf. Figure 10.

Condition Action
# # ... # # attlst CF 61 @

Other final rules of the Carry-one, Majority-on, and Even-parity domains also
achieve maximal generality with all “don’t-care” bits in the condition part. These rules
were also validated on the corresponding domains at very large scales. The trees in the
rule actions of these final rules are illustrated respectively in Figure 11, 12, and 13. Be-
sides the Multiplexer domain, the Carry-one problem domain requires six subproblems
to obtain the final logic, which resulted in a high complexity of the rule action. The final
function iC has five distinct nested functions within it and three occurrences of func-
tion h. The complexity of the solution for the n-bit Carry-one problem is equivalent to
the complexity of function for the n-bit Multiplexer.
As the training flows of the Majority-on and Even-parity domains are straightfor-
ward, XCSCF* also discovered simpler rule actions in the final rules. XCSCF* yielded
several different solutions for the Even-parity domains. The two most popular ones
are illustrated in Figure 13. Solution 1 in this Figure appeared in most runs. Another
solution found in only two runs is identical in logic to the solution 2, but the node c1
(constant CF of value 1) is replaced with another CF that uses the division operator

Evolutionary Computation Volume x, Number x 19


I M Alvarez et al.

MUX
output

Multiplexer (M@)
@

kl CF61
+
k

ks
[ 2d
b2d

dc
{ (

[
L

attlst

Figure 10: Multiplexer solution. Function nodes follow the tags in Table 1. This solution
uses nested learned functionalities in dashed boxes, which follow the tags in Table 2.

between c1 and a value of more than 1.

6 Discussions
It can be said that the reason XCSCF* is capable of solving problems to a much larger
scale than previously is that human knowledge separated the problem into appropriate
and simpler sub-problems. Nevertheless, it is still a difficult task to learn each sub-task
in such a manner that the learned knowledge/functionality could be transferred and
then to learn to combine these blocks effectively. It is considered that the solutions of
the tested problem domains, i.e. the Multiplexer, Carry-one, Even-parity, and Majority-
on domains, yielded by XCSCF* contain the general logic of these domains and can
solve these problems at any scale.
The way that humans select sub-problems is similar to that of humans selecting
function sets in standard EC approaches where too few or inappropriate selection pre-
vents effective learning, while selecting too many unnecessary components could in-
hibit training. In these experiments, a number of redundant functions, such as the
ceiling and the multiplication, and functions useful for only one specific problem do-
main, were never used by the final evolved solutions. XCSCF*, however, can identify
the correct combination of accumulated knowledge to build complex solutions for the
tested tasks.

20 Evolutionary Computation Volume x, Number x


A Layered Learning Approach for LCSs

Carry-one
output

Carry-one (iC)
>

L /

L c2

( ) h

Sh

/ / St

S+
L c2 L c2
L+

attlst

Figure 11: Carry-one solution.

Majority-on output

Majority-on (iM)
>

sum /

L c2
h

attlst

Figure 12: Majority-on solution.

Two main components of XCSCF* enables it to solve the tested problems fully.
First, the supply of constants furnishes the required functionalities in the Carry-one,
Even-parity, and Majority-on domains, as shown in the CFs of the final solutions. Sec-
ond, the availability of the CF attlst also contributes to solving the Carry-one, Majority-
on, and Even-parity domains because it provides appropriate input for general func-
tions, e.g. StringSum, HeadList, and T ailList. It is argued that we can still input
the environment state implicitly to all such functions. However, this method creates
the complication of defining the environment state when these functions are nested

Evolutionary Computation Volume x, Number x 21


I M Alvarez et al.

output output

Even-parity (iPe) Even-parity (iPe)


! >

% c1 %

sm2 sum c2 sm2 sum c2

attlst attlst

(a) Solution 1 (b) Solution 2

Figure 13: Two most common solutions of the Even-parity domain. Solution 1 is more
popularly discovered than solution 2.

in rule-set functions. Furthermore, deciding which functions should take the environ-
ment state by default and which functions should choose other string inputs requires
extra human intervention. An extra benefit of using attlst is that XCSCF* can now
solve variable-scale problems in the tested domain. Previously, supplying a constant L
meant that the problem scale could not change.
It is evident that the proposed work has benefited from the transfer of learned
information from each of the sub-problems. Reusing functionalities enables the system
to achieve neat and abstract solutions although these solutions are actually complex
without bloat when fully expanded. Although a defined recipe was not furnished to
the system, it was able to form logical determinations as to the flow of the accumulated
functionality, see Figure 10. This property of the system is similar to deriving a set of
Threshold Concepts where significant learning towards the final target problem only
advances once the proper chain of functionality is formed and evaluated.

7 Conclusions and Future Works


In this paper, we have introduced a developed LL system, i.e. XCSCF*, that can trans-
fer learned knowledge and functionality. Starting from having minimal general knowl-
edge (functions and skills) on the Boolean domain and some specific basic knowledge
necessary for the target problems, XCSCF* is capable of learning general solutions to
complex problem domains, i.e. the Multiplexer, Carry-one, Majority-on, and Even-
parity domains, through analogies to the LL approach. By breaking down the problem
domain into component sub-problems, providing the necessary axioms and transfer-
ring learned functionality in addition to knowledge, it is possible to identify general
rules that can then be applied to any-scale problems in the domain. Another impor-
tant observation is that not all of the provided functionality was utilised in the final
solutions.
Certain improvements of XCSCF* have been developed to enable learning the logic
behind more general problems, such as the Multiplexer, Carry-one, Majority-on, and
Even-parity domains. Removing the implicit connections between the instance and a
few functions requires explicit connections between such functions and a newly pro-

22 Evolutionary Computation Volume x, Number x


A Layered Learning Approach for LCSs

vided attribute list, a more general input to replace the human-generated constant L.
This explicit connections allows these functions to take any string-type inputs. There-
fore, new XCSCF* provides more flexible logic and reduces the need for customisation
to a given task. Also, the type-fitting property assures that generated CFs are compat-
ible within themselves as well as with the target problem, which results in the ability
to divide the search space by input and output types of available functions. Thus, this
style of learning system can have access to more functionality than necessary for a sin-
gle problem, without inhibited learning.
The general solutions from XCSCF* was validated by solving very difficult prob-
lems like the n-bit Multiplexer, n-bit Carry-one, n-bit Even-parity, and n-bit Majority-
on problems. Although the aforementioned problems are comprised of a vastly
sized search space, the proposed technique successfully discovered a minimal num-
ber (mostly one) of general rules. An advancement of this work was that the logic of
complex problems was captured by simple trees when being described by the learned
functionalities. However, once fully expanded, the CF trees contain certain complex
nested patterns. Thus, LL can facilitate interpreting complex tree solutions using the
intermediate learned components from the intermediate stages of LL.
Future work seeks to create a continuous-learning system with base axioms and a
number of problems, including their possible subproblems, to be solved in a parallel
architecture simultaneously. The ‘toolbox’ of functions (learned functions and axioms)
plus the complementary knowledge (CFs) will grow as problems are solved and will
be available for addressing future problems. The linked knowledge of solved problems
would demonstrate interesting meta-knowledge, a form of learning curricula, and pos-
sible relationships among known problems, such as n-bit Multiplexer, n-bit Carry-one,
etc. Furthermore, the research question is whether an XCS-based system with LL or
parallel learning can solve real-valued datasets. The first thing to consider is to estab-
lish real-valued datasets that furnish LL.

References
Alvarez, I. M., Browne, W. N., and Zhang, M. (2014a). Reusing learned functionality in XCS:
Code fragments with constructed functionality and constructed features. In Proceedings of the
Companion Publication of the 2014 Annual Conference on Genetic and Evolutionary Computation,
GECCO Comp 14, page 969976, New York, NY, USA. Association for Computing Machinery.

Alvarez, I. M., Browne, W. N., and Zhang, M. (2014b). Reusing learned functionality to address
complex boolean functions. In Simulated Evolution and Learning, Lecture Notes in Computer
Science, pages 383–394. Springer International Publishing.

Alvarez, I. M., Browne, W. N., and Zhang, M. (2016). Human-inspired scaling in learning clas-
sifier systems: Case study on the n-bit multiplexer problem set. In Proceedings of the Genetic
and Evolutionary Computation Conference 2016, GECCO 16, page 429436, New York, NY, USA.
Association for Computing Machinery.

Bull, L. (2015). A brief history of learning classifier systems: from CS-1 to XCS and its variants.
Evolutionary Intelligence, 8(2-3):55–70.

Butz, M. V. (2006). Rule-Based Evolutionary Online Learning Systems. Number v. 191 in Studies in
fuzziness and soft computing. Springer-Verlag, Berlin, Germany. OCLC: ocm61219110.

Butz, M. V. and Wilson, S. W. (2000). An algorithmic description of XCS. pages 253–272.

Falkner, N. J. G., Vivian, R. J., and Falkner, K. E. (2013). Computer science education: The first
threshold concept. In 2013 Learning and Teaching in Computing and Engineering, pages 39–46.
IEEE.

Evolutionary Computation Volume x, Number x 23


I M Alvarez et al.

Feng, L., Ong, Y.-S., Tan, A.-H., and Tsang, I. W. (2015). Memes as building blocks: a case study
on evolutionary optimization + transfer learning for routing problems. Memetic Computing,
7(3):159–180.

Holland, J. H. (1975). Adaptation in natural and artificial systems: An introductory analysis with appli-
cations to biology, control, and artificial intelligence. Adaptation in natural and artificial systems:
An introductory analysis with applications to biology, control, and artificial intelligence. The
University of Michigan Press, Ann Arbor, Oxford, England.

Holland, J. H. (1976). Adaptation. Progress in Theoretical Biology, pages 263–293.

Huelsbergen, L. (1998). Finding general solutions to the parity problem by evolving machine-
language representations. Genetic Programming, pages 158–166.

Ioannides, C. and Browne, W. (2008). Investigating scaling of an abstracted LCS utilising ternary
and S-expression alphabets. In Bacardit, J., Bernadó-Mansilla, E., Butz, M. V., Kovacs, T.,
Llorà, X., and Takadama, K., editors, Learning Classifier Systems, pages 46–56. Springer Berlin
Heidelberg, Berlin, Heidelberg.

Iqbal, M., Browne, W. N., and Zhang, M. (2012). XCSR with computed continuous action. In
AI 2012: Advances in Artificial Intelligence, pages 350–361, Berlin, Heidelberg. Springer Berlin
Heidelberg.

Iqbal, M., Browne, W. N., and Zhang, M. (2013a). Evolving optimum populations with XCS
classifier systems. Soft Computing, 17(3):503–518.

Iqbal, M., Browne, W. N., and Zhang, M. (2013b). Extending learning classifier system with
cyclic graphs for scalability on complex, large-scale boolean problems. In Proceedings of the
15th Annual Conference on Genetic and Evolutionary Computation, GECCO 13, page 10451052,
New York, NY, USA. Association for Computing Machinery.

Iqbal, M., Browne, W. N., and Zhang, M. (2013c). Learning overlapping natured and niche im-
balance boolean problems using XCS classifier systems. In 2013 IEEE Congress on Evolutionary
Computation, pages 1818–1825. IEEE.

Iqbal, M., Browne, W. N., and Zhang, M. (2014). Reusing building blocks of extracted knowledge
to solve complex, large-scale boolean problems. IEEE Transactions on Evolutionary Computation,
18(4):465–480.

Koza, J. R. (1991). A hierarchical approach to learning the boolean multiplexer function. 1:171 –
192.

Lanzi, P. L. and Riolo, R. L. (2000). A roadmap to the last decade of learning classifier system
research (from 1989 to 1999). In Lanzi, P. L., Stolzmann, W., and Wilson, S. W., editors, Learning
Classifier Systems, pages 33–61, Berlin, Heidelberg. Springer Berlin Heidelberg.

Meyer, J. H. F. and Land, R. (2006). Overcoming Barriers to Student Understanding: Threshold con-
cepts and troublesome knowledge. Routledge.

Montana, D. J. (1995). Strongly typed genetic programming. Evolutionary Computation, 3:199–230.

Nguyen, T. B., Browne, W. N., and Zhang, M. (2019a). Improvement of code fragment fitness to
guide feature construction in XCS. In Proceedings of the Genetic and Evolutionary Computation
Conference, GECCO 19, page 428436, New York, NY, USA. Association for Computing Machin-
ery.

Nguyen, T. B., Browne, W. N., and Zhang, M. (2019b). Online feature-generation of code frag-
ments for XCS to guide feature construction. In 2019 IEEE Congress on Evolutionary Computation
(CEC), pages 3308–3315. IEEE.

Pan, S. J. and Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on Knowledge and
Data Engineering, 22(10):1345–1359.

24 Evolutionary Computation Volume x, Number x


A Layered Learning Approach for LCSs

Price, C. J. and Friston, K. J. (2005). Functional ontologies for cognition: The systematic definition
of structure and function. Cognitive Neuropsychology, 22(3-4):262–275.

Schaffer, J. D. (1985). Learning multiclass pattern discrimination. In Proceedings of the 1st Interna-
tional Conference on Genetic Algorithms, page 7479, USA. L. Erlbaum Associates Inc.
Stone, P. and Veloso, M. (2000). Layered learning. In López de Mántaras, R. and Plaza, E., editors,
Machine Learning: ECML 2000, pages 369–381, Berlin, Heidelberg. Springer Berlin Heidelberg.

Urbanowicz, R., Granizo-Mackenzie, A., and Moore, J. (2012). Instance-linked attribute tracking
and feedback for michigan-style supervised learning classifier systems. In Proceedings of the
14th Annual Conference on Genetic and Evolutionary Computation, GECCO 12, page 927934, New
York, NY, USA. Association for Computing Machinery.
Urbanowicz, R. J. and Browne, W. N. (2017). Introduction to Learning Classifier Systems. Springer-
Briefs in Intelligent Systems. Springer-Verlag, Berlin Heidelberg.

Urbanowicz, R. J. and Moore, J. H. (2009). Learning classifier systems: A complete introduction,


review, and roadmap. Journal of Artificial Evolution and Applications, 2009.
Wilson, S. W. (1995). Classifier fitness based on accuracy. Evolutionary Computation, 3(2):149–175.

Evolutionary Computation Volume x, Number x 25

You might also like