0% found this document useful (0 votes)

41 views19 pages

6.algorithm Quasi-Optimal (AQ) Learning

Uploaded by

sema.abdulla

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views19 pages

6.algorithm Quasi-Optimal (AQ) Learning

Uploaded by

sema.abdulla

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Advanced Review

Algorithm quasi-optimal (AQ)

learning
Guido Cervone,1∗ Pasquale Franzese2 and Allen P. K. Keesee3

The algorithm quasi-optimal (AQ) is a powerful machine learning methodology

aimed at learning symbolic decision rules from a set of examples and
counterexamples. It was first proposed in the late 1960s to solve the Boolean
function satisfiability problem and further refined over the following decade to
solve the general covering problem. In its newest implementations, it is a powerful
but yet little explored methodology for symbolic machine learning classification.
It has been applied to solve several problems from different domains, including
the generation of individuals within an evolutionary computation framework.
The current article introduces the main concepts of the AQ methodology and
describes AQ for source detection(AQ4SD), a tailored implementation of the AQ
methodology to solve the problem of finding the sources of atmospheric releases
using distributed sensor measurements. The AQ4SD program is tested to find the
sources of all the releases of the prairie grass field experiment .  2010 John Wiley &
Sons, Inc. WIREs Comp Stat 2010 2 218–236

T he algorithm quasi-optimal (AQ) learning

methodology traces its origin to the Aq algo-
rithm for solving general covering problems of
recent members of the AQ family of programs are
among the most advanced symbolic learning systems.
The rapid development of computer technology
high complexity.1,2 An implementation of the AQ and high-level programming languages throughout
algorithm in combination with the variable-valued the 1980s and 1990s prompted researchers to port the
logic representation produced the first AQ learning original Lisp version of the AQ methodology to new
program, AQVAL/1, which pioneered research on programming environments. These developments,
general-purpose inductive learning systems.3 An early however, were performed in an academic environment
application of AQ to soybean disease diagnosis was and primarily for educational purposes, and they often
considered one of the first significant achievements of lacked stability, reliability, and ease of use associated
machine learning.4 with other commercial or more popular classification
Subsequent implementations, developed over programs, such as classification and regression trees
the span of several decades, added many new
(CART)5,6 and C4.5.7
features to the original system, and produced a
As a result, despite such continuous develop-
highly versatile learning methodology, able to tackle
ment, use of AQ programs has been limited, especially
complex and diverse learning problems. In the current
outside the main developing group. In addition, lim-
implementation, AQ is a multipurpose machine
ited AQ usage may be the result to some extent of
learning methodology that generates rulesets in attri-
butional calculus. Because of a wide range of features the complexity associated with running various dif-
and a highly expressive representation language, ferent AQ implementations (i.e., variations based on
different parameter settings available within the basic
AQ framework). That is, changing modes, tolerance
∗ Correspondence to: [email protected] levels, thresholds, and other parameters that can be
1 Departmentof Geography and Geoinformation Science, George adjusted in AQ to address different kinds of data or
Mason University, Fairfax, VA 22030, USA
2 Center set up AQ to generate different types of output is an
for Earth Observing and Space Research, George Mason
University, Fairfax, VA 22030, USA exercise in fine tuning, and although correct parame-
3 Department of Statistics, George Mason University, Fairfax, VA ter setting is not overly difficult to learn, it does take
22030, USA some effort and even a modicum of trial and error
DOI: 10.1002/wics.78 with specific datasets.

218  2010 Jo h n Wiley & So n s, In c. Vo lu me 2, March /April 2010

WIREs Computational Statistics Algorithm quasi-optimal learning

This article describes a complete rewrite of the AQ METHODOLOGY

AQ algorithm, specifically tailored for the problem
of detecting the source of an atmospheric pollutant Overview
release from limited ground sensor measurements. This section reviews the main features of AQ-type
Before undertaking the task of developing an entirely learning. A detailed description of various aspects of
new AQ program, we analyzed relevant available the methodology can be found in Refs 1–3, and 9–12.
existing AQ implementations, such as AQ158 and AQ pioneered the sequential covering (a.k.a. ‘separate
AQ18.9 Our analysis indicated that the above and conquer’) approach to concept learning. It is
implementations were optimized for speed, rather based on an algorithm for determining quasi-optimal
than for the extensibility or comprehensibility of the (optimal or suboptimal) solutions to general covering
code. Additionally, many of the features included problems of high complexity.
AQ is a machine learning classifier that
in the existing implementations were not needed in
generalizes sets of examples with respect to one or
the context of atmospheric source detection, whereas
more sets of counter-examples. The input data for AQ
some crucial features were missing. Therefore,
is therefore made of labeled data, or in other words
the specialized AQ for source detection (AQ4SD)
data which is already assigned to a particular class
represents for the AQ approach a fresh new start.
or group. Unlike clustering, a form of unsupervised
Although AQ4SD can be used as a general machine learning, whose goal is dividing unlabeled data into
learning classifier, it was specifically designed to distinct classes, AQ is a form of supervised learning,
work with large noisy datasets containing a limited wherein classified data are generalized to identify the
number of primarily real-valued attributes but up to characteristics of the entire class.
hundreds of thousands of cases. The main application In its simplest form, given two sets of multivari-
of AQ4SD is to generate new candidate solutions in ate descriptions, or events, P1 , . . . , Pn and N1 , . . . , Nm ,
a non-Darwinian evolutionary computation process. AQ finds rules that cover all P examples (a.k.a. positive
It was optimized to run iteratively and included a events) and do not cover any of the N examples (a.k.a.
new mechanism for incremental learning to refine negative events). More generally, each multivariate
previously learned patterns or rules using only new description is a classified event of type x1 , . . . , xk , and
events, rather than starting the learning process from c, where each x is an attribute value, and c is the
scratch each time. Using knowledge acquired from the class it belongs to. For each class c, AQ considers as
analysis of the previous codes, the new design aims positive all the events that belong to class c, and as
at making the new implementation reliable, easy to negative all the events belonging to the other classes.
use, and easy to modify and extend, while retaining The algorithm learns from examples (positives)
relevant features previously implemented in AQ rule and counterexamples (negatives) patterns (a.k.a. rules)
learning systems. of attribute values that discriminate the characteristics
The article is structured as follows: the first of the positive events with respect to the negative
section gives an introduction to the AQ method- events. Such patterns are generalizations of the
ology; it discusses the implementation of AQ4SD, individual positive events and depending on AQ’s
and discusses the strengths and disadvantages; Section mode of operation may vary from being totally
complete (covering all positives) and consistent (not
Advantages and Disadvantages of the AQ Methodol-
covering any of the negatives) to accepting a tradeoff
ogy presents advantages and disadvantages associated
of coverage to gain simplicity of patterns.
with the AQ methodology compared with other meth-
The AQ learning process can proceed in one of
ods; Section Evolutionary Computation Guided by
two modes: (1) the theory formation (TF) mode and
AQ discusses the use of AQ as main engine for (2) the pattern discovery (PD) mode. The PD mode
an evolutionary computation process; Section Source was introduced in AQ18 and was not part of the
Detection of Atmospheric Releases discusses the prob- original methodology. In the TF mode, AQ learns
lem of source detection of atmospheric releases and rules that are complete and consistent with regard to
presents a technique based on AQ learning to identify the data. In other words, the learned rules cover all the
the source of unknown releases; and Section Results positive examples and do not cover any of the negative
presents the results from the application of AQ to examples. This mode is mainly used when the training
identify the sources of the real-world Prairie Grass data can be assumed to contain no errors. The PD
experiment.49 Finally, Section Discussion summarizes mode is used for determining strong patterns in the
the main contributions of the articles and the results data. Such patterns may be partially inconsistent or
of the experiments. incomplete with respect to the training data. The PD

Vo lu me 2, March /April 2010  2010 Jo h n Wiley & So n s, In c. 219

Advanced Review www.wiley.com/wires/compstats

mode is particularly useful for mining very large and AQ4SD is optimized to run with a very large
noisy datasets. number of events with a small number of attributes.
The core of the AQ algorithm is the so-called star Experiments, for example, were performed with up
generation, the process of which can be done in two to 1,000,000 training events, each comprised of 20
different ways, depending on the mode of operation real-valued attributes.
(TF or PD). In TF mode, the star generation proceeds Although a formal analysis of the complexity of
by selecting a random positive example (called a seed) the AQ algorithm is beyond the scope of this article,
and then generalizing it in various ways to create experimental runs showed that AQ complexity is
a set of consistent generalizations (that cover the polynomial. In particular, it is a low-order polynomial
positive example and do not cover any of the negative in the number of events and a higher order polynomial
examples). In PD mode, rules are generated similarly, in the number of negative events. The lower complex-
but the program seeks strong patterns (that may ity increases associated with an increase in positive
be partially inconsistent) rather than fully consistent events are due to the fact that in optimization during
rules. This star generation process is repeated until all learning, only uncovered positive events are used to
the positive events are covered. Additionally, when evaluate rules (whereas all negatives, or a sample of
run in PD mode, the generated rules go through all the negatives, are used to evaluate the rules).
an optimization process which aims at generalizing The following sections describe the algorithms
and/or specializing the learned descriptions to simplify and data types used and implemented in AQ4SD.
the patterns. Because AQ4SD is an implementation of a general
methodology, when AQ4SD is specified in the text, it
refers to specific features or implementation details of
AQ4SD Features and Implementation AQ4SD itself, whereas when AQ is specified, it refers
AQ4SD is, as noted above, a total rewrite of the AQ to concepts and theories that apply to the general AQ
algorithm, specifically optimized to solve the problem methodology.
of source detection of atmospheric releases (see Section
Source Detection of Atmospheric Releases). It shares
many parts with the earlier version of AQ2011 but AQ Events
includes new features and optimization algorithms The AQ input data consists of a sequence of events.
for the source detection problem. The development of An event is a vector of values, where each value corre-
AQ20 was led by the first author in close collaboration sponds to a measurement associated with a particular
with many faculty and student members of the attribute. An event can be seen as a row in a database,
Computer Science Department and Machine Learning with each value an observation of a particular
Laboratory at George Mason University. attribute where columns are the different attributes.
AQ4SD is written in C++ making extensive AQ events are a form of labeled data, meaning
use of the standard templated library (STL)13 and that they are or can be classified into one of two or
generic design patterns.14 The entire code comprises more classes. Therefore, each event contains a special
about 250,000 lines. The goal of the AQ4SD attribute class, which identifies which class it belongs
algorithm was to be suitable as a main engine of to. A sequence of events belonging to the same class
evolution in a non-Darwinian evolutionary process is called an eventset.
(see Section Evolutionary Computation Guided by Additionally, two different types of events can
AQ) to find the sources of atmospheric releases, using be used by AQ: training and testing. Training events
sensor concentration measurements and forward are used by AQ to learn rules. Testing events are used
atmospheric transport and dispersion numerical to compute the statistical correctness of the learned
models. rules on events not used during learning.
AQ4SD was thus optimized to be used iteratively
because evolutionary computation is based on
iterative processes. It is tailored primarily toward
AQ Rules
real-valued (continuous) attributes, and it uses a novel
AQ uses a highly descriptive representation language
method that does not discretize real-valued attributes
to represent the learned knowledge. In particular,
into ordinal attributes during preprocessing. It is
it uses rules to describe patterns in the data. A
also optimized to work with noisy data, as sensor
prototypical AQ rule is defined in logical Eq. (1).
concentration measurements often contain errors and
missing values. Finally, as sensors are usually very
limited in number but record very long time series, Consequent ←− Premise Exception (1)

220  2010 Jo h n Wiley & So n s, In c. Vo lu me 2, March /April 2010

WIREs Computational Statistics Algorithm quasi-optimal learning

where consequent, premise and exception are conjunc- sample ruleset:

tions of conditions. While premise and consequent
are mandatory, the exception is optional and used [Cluster = 1] ←− [WindDir = N . . . E]
only in very special circumstances. Although excep- [WindSpeed > 10 m/s]
tion has been implemented in AQ4SD, it is not being ◦
[Temp > 22 C] : p = 11, n = 3
used, because it often leads to over fitting in the
←− [WindDir = E]
presence of very noisy data. A condition is simply a
relation between an attribute and a set of values it can [Date = July] : p = 5, n = 0
take. ←− [Pressure > 1010]
[Date = Sep] : p = 1, n = 0 (4)

[Attribute. Relation. Value(s)] (2) Each rule has a different statistical value. Assuming
13 positive events associated with cluster 1, the first
rule in Eq. (4) covers not only most positive events in
Depending on the attribute type, different the cluster (11 of the 13 events) but also three negative
relations may exist. For example, for unordered events. This means that AQ was run in PD mode, to
categorical attributes, the relations < or > cannot allow inconsistencies to gain simpler rules. The second
be used as they are undefined. A complete set of rule covers less than 50% of the events and the third
the relations allowed with each attribute type is covers only 1, but both without covering any elements
given in Section Attribute Types). Typically, the in other clusters. Therefore, there is a tradeoff between
consequent consists of a single condition, whereas the completeness, namely the number of events covered
premise consists of a conjunction of several conditions. out of all the clouds in the cluster and consistency,
Equation (3) shows a sample rule relating a particular namely the coverage of events from other clusters.
cluster to a set of input parameters. The annotations
p and n indicate the number of positive and negative Attribute Types
events covered by this rule.
AQ4SD allows for four different types of attributes,
nominal, linear, integer, and continuous. Each
attribute type is associated with specific relations that
[Cluster = 1] ←− [WindDir = N . . . E] can be used in rule conditions.
[WindSpeed > 10 m/s] (3)
◦ Nominal: Unordered categorical attribute for which
[Temp > 22 C] : p = 11, n = 3 a distance metric cannot be defined. Nominal
attributes do not naturally or necessarily fall into
any particular order or rank, like the colors or
This type of rule is usually called attributional to blood types or city names. The domain of nominal
be distinguished from more traditional rules that attributes is thus that of unordered sets. The
use a simpler representation language. The main following relations are allowed in rule conditions:
difference from traditional rules is that referee equal (=) and not equal (=).
(attribute), relation, and reference may include
internal disjunctions of attribute values, ranges of Linear: Ordinal categorical attribute that is rankable,
values, internal conjunctions of attributes, and other but not capable of being arithmetically operated
constructs. Such a rich representation language means upon. Examples of linear attributes are small,
that very complex concepts can be represented using medium, large, or good, better, best. Such
a compact description. However, attributional rules attributes can be sorted and ranked but cannot
have the disadvantage of being more prone to over be multiplied or subtracted from one other. The
following relations are allowed in rule conditions:
fitting with noisy data.
equal (=), not equal (=), lesser (<), greater (>),
Multiple rules are learned for each cluster, and
lesser or equal (≤), and greater or equal (≥).
are called a ruleset. A ruleset for a specific consequent
is also called a cover. A ruleset is a disjunction of Integer: Ordinal integer-valued attribute without a
rules, meaning that even if only one rule is satisfied, prefixed discretization and without decimal
then the consequent is true. Multiple rules can be values. Integer attributes allow only whole
satisfied at one time because the learned rules could numbers, such as 20, or −77. The following
be intersecting each other. Equation (4) shows a relations are allowed in rule conditions: equal

Vo lu me 2, March /April 2010  2010 Jo h n Wiley & So n s, In c. 221

Advanced Review www.wiley.com/wires/compstats

(=), not equal (=), lesser (<), greater (>), lesser be unique and belong to a single class. AQ has four
or equal (≤), and greater or equal (≥). different strategies to resolve ambiguities:

Continuous: Ordinal real-valued attribute without Positives: The ambiguous event is kept in the positive
a prefixed discretization but which contains a class (the class rules are being learned from) and
decimal point and a fractional portion. The eliminated from all the other classes.
following relations are allowed in rule conditions:
Negatives: The ambiguous event is eliminated from
equal (=), not equal (=), lesser (<), greater (>),
the positive class.
lesser or equal (≤), and greater or equal (≥).
Previous versions of AQ dealt with continuous Eliminate: The ambiguous event is eliminated and
variables by discretizing them into a number not used for learning.
of discrete units and then treating them as
linear attributes. AQ4SD does not require such Majority: The ambiguous event is associated to the
discretization as it automatically determines class where it most appears.
ranges of continuous values for each variable
occurrence in a rule during the star generation Attribute Selection
process. In general, AQ learns rules to discriminate between
classes using only the smallest number of attributes.a
Therefore, AQ performs an automatic attribute
selection during the learning phase, selecting the most
AQ Algorithm
relevant attributes, and disregarding those apparently
The AQ learning process can be divided into
irrelevant. Unfortunately, especially for large noisy
four different parts: data preparation, rule learning,
problems, irrelevant attributes can lead to generation
postprocessing, and optional testing. The following
of incorrect rules. To avoid this problem, AQ can
sections address each part individually.
be set to create statistics for each of the attribute
The input data is made of a definition of the
values, namely a measure of how many positives
attributes (variables), AQ control parameters for each and negative examples, respectively, each attribute
of the four parts mentioned above, and the raw events. value covers. AQ can try to keep only those attributes
The output of AQ consists of the learned rules, which that seem to have more discriminatory information
can be displayed in textual or graphical form. Different between classes.
versions of AQ used different ways to define the format This is only a rough approximation, as
of the input and output. Because the different methods individual attributes might have little discriminatory
do not affect learning, they are not discussed in this information when considered singularly but can help
article. AQ4SD uses the input/output format described the generation of excellent rules in combination with
in Ref 12. others. Creating such statistics is a quick linear
operation that requires a one-time analysis of the
Data Preparation entire data or of a statistical sample of the data.
The AQ learning process starts with data being Because such statistics are also used by the learning
read from a file (when used as a stand alone and optimization algorithms, there is not a significant
classifier) or from memory (when embedded in a computational overhead introduced by this attribute
larger system). The data is processed by the data selection method.
preparation mechanism, which checks the data format
for correctness, corrects or removes ambiguities, Rules for Incremental Learning
selects the relevant attributes (a.k.a. feature selection), One of the main advantages of AQ (see Section
and applies rules for incremental learning. Advantages and Disadvantages of the AQ Method-
Some versions of AQ can also automatically ology) is the ability to refine previously learned rules
or through user input generate new attributes to as new input events become available. The input data
change the data representation. This feature, called can specify rules that describe either a previously
constructive induction, was first implemented in learned concept or constraints between attributes.
a specialized version of AQ1715,16 and is not For example, they can specify that a particular
implemented in AQ4SD. combination of attributes cannot appear together in
a rule or that the boundaries of the search space are
Resolving Ambiguities reduced under particular attribute values.
An ambiguity is an event that belongs to two or more As described in detail in Section Rule Learning,
classes. For the purpose of learning, each event must AQ starts generating rules by comparing positive

222  2010 Jo h n Wiley & So n s, In c. Vo lu me 2, March /April 2010

WIREs Computational Statistics Algorithm quasi-optimal learning

and negative events and keeps specializing previously the seed but might cover many if not all the events in
learned rules with new conditions when they cover P . Rule r is then added to the list of rules R to be
negative events. In incremental learning mode, the added to the final answer.
set of rules being specialized does not start with an
empty set but with those specified in the input data. Star Generation
No other aspects of the learning are affected except The central concept of the algorithm is a star, defined
in the case of extreme ambiguities, when the supplied as a set of alternative general descriptions (rules) of a
rules do not include any positive examples. In such particular event (a ‘seed’) that satisfy given constraints,
situations, AQ cannot use the input rules as it is not for example, do not cover negative examples, do not
able to evaluate their positive and negative coverage. contradict prior knowledge, etc.

Rule Learning
This is the core of the AQ methodology, where rules
are generated from examples and counterexamples.
AQ generates rules by an iterative process aimed at
identifying generalizations of the positive examples
with respect to the negative examples. Recall that
positive examples are those labeled for the target
class, and negatives are those belonging to all the
other classes.

The star generation is an iterative process

(Algorithm 2). First, the seed is extended against
each negative example (line 3). The extension-against
operator () is a pair-wise generalization operation
The main algorithm for AQ is illustrated in between the seed and a negative event aimed at
Algorithm 1. Although several variants and optimiza- finding the largest possible set of descriptions (rules)
tion mechanisms have been developed, the main core that cover the seed but not the negative. Thus,
shown is true for the main AQ methodology. AQ the result of the extension-against operation is a
requires two nonempty eventsets, one of positive and disjunction of single condition rules, namely one rule
one of negative events, where at least one positive for each nonidentical attribute. An identical attribute
event and one negative event are not ambiguous. The is simply an attribute that has the same value for
algorithm starts by making a new list of positive events both the seed and the negative event, and for which
yet to be to cover P . The algorithm loops until all a generalization that covers the seed but not the
positives have been covered. negative cannot be made. For each dimension, the
The algorithm starts by selecting a random largest possible description that covers the seed, but
positive event, called the seed from among P , and not the negative, is the negation of the negative.
then creates a star (Section Star Generation) for This definition of the extension-against operator only
that example.b The result of the star is a rule that works for nominal attributes and is implemented by
generalizes the seed and does not cover any of the encoding the attribute domain into a binary vector,
negatives (TF mode) or can allow an inconsistent where each bit represented a particular value, and by
coverage for simpler rules (PD mode). negating this vector.
A lexicographical evaluation functions (LEF) is Assuming a nominal variable ||x|| = {red, green,
used to evaluate the rules during the star genera- blue}, x = blue is encoded as {0,0,1}. The result
tion (Section Lexicographical Evaluation Functions). of the extension-against operation between a seed
Next, all the events covered by rule r are removed with x = blue and a negative event with x = red
from the list P . Rule r is guaranteed to cover at least is the rule [positive] ←− [x = green or blue]. Its

Vo lu me 2, March /April 2010  2010 Jo h n Wiley & So n s, In c. 223

Advanced Review www.wiley.com/wires/compstats

binary representation is [{0,1,1}], which is exactly the to determine which rules, among those generated,
negation of the negative event. are best suited to be included in the answer. AQ
The extension-against operation for linear has been described as performing a beam search in
attributes is slightly different, and it involves flipping space.8 LEF is the parameter that controls the width of
the bits only up to the value of the negative event, the beam.
and not any values beyond. Assuming a linear LEF works as following:
variable ||y|| = {XS, S, M, L, XL, XXL}, the result
of the extension-against operation between a seed 1. Sort the rules in the star according to LEF, from
with y = S and a negative event with y = L is the best to the worst.
the rule [positive] ←− [y = XS . . . M]. Its binary
2. Select the first rule and compute the number
representation is [{1,1,1,0,0,0}].
of examples it covers. Select the next rule and
For integer and continuous attributes, the
compute the number of new examples it covers.
extension-against operator finds a value between the
seed and negative. The degree of generalization can 3. If the number of new examples covered exceeds a
be controlled, and by default set to choose the middle new example threshold, then the rule is selected,
point between the two. The ε parameter, defined otherwise it is discarded. Continue the process
between [0 and 1], controls the degree of generaliza- until all rules are inspected.
tion during the extension-against operation, with 0
being most restrictive to the seed, and 1 generalizing The result of this procedure is a set of rules
up to the negative. Assuming a continuous variable selected from a star. The list of positive events to
||z|| = {0.100}, the result of the extension-against cover is updated by deleting all those events that are
operation between a seed with z = 10 and a negative covered by these rules.
event with y = 30, and ε = 0.5 is the rule [positive]
←− [z ≤ 20]. The result of the same extension-against
operation with ε = 1 is [positive] ←− [z < 30] (note Postprocessing
that 30 is not included). Postprocessing operations consist in: (1) improvement
The rules from the extension-against operation of the learned rules through generalization and
are then logically multiplied out with all the rules r to specialization operators, (2) optional generation of
form a star (Algorithm 2, line 4), and the best rule (or alternative covers, and (3) formatting of the output
rules) according to a given multicriterion functional for textual and graphical visualization.
LEF (Section Lexicographical Evaluation Functions)
are selected (line 5). The parameter maxstar is central Optimization of Rules
to the star generation process and defines how many When AQ is run in PD mode, rules can be further
rules are kept for each star. optimized during post processing. Rules can be
If AQ is run in TF mode, the result from the generalized by dropping values in the reference of the
intersection of the previously learned rules and the conditions or by enlarging the ranges for continuous
new rule is kept. In PD mode, the function Q [Eq. and integer attributes. Rules can be further generalized
(5)] is used to compute the tradeoff between the by dropping conditions altogether. Finally, entire
completeness and the consistency of the rules rules could be dropped. The opposite operation of
w 1−w specialization is performed only at the condition level,
p P+N p P by adding values in discrete attributes, and shrinking
Q = − (5)
P N p+n P+N domains for integer and continuous attributes.
The optimization operation follows heuristics,
where is p and n are the number of positive and and at each step computes what is called in AQ the Q
negative events, and P and N are the total number of value for the new rule Eq. (5). If the Q value increases,
positive and negative events in the data. The parameter then the modified rule is added to the final answer,
w is defined between 0 and 1 and controls the tradeoff otherwise is disregarded.
between completeness and consistency.9
Alternative Covers
Lexicographical Evaluation Functions Some of the rules learned during the star generation
LEF is an evaluation function composed from process, especially with large maxstar values, might
elementary criteria and their tolerances and are used not be required in the final output. The final step of
to determine which rules best reflect the needs of the learning process consists in selecting from the pool
the problem at hand. In other words, LEF is used of learned rules, only the minimum set required to

224  2010 Jo h n Wiley & So n s, In c. Vo lu me 2, March /April 2010

WIREs Computational Statistics Algorithm quasi-optimal learning

Group = 2
(30, 19) Rule 3
Rule 1
(25, 0)
(30, 0)
Rule 2
(27, 0)
<=0.8399
=Feb..Nov (27,17)
(25,12)
DiffSST
>=1.002e+05
=NE..S (16,9)
=N..SE >=7.79 >=9.954e+04
(23,3) =5.45..8.95 <=294.7
(27,8) (28,14) (30,16) >=2
(13,12) (23,19)
(28,19)
Wind0Z500 BlackSST Pressure Humidity Date SLHF Air temperature
=S..W >=5.35 <=Dec
=7.035..19.32 <=1.011e+05 (9,33) (11,39) <=38
(10,8) =Feb..Mar
(10,22) (11,39) (9,31)
=3.75..5.45 (2,4)
(3,8)

Rule 1
(11, 0) Rule 2
Group = 6 (2, 0)
(11, 39)

FIGURE 1 | A sample association graph from an atmospheric pollution problem.

cover the positive examples. Thus, some of the rules of a much slower or complex program. Other
might not be included in the final answer and can be issues remain unresolved and open to investigation.
used to generate alternative solutions. Depending on The following discussion summarizes those that are
the presence of multiple strong patterns in the data, believed to be the main issues to consider when
alternative covers might be very useful to discriminate choosing the AQ methodology over other methods,
between classes. in particular C4.5 which is the closest widely used
machine learning symbolic classifier.
Association Graphs
Association graphs are used to visualize attributional
rules that characterize multivariate dependencies Rich Representation Language
between target classes and input attributes. A pro- One of the main advantages of AQ consists in the abil-
gram called concept association grapth (CAG) was ity to generate compact descriptions which are easy to
developed by the first author to automatically display read and understand. Unlike neural networks, which
such graphs. Figure 1 is a graphical illustration of are black boxes and use a representation language
the rules discovered from an atmospheric release that cannot be easily visualized, AQ rules can be
problem.17 Representing relationships with nodes and inspected and validated by human experts. Although
links is not new nor unique to AQ and has been used decision tree classifiers, such as C4.5, can convert the
in many applications in statistics and mathematics. learned trees into rules, the resulting descriptions are
Each target class is associated only with unique pat- expressed in a much simpler representation language
terns of input parameters. The thickness of the links in AQ. For example, C4.5 rules only allow for atomic
indicates the weight of a particular parameter-value relationships between attributes and possible values
combination in the definition of the cluster. and do not allow for internal disjunctions or multiple
ranges. Figure 2 shows the respective covers generated
by AQ (left) and C4.5 (right). In this example, internal
ADVANTAGES AND DISADVANTAGES disjunction allows for a simpler and more compact
OF THE AQ METHODOLOGY representation due to intersecting patterns.
The AQ methodology has intrinsic advantages and The cover generated by AQ [Eq. (6)] is composed
disadvantages with respect to other machine learning of two rules with a single condition, each covering 20
classifiers, such as neural networks, decision trees, or positives and no negatives.
decision rules. Some of the original disadvantages have
been solved or improved with additional components [Positives = 1] ←− [X ≥ 5] : p = 20, n = 0
or optimization processes, often at the expense ←− [Y ≥ 5] : p = 20, n = 0 (6)

Vo lu me 2, March /April 2010  2010 Jo h n Wiley & So n s, In c. 225

Advanced Review www.wiley.com/wires/compstats

AQ Cover C4.5 Cover

10 10
+ + + + ++ + + + + ++
+ +
+ + + ++ + + + ++
8 + 8 +
+ + + +
+ ++ + ++
6 + + 6 + +
+ + + +

Y
Y

++ + ++ +
4 + 4 +
− −
− + − +
− − − −
2 2
− − − −
− + − +
− − + + − − + +
0 − 0 − FIGURE 2 | Different covers
0 2 4 6 8 10 0 2 4 6 8 10 generated by AQ (left) and C4.5
X X
(right) using the same dataset.

In contrast, the tree [Eq. (7)] and the corresponding a portion of the positive examples but still has to
rules [Eq. (8)] generated by C4.5 cannot represent consider all the negatives. This is in part due to the
the intersecting concept because of the simpler ability of representing intersecting concepts, meaning
representation language. that rules are not bound to prior partitions.

Root
Quality of Decisions
X<5 X≥5 As previously seen, C4.5 performs consecutive splits
on a decreasing number of positive and negative
Y≥5 Y<5 Positives examples. This means that at each iteration, decisions
Positives Negatives
are made on a smaller amount of information. In
(7)
contrast, AQ considers all the search space at each
iteration, meaning that all decisions are made with
the maximum amount of information available.
[Positives = 1] ←− [X ≥ 5] : p = 20, n = 0
←− [X < 5][Y ≥ 5] : p = 10, n = 0
(8) Control Parameters
AQ has a very large number of control variables. Such
The C4.5 cover is composed of two rules, one controls allow for a very fine tuning of the algorithm,
with a single condition, and one with two conditions. which can lead to very high quality descriptions. On
The first, identical to the rule learned by AQ, covers the other hand, it is often difficult to determine a priori
20 positives and no negatives, whereas the second which set of parameters will generate better rules.
covers only 10 positives and no negatives. Although Although heuristics on how to set the parameters
both covers are complete and consistent, the cover exist, they are often suboptimal, and user fine tuning
of C4.5 is more complex and cannot represent the is required for optimal descriptions.
intersecting concept.
Matching of Rules and Events
Speed AQ allows for different methods to test events on a
AQ is considerably slower than C4.5 because of set of learned rules. In C4.5, and most other classifiers
the underlying differences between the ‘separate and which do not allow for intersecting concepts, testing
conquer’ learning strategy of AQ and the ‘divide and an event usually involves checking if it is included in
conquer’ strategy of C4.5. In C4.5, at each iteration, a rule or not. This is due to the fact the entire search
the algorithm recursively divides the search space. This space is partitioned into one of the target classes. In
means that at each iteration the algorithm analyzes an AQ, each event can be included in more than one rule
always smaller number of events. In contrast, AQ or could be in an area of the search space which has not
compares each positive with all of the negatives. been assigned to any classes. Assigning an unclassified
Effectively, AQ can be optimized to consider only event to one of the target classes involves computing

226  2010 Jo h n Wiley & So n s, In c. Vo lu me 2, March /April 2010

WIREs Computational Statistics Algorithm quasi-optimal learning

descriptions for each of the classes using a single itera-

tion of the algorithm, leading to very fast results. AQ,
on the other hand, must be run multiple times, each
time using the events of the target class as positives
and the events of all the other classes as negatives.
Such limitation seriously affects the execution time.
Additionally, because rules are learned separately for
each class, the resulting covers might be intersecting.
Intersecting concepts might lead unseen testing events
d2
to be classified as belonging to more than one class.
d1
d3 Alternative Covers
X AQ can generate alternative covers for each run. This
is because in the postprocessing phase, only a portion
of the learned rules are used for the final output. By
selecting different rules combinations, it is possible to
generate a number of alternative covers. Each cover
FIGURE 3 | An event that lies in an area of the search space which
is not generalized to any of the training classes is assigned to the class might differ in completeness and consistency, and in
it is closest to. simplicity of patterns.

different degrees of match between the event and the Incremental Learning
covers of each of the classes and selecting the class with Decision rule learners have the intrinsic advantage of
highest degree of match. Figure 3 shows the example being able to refine previous rules as new training data
of an unclassified event that lies in an area of the search become available. This is because of the sequential
space not generalized to any of the classes. The degree nature of the ‘separate and conquer’ strategy of
of match between the event and the cover of each of the algorithms. Refinement of rules involves adding or
class is computed, and it is assigned to the class with dropping conditions in previously learned rules or
the highest score, in this case (d1). Several distance splitting a rule in a number of partially subsumed
functions can be used to match rules and events and rules. The main advantage is that modification in a rule
differ at the top level if they are strict or flexible. of the cover does not affect the coverage of the other
In strict matching, AQ counts how many times rules in the cover (Although the overall completeness
a particular event is covered by the rules of each of and consistency of the entire cover might be affected).
the classes. An event can be covered multiple times by In contrast, although possible, it is more complicated
the rules of a particular class because the rules might to update a tree as it often involves several updates
be intersecting due to internal disjunctions. It can also that propagate from leaves of the tree, all the way
be covered multiple times by rules of different classes to the root. Additionally, the resulting tree might be
if AQ was run in PD mode, and inconsistent covers suboptimal and very unbalanced, prompting for a
were generated. complete re-evaluation of each node.
In flexible matching, AQ computes the ratio of
how many of the attributes are covered over the total
number of attributes. Assuming an event with three Input Background Knowledge
attributes, if a rule for class A matches three of them, Because of the ability of AQ to update previously
and rule for class B matches two of them, the event is learned rules, it is possible to add background knowl-
classified as type A because of a higher flexible degree edge in the form of input rules. This feature is partic-
of match. If the degree of match falls below a certain ularly important when there is an existing knowledge
threshold, AQ classifies the event as unknown. In case of the data, or constraints on the attributes, which
more than one class has the same degree of match, can lead to a simpler rules and a faster execution.
the classification is uncertain, and multiple classes are
output.
EVOLUTIONARY COMPUTATION
Multiple Target Classes GUIDED BY AQ
Decision tree classifiers are advantaged when learning The term evolutionary computation was coined in
from data with several target classes. They can learn 1991 as an effort to combine the different approaches

Vo lu me 2, March /April 2010  2010 Jo h n Wiley & So n s, In c. 227

Advanced Review www.wiley.com/wires/compstats

to simulating evolution to solve computational Discriminating between best and worst per-
problems.18–23 Evolutionary computation algorithms forming individuals could provide additional infor-
are stochastic methods that evolve in parallel a mation on how to guide the evolutionary process.
set of potential solutions through a trial and error The learnable evolution model (LEM) methodology
process. Potential solutions are encoded as vectors was proposed in which a machine learning rule
of values and evaluated according to an objective induction algorithm was used to learn attributional
function (often called fitness function). The evolu- rules that discriminate between best and worst per-
tionary process consists of selecting one or more forming candidate solutions.33–35 New individuals
candidate solutions whose vector values are modified were then generated according to inductive hypothe-
to maximize (or minimize) the objective function. ses discovered by the machine learning program. The
If the newly created solutions better optimize the individuals are thus genetically engineered, in the sense
objective function, they are inserted into the next that the values of the variables are not randomly or
semi-randomly assigned but set according to the rules
generation, otherwise they are disregarded. While
discovered by the machine learning program.
the methodologies and algorithms that are subsumed
The basic algorithm of LEM works like
by this name are numerous, most of them share one
Darwinian-type evolutionary methods, that is, exe-
fundamental characteristic. They use nondeterministic
cutes repetitively three main steps:
operators such as mutation and recombination as the
main engine of the evolutionary process.
These operators are semi-blind, and the evolu- 1. Create a population of individuals (randomly or
tion is not guided by knowledge learned in the past by selecting them from a set of candidates using
generations, but it is a form of search process executed some selection method).
in parallel. In fact, most evolutionary computation 2. Apply operators of mutation and/or recombi-
algorithms are inspired by the principles of Darwinian nation to selected individuals to create new
evolution, defined by ‘. . .one general law, leading to individuals.
the advancement of all organic beings, namely, multi- 3. Use a fitness function to evaluate the new
ply, vary, let the strongest live and the weakest die’.24 individuals.
The Darwinian evolution model is simple and fast 4. Select the individuals which survive into the next
to simulate, and it is domain independent. Because generations.
of these features, evolutionary algorithms have been
applied to a wide range of optimization problems.25
There have been several attempts to extend The main difference with Darwinian-type evo-
the traditional Darwinian operators with statistical lutionary algorithms is in the way it generates new
and machine learning approaches that use history individuals. In contrast to Darwinian operators of
information from the evolution to guide the search mutation and/or recombination, AQ conducts a rea-
process. The main challenges are to avoid local soning process in the creation of new individuals.
maxima and increase the rate of convergence. The Specifically, at each step (or selected steps) of evolu-
tion, a machine learning method generates hypotheses
majority of such methods use some form of memory
characterizing differences between high-performing
and/or learning to direct the evolution toward
and low-performing individuals. These hypotheses are
particular directions thought more promising.26–31
then instantiated in various ways to generate new indi-
Because evolutionary computation algorithms
viduals. The search conducted by LEM for a global
evolve a number of individuals in parallel, it is possible solution can be viewed as a progressive partitioning
to learn from the ‘experience’ of entire populations. of the search space.
There is not a similar type of biological evolution Each time the machine learning program is
because in nature there is not a mechanism to evolve applied, it generates hypotheses indicating the areas
entire species. Estimation of distribution algorithms in the search space that are likely to contain high-
(EDA) are a form of evolutionary algorithms where performing individuals. New individuals are selected
an entire population may be approximated with a from these areas and then classified as belonging to
probability distribution.32 New candidate solutions a high-performance and a low-performance group,
are not chosen at random but using statistical depending on their fitness value. These groups are
information from the sampling distribution. The aim then differentiated by a machine learning program,
is to avoid premature convergence and to provide a yielding a new hypothesis as to the likely location of
more compact representation. the global solution.

228  2010 Jo h n Wiley & So n s, In c. Vo lu me 2, March /April 2010

WIREs Computational Statistics Algorithm quasi-optimal learning

To understand the advantage of using AQ to There are currently no established methodolo-

generate new individuals, compared with using the gies for the satisfactory solution to the problem of
traditional Darwinian operation, it is necessary to take detecting the sources of atmospheric releases, and
into account both the evolution length, defined as the there is a great degree of uncertainty with respect to the
number of function evaluations needed to determine effectiveness and applicability of existing techniques.
the target solution, and the evolution time, defined as One line of research focuses on the adjoint trans-
the execution time required to achieve this solution. port modeling,36–38 but more general and powerful
The reason for measuring both characteristics is that methodologies are based on Bayesian inference
choosing between the AQ and Darwinian algorithms coupled with stochastic sampling.39 Bayesian methods
involves assessing tradeoffs between the complexity of aim at an efficient ensemble run of forward simu-
the population generating operators and the evolution lations, where statistical comparisons with observed
length. The AQ operations of hypothesis generation data are used to improve the estimates of the unknown
and instantiation used are more computationally source location.40 This method is general, as it is
costly than operators of mutation and/or crossover, independent of the type of model used and the
but the evolution length is typically much shorter than type and amount of data, and can be applied to
that of Darwinian evolutionary algorithms. nonlinear processes as well. Senocak et al.41 used
Therefore, the use of AQ as engine of evolution a Bayesian inference methodology to reconstruct
is only advantageous for problems with high objective atmospheric contaminant dispersion. They pair the
function evaluation complexity. The problem of Bayesian paradigm with Markov chain Monte Carlo
source detection of atmospheric pollutants described (MCMC) to iteratively identify potential candidate
in this article is an ideal such problem because of the sources. A reflected Gaussian plume model is run for
complexity of the function evaluation which requires each candidate source, and the resulting concentra-
running complex numerical simulations. tions are compared with ground observations. The
goal of the algorithm is to minimize the error between
the simulated and the measured concentrations.
A similar approach was followed by Refs 42–45,
SOURCE DETECTION which use an iterative process based on genetic algo-
OF ATMOSPHERIC RELEASES rithms to find the characteristics of unknown sources.
When an airborne toxic contaminant is released in the They perform multiple forward simulations from ten-
atmosphere, it is rapidly transported by the wind and tative source locations, and use the comparison of
dispersed by atmospheric turbulence. Contaminant simulated concentration with sensor measurements to
clouds can travel distances of the order of thousand of implement an iterative process that converges to the
kilometers within a few days and spread over areas of real source. The strength of the approach relies in the
the order of thousands of square kilometers. A large domain independence of the genetic algorithm, which
population can be affected with serious and long- can effectively be used with different error functions
term consequences depending on the nature of the without major modifications to the underline method-
hazardous material released. Potential atmospheric ology. The error functions quantify the difference
hazards include toxic industrial chemical spills, forest between simulated and observed values.
fires, intentional or accidental releases of chemical The methodology applied in this article is based
and biological agents, nuclear power plants accidents, on this approach, but rather than using a traditional
and release of radiological material. Risk assessment evolutionary algorithm, it uses AQ4SD to generate
of contamination from a known source can be new individuals. This application is particularly suited
computed by performing multiple forward numerical for AQ4SD, because the function evaluation is very
simulations for different meteorological conditions computationally intensive and requires running a
and by analyzing the simulated contaminant clouds numerical simulation. The main advantage of using
with clustering and classification algorithms to AQ4SD to generate new individuals is the reduced
identify the areas with highest risk.17 number of function evaluations, which in this case
However, often the source is unknown, and translates to a huge improvement in speed.
it must be identified from limited concentration
measurements observed on the ground. The likely Transport and Dispersion Simulations
occurrence of a hazard release must be inferred from Central to every evolutionary algorithm is the
the anomalous levels of contaminant concentration definition of the objective or fitness function. Given
measured by sensors on the ground or by satellite- a candidate solution, the fitness function evaluates it
borne remote sensors. and gives as feedback which solution is better for the

Vo lu me 2, March /April 2010  2010 Jo h n Wiley & So n s, In c. 229

Advanced Review www.wiley.com/wires/compstats

problem at hand. Each candidate solution is comprised 1000

of eight variables x, y, z, θ, U, Q, S, and ψ. x, y, and z
are the coordinates of the release in kilometers; θ and
U are, respectively, the wind direction and speed in
degrees and ms−1 ; Q is the source strength in gs−1 ;
S is proportional to the area of the release in m2 ;
and ψ describes the atmospheric stability according 500
to Pasquill’s stability classes.46,47 The fitness of each
candidate solution is computed using a normalized
mean square error (NMSQE) function between the 0
20

observed concentrations and the simulated values:

Distance (km)
0

20

60

(Co − Cs )2
0
0 20

0
NMSQE =
80
40

2
(9) 20

where Co is each sensor’s observed values, and Cs is

the corresponding simulated value. The bar indicates
an average over all the observations. The values for Cs −500
are simulated using a three-dimensional (3D) Gaussian
dispersion model, that is,

Cs = P1 P2 (P3 + P4 ) (10)

where P1 , . . . , P4 are defined by −1000

0 200 400 600 800 1000
Q
P1 = (11) Distance (km)
2πU (S + σ 2y )(S + σ 2z )
FIGURE 4 | Summary of the 68 prairie grass experiments.
(y − y0 )2
P2 = exp − (12)
2(S + σ 2y )
sensors positioned along arcs radially located at
(z − z0 )2 distances of 50, 100, 200, 400, and 800 m from
P3 = exp − (13) the source. Only sensors that recorded values above
2(S + σ 2z )
a minimum threshold were considered reliable, and
(z + z0 )2
P4 = exp − (14) as a result, each experiment has a different number
2(S + σ 2z ) of concentration measurements depending on the
atmospheric conditions at the time of the release.
where σ x (x, x0 ; ψ), σ y (x, x0 ; ψ), σ z (x, x0 ; ψ) are
The goal of the optimization process is to identify the
the dispersion coefficients, which were computed
source and the atmospheric characteristics. The only
from the tabulated equations of Briggs,48 and S =
information used for the fitness evaluation are the
σ 2y (xo , xo , ψ) = σ 2y (xo , xo , ψ).
values of the concentrations measured at the sensors.
The result of the simulation is the concentration
Figure 4 shows a summary of the 68 consecutive
field generated by the release along an arbitrary
experiments. The concentration was computed by
wind direction. In order to map each Cs with the
interpolating all the values measured at the concentric
corresponding Co , the wind direction θ is taken into
sensors (shown). The main direction of each release,
account by applying a rotation to the x, y, and z
as indicated in the experiment’s summary, is shown
coordinates of each Cs points.
with the solid lines protruding from the interpolated
surface. One of the characteristics of the prairie grass
Prairie Grass Experiment experiment is the detailed information on the atmo-
The current application uses real-world data from spheric conditions at the time of the release. It is then
the prairie grass field experiment.49 The experiment possible to classify each experiment as belonging to
consisted of 68 consecutive releases of 10 min each a different atmospheric type, using Pasquill’s stability
from the same source. SO2 was used as a trace gas, classes.46,47 Pasquill’s classes range from unstable
and measurements of concentrations were made at (A) to neutral (D) to stably stratified atmosphere (F).

230  2010 Jo h n Wiley & So n s, In c. Vo lu me 2, March /April 2010

WIREs Computational Statistics Algorithm quasi-optimal learning

Release 25, type A Release 7, type B Release 9, type C

600 600 600

3
400 400 400

4
1
3
2

2
1

0
200 200 1 200
1 2

1
2 1

3
1
2
0 0 0 0

0
1
2 3
3
−200 2 −200 −200

3
−600 −600 −600
0 200 600 1000 0 200 600 1000 0 200 600 1000

Release 12, type D Release 42, type E Release 13, type F

600 600 600

400 400 2 400

1 1
200 1 200 0 1 200
0 0 2
1 2 0
3
1 3
2 1 0
0 2 0 1 0 0 3

−200 −200 −200

FIGURE 5 | Different sample prairie −600 −600 −600

grass releases by atmosphere type. 0 200 600 1000 0 200 600 1000 0 200 600 1000

Figure 5 shows 6 of the 68 experiments, each having

1.4
occurred under a different atmospheric type. The
figure shows how the atmospheric stability determines 1.2
the characteristics of the concentration field. Unstable
atmosphere (A) enhances the spread, thus reducing 1.0
the ground level concentration, whereas stable
Error

0.8
atmosphere causes much narrower plumes, which
result in higher ground concentrations. 0.6

0.4

Results 0.2
Experiments were performed for each of the 68 prairie A B C D E F
grass releases. The algorithm started by generating Atmosphere type
a population of random candidate solutions. Each
candidate solution is a potential source and is encoded FIGURE 6 | Errors of AQ4SD divided by atmosphere type.
as a vector of eight variables: x, y, z, θ, U, Q, S, and ψ.
For each potential source, the resulting concentration
field is computed by Eq. (10). The fitness score of each between the two groups. New candidate solutions
source is defined as the error between the observed are generated according to the learned patterns. The
ground concentration and the simulated concentration process continues for 500 iterations. The algorithm
at the same locations, computed according to Eq. (9). was run using a population of 100 candidate
The algorithm proceeds by dividing the can- solutions. At each step, the top and lowest 30%
didate solutions with high and low fitness scores, of the solutions were used as members of the high-
and learning patterns (rules) which characterize and low-performing groups. Each experiment was
the attribute values combinations that discriminate repeated 10 times to study the sensitivity of the

Vo lu me 2, March /April 2010  2010 Jo h n Wiley & So n s, In c. 231

Advanced Review www.wiley.com/wires/compstats

1.4 Atmosphere type

A
B
C
1.2 D
E
F

1.0
Error

0.8

0.6

0.4

0.2 FIGURE 7 | Summary of the errors of AQ4SD

for each prairie grass experiment. The atmosphere
001 005 009 013 017 021 025 029 033 036 040 044 048 051 055 059 065
Experiment ID
type of each experiment is color coded.

algorithm to the initial guess of solutions. The results Figure 8 shows a summary of the results in
are shown also in terms of atmospheric type. terms of x, y, z, θ (called WA = wind angle), and Q.
A total of 680 AQ4SD source detections For all experiments, the original source was located
were performed, namely 10 for each of the 68 at x,y,z = 0, 0, 0. The WA and Q errors are defined,
experiments. There is a considerably higher number respectively, as the identified angle minus the real
of experiments of type D, as this was the predominant angle, and the identified Q minus the real Q. Once
atmospheric condition at the time of the releases. In again, the atmosphere type is color coded using the
order to compensate for the different distributions same colors as in Figure 7. The ideal solution would
of experiments, the results are normalized using this be all the points located at 0, for all variables. In
information. the figure, although this is primarily true for most
Figure 6 shows a summary of the different variables, the strong dependence between y and θ
errors, defined by Eq. (9), achieved by AQ4SD as a is evident. The units for the x,y, and z directions
function of the atmospheric type. A threshold of 1.0 are meters. Therefore, the errors associated with
was assigned as minimum fitness value to recognize changes in the z values are actually very small, as
a source, because such value indicates AQ4SD z only varies 10 m at most. There are larger errors
identifying the source within 50 m of correct solution. for the alongwind dimension x compared with the
Considerably better results were achieved for atmo- crosswind dimension y. This is to be attributed to the
spheric type D, and worse results for atmospheric type concentration field which has a larger gradient in the
A and type F. This pattern reflects the accuracy of the crosswind direction compared with the alongwind
dispersion model (10) to reproduce the concentration direction. Note the correlation between the error in
field under different stability conditions. The Gaus- θ and the y dimension. Such behavior exemplifies
sian model is expected to perform better in neutral the algorithm’s skill at compensating for errors in y
conditions (D), whereas convective turbulence (A) through changes in θ. The variable that seems to be
and stable stratification (F) involve more complex dis- harder to optimize is Q. Such results are primarily due
persion mechanisms which cannot be accounted for, to the correlation between Q and U [P1 in Eq. (10)].
resulting in a lack of accuracy. Figure 6 is consistent
with the notion that the algorithm performs better
when the fitness of the dispersion model is higher. DISCUSSION
Figure 7 shows a summary for all the 68 prairie
grass experiments. Each atmosphere type is color This article introduces the main concepts of the
coded. With the exception of six experiments (3, 4, AQ methodology and discusses its advantages and
7, 25, 52), each of type A or type F, AQ4SD always disadvantages. It describes a new implementation of
achieves a minimum fitness of 1.0, which was the the AQ methodology, AQ4SD, applied to the problem
target acceptance threshold for this experiment. The of source detection of atmospheric releases. In that
overall average fitness error is 0.6. context, AQ4SD is used as main engine of evolution

232  2010 Jo h n Wiley & So n s, In c. Vo lu me 2, March /April 2010

−500 −300 −100 0

−100

X Atmosphere type
−300
A
−500 B

Vo lu me 2, March /April 2010

WIREs Computational Statistics

C
60
D
20 E
Y
F
−20

−60
10
8
6
Z
4
2
0

100

 2010 Jo h n Wiley & So n s, In c.

0 Error WA

−100

150 150

100 100

50 Error Q 50

0 0

−500 −300 −100 0 −60 −20 0 20 40 60 80 0 2 4 6 8 10 −100 0 100 0 50 100 150

FIGURE 8 | Pair-wise plot of different attributes used during the optimization.

233
Algorithm quasi-optimal learning
Advanced Review www.wiley.com/wires/compstats

for an evolutionary computation process aimed at The proposed methodology has a wide domain
finding the source of an atmospheric release, using of applicability, not restricted only to the source
only a observed ground measurements and a numeric detection problem. It can be used for a variety of opti-
atmospheric dispersion model. Experiments were per- mization problems and is particularly advantageous
formed to identify the source of each of the 68 releases for those problems where the fitness function evalua-
of the prairie grass field experiment. tion involves a computationally expensive operation.
The numerical experiments show that in all but
five cases the methodology was able to achieve a
fitness score considered acceptable for the correct NOTES
identification of the source. The performance of a
Some versions of AQ can also be run to generate
the algorithm has been very satisfactory consider- rules with the largest amount of attributes (called
ing that the error intrinsic in the measured data and characteristic mode), but such mode merely consists
the approximation of the dispersion model. AQ4SD in generating discriminant rules and adding condi-
also proved to be quite efficient in terms of number tions that include all events in the class, but having no
of model simulations required for each optimiza- discriminatory information.
tion case. This is one of the main advantages of b Some versions of AQ sort all or a part of the negative
the proposed methodology compared with traditional events according to a distance metric. Although such
evolutionary algorithms, because a fitness evaluation mechanism has been shown to generate simpler rules
for a complete source detection procedure may require in specific cases, because of the additional complexity
computationally expensive numerical simulations. In of defining such distance metrics, which is not always
particular, for larger scale dispersion problems, more possible as in the case of nominal attributes, paired
sophisticated and computationally expensive mete- with the additional computational resources required,
orological and dispersion models need to be run the advantage of such sorting is not clear. AQ4SD
concurrently to evaluate the fitness of each candidate can be run with and without sorting, and experiments
solution.45 have shown no or negligible improvements.

ACKNOWLEDGEMENTS
This material is partly based upon work supported by the National Science Foundation under Grant no: AGS
0849191.

REFERENCES
1. Michalski R. On the quasi-minimal solution of the 5. Steinberg D, Colla P. CART: Tree-structured Non-
general covering problem. Proceesings of Fifth Inter- parametric Data Analysis San Diego, CA: Salford
national Symposium on Information Processing (FCIP Systems; 1995.
69), Yugoslavia, Bled, vol A3; October 3–11 1969, 6. Quinlan J. C4.5: Programs for Machine Learning. San
125–128. Mateo: Morgan Kaufmann; 1993.
2. Michalski R. A theory and methodology of inductive 7. Breiman L, Friedman J, Stone CJ, Olshen RA. Classifi-
learning. Mach Learn 1983, 1:83–134. cation and Regression Trees: Wadsworth International
3. Michalski R. AQVAL/1 computer implementation of a Group; 1984.
variable-valued logic system VL 1 and examples of its 8. Michalski R, Mozetic I, Hong J, Lavrac N. The multi-
application to pattern recognition. First International purpose incremental learning system AQ15 and its test-
Joint Conference on Pattern Recognition, Washington, ing application to three medical domains. Proceedings
D.C., 1973, 3–17. of the 1986 AAAI Conference, Philadelphia, PA vol.
4. Chilausky R, Jacobsen B, Michalski R. An application 104; August 11–15 1986, 1041–1045.
of variable-valued logic to inductive learning of plant 9. Kaufman K, Michalski R. The AQ18 Machine Learn-
disease diagnostic rules. Proceedings of the Sixth Inter- ing and Data Mining System: An Implementation and
national Symposium on Multiple-valued Logic. Logan, User’s Guide. MLI Report. Fairfax, VA: Machine
UT: IEEE Computer Society Press Los Alamitos; 1976, Learning and Inference Laboratory, George Mason
233–240. University; 1999.

234  2010 Jo h n Wiley & So n s, In c. Vo lu me 2, March /April 2010

WIREs Computational Statistics Algorithm quasi-optimal learning

10. Mitchell T. Machine Learning. New York: McGraw- 26. Grefenstette J. Incorporating problem specific knowl-
Hill; 1997. edge into genetic algorithms. Genetic Alg Simul Anneal-
ing 1987, 4:42–60.
11. Cervone G, Panait L, Michalski R. The development
of the AQ20 learning system and initial experiments. 27. Grefenstette J. Lamarckian learning in multi-agent
Proceedings of the Fifth International Symposium on environments. Proceedings of the Fourth International
Intelligent Information Systems, June 18-22, 2001, Conference on Genetic Algorithms, Morgan Kaufmann
Zakopane, Poland: Physica Verlag; 2001, 13. Publishers Inc., San Francisco, CA, 1991.
12. Keesee APK. How Sequential-Cover Data Mining Pro- 28. Sebag M, Schoenauer M. Controlling Crossover
grams Learn. College of Science. Fairfax, VA: George through Inductive Learning. Lecture Notes in
Mason University; 2006. Computer Science. London: Springer-Verlag; 1994,
209–209.
13. Austern M. Generic Programming and the STL: Using
and Extending the C++ Standard Template Library. 29. Sebag M, Schoenauer M, Ravise C. Inductive learning
1998. of mutation step-size in evolutionary parameter opti-
mization, Lecture Notes in Computer Science. London:
14. Gamma E, Helm R, Johnson R, Vlissides J. Design Pat-
Springer-Verlag; 1997, 247–261.
terns: Elements of Reusable Object-Oriented Software.
Westford, MA: Addison-Wesley Reading; 1995. 30. Reynolds R. Cultural Algorithms: Theory and Applica-
tions. Mcgraw-Hill’S Advanced Topics In Computer
15. Bloedorn E, Wnek J, Michalski R. Multistrategy con-
Science Series. Maidenhead, England: McGraw-Hill
structive induction: AQ17-MCI. Rep Mach Learn Infer
Ltd.; 1999, 367–378.
Lab 1993, 1051:93–4.
31. Hamda H, Jouve F, Lutton E, Schoenauer M, Sebag M.
16. Wnek J, Michalski R. Hypothesis-driven constructive Compact unstructured representations for evolutionary
induction in AQ17-HCI: a method and experiments. design. Appl Intell 2002, 16:139–155.
Mach Learn 1994, 14:139–168.
32. Lozano J. Towards a New Evolutionary Computation:
17. Cervone G, Franzese P, Ezber Y, Boybeyi Z. Risk Advances in the Estimation of Distribution Algorithms:
assessment of atmospheric emissions using machine Springer; 2006.
learning. Nat Hazards Earth Syst Sci 2008,
8:991–1000. 33. Michalski R. Learnable evolution: combining symbolic
and evolutionary learning. Proceedings of the Fourth
18. Holland J. Adaptation in Natural and Artificial Sys- International Workshop on Multistrategy Learning
tems. Cambridge, MA: The MIT Press; 1975. (MSL’98). 1999, 14–20.
19. Goldberg DE. Genetic Algorithms in Search, Optimiza- 34. Cervone G, Michalski R, Kaufman K, Panait L. Com-
tion, and Machine Learning. Reading, MA: Addison bining machine learning with evolutionary computa-
Wesley; 1989. tion: Recent results on lem. Proceedings of the Fifth
20. Bäck T. Evolutionary Algorithms in Theory and Prac- International Workshop on Multistrategy Learning
tice: Evolutionary Straegies, Evolutionary Program- (MSL-2000). Portugal: Guimaraes; 2000, pp. 41–58.
ming, and Genetic Algorithms. Oxford, NY: Oxford 35. Cervone G, Kaufman K, Michalski R. Experimental
University Press; 1996. validations of the learnable evolution model. Proceed-
21. Michalewicz Z. Genetic Algorithms + Data Structures ings of the 2000 Congress on Evolutionary Computa-
= Evolution Programs. 3rd ed. Berlin: Springer-Verlag; tion, LaJolla, CA, vol. 2; July 16–19 2000.
1996. 36. Pudykiewicz J. Application of adjoint tracer transport
22. Fogel L. Intelligence Through Simulated Evolution: equations for evaluating source parameters. Atmos
Forty Years of Evolutionary Programming. Wiley Series Environ 1998, 32:3039–3050.
on Intelligent Systems. New York: John Wiley & Sons, 37. Hourdin F, Issartel JP. Sub-surface nuclear tests moni-
Inc.; 1999. toring through the ctbt xenon network. Geophys Res
23. De Jong K. Evolutionary computation: a unified Lett 2000, 27:2245–2248.
approach. Proceedings of the 2008 GECCO Confer- 38. Enting I. Inverse Problems in Atmospheric Constituent
ence on Genetic and Evolutionary Computation. New Transport. Cambridge, NY: Cambridge University
York: ACM; 2008, 2245–2258. Press; 2002, 392.
24. Darwin C. On the Origin of Species by Means of Natu- 39. Gelman A, Carlin J, Stern H, Rubin D. Bayesian Data
ral Selection, or the Preservation of Favoured Races in Analysis: Chapman & Hall/CRC; 2003, 668 pp.
the Struggle for Life. London: Oxford University Press;
40. Chow F, Kosović B, Chan T. Source inversion for
1859.
contaminant plume dispersion in urban environments
25. Ashlock D. Evolutionary Computation for Modeling using building-resolving simulations. Proceedings of the
and Optimization. Berlin Heidelberg: Springer-Verlag; 86th American Meteorological Society Annual Meeting,
2006. Atlanta, GA, January 2006, 12–22.

Vo lu me 2, March /April 2010  2010 Jo h n Wiley & So n s, In c. 235

Advanced Review www.wiley.com/wires/compstats

41. Senocak I, Hengartner N, Short M, Daniel W. Stochas- 45. Delle Monache L, Lundquistand J, Kosović
tic event reconstruction of atmospheric contaminant B, Johannesson G, Dyer K, et al. Bayesian inference
dispersion using Bayesian inference. Atmos Environ and markov chain monte carlo sampling to reconstruct
2008, 42:7718–7727. a contaminant source on a continental scale. J Appl
Meteor Climatol 2008, 47:2600–2613.
42. Haupt SE. A demonstration of coupled recep-
46. Pasquill F. The estimation of the dispersion of wind-
tor/dispersion modeling with a genetic algorithm.
borne material. Meteorol Magazine 1961, 90:33–49.
Atmos Environ 2005, 39:7181–7189.
47. Pasquill F, Smith F. Atmospheric Diffusion. Chichester,
43. Haupt SE, Young GS, Allen CT. A genetic algorithm UK: Ellis Horwood; 1983.
method to assimilate sensor data for a toxic contami-
48. Arya PS. Air Pollution Meteorology and Dispersion.
nant release. J Comput 2007, 2:85–93.
Oxford, NY: Oxford University Press; 1999.
44. Allen CT, Young GS, Haupt SE. Improving pollutant 49. Barad M, Haugen D. Project Prairie Grass, A Field
source characterization by better estimating wind direc- Program in Diffusion: United States Air Force, Air
tion with a genetic algorithm. Atmos Environ 2007, Research and Development Command, Air Force Cam-
41:2283–2289. bridge Research Center; Cambridge, MA, 1958.

236  2010 Jo h n Wiley & So n s, In c. Vo lu me 2, March /April 2010

RWKV Architecture and Applications: The Complete Guide for Developers and Engineers
From Everand
RWKV Architecture and Applications: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
CSC-491L - Mobile Application Development Lab Manual Ver 2.0
No ratings yet
CSC-491L - Mobile Application Development Lab Manual Ver 2.0
194 pages
OneFlow for Parallel and Distributed Deep Learning Systems: The Complete Guide for Developers and Engineers
From Everand
OneFlow for Parallel and Distributed Deep Learning Systems: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
BTL Cardiopoint: User'S Manual
No ratings yet
BTL Cardiopoint: User'S Manual
72 pages
Ai Unit-4
No ratings yet
Ai Unit-4
60 pages
Test Management
100% (1)
Test Management
11 pages
Active Learning
No ratings yet
Active Learning
102 pages
Importing Data Into Oracle ERP Cloud Using Oracle Integration Cloud
No ratings yet
Importing Data Into Oracle ERP Cloud Using Oracle Integration Cloud
16 pages
Schenk - bvh2334gb - Disomat Tersus - System Manual
No ratings yet
Schenk - bvh2334gb - Disomat Tersus - System Manual
96 pages
Specs2 for Scala Development: Definitive Reference for Developers and Engineers
From Everand
Specs2 for Scala Development: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
FULLTEXT01
No ratings yet
FULLTEXT01
59 pages
Valancius 24 A
No ratings yet
Valancius 24 A
19 pages
TR1648
No ratings yet
TR1648
47 pages
Chiu and Xu 2021
No ratings yet
Chiu and Xu 2021
40 pages
Priya E-Commerce File
No ratings yet
Priya E-Commerce File
46 pages
Unit-5 1
No ratings yet
Unit-5 1
88 pages
03-Computational Cognitive Science
No ratings yet
03-Computational Cognitive Science
42 pages
Theory of Computation and Application- Automata,Formal languages,Computational Complexity (2nd Edition): 2, #1
From Everand
Theory of Computation and Application- Automata,Formal languages,Computational Complexity (2nd Edition): 2, #1
S. R. Jena
No ratings yet
Log
100% (1)
Log
292 pages
Unit 5
No ratings yet
Unit 5
21 pages
Mlnotes 2 Srija
No ratings yet
Mlnotes 2 Srija
15 pages
Referencia 12
No ratings yet
Referencia 12
28 pages
CS532 Chapter I Spring 2024
No ratings yet
CS532 Chapter I Spring 2024
23 pages
VLDB 92
No ratings yet
VLDB 92
14 pages
Finite Elements and Approximation
From Everand
Finite Elements and Approximation
O. C. Zienkiewicz
4.5/5 (4)
AIML Projectsynopsis Format 2024-25
No ratings yet
AIML Projectsynopsis Format 2024-25
4 pages
Face Emotion Detection Opencv
No ratings yet
Face Emotion Detection Opencv
16 pages
IT 103 Presentation 1 Lesson1
No ratings yet
IT 103 Presentation 1 Lesson1
31 pages
Lexicon of Programming Terminology: Lexicon of Tech and Business, #17
From Everand
Lexicon of Programming Terminology: Lexicon of Tech and Business, #17
Mustafa Al-Dori
5/5 (1)
Report
No ratings yet
Report
57 pages
Unit 3
No ratings yet
Unit 3
16 pages
Queries and Concept Learning: Department of Computer Science, Yale University, P.O. Box 2158, New Haven, CT 06520, U.S.A
No ratings yet
Queries and Concept Learning: Department of Computer Science, Yale University, P.O. Box 2158, New Haven, CT 06520, U.S.A
24 pages
Mathematics 11 00820
No ratings yet
Mathematics 11 00820
38 pages
ChaosBlade in Practice: The Complete Guide for Developers and Engineers
From Everand
ChaosBlade in Practice: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Ai Unit V
No ratings yet
Ai Unit V
18 pages
Concept Learning
No ratings yet
Concept Learning
33 pages
Jogl and Java 3 D
No ratings yet
Jogl and Java 3 D
13 pages
Fix B.ing
No ratings yet
Fix B.ing
12 pages
Streaming Big-Data Analytic Platform For Unified Logholaye
No ratings yet
Streaming Big-Data Analytic Platform For Unified Logholaye
117 pages
Foundations of Scheduling Algorithms: Definitive Reference for Developers and Engineers
From Everand
Foundations of Scheduling Algorithms: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Adaptive Filtering Prediction and Control
From Everand
Adaptive Filtering Prediction and Control
Graham C Goodwin
No ratings yet
UNIT-VI Learning
No ratings yet
UNIT-VI Learning
19 pages
Active Learning Book
No ratings yet
Active Learning Book
116 pages
CSS Assessment G10
No ratings yet
CSS Assessment G10
1 page
An Integer Programming Approach To Inductive Learning Using Genetic Algorithm (2002)
No ratings yet
An Integer Programming Approach To Inductive Learning Using Genetic Algorithm (2002)
6 pages
Machine Learning Mod 5
No ratings yet
Machine Learning Mod 5
15 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
33 pages
OpenTelemetry in Practice: Definitive Reference for Developers and Engineers
From Everand
OpenTelemetry in Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
AI for Everyone: An Intermediate Guide to Artificial Intelligence
From Everand
AI for Everyone: An Intermediate Guide to Artificial Intelligence
Nova Clarke
No ratings yet
User Manual: WWW - Audac.eu
No ratings yet
User Manual: WWW - Audac.eu
60 pages
Python Full Notes - Working
100% (4)
Python Full Notes - Working
645 pages
Chaos Mesh for Resilient Kubernetes Deployments: The Complete Guide for Developers and Engineers
From Everand
Chaos Mesh for Resilient Kubernetes Deployments: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Lecture Notes For Chapter 4 Rule-Based Introduction To Data Mining, 2 Edition
No ratings yet
Lecture Notes For Chapter 4 Rule-Based Introduction To Data Mining, 2 Edition
28 pages
Smash 3000
No ratings yet
Smash 3000
4 pages
Machine Learning Notes Unit 1
No ratings yet
Machine Learning Notes Unit 1
25 pages
Litmus Chaos Experiments in Practice: The Complete Guide for Developers and Engineers
From Everand
Litmus Chaos Experiments in Practice: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Pathways to Machine Learning and Soft Computing: 邁向機器學習與軟計算之路（國際英文版）
From Everand
Pathways to Machine Learning and Soft Computing: 邁向機器學習與軟計算之路（國際英文版）
Jyh-Horng Jeng
No ratings yet
Antivirus Software in Comparison
100% (1)
Antivirus Software in Comparison
12 pages
Assignment 1 JS
No ratings yet
Assignment 1 JS
7 pages
Model-Driven Online Capacity Management for Component-Based Software Systems
From Everand
Model-Driven Online Capacity Management for Component-Based Software Systems
André van Hoorn
No ratings yet
Feedback Control Theory
From Everand
Feedback Control Theory
Bruce Francis
5/5 (1)
Forecasting Models – an Overview With The Help Of R Software
From Everand
Forecasting Models – an Overview With The Help Of R Software
Editor IJSMI
No ratings yet
DS PDF
No ratings yet
DS PDF
4 pages
Scout Gps Link: Introducing
No ratings yet
Scout Gps Link: Introducing
3 pages
Study Guide 300-615 Dcit Troubleshooting Cisco Data Centre Infrastructure
From Everand
Study Guide 300-615 Dcit Troubleshooting Cisco Data Centre Infrastructure
Anand Vemula
No ratings yet
Dice Resume CV Anitha Polagari Highlighted
No ratings yet
Dice Resume CV Anitha Polagari Highlighted
3 pages
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
César Pérez López
No ratings yet
PLC
No ratings yet
PLC
3 pages
Medical Data Mining Using Evolutionary Computation
No ratings yet
Medical Data Mining Using Evolutionary Computation
24 pages
Software Fundamentals Course Outline
No ratings yet
Software Fundamentals Course Outline
3 pages
Lecture Notes in Machine Learning
No ratings yet
Lecture Notes in Machine Learning
65 pages
IGNOU PGDCA MCS 206 Object Oriented Programming using Java Previous Years solved Papers
From Everand
IGNOU PGDCA MCS 206 Object Oriented Programming using Java Previous Years solved Papers
Manish Soni
No ratings yet
Detailed Lesson Plan Grade 6 Ict 1 2 1
No ratings yet
Detailed Lesson Plan Grade 6 Ict 1 2 1
13 pages
A New Move Towards Updating Pheromone Trail in Order To Gain Increased Predictive Accuracy in Classification Rule Mining by Implementing Aco Algorithm
No ratings yet
A New Move Towards Updating Pheromone Trail in Order To Gain Increased Predictive Accuracy in Classification Rule Mining by Implementing Aco Algorithm
12 pages
Smart Dustbin For Smart City
No ratings yet
Smart Dustbin For Smart City
2 pages
Email
No ratings yet
Email
4 pages
Graph Layout Support for Model-Driven Engineering
From Everand
Graph Layout Support for Model-Driven Engineering
Miro Spönemann
No ratings yet
Expert Systems With Applications: Haijun Su, Yupu Yang, Liang Zhao
No ratings yet
Expert Systems With Applications: Haijun Su, Yupu Yang, Liang Zhao
7 pages
5339 Syl Lab Us Spring 18
No ratings yet
5339 Syl Lab Us Spring 18
6 pages
Chapter 8: Learning: By, Safa Hamdare
No ratings yet
Chapter 8: Learning: By, Safa Hamdare
46 pages
Associative Memory
94% (18)
Associative Memory
17 pages
Tracking Context Changes Through Meta-Learning: Editors: Ryszard S. Michalski and Janusz Wnek
No ratings yet
Tracking Context Changes Through Meta-Learning: Editors: Ryszard S. Michalski and Janusz Wnek
28 pages
Chapter Two - Input: Concepts, Instances, Attributes: Preparing For Learning
No ratings yet
Chapter Two - Input: Concepts, Instances, Attributes: Preparing For Learning
12 pages
Estrategias de Control - Procesos
No ratings yet
Estrategias de Control - Procesos
9 pages
The Revision of Inductive Learning Theory Within Incomplete and Imprecise Observations
No ratings yet
The Revision of Inductive Learning Theory Within Incomplete and Imprecise Observations
10 pages
Advanced Backend Code Optimization
From Everand
Advanced Backend Code Optimization
Sid Touati
No ratings yet
Data Cloud Consultant Demo
No ratings yet
Data Cloud Consultant Demo
3 pages
Quality Assurance and Quality Control in Neutron Activation Analysis: A Guide to Practical Approaches
From Everand
Quality Assurance and Quality Control in Neutron Activation Analysis: A Guide to Practical Approaches
IAEA
No ratings yet
Support Vector Machine: Fundamentals and Applications
From Everand
Support Vector Machine: Fundamentals and Applications
Fouad Sabry
No ratings yet
Kernel Methods: Fundamentals and Applications
From Everand
Kernel Methods: Fundamentals and Applications
Fouad Sabry
No ratings yet

6.algorithm Quasi-Optimal (AQ) Learning

Uploaded by

6.algorithm Quasi-Optimal (AQ) Learning

Uploaded by

Advanced Review

Algorithm quasi-optimal (AQ)

The algorithm quasi-optimal (AQ) is a powerful machine learning methodology

T he algorithm quasi-optimal (AQ) learning

218  2010 Jo h n Wiley & So n s, In c. Vo lu me 2, March /April 2010

This article describes a complete rewrite of the AQ METHODOLOGY

Vo lu me 2, March /April 2010  2010 Jo h n Wiley & So n s, In c. 219

220  2010 Jo h n Wiley & So n s, In c. Vo lu me 2, March /April 2010

where consequent, premise and exception are conjunc- sample ruleset:

Vo lu me 2, March /April 2010  2010 Jo h n Wiley & So n s, In c. 221

222  2010 Jo h n Wiley & So n s, In c. Vo lu me 2, March /April 2010

The star generation is an iterative process

Vo lu me 2, March /April 2010  2010 Jo h n Wiley & So n s, In c. 223

224  2010 Jo h n Wiley & So n s, In c. Vo lu me 2, March /April 2010

FIGURE 1 | A sample association graph from an atmospheric pollution problem.

Vo lu me 2, March /April 2010  2010 Jo h n Wiley & So n s, In c. 225

AQ Cover C4.5 Cover

226  2010 Jo h n Wiley & So n s, In c. Vo lu me 2, March /April 2010

descriptions for each of the classes using a single itera-

Vo lu me 2, March /April 2010  2010 Jo h n Wiley & So n s, In c. 227

228  2010 Jo h n Wiley & So n s, In c. Vo lu me 2, March /April 2010

To understand the advantage of using AQ to There are currently no established methodolo-

Vo lu me 2, March /April 2010  2010 Jo h n Wiley & So n s, In c. 229

problem at hand. Each candidate solution is comprised 1000

observed concentrations and the simulated values:

where Co is each sensor’s observed values, and Cs is

where P1 , . . . , P4 are defined by −1000

230  2010 Jo h n Wiley & So n s, In c. Vo lu me 2, March /April 2010

Release 25, type A Release 7, type B Release 9, type C

Release 12, type D Release 42, type E Release 13, type F

400 400 2 400

−200 −200 −200

FIGURE 5 | Different sample prairie −600 −600 −600

Figure 5 shows 6 of the 68 experiments, each having

Vo lu me 2, March /April 2010  2010 Jo h n Wiley & So n s, In c. 231

1.4 Atmosphere type

0.2 FIGURE 7 | Summary of the errors of AQ4SD

232  2010 Jo h n Wiley & So n s, In c. Vo lu me 2, March /April 2010

Vo lu me 2, March /April 2010

 2010 Jo h n Wiley & So n s, In c.

−500 −300 −100 0 −60 −20 0 20 40 60 80 0 2 4 6 8 10 −100 0 100 0 50 100 150

FIGURE 8 | Pair-wise plot of different attributes used during the optimization.

234  2010 Jo h n Wiley & So n s, In c. Vo lu me 2, March /April 2010

Vo lu me 2, March /April 2010  2010 Jo h n Wiley & So n s, In c. 235

236  2010 Jo h n Wiley & So n s, In c. Vo lu me 2, March /April 2010

You might also like