AI Algorithms, Data Structures, and Idioms in Prolog, Lisp (PDFDrive) - 364-463
AI Algorithms, Data Structures, and Idioms in Prolog, Lisp (PDFDrive) - 364-463
{
System.out.println(
"CloneNotSupportedException:" + e);
}
}
}
24.7 Design Discussion
In closing out this chapter, we would like to look at two major design
decisions. The first is our separation of representation and search through
the introduction of AbstractSolutionNode and its descendants.
The second is the importance of static structure to the design.
Separating The separation of representation and search is a common theme in AI
Representation
and Search
programming. In Chapter 22, for example, our implementation of simple
search engines relied upon this separation for generality. In the reasoning
engine, we bring the relationship between representation and search into
sharper focus. Here, the search engine serves to define the semantics of
our logical representation by implementing a form of logical inference. As
we mentioned before, our approach builds upon the mathematics of the
representation language – in this case, theories of logic inference – to
insure the quality of our representation.
One detail of our approach bears further discussion. That is the use of the
method, getSolver(RuleSet rules, SubstitutionSet
parentSolution), which was defined in the Goal interface. This
method simplifies the handling of the search space by letting search
algorithms treat them independently of their type (simple sentence, node,
etc). Instead, it lets us treat nodes in terms of the general methods defined
by AbstractSolutionNode, and to rely upon each goal to return
the proper type of solution node. This approach is beneficial, but as is
typical of object-oriented design, there are other ways to implement it.
One of these alternatives is through a factory pattern. This would replace
the getSolver() method of Goal with a separate class that creates
instances of the needed node. For example:
Class SolutionNodeFactory
{
public static AbstractSolutionNode
getSolver(Goal goal,
RuleSet rules,
SubstitutionSet parentSolution)
{
if (goal instanceof SimpleSentence)
return new SimpleSentenceSolutionNode(
goal, rules, parentSolution);
if (goal instanceof And)
return new AndSolutionNode(goal, rules,
parentSolution);
}
}
351
standard accessors:
public class ESRule extends Rule
{
private double certaintyFactor;
public ESRule(ESSimpleSentence head,
double certaintyFactor)
{
this(head, null, certaintyFactor);
}
public ESRule(ESSimpleSentence head, Goal body,
double certaintyFactor)
{
super(head, body);
this.certaintyFactor = certaintyFactor;
}
public double getCertaintyFactor()
{
return certaintyFactor;
}
protected void setCertaintyFactor(double value)
{
this.certaintyFactor = value;
}
}
Note the two constructors, both of which include certainty factors in their
arguments. The first constructor supports rules with conclusions only;
since a fact is simply a rule without a premise, this allows us to add
certainty factors to facts. The second constructor allows definition of full
rules. An obvious extension to this definition would be to add checks to
make sure certainty factors stay in the range -1.0 to 1.0, throwing an out of
range exception if they are not in range. We leave this as an exercise.
This is essentially the only change we will make to our representation. Most
of our changes will be to the solution nodes in the proof tree, since these
define the reasoning strategy. To support this, we will define subclasses to
both SimpleSentence and And to return the appropriate type of solution
node, as required by the interface Goal (these are all defined in the
preceding chapter). The new classes are:
public class ESSimpleSentence extends SimpleSentence
{
public ESSimpleSentence(Constant functor,
Unifiable... args)
{
super(functor, args);
}
ESSolutionNode child =
(ESSolutionNode) getChild();
if(child == null)
{
certainty = rule.getCertaintyFactor();
}
else
{
certainty = child.getCertainty() *
rule.getCertaintyFactor();
}
return solution;
}
We will define ESFrontEnd in an interface:
public interface ESFrontEnd
{
public double ask(ESSimpleSentence goal,
SubstitutionSet subs);
}
Finally, we will introduce a new class, ESRuleSet, to extend RuleSet
to include an instance of ESFrontEnd:
public class ESRuleSet extends RuleSet
{
private ESFrontEnd frontEnd = null;
public ESRuleSet(ESFrontEnd frontEnd,
ESRule... rules)
{
super((Rule[])rules);
this.frontEnd = frontEnd;
}
public ESFrontEnd getFrontEnd()
{
return frontEnd;
}
}
This is only a partial implementation of user interactions for the expert
system shell. We still need to add the ability for users to make a top-level
query to the reasoner, and also the ability to handle “how” and “why”
queries as discussed in (Luger 2009). We leave these as an exercise.
25.4 Design Discussion
Although the extension of the unification problem solver into a simple
expert system shell is, for the most part, straightforward, there are a couple
interesting design questions. The first of these was our decision to, as
much as possible, leave the definitions of descendents of PCExpression
as unchanged as possible, and place most of the new material in extensions
to the solution node classes. Our reason for doing this reflects a theoretical
consideration.
Logic makes a theoretical distinction between syntax and semantics,
between the definition of well-formed expressions and the way they are
used in reasoning. Our decision to define the expert system almost entirely
through changes to the solution node classes reflects this distinction. In
making this decision, we are following a general design heuristic that we
have found useful, particularly in AI implementations: insofar as possible,
define the class structure of code to reflect the concepts in an underlying
mathematical theory. Like most heuristics, the reasons for this are intuitive,
and we leave further analysis to the exercises.
The second major design decision is somewhat more problematic. This is
our decision to use the nextSolution method from the unification solver to
perform the actual search, and compute certainty factors afterwards. The
benefits of this are in not modifying code that has already been written and
tested, which follows standard object-oriented programming practice.
However, in this case, the standard practice leads to certain cons that
should be considered. One of these is that, once a solution is found,
acquiring both the variable substitutions and certainty factor requires two
separate methods: nextSolution and getCertainty. This is error
prone, since the person using the class must insure that no state changes
occur between these calls. One solution is to write a convenience function
that bundles both values into a new class (say ESSolution) and returns
them. A more aggressive approach would be to ignore the current version
of nextSolution entirely, and to write a brand new version.
This is a very interesting design decision, and we encourage the reader to
try alternative approaches and discuss their trade-offs in the exercises to
this chapter.
Exercises
1. Modify the definition of the nextSolution method of the classes
ESSimpleSolutionNode and ESAndSolutionNode to fail a line of
reasoning if the certainty factor falls below a certain value (0.2 or 0.3 are
typical values). Instrument your code to count the number of nodes visited
and test it both with and without pruning.
2. Add range checks to all methods and classes that allow certainty factors
to be set, throwing an exception of the value is not in the range of -1.0 to
1.0. Either use Java’s built-in IllegalArgumentException or an
exception class of your own definition. Discuss the pros and cons of the
approach you choose.
3. In designing the object model for the unification problem solver, we
followed the standard AI practice of distinguishing between the
representation of well-formed expressions (classes implementing the
interface unifiable) and the definition of the inference strategy in the
Chapter This chapter examines Java expert system shells available on the world wide web
Objectives
Chapter 26.1 Introduction
Contents 26.2 JESS
26.3 Other Expert System Shells
26.4 Using Open Source Tools
26.1 Introduction
In the last three chapters we demonstrated the creation of a simple expert
system shell in Java. Chapter 22 presented a representational formalism for
describing predicate calculus expressions, the representation of choice for
expert rule systems and many other AI problem solvers. Chapter 24
created a procedure for unification. We demonstrated this algorithm with a
set of predicate calculus expressions, and then built a simple Prolog in Java
interpreter. Chapter 25 added full backtracking to our unification algorithm
so that it could check all possible unifications in the processes of finding
sets of consistent substitutions across sets of predicate calculus
specifications. In Chapter 25 we also created procedures for answering why
and how queries, as well as for setting up a certainty factor algebra.
In this chapter we present a number of expert system shell libraries written
in Java. As mentioned throughout our presentation of Java, the presence of
extensive code libraries is one of the major reasons for the broad
acceptance of Java as a problem-solving tool. We have explored these
expert shells at the time of writing this chapter. We realize that many of
these libraries will change over time and may well differ (or not even exist!)
when our readers considers them. So we present their urls, current as of
January 2008, with minimal further comment.
26.2 JESS
The first library we present is JESS, the Java Expert System Shell, built and
maintained by programmers at Sandia National Laboratories in
Albuquerque New Mexico. JESS is a rule engine for the Java platform.
Unlike the unification system presented in Chapters 23 and 24, JESS is
driven by a lisp-style scripting language built in Java itself. There are
advantages and disadvantages to this approach. One main advantage of an
independent scripting language is that it is easier to work with for the code
builder. For example, Prolog has its own language that is suitable for rule
363
languages, which makes it easy and clear to write static rule systems.
On the other hand, Prolog is not intended to be embedded in other
applications. In the case of Java, rules may be generated and controlled by
some external mechanism, and in order to use JESS’s approach, the data
needs to be converted into text that this interpreter can handle.
A disadvantage of an independent scripting language is the disconnect
between Java and the rule engine. Once external files and strings are used
to specify rules, standard Java syntax cannot be used to verify and check
syntax. While this is not an issue for stand-alone rule solving systems, once
the user wants to embed the solver into existing Java environments, she
must learn a new language and decide how to interface and adapt the
library to her project.
In an attempt to address standardization of rule systems in Java, the Java
Community Process defined an API for rule engines in Java. The Java
Specification Request #94 defines the javax.rules package and a number of
classes for dealing with rule engines. Our impression of this system is that
it is very vague and seemingly tailored for JESS. It abstracts the rule
system as general objects with general methods for getting/setting
properties on rules.
RuleML, although not Java specific, provides a standardized XML format
for defining rules. This format can theoretically be used for any rule
interpreter, as the information can be converted into the rule interpreter’s
native representations.
JESS has its own JessML format for defining rules in XML, which can be
converted to RuleML and back using XSLT (eXtensible Stylesheet
Language Transformations). These formats, unfortunately, are rather
verbose and not necessarily intended for being read and written by people.
Web links for using JESS include:
http://www.jessrules.com/- The JESS web site,
http://jcp.org/en/jsr/detail?id=94 - JSR 94: JavaTM Rule Engine API,
http://www.jessrules.com/jess/docs/70/api/javax/rules/package-
summary.html - javadocs about javax.rules (from JSR 94), and
http://www.ruleml.org/ - RuleML.
26.3 Other Expert System Shells
We have done some research into other Java expert rule systems, and
found dozens of them. The following url introduces a number of these
(not all in Java):
http://www.kbsc.com/rulebase.html
The general trend of these libraries is to use some form of scripting-
language based rule engine. There is even a Prolog implementation in Java!
There are many real implementations and issues that these things
introduce, including RDF, OWL, SPARQL, Semantic Web, Rete, and
more.
Thie following url discusses a high level look at rule engines in Java (albeit
367
of data, and then delegates further decision making to child nodes based on
the value of that particular property (Luger 2009, Section 10.3). The leaf
nodes of the decision tree are terminal states that return a class for the
given data collection. We can illustrate decision trees through the example
of a simple credit history evaluator that was used in (Luger 2009) in its
discussion of the ID3 learning algorithm. We refer the reader to this book
for a more detailed discussion, but will review the basic concepts of
decision trees and decision tree induction in this section.
Assume we wish to assign a credit risk of high, moderate, or low to people
based on the following properties of their credit rating:
Collateral, with possible values {adequate, none}
Income, with possible values {“0$ to $15K”, “$15K to $35K”,
“over $35K”}
Debt, with possible values {high, low}
Credit History, with possible values {good, bad, unknown}
We could represent risk criteria as a set of rules, such as “If debt is low,
and credit history is good, then risk is moderate.” Alternatively, we can
summarize a set of rules as a decision tree, as in figure 27.1. We can
perform a credit evaluation by walking the tree, using the values of the
person’s credit history properties to select a branch. For example, using the
decision tree of figure 27.1, an individual with credit history = unknown,
debt = low, collateral = adequate, and income = $15K to $35K would be
categorized as having low risk. Also note that this particular categorization
does not use the income property. This is a form of generalization, where
people with these values for credit history, debt, and collateral qualify as
having low risk, regardless of income.
Figure 27.1 A Decision Tree for the Credit Risk Problem (Luger 2009)
Now, assume the following set of 14 training examples. Although this does
not cover all possible instances, it is large enough to define a number of
meaningful decision trees, including the tree of figure 27.1 (the reader may
want to construct several such trees. See exercise 1). The challenge facing
any inductive learning algorithm is to produce a tree that both covers all
the training examples correctly, and has the highest probability of being
correct on new instances.
}
public String toString()
{
// to be defined by reader
}
public abstract Set<String> getPropertyNames();
}
This implementation of AbstractExample as an immutable object is
incomplete in that it does not include the techniques demonstrated in
AbstractProperty to enforce the immutability pattern. We leave this
as an exercise.
Implementing ExampleSet, along with AbstractDecisionTreeNode, is one of
ExampleSet
the most interesting classes in the implementation. This is because the
decision tree induction algorithm requires a number of fairly complex
operations for partitioning the example set on property values. The
implementation presented here is simple and somewhat inefficient, storing
examples as a simple vector. This requires examination of all examples to
form partitions, retrieve examples with a specific value for a property, etc.
We leave a more efficient implementation as an exercise.
In providing integrity checks on data, we have required that all examples be
categorized, and that all examples belong to the same class.
The basic member variables and accessors are defined as:
public class ExampleSet
{
private Vector<AbstractExample> examples =
new Vector<AbstractExample>();
private HashSet<String> categories =
new HashSet<String>();
private Set<String> propertyNames = null;
public void addExample(AbstractExample e)
throws IllegalArgumentException
{
if(e.getCategory() == null)
throw new IllegalArgumentException(
"Example missing categorization.");
// Check that new example is of same class
// as existing examples
if((examples.isEmpty()) ||
e.getClass() ==
examples.firstElement().getClass())
{
examples.add(e);
categories.add(e.getCategory());
if(propertyNames == null)
propertyNames =
new HashSet<String>(
e.getPropertyNames());
}
else
throw new IllegalArgumentException(
"All examples must be same type.");
}
public int getSize()
{
return examples.size();
}
public boolean isEmpty()
{
return examples.isEmpty();
}
public AbstractExample getExample(int i)
{
return examples.get(i);
}
public Set<String> getCategories()
{
return new HashSet<String>(categories);
}
public Set<String> getPropertyNames()
{
return new HashSet<String>(propertyNames);
}
// More complex methods to be defined.
public int getExampleCountByCategory(String cat)
throws IllegalArgumentException
{
// to be defined below.
}
public HashMap<String, ExampleSet> partition(
String propertyName)
throws IllegalArgumentException
{
// to be defined below.
}
}
throws IllegalArgumentException
{
induceTree(examples, selectionProperties);
}
public boolean isLeaf()
{
return children.isEmpty();
}
public String getCategory()
{
return category;
}
public String getDecisionProperty()
{
return decisionPropertyName;
}
public AbstractDecisionTreeNode getChild(String
propertyValue)
{
return children.get(propertyValue);
}
public void addChild(String propertyValue,
AbstractDecisionTreeNode child)
{
children.put(propertyValue, child);
}
public String Categorize(AbstractExample ex)
{
// defined below
}
public void induceTree(ExampleSet examples,
Set<String> selectionProperties)
throws IllegalArgumentException
{
// defined below
}
public void printTree(int level)
{
// implementation left as an exercise
}
protected abstract double
evaluatePartitionQuality(HashMap<String,
ExampleSet> part, ExampleSet examples)
throws IllegalArgumentException;
protected abstract AbstractDecisionTreeNode
createChildNode(ExampleSet examples,
Set<String> selectionProperties)
throws IllegalArgumentException;
}
Note the two abstract methods for evaluating a candidate partition and
creating a new child node. These will be implemented on 27.3.
Categorize categorizes a new example by performing a recursive tree
walk.
public String categorize(AbstractExample ex)
{
if(children.isEmpty())
return category;
if(decisionPropertyName == null)
return category;
AbstractProperty prop =
ex.getProperty(decisionPropertyName);
AbstractDecisionTreeNode child =
children.get(prop.getValue());
if(child == null)
return null;
return child.categorize(ex);
}
InduceTree performs the induction of decision trees. It deals with four
cases. The first is a normal termination: all examples belong to the same
category, so it creates a leaf node of that category. Cases two and three
occur if there is insufficient information to complete a categorization; in
this case, the algorithm creates a leaf node with a null category.
Case four performs the recursive step. It iterates through all properties that
have not been used in the decision tree (these are passed in the parameter
selectionProperties), using each property to partition the example
set. It evaluates the example set using the abstract method,
evaluatePartitionQuality. Once it finds the best evaluated
partition, it constructs child nodes for each branch.
public void induceTree(ExampleSet examples,
Set<String> selectionProperties)
throws IllegalArgumentException
{
// Case 1: All instances are the same
// category, the node is a leaf.
if(examples.getCategories().size() == 1)
{
category = examples.getCategories().
iterator().next();
return;
}
//Case 2: Empty example set. Create
// leaf with no classification.
if(examples.isEmpty())
return;
//Case 3: Empty property set; could not classify.
if(selectionProperties.isEmpty())
return;
// Case 4: Choose test and build subtrees.
// Initialize by partitioning on first
// untried property.
Iterator<String> iter =
selectionProperties.iterator();
String bestPropertyName = iter.next();
HashMap<String, ExampleSet> bestPartition =
examples.partition(bestPropertyName);
double bestPartitionEvaluation =
evaluatePartitionQuality(bestPartition,
examples);
// Iterate through remaining properties.
while(iter.hasNext())
{
String nextProp = iter.next();
HashMap<String, ExampleSet> nextPart =
examples.partition(nextProp);
double nextPartitionEvaluation =
evaluatePartitionQuality(nextPart,
examples);
// Better partition found. Save.
if(nextPartitionEvaluation >
bestPartitionEvaluation)
{
bestPartitionEvaluation =
nextPartitionEvaluation;
bestPartition = nextPart;
bestPropertyName = nextProp;
}
}
// Create children; recursively build tree.
this.decisionPropertyName = bestPropertyName;
Set<String> newSelectionPropSet =
new HashSet<String>(selectionProperties);
newSelectionPropSet.remove(decisionPropertyName);
iter = bestPartition.keySet().iterator();
while(iter.hasNext())
{
String value = iter.next();
ExampleSet child = bestPartition.get(value);
children.put(value,
createChildNode(child,
newSelectionPropSet));
}
27.4 ID3: An Information Theoretic Tree Induction Algorithm
The heart of the ID3 algorithm is its use of information theory to evaluate
the quality of candidate partitions of the example set by choosing
properties that gain the most information about an examples
categorization. Luger (2009) discusses this approach in detail, but we will
review it briefly here.
Shannon (1948) developed a mathematical theory of information that
allows us to measure the information content of a message. Widely used in
telecommunications to determine such things as the capacity of a channel,
the optimality of encoding schemes, etc., it is a general theory that we will
use to measure the quality of a decision property.
Shannon’s insight was that the information content of a message depended
upon two factors. One was the size of the set of all possible messages, and
the probability of each message occurring. Given a set of possible
messages, M = {m1, m2 . . . mn}, the information content of any
individual message is measured in bits by the sum, across all messages in M
of the probability of each massage times the log to the base 2 of that
probability.
I(M) = – p(mi) log2 p(mi)
Applying this to the problem of decision tree induction, we can regard a set
of examples as a set of possible messages about the categorization of an
example. The probability of a message (a given category) is the number of
examples with that category divided by the size of the example set. For
example, in the table in section 27.1, there are 14 examples. Six of the
examples have high risk, so p(risk = high) = 6/14. Similarly, p(risk =
moderate) = 3/14, and p(risk = low) = 5/14. So, the information in any
example in the set is:
I(example set) = -6/14 log (6/14) -3/14 log (3/14) -5/14 log (5/14)
= - 6/14 * (-1.222) - 3/14 * (-2.222) - 5/14 * (-1.485)
= 1.531 bits
We can think of the recursive tree induction algorithm as gaining
information about the example set at each iteration. If we assume a set of
Exercises
1. Construct two or three different trees that correctly classify the training
examples in the table of section 27.1. Compare their complexity using
average path length from root to leaf as a simple metric. What informal
heuristics would use in constructing the simplest trees to match the data?
Manually build a tree using the information theoretic test selection
algorithm from the ID3 algorithm. How does this compare with your
informal heuristics?
2. Extend the definition of AbstractExample to enforce the
immutable object pattern using AbstractProperty as an example.
3.The methods AbstractExample and AbstractProperty throw
exceptions defined in Java, such as IllegalArgumentException
or UnsupportedOperationException when passed illegal values
or implementers try to violate the immutable object pattern. An alternative
approach would use user-defined exceptions, defined as subclasses of
java.lang.RuntimeException. Implement this approach, and discuss its
advantages and disadvantages.
4. The implementation of ExampleSet in section 27.2.3 stores
component examples as a simple vector. This requires iteration over all
examples to partition the example set on a property, count categories, etc.
Redo the implementation using a set of maps to allow constant time
retrieval of examples having a certain property value, category, etc.
Evaluate performance for this implementation and that given in the
chapter.
5. Complete the implementation for the credit risk example. This will
involve creating subclasses of AbstractProperty for each property,
and an appropriate subclass of AbstractExample. Also, write a class
and methods to test your code.
28.1 Introduction
The genetic algorithm (GA) is one of a number of computer programming
techniques loosely based on the idea of natural selection. The idea of
applying principles of natural selection to computing is not new. By 1948,
Alan Turing proposed “genetical or evolutionary search” (Turing 1948).
Less than two decades later, H.J. Bremmermann performed computer
simulations of “optimization through evolution and recombination”
(Eiben and Smith 1998). It was John Holland who coined the term, genetic
algorithm (Holland 1975). However, the GA was not widely studied until
1989, when D.E. Goldberg showed that it could be used to solve a
significant number of difficult problems (Goldberg 1989). Currently, many
of these threads have come together under the heading evolutionary computing
(Luger 2009, Chapter 12).
28.2 The Genetic Algorithm: A First Pass
The Genetic Algorithm is based loosely on the concept of natural
selection. Individual members of a species who are better adapted to a
given environment reproduce more successfully. They pass their
adaptations on to their offspring. Over time, individuals possessing the
adaptation form a new species that is particularly suited to the
environment. The genetic algorithm applies the metaphor of natural
selection to optimization problems. No claim is made about its biological
accuracy, although individual researchers have proposed mechanisms both
with and without a motivating basis from nature.
A candidate solution for a genetic algorithm is often called a chromosome.
The chromosome is composed of multiple genes. A collection of
389
WordGuess Consider a simple problem called WordGuess (Haupt and Haupt 1998). The
Example
user enters a target word at the keyboard. The GA guesses the word. In
this case, each letter is a gene, each word a chromosome, and the total
collection of words is the population. To begin, we randomly generate a
sequence of chromosomes of the desired length. Next, we rank the
generated chromosomes for fitness. A chromosome that is identical with
the target has a fitness of zero. A chromosome that differs in one letter has
a fitness of 1 and so on. It is easy to see that the size of the search space
for WordGuess increases exponentially with the length of the word. In the
next few sections, we will develop an object-oriented solution to this
problem.
Suppose we begin with a randomly generated population of 128 character
strings. After ranking them, we immediately eliminate the half that is least
fit. Of the 64 remaining chromosomes, the fittest 32 form 16 breeding
pairs. If each pair produces 2 offspring, the next generation will consist of
the 32 parents plus the 32 children.
21 22
19 20
17 18
70 K 2 4 6 8 16 24 26 28
70 K
1 3 5 7 15 23 25 27
20 K 13 14
11 12
9 10
20 K
100 K 100 K
}
// Remaining method implemented below.
}
Mate takes a population of chromosome as a parameter and returns a
mated population. The for-loop eliminates the least fit half of the
population to make room for the two children per breeding pair.
Crossover, the only other method in Mate, is presented next. It
implements the algorithm described in Section 28.2. Making use of the
Set/Get methods of Chromosome, Crossover blends the
chromosomes of each breeding pair. When mating is complete, the
breeding pairs are in the top half of the ArrayList, the children are in
the bottom half.
public ArrayList<Chromosome> Crossover(
ArrayList<Chromosome> population, int numPairs)
{
for (int j = 0; j < numPairs; j++)
{
MT_father = population.get(MT_posFather);
MT_mother = population.get(MT_posMother);
MT_child1 = new Chromosome(MT_numGenes);
MT_child2 = new Chromosome(MT_numGenes);
Random rnum = new Random();
int crossPoint = rnum.nextInt(MT_numGenes);
// left side
for (int i = 0; i < crossPoint; i++)
{
MT_child1.SetGene(i,
MT_father.GetGene(i));
MT_child2.SetGene(i,
MT_mother.GetGene(i));
}
// right side
for (int i = crossPoint;
< MT_numGenes;i++)
{
MT_child1.SetGene(i,
MT_mother.GetGene(i));
MT_child2.SetGene(i,
MT_father.GetGene(i));
}
population.add(MT_posChild1,MT_child1);
population.add(MT_posChild2,MT_child2);
MT_posChild1 = MT_posChild1 + 2;
MT_posChild2 = MT_posChild2 + 2;
MT_posFather = MT_posFather + 2;
MT_posMother = MT_posMother + 2;
}
return population;
}
The GA Class Having examined its subclasses, it is time to look at class GA, itself. We
never create an instance of class GA. GA exists only so that its member
variables and methods can be inherited, as in Figure 28.3. Classes that may
not be instantiated are called abstract. The classes higher in the hierarchy are
called superclasses. Those lower in the hierarchy are called subclasses. Member
variables and methods designated protected in a super class are
available to its subclasses.
GA contains the population of chromosomes, along with the various
parameters that its subclasses need. The parameters are the size of the
initial population, the size of the pared down population, the number of
genes, the fraction of the total genes to be mutated, and the number of
iterations before the program stops. The parameters are stored in a file
manipulated through the classes Parameters, SetParams, and
GetParams. We use object semantics to manipulate the files. Since file
manipulation is not essential to a GA, we will not discuss it further. The
class declaration GA, its member variables, and its constructor follow.
public abstract class GA extends Object
{
protected int GA_numChromesInit;
protected int GA_numChromes;
protected int GA_numGenes;
protected double GA_mutFact;
protected int GA_numIterations;
protected ArrayList<Chromosome> GA_pop;
public GA(String ParamFile)
{
GetParams GP = new GetParams(ParamFile);
Parameters P = GP.GetParameters();
GA_numChromesInit = P.GetNumChromesI();
GA_numChromes = P.GetNumChromes();
GA_numGenes = P.GetNumGenes();
GA_mutFact = P.GetMutFact();
GA_numIterations = P.GetNumIterations();
GA_pop = new ArrayList<Chromosome>();
}
//Remaining methods implemented below.
}
The first two lines of the constructor create the objects necessary to read
the parameter files. The succeeding lines, except the last, read the file and
store the results in class GA’s members variables. The final line creates the
data structure that is to house the population. Since an ArrayList is an
expandable collector, there is no need to fix the size of the array in
advance.
Class GA can do all of those things common to all of its subclasses. Unless
you are a very careful designer, odds are that you will not know what is
common to all of the subclasses until you start building prototypes. Object-
oriented techniques accommodate an iterative design process quite nicely.
As you discover more methods that can be shared across subclasses, simply
push them up a level to the superclass and recompile the system.
Superclass GA performs general housekeeping tasks along with work
common to all its subclasses. Under housekeeping tasks, we want a super
class GA to display the entire population, its parameters, a chromosome,
and the best chromosome within the population. We also might want it to
tidy up the population by removing those chromosomes that will play no
part in evolution. This requires a little explanation. Two of the parameters
are GA_numChromesInit and GA_numChromes. Performance of a
GA is sometimes improved if we initially generate more chromosomes
than are used in the GA itself (Haupt and Haput 1998). The first task, then,
is to winnow down the number of chromosomes from the number initially
generated (GA_numChromesInit) to the number that will be used
(GA_numChromes).
Under shared tasks, we want the superclass GA to create, rank, and mutate
the population. The housekeeping tasks are very straightforward. The
shared method that initializes the population follows:
protected void InitPop()
{
Random rnum = new Random();
char letter;
for (int index = 0;
index < GA_numChromesInit; index++)
{
Chromosome Chrom =
new Chromosome(GA_numGenes);
for (int j = 0; j < GA_numGenes; j++)
{
letter = (char)(rnum.nextInt(26) + 97);
Chrom.SetGene(j,letter);
}
Chrom.SetCost(0);
GA_pop.add(Chrom);
}
}
Initializing the population is clear enough, though it does represent a
design decision. We use a nested for loop to create and initialize all genes
uses the method GetCost to extract the cost from the chromosome, and
the compareTo method of the Integer wrapper class to determine which
of the chromosomes costs more. In keeping with OO, we give no
consideration to the specific algorithm that Java uses. Java documentation
guarantees only that the Comparator class “imposes a total ordering on
some collection of objects” (Interface Comparator 2007).
Mutation is the last of the three shared methods that we will consider.
The fraction of the total number of genes that are to be mutated per
generation is a design parameter. The fraction of genes mutated depends
on the size of the population, the number of genes per chromosome, and
the fraction of the total genes to mutate. For each of the mutations, we
randomly choose a gene within a chromosome, and randomly choose a
mutated value. There are two things to notice. First, we never mutate our
best chromosome. Second, the mutation code in GA is specific to genetic
algorithms where genes may be reasonably represented as characters. The
code for Mutation may be found on the Chapter 28 code library.
28.4 Conclusion: Complex Problem Solving and Adaptation
In this chapter we have shown how Darwin’s observations on speciation
can be adapted to complex problem solving. The GA, like other AI
techniques, is particularly suited to those problems where an optimal
solution may be computationally intractable. Though the GA might
stumble upon the optimal solution, odds are that computing is like nature
in one respect. Solutions and individuals must be content with having
solved the problem of adaptation only well enough to pass their
characteristics into the next generation. The extended example,
WordGuess, was a case in which the GA happens upon an exact
solution. (See the code library for sample runs). This was chosen for ease
of exposition. The exercises ask you to develop a GA solution to a known
NP-Complete problem.
We have implemented the genetic algorithm using object-oriented
programming techniques, because they lend themselves to capturing the
generality of the GA. Java was chosen as the programming language, both
because it is widely used and, because its syntax in the C/C++ tradition
makes it readable to those with little Java or, even, OO experience.
As noted, we have not discussed the classes SetParams, GetParams,
and Parameters mentioned in Section 28.3. These classes write to and
read from a file of design parameters. The source code for them can be
found in the auxiliary materials. Also included are instructions for using the
parameter files, and instructions for exercising WordGuess.
Chapter 28 was jointly written with Paul De Palma, Professor of Computer
Science at Gonzaga University.
Exercises
1. The traveling salesperson problem is especially good to exercise the GA,
because it is possible to compute bounds for it. If the GA produces a
solution that falls within these bounds, the solution, while probably not
optimal, is reasonable. See Hoffman and Wolfe (1985) and Overbay, et al.
(2007) for details. The problem is easily stated. Given a collection of cities,
with known distances between any two, a tour is a sequence of cities that
defines a start city, C, visits every city once and returns to C. The optimal
tour is the tour that covers the shortest distances. Develop a genetic
algorithm solution for the traveling sales person problem. Create, at least,
two new classes TSP, derived from GA, and TSPtst that sets the
algorithm in motion. See comments on mating algorithms for the traveling
salesperson problem in Luger (2009, Section 12.1.3).
2. Implement the Tournament pairing method of the class Pair.
Tournament chooses a subset of chromosomes from the population. The
most fit chromosome within this subset becomes Parent A. Do the same
thing again, to find its mate, Parent B. Now you have a breeding pair.
Continue this process until we have as many breeding pairs as we need.
Tournament is described in detail in Haupt and Haupt (1998). Does
WordGuess behave differently when Tournament is used?
3. As it stands, GA runs under command-line Unix/Linux. Use the
javax.swing package to build a GUI that allows a user to set the
parameters, run the program, and examine the results.
4. Transform the java application code into a java applet. This applet
should allow a web-based user to choose the GA to run (either
WordGuess or TSP), the pairing algorithm to run (Top-Down or
Tournament), and to change the design parameters
5. WordGuess does not make use of the full generality provided by
object-oriented programming techniques. A more general design would not
represent genes as characters. One possibility is to provide several
representational classes, all inheriting from a modified GA and all being
super classes of specific genetic algorithm solutions. Thus we might have
CHAR_GA inheriting from GA and WordGuess inheriting from CHAR-
GA. Another possibility is to define chromosomes as collections of genes
that are represented by variables of class Object. Using these, or other,
approaches, modify GA so that it is more general.
6. Develop a two-point crossover method to be included in class Mate.
For each breeding pair, randomly generate two crossover points. Parent A
contributes its genes before the first crossover and after the second to
Child A. It contributes its genes between the crossover points to Child B.
Parent B does just the opposite. See Haupt and Haupt (1998) for still other
possibilities.
Chapter This chapter provides a number of sources for open source and free machine
Objectives learning software on the web.
Chapter 29.1 Java Machine Learning Software
Contents
getRHS(String lhs) will return all RHS’s of the grammar rules for
any left-hand-side, lhs. If there are not any such rules, then it will indicate
such.
isPOS(String lhs) will return true or false based on whether
or not a component of the lhs is a part of speech.
To make the Grammar class easier to extend to more complicated
grammars, the Grammar class itself does not instantiate any rules. It is a
basic framework for the two important methods, and defines how the rules
are contained and related. To create a specific grammar, the Grammar
class needs to be extended to a new class. This new class will instantiate all
the grammar rules. As just noted, the framework of the grammar rules is a
mapping between a LHS and RHS. For each rule there will be only one
LHS for each RHS, but it is likely in the full set of grammar rules that a
particular LHS will have several possible RHSs. Later in the chapter, the
exact framework for this matching is presented.
The Chart A chart is an ordered list of the successively produced components of the
parse. A major requirement is to determine whether any newly produced
possible component of the parse is already contained in the chart.
To make it easier to maintain the charts correctly and consistently, we
create a Chart class. We could have used a simpler structure, like
Vector, to contain the states of the parse as they are produced, but the
code to manipulate the Vector would then be distributed throughout the
parser code. This dispersed code makes it much harder to make
corrections, and to debug. When we create the Chart class, the code to
manipulate the chart will be identical for all uses, and since the code is all in
one place, it will be much easier to debug. Notice that the Chart class
represents a single chart, not the evolving set of states that are created by
the Earley algorithm. Since there is no additional functionality needed
beside that already discussed, we make a Chart array for the evolving set
of chart states, rather than making another class.
The States A state component for the parser has one left-hand-side, LHS, one right-
hand-side, RHS, for each rule that is instantiated, as well as indices from
the sentence String array, an (i j) pair. Because these components all
need to be represented, the easiest way to create the problem solving state,
is to make a State class. Since the State class supports the full Earley
algorithm, it will require get methods for returning the LHS, the RHS,
and the i j indices. Also, as seen in the pseudo-code of Section 9.2, we
need the ability to get the term after the dot in the RHS, as well as the
ability to determine whether or not the dot is in the last (terminal) position.
Methods to support these requirements must be provided.
Throughout our discussion, LHS and RHS have been mentioned, but not
their implementation. Since the Earley parser uses context-free grammar
rules, we create the LHS as a String. The RHS on the other hand, is a
sequence of terms, which may or may not include a dot. Due to the fact
that it is used in two separate classes, and the additional requirement of dot
manipulation, we separated the RHS into its own class.
RHS returns its terms, the String array, for use by the
EarleyParser, as well as the String prior to and after the dot. This
enables ease of queries by the EarleyParser regarding the terms in the
RHS of the grammar rule. For example,. EarleyParser gets the term
following the dot from RHS, and queries the Grammar to determine if
that term is a part of speech.
public String[] getTerms ()
{
return terms;
}
public String getPriorToDot ()
{
if(hasDot && dot >0)
return terms[dot-1];
return "";
}
public String getAfterDot ()
{
if(hasDot && dot < terms.length-1)
return terms[dot+1];
return "";
}
The final procedures required to implement RHS are manipulation of and
queries concerning the dot. The queries determine whether there is a dot,
and where the dot is located, last or first. When a dot is moved or added to
a RHS, a new RHS is returned. This is done because whenever a dot is
moved a new State must be created for it.
public boolean hasDot ()
{
return hasDot;
}
public boolean isDotLast ()
{
if(hasDot)
return (dot==terms.length-1);
return false;
}
public boolean isDotFirst ()
{
if(hasDot)
return (dot==0);
return false;
}
the charts, and for every State in a Chart, checks to determine which
procedure is called next: completer(...) - the dot is last,
scanner(...) - the term following the dot is a part of speech, or
predictor(...) - the term following the dot is not a part of speech.
After all charts are visited, if the last State added to the last Chart is a
finish state, (”$ S @”, 0, sentenceLength), the sentence
was successfully parsed.
public boolean parseSentence (String[] s)
{
sentence = s;
charts = new Chart[sentence.length+1];
for (int i=0; i< charts.length; i++)
charts[i] = new Chart ();
String[] start1 = {"@", "S"};
RHS startRHS = new RHS (start1);
State start = new State ("$",startRHS,0,0,null);
charts[0].addState (start);
for (int i=0; i<charts.length; i++)
{
for (int j=0; j<charts[i].size (); j++)
{
State st = charts[i].getState (j);
String next_term = st.getAfterDot ();
if (st.isDotLast ())
// State's RHS = ... @
completer (st);
else
if(grammar.isPartOfSpeech (next_term))
// RHS = ... @ A ..., where A is a part of speech.
scanner (st);
else
predictor (st); // A is NOT a part of speech.
}
}
// Determine whether a successful parse.
String[] fin = {"S","@"};
RHS finRHS = new RHS (fin);
State finish = new State ("$",finRHS,
0,sentence.length,null);
State last = charts[sentence.length].getState
(charts[sentence.length].size ()-1);
return finish.equals (last);
}
6. If the criteria fail for all source states, then this was a dead end,
and no tree is returned. If any of the source states are valid,
start at step 1 with that state as the current state, and update the
current evaluating node. The accepted trees (there may be no
tree possible) are bundled together and returned.
From this algorithm, we can produce the multiple parse trees implicit in the
Earley algorithm’s successful production of the Chart. Example code
implementing this algorithm is included with the Chapter 30 support
materials.
The Earley parser code as well as the first draft of this chapter was written
by Ms Breanna Ammons, MS in Computer Science, University of New
Mexico.
Exercises
1. Describe the role of the dot within the right hand side of the grammar
rules processed by the Earley parser. How is the location of the dot
changed as the parse proceeds? What does it mean that the same right
hand side of a grammar rule can have dots at different locations?
2. In the Earley parser the input word list and the states in the state lists
have indices that are related. Explain how the indices for the states of the
state list are created.
3. Describe in your own words the roles of the predictor,
completer, and scanner procedures in the algorithm for Earley
parsing. What order are these procedures called in when parsing a sentence,
and why is that ordering important? Explain your answers to the order of
procedure invocation in detail.
4. Use the Java parser to consider the sentence “John saw the burglar with
the telescope”. Parse also “Old men and women like dogs”. Comment on
the different parses possible from these sentences and how you might
retrieve them from the chart.
5. Create a Sentence class. Have one of the constructors take a
String, and have it then separate the String into words.
6. Code the algorithm for production of parse trees from the completed
Chart. One method of recording the sources is presented in Section 30.5.
You may find it useful to use a stack to keep track of the states you have
evaluated for the parse tree.
7. Extend EarleyParser to include support for context-sensitive
(Luger 2009, Section 15.9.5) grammar rules. What new structures are
necessary to guarantee constraints across subtrees of the parse?
Chapter This chapter provides a number of sources for open source and free atural
Objectives language understanding software on the web.
Chapter 31.1 Java NLP Software
Contents 31.2 LingPipe from the University of Pennsylvania
31.3 The Stanford Natural Language Processing Group Software
31.4 Sun’s Speech API
the tutorials, you will get suggestions for how some of the classes can be
extended to do sentence detection for other corpora.
The AbstractSentenceModel class contains the basic functionality needed to
detect sentences. When extending this class, definitions of possible stops,
impossible penultimates, and impossible starts are needed. Possible stops
are any token that can be placed at the end of a sentence. This includes ‘.’
and ‘?’. Impossible penultimates are tokents that cannot precede an end of
the sentence. An example would be ‘Mr’ or ‘Dr’. Impossible starts are
normally punctuation that should not start a sentence and should be
associated with the end of the last sentence. These can be things like end
quotes.
The AbstractSentenceModel is already extended to the HeuristicSentenceModel,
which is extended to the IndoEuropeanSentenceModel, and the
MedlineSentenceModel. These last two provide good examples of definitions
for the possible stops, impossible penultimates and the impossible starts.
From these examples, HeuristicSentenceModel can be extended to suit a data
set. After creating an example set with known sentence boundaries,
running the evaluator contained in the tutorials for the sentences class
gives an idea of fallacies of the current model. From the evaluator’s output
files, corrections can be made to the possible stops, impossible
penultimates and impossible starts. Be careful though; when attempting to
remove all false positives and all false negatives, the definitions can be
come too rigid and cause more errors when run with more then the
example set. So try to find a good balance.
Within the download, the AbstractSentenceModel is only extended to the
HeuristicSentenceModel. This does not mean that you must use the
HeuristicSentenceModel. The HeuristicSentenceModel can be used as an example
to create a new class that extends only AbstractSentenceModel. Therefore if
you have a different type of model that you would like to use, extend
AbstractSentenceModel and try it out.
Part-of-speech The part-of-speech (POS) tagger is a little more complicated then the
Tagger
sentences classes. To use it the POS tagger must be trained. After it is
trained, the tagger can be used to produce a couple of different statistics
about its confidence of the tags it applies to input. In the download there
are examples of code for the Brown, Genia and MedPost corpora. The
classes used in making a POS tagger come from the com.aliasi.hmm package.
The tagger is a HmmDecoder defined by a HiddenMarkovModel.
To train a tagger, you need first a corpus or test set that has been tagged.
Using this tagged set, the HmmCharLmEstimator (in the com.aliasi.hmm
package) can read the training set and create a HiddenMarkovModel. This
model can be used immediately, or it can be written out to file and used at
a later time. The file can be useful when evaluating different taggers. For
each test on different corpora, the exact tagger can be used without having
to recreate it each time.
Now that we have a tagger, we can use it to tag input. Within one
HmmDecoder there are a couple of different ways to tag; all are methods of
the decoder you create from the HiddenMarkovModel. Based on what kind
of information you need, the options are first-best, n-best and confidence-
based. First-best returns only the “best” tagging of the input. N-best
returns the first n “best” taggings. Confidence-based results are the entire
lattice of forward/backward scores.
Provided in the tutorials is an evaluator of taggers. This uses pre-tagged
corpora and trains a little, then evaluates a little. It parses reference
taggings, uses the model to tag, and evaluates how well the model did. The
reference tags are then added to the training data, and the parser moves on.
The arguments to the evaluator will determine how well the model learns
and how long it will take. Experiment with this package to see what is
appropriate for your own data set.
A tagger produced by this package could be used in other algorithms.
Whether as tags needed for the algorithm or as a source to produce a
grammar, this POS tagger is useful. A future project might be to create a
parse tree from the POS tagger, but that functionality is not within
LingPipe. An exercise might be to extend LingPipe to create parse trees.
31.3 The Stanford Natural Language Processing Group
The Stanford NLP group is a team of faculty, postdoctoral researchers,
graduate, and undergraduate students, members from both the Computer
Science and Linguistics departments. The site http://nlp.stanford.edu
describes the team members, their publications, and the libraries that can
be downloaded.
Exploring this Stanford website, the reader finds, as of January 2008, six
Java libraries available for work in natural language processing. These
include a parser as well as a part-of-speech tagger. Although the
information contained in the introduction for each package is extensive
and contains a set of “frequently asked questions”, the code
documentation is often sparse without sufficient design documentation.
The libraries are licensed under the full GNU Public License, which means
that they are available for research or free software projects.
The Stanford The parser makes up a major component of the Stanford NLP website.
Parser
There is background information for the parser, an on-line demo, and
answers for frequently asked questions. The Stanford group refers to their
parser as “a program that works out the grammatical structure of
sentences”. The software package includes an optimized probabilistic
context-free grammar (Luger 2009, Section 15.4).
Within the download of the Stanford parser is a package called
edu.stanford.nlp.parser. This parser interface contains two functions: One
function determines whether the input sentence can be parsed or not. The
other function determines whether the parse meets particular parsing goals,
for example, that it is a noun phrase followed by a verb phrase sentence.
There are also a number of sub-interfaces provided, including ViterbiParser ,
see Chapter 30, and KBestViterbiParser, the first supporting the most likely
probabilistic parse of the sentence and the second giving the K best parses,
where all parses are returned with their “scores”.
Within the interface edu.stanford.nlp.parser there is a further interface,
edu.stanford.nlp.parser.lexparser, which supports parsers for English,
German, Chinese, and Arabic expressions. There are also classes, that once
implemented, can be used to train the parser for use on other languages.
To train the parser, a training set needs to include systematically annotated
data, specifically in the form of a Treebank (a corpus of annotated data that
explicitly specifies parse trees). Once trained the parser contains three
components: grammars, a lexicon, and a set of option values. The grammar
itself consists of two parts, unary (NP = N) and binary (S = NP VP)
rewrite rules. The lexicon is a list of lexical (word) entries) preceeded by a
keyword followed by a raw count score. The options are persistent
variable-value pairs that remain constant across the training and parsing
stages.
The Stanford tools also include a GUI for easy visualization of the trees
produced through parsing the input streams. The training stages require
much more time and memory than using the already trained parser. Thus,
for experimental purposes, it is convenient to use the already trained
parsers, although there is much that can be learned by stepping through the
creation of a parser.
Named-Entity A program that performs named–entity recognition (NER) locates and
Recognition
classifies elements of input strings (text) into predefined categories. For the
Stanford NER the primary categories for classification are name,
organization, location, and miscellaneous. There are two recognizers, the first
classifying the first three of these categories and trained on a corpus
created from data from conference proceedings. The second is trained on
more specific data, the proceedings from one conference.
Using the text classifiers is straightforward. They can be run either as
embedded in a larger program or by command line. When run as part of a
program the classifier is read in using a function associated with
CRFClassifier. This function returns an AbstractSequenceClassifier that uses
methods to classify the contents of a String. An example of one (of the
three possible) output formats, called /Cat is: My/O name/O is/O
Bill/PERSON Smith/PERSON ./O. /O indicates that the text string is
not recognized as a named-entity. There are a number of issues involved in
this type classification, for example that at this point Bill Smith is not
recognized as the name of a single person but rather as two consecutive
PERSON tokens. When working with this type pattern matching it is
important to monitor issues in over-learning and under-learning: when one
pattern matching component is created to fit a complex problem situation,
another set of patterns may not then be classified properly.
Unfortunately the documentation for the Named-Entity package is
minimal. Although it contains a set of JavaDocs they can be both wrong
(referring to classes that are not included) or simply unhelpful.
31.4 Sun’s Speech API
To this point we have focused on Java-based natural language processing
tools analyzing written language. Speech recognition and synthesis are also
important components of NLP. To assist developers in these areas Sun
Microsystems has created an API for speech. This Java API can be found
at http://research.sun.com/speech. From this page there is also a link to a
We have come to the end of our task! In Part V we will give a brief
summary of our views of computer language use, especially in a
comparative setting where we have been able to compare and contrast the
idioms of three different language paradigms and their use in building
structures and strategies for complex problem solving. We begin Chapter
32 with a brief review of these paradigm differences, and then follow with
summary comments on paradigm based abstractions and idioms.
But first we briefly review the nature of the programming enterprise and
why we are part of it.
Well, first, we might say that programming offers monetary compensation
to ourselves and our dependents. But this isn’t really why most of us got
into our field. We authors got into this profession because computation
offered us a critical medium for exploring and understanding our world.
And, yes, we mean this in the large sense where computational tools are
seen as epistemological artifacts for comprehending our world and
ourselves.
We see computation as Galileo might have seen his telescope, as a medium
for exploring entities, relationships, and invariance’s never before perceived
by the human agent. It took Newton and his “laws of motion” almost
another century fully to capture Galileo’s insights. We visualize
computation from exactly this viewpoint, where even as part of our own
and our colleagues’ small research footprint we have explored complex
human phenomena including:
• Human subjects’ neural state and connectivity, using human
testing, fMRI scanning, coupled with dynamic Bayesian
networks and MCMC sampling, none of which would be
possible without computation.
429
Chapter This chapter provides a summary and discussion of the primary idioms and design
Objectives patterns presented in our book.
Chapter 32.1 Paradigm-Based Abstractions and Idioms
Contents 32.2 Programming as a Tool for Exploring Problem Domains
32.3 Programming as a Social Activity
32.4 Final Thoughts
431
that called additional main methods in the class. This program might
function correctly, but it would hardly be considered a good java program.
Instead, quality Java programs distribute their functionality over relatively
large numbers of class definitions, organized into hierarchies by
inheritance, interface definitions, method overloading, etc. The goal is to
reflect the structure of the problem in the implementation of its solution.
This not only brings into focus the use of programming languages to
sharpen our thinking by building epistemological models of a problem
domain, but also supports communication among developers and with
customers by letting people draw on their understanding of the domain.
There are many reasons for the importance of idioms to good
programming. Perhaps the most obvious is that the idiomatic patterns of
language use have evolved to help with the various activities in the
software lifecycle, from program design through maintenance. Adhering to
them is important to gaining the full benefits of the language. For example,
our hypothetical “Java written as C” program would lack the
maintainability of a well-written Java program.
A further reason for adhering to accepted language idioms is for
communication. As we will discuss below, software development (at least
once we move beyond toy programs) is a fundamentally social activity. It is
not enough for our programs to be correct. We also want other
programmers to be able to read them, understand the reasons we wrote the
program as we did, and ultimately modify our code without adding bugs
due to a misunderstanding of our original intent.
Throughout the book, we have tried to communicate these idioms, and
suggested that mastering them, along with the traditional algorithms, data
structures, and languages, is an essential component of programming skill.
analyzed and verified in terms of things and relations, rather than the
complexities of analyzing the many paths a program can take through its
execution. And, it enhances the use of the programming language as a tool
for stating theoretical ideas: as a tool for thinking.
32.3 Programming as a Social Activity
As programming has matured as a discipline, we have also come to
recognize that teams usually write complex software, rather than a single
genius laboring in isolation. Both authors work in research institutions, and
are acutely aware that the complexity of the problems modern computer
science tackles makes the lone genius the exception, rather than the rule.
The most dramatic example of this is open-source software, which is built
by numerous programmers laboring around the world. To support this, we
must recognize that we are writing programs as much to be read by other
engineers as to be executed on a computer.
Software This social dimension of programming is most strongly evident in the
Engineering
and AI
discipline of software engineering. We feel it unfortunate that many
textbooks on software engineering emphasize the formal aspects of
documentation, program design, source code control and versioning,
testing, prototyping, release management, and similar engineering practices,
and downplay the basic source of their value: to insure efficient, clear
communication across a software development team.
Both of this book’s authors work in research institutions, and have
encountered the mindset that research programming does not require the
same levels of engineering as applications development. Although research
programming may not involve the need for tutorials, user manuals and
other artifacts of importance to commercial software, we should not forget
that the goal of software engineering is to insure communication. Research
teams require this kind of coordination as much as commercial
development groups. In our own practice, we have found considerable
success with a communications-focused approach to software engineering,
treating documentation, tests, versioning, and other artifacts as tools to
communicate with our team and the larger community. Thinking of
software engineering in these terms allows us to take a “lightweight”
approach that emphasizes the use of software engineering techniques for
communication and coordination within the research team. We urge the
programmer to see their own software engineering skills in this light.
Prototyping Prototyping is an example of a software engineering practice that has its
roots in the demands of research, and that has found its way into
commercial development. In the early days, software engineering seemed
to aim at “getting it right the first time” through careful specification and
validation of requirements. This is seldom possible in research
environments where the complexity and novelty of problems and the use
of programming as a tool for thinking precludes such perfection.
Interestingly, as applications development has moved into interactive
domains that must blend into the complex communication acts of human
communities, the goal of “getting it right the first time” has been rejected
in favor of a prototyping approach.
We urge the reader to look at the patterns and techniques presented in this
book as tools for building programs quickly and in ways that make their
semantics clear – as tools for prototyping. Metalinguistic abstraction is the
most obvious example of this. In building complex, knowledge-based
systems, the separation of inference engine and knowledge illustrated in
many examples of this book allows the programmer to focus on
representing problem-specific knowledge in the development process.
Similarly, in object-oriented programming, the mechanisms of interfaces,
class inheritance, method extension, encapsulation, and similar techniques
provide a powerful set of tools for prototyping. Although often thought of
as tools for writing reusable software, they give a guiding structure to
prototyping. “Thin-line” prototyping is a technique that draws on these
object-oriented mechanisms. A thin-line prototype is one that implements
all major components of a system, although initially with limited
complexity. For example, assume an implementation of an expert-system
in a complex network environment. A thin-line prototype would include all
parts of the system to test communication, interaction, etc., but with
limited functionality. The expert system may only have enough rules to
solve a few example problems; the network communications may only
implement enough messages to test the efficiency of communications; the
user interface may only consist of enough screens to solve an initial
problem set, and so on.
The power of thin-line prototypes is that they test the overall architecture
of the program without requiring a complete implementation. Once this is
done and evaluated for efficiency and robustness by engineers and for
usability and correctness by end users, we can continue development with a
focused, easily managed cycle of adding functionality, testing it, and
planning. In our experience, most AI programs are built this way.
Reuse It would be nearly impossible to write a book on programming without a
discussion of an idea that has become something of a holy grail to modern
software development: code reuse. Both in industry and academia,
programmers are under pressure, not only to build useful, reliable software,
but also to produce useful, reusable components as a by-product of that
effort. In aiming for this goal, we should be aware of two subtleties.
The first is that reusable software components rarely appear as by-products
of a problem-specific programming effort. The reason is that reuse, by
definition, requires that components be designed, implemented, and tested
for the general case. Unless the programmer steps back from the problem
at hand to define general use cases for a component, and designs, builds,
tests, and documents to the general cases, it is unlikely the component will
be useful to other projects. We have built a number of reusable
components, and all of them have their roots in this effort to define and
build to the general case.
The second thing we should consider is that actual components should not
be the only focus of software reuse. Considerable value can be found in
reusing ideas: the idioms and patterns that we have demonstrated in this
book. These are almost the definition of skill and mastery in a programmer,
and can rightly be seen as the core of design and reuse.
Abelson, H. and Sussman, G. J., 1985. Structure and Interpretation of Computer Programs. Cambridge, MA:
MIT Press.
Alexander, C., Ishikawa, S., Silverstein, M., Jacobson, M., Fiksdahl-King, I. and Angel, S., 1977. A
Pattern Language. New York: Oxford University Press.
Bellman, R. E., 1956. Dynamic Programming. Princeton, NJ: Princeton University Press.
Brachman, R. J. and Levesque, H. J., 1985. Readings in Knowledge Representation, Los Altos CA: Morgan
Kaufmann.
Bundy, A., Byrd, L., Luger, G., Melish, C., Milne, R., and Stone, M. 1979. Solving Mechanics
Problems Using Meta-Inference. In Proceedings of IJCAI-1979, 1017-1027.
Chakrabarti, C., Rammohan, R., and Luger, G. F., 2005. A First-Order Stochastic Prognostic System
for the Diagnosis of Helicopter Rotor Systems for the US Navy. In Proceedings of the 2nd Indian
International Conference on Artificial Intelligence. Pune, India. Elsevier Publications.
Charniak, E., Riesbeck, C. K., McDermott, D.V., and Meehan, J.R., 1987. Artificial Intelligence
Programming, 2nd ed. Hillsdale, NJ: Erlbaum.
Church, A. (1941). The Calculi of Lambda-Conversion. Annals of Mathematical Studies 6 . Princeton NJ:
Princeton University Press.
Clocksin, W. F. and Mellish, C. S., 1984. Programming in Prolog. New York, Springer-Verlag.
Clocksin, W. F. and Mellish, C. S., 2003. Programming in Prolog: Using the ISO Standard. New York,
Springer.
Collins, A. and Quillian, M. R., 1969. Retrieval Time for Semantic Memory. Journal of Verbal Learning
& Verbal Behavior, 8: 240-247.
Colmerauer, A. H., 1975. Les Grammaires de Metamorphose, Groupe Intelligence Artificielle, Universite
Aix-Marseille II, France.
Colmerauer, A., H. Kanoui, H., 1973. Un Systeme de Communication Homme-machine en Francais. Groupe
Intelligence Artificiale, Université Aix-Marseille II, France.
Coplein, J. O. and Schmidt, D. C., 1995. Pattern Languages of Program Design. Reading, MA: Addison-
Wesley.
Dahl, V., 1977. Un Système Deductif d’Interrogation de Banques de Donnes en Espagnol, PhD thesis,
Université Aix-Marseille, France.
Dahl, V. and McCord, M.C. 1983. Treating Coordination in Logic Grammars. American Journal of
Computational Linguistics, 9:69–91.
Darwin, C., 2007. The Voyage of the Beagle. Retrieved 3/23/07 from
http://www.literature.org/authors/darwin-charles/the-voyage-of-the-beagle.
DeJong, G. and Mooney, R., 1986. Explanation-Based Learning: An Alternative View. Machine
Learning, 1(2); 145-176.
Dybvig, R. K., 1996. The Scheme Programming Language. Upper Saddle River, NJ: Prentice Hall.
Earley, J., 1970. An efficient context-free parsing algorithm. Communications of the ACM, 6(8): 451-455.
Eiben, A. E., Smith, J. E., 1998. Introduction to Evolutionary Computing. Berlin: Springer.
Evans, E., 1983. Domain Driven Design: Tackling Complexity in the Heart of Software. Upper Saddle River
NJ: Addison-Wesley.
Feigenbaum, E. A., and Feldman, J., eds., 1963. Computers and Thought. New York: McGraw-Hill.
Fikes, R. E., Hart, P. E., and Nilsson, N. J., 1972. Learning and Executing General Robot Plans.
Artificial Intelligence, 3(4): 251-288.
439
Fikes, R. E. and Nilsson, N. J., 1971. STRIPS: A New Approach to the Application of Theorem
Proving to Artificial Intelligence. Artificial Intelligence, 1(2): 189-208.
Forbus, K. D. and deKleer, J. 1993. Building Problem Solvers. Cambridge, MA: MIT Press.
Gamma, E., Helm, R., Johnson, R, and Vlissides, J., 1995. Design Patterns: Elements of Reusable Object-
oriented Software. Reading, MA: Addison-Wesley.
Ganzerli, S., De Palma, P., Smith, J. D., and Burkhart, M. F., 2003. Efficiency of Genetic Algorithms
for Optimal Structural Design Considering Convex Models of Uncertainty. Proceedings of the
Ninth International Conference on Statistics and Probability in Civil Engineering, Berkeley: 1003-1010.
Rotterdam, NL: Millpress Science Publishers.
Gazdar, G. and Mellish, C., 1989. Natural Language Processing in PROLOG: An Introduction to
Computational Linguistics. Reading, MA: Addison-Wesley.
GECCO, 2007. Genetic and Evolutionary Computing Conference, 2007. Retrieved 3/23/07 from
http://www.sigevo.org/gecco-2007.
Goldberg, D. E., 1989. Genetic Algorithms in Search Optimization and Machine Learning. New York:
Addison-Wesley.
Graham, P. 1993. On LISP: Advanced Techniques for Common LISP. Englewood Cliffs, NJ: Prentice Hall.
Graham, P. 1995. ANSI Common Lisp. Englewood Cliffs, NJ: Prentice Hall.
Halcolm, J. R. and Shultz, R., 2005. Tau: A web-deployed hybrid prover for first-order logic with identity with
optional inductive proof. 12 April 2008, http://www.hsinfosystems.com/Tau_JAR.pdf
Hasemer, T. and Domingue, J., 1989. Common LISP Programming for Artificial Intelligence. Reading, MA:
Addison-Wesley.
Haupt, L. and Haupt, S., 1998. Practical Genetic Algorithms. New York: John Wiley and Sons.
Hayes, P., 1977. In Defense of Logic. Proceedings of IJCAI-77, Cambridge, MA: MIT Press.
Hermenegildo, M. And the Ciao Development Team, 2007. An Overview of The Ciao Multiparadigm
Language and Program Development Environment and its Design Philosophy. ECOOP Workshop on
Multiparadigm Programming with Object-Oriented Languages MPOOL 2007, July 2007.
Hill, P. and Lloyd, J., 1995. The Gödel Programming Language. Cambridge, MA: MIT Press.
Holland, J. H., 1975. Adaptation in Natural and Artificial Systems. Ann Arbor MI: University of
Michigan Press.
Jurafsky, D., and Martin, J. H., 2008. Speech and Language Processing (2nd ed), Upper Saddle River, NJ:
Prentice Hall.
Kedar-Cabelli, S. T. and McCarty, L. T., 1987. Explanation-Based Generalization as Resolution
Theorem Proving. Proceedings of the Fourth International Workshop on Machine Learning.
King, S. H., 1991. Knowledge Systems Through Prolog. Oxford: Oxford University Press.
Kowalski, R., 1979. Algorithm = Logic + Control. Communications of the ACM 22: 424-436.
Kowalski, R., 1979. Logic for Problem Solving. Amsterdam: North Holland.
Krzysztof, R. A. and Wallace, M., 2007. Constraint Logic Programming Using Eclipse. Cambridge UK:
Cambridge University Press.
Lloyd, J. W., 1984. Foundations of Logic Programming. New York: Springer Verlag.
Lucas, R., 1996. Mastering Prolog. London UK: UCL Press.
Luger, G. F., 2009. Artificial Intelligence: Structures and Strategies for Complex Problem Solving, Boston MA:
Addison-Wesley Pearson.
Maclean, N., 1989. A River Runs Through It, Chicago: University of Chicago Press.
Maier, D. and Warren, D. S., 1988. Computing with Logic: Logic Programming with Prolog. Boston MA:
Addison-Wesley.
Malpas, J., 1987. Prolog: A Relational Language and its Applications. Englewood Cliffs NJ: Prentice Hall.
McCarthy, J., 1960. Recursive functions of symbolic expressions and their computation by machine.
Communications of the ACM 3(4).
McCord, M. C., 1982. Using slots and modifiers in logic grammars for natural language. Artificial
Intelligence, 18:327–367.
McCord, M. C., 1986. Design of a Prolog based machine translation system. Proceedings of the Third
International Logic Programming Conference, London.
McCune, W. W. and Wos, L., 1997. Otter: The CADE-13 competition incarnations. Journal of
Automated Reasoning, 18(2): 211-220.
Milner, R., Tofte, M. , Harper , R., and MacQueen, D., 1997. The Definition of Standard ML (Revised).
Cambridge MA: MIT Press.
Minsky, M., 1975. A Framework for Representing Knowledge. In Brachman and Levesque (1985).
Minton, S., 1988. Learning Search Control Knowledge. Dordrecht: Kluwer Academic Publishers.
Mitchell, T. M., 1978. Version Spaces: An Approach to Concept Learning. Report No.STAN-CS-78-
711, Computer Science Dept., Stanford University.
Mitchell, T. M., 1979. An Analysis of Generalization as a Search Problem. Proceedings of IJCAI, 6.
Mitchell, T. M., 1982. Generalization as Search, Artificial Intelligence, 18(2): 203-226.
Mitchell, T. M., Keller, R. M., and Kedar-Cabelli, S. T., 1986. Explanation-Based Generalization: A
Unifying View. Machine Learning, 1(1): 47-80.
Mycroft, A. and O’Keefe, R. A., 1984. A Polymorphic Type System for Prolog. Artificial Intelligence,
23: 295-307.
Neves J. C. F. M., Luger, G. F., and Carvalho, J. M., 1986. A Formalism for Views in a Logic Data
Base. In Proceedings of the ACM Computer Science Conference, Cincinnati OH.
Newell, A., 1982. The Knowledge Level. Artificial Intelligence, 18(1): 87-127.
Newell, A. and Simon, H. A. 1976. Computer Science as Empirical Inquiry: Symbols and Search.
Communications of the ACM, 19(3):113–126.
Nilsson, N. J., 1980. Principles of Artificial Intelligence. Palo Alto, CA: Tioga.
O’Keefe, R., 1990. The Craft of PROLOG. Cambridge, MA: MIT Press.
O’Sullivan, B., 2003. Recent advances in constraints, Joint ERCIM/CologNet International
Workshop on Constraint Solving and Constraint Logic Programming. Lecture Notes in Computer
Science 2627, Berlin: Springer.
Overbay, S., Ganzerli, S., De Palma, P, Brown, A., Stackle, P., 2006. Trusses, NP-Completeness, and
Genetic Algorithms. Proceedings of the 17th Analysis and Computation Specialty Conference. St. Louis,
MO.
Paulson, L. C., 1989. Isabelle: The Next 700 Theorem Provers. Journal of Automated Reasoning 5: 383-
397.
Pereira, L. M. and Warren, D. H. D., 1980. Definite Clause Grammars for Language Analysis – A
Survey of the Formalism and a Comparison with Augmented Transition Networks. Artificial
Intelligence, 13:231–278.
Pless, D. and Luger, G. F., 2003. EM learning of product distributions in a first-order stochastic logic
language. Artificial Intelligence and Soft Computing: Proceedings of the IASTED International Conference.
Anaheim: IASTED/ACTA Press. Also available as University of New Mexico Computer Science
Technical Report TR-CS-2003-01.
Quinlan, J. R., 1986. Induction of Decision Trees. Machine Learning, 1(1):81–106.
Quinlan, J, R., 1996. Bagging, Boosting and C4.5. Proceedings AAAI 96. Menlo Park CA: AAAI Press.
Rajeev, S. and Krishnamoorthy, C. S., 1997. Genetic Algorithms-Based Methodologies for Design
Optimization of Trusses. Journal of Structural Engineering, 123 (3): 350-358.
Robinson, J. A., 1965. A Machine-Oriented Logic Based on the Resolution Principle. Journal of the
ACM, 12: 23-41.
Robinson, J. A. and Voronkov, A., 2001. Handbook of Automated Reasoning: Volume 1. Cambridge MA:
MIT Press.
Ross, P., 1989. Advanced Prolog. Reading, MA: Addison-Wesley.
Roussel, P., 1975. Prolog: Manuel de Reference et d'Utilisation. Luminy, France, Groupe d'Intelligence
Artificialle, Université d' Aix-Marseille.
Sakhanenko, N., Luger, G. F. and Stern, C. R., 2006. Managing Dynamic Contexts using Failure-
Driven Stochastic Models. Proceedings of FLAIRS Conference. Menlo Park CA: AAAI Press.
Seibel, P., 2005. Practical Common Lisp. Berkeley CA: Apress, Inc.
Shannon, C., 1948. A Mathematical Theory of Communication. Murray Hill NJ: Bell System Technical
Journal.
Shapiro, S. C., ed., 1987. Encyclopedia of Artificial Intelligence. New York: Wiley-Interscience.
Smith, J. B., 2006. Practical OCaml. Berkeley CA: Apress, Inc.
Somogyi, Z., Henderson, F., and Conway, T., 1995. Logic Programming for the real world. In
Proceedings of the Eighteenth Australasian Computer Science Conference, R Kotagiri (Editor), 1995,
Australian Computer Science Communications: Glenelg, South Australia. pp. 499-512.
Sowa, J. F., 1984. Conceptual Structures: Information Processing in Mind and Machine. Reading MA: Addison-
Wesley.
Steele, G. L., 1990, Common LISP: The Language, 2nd ed. Bedford, MA: Digital Press.
Sterling, L. and Shapiro, E., 1986. The Art of Prolog. Advanced Programming Techniques.
Cambridge MA: MIT Press.
Sussman, G. and Steele, G., 1975. SCHEME: An Interpreter for Extended Lambda Calculus, AI Memo
349, MIT Artificial Intelligence Laboratory, Cambridge, Mass.
Tanimoto, S. L., 1990. The Elements of Artificial Intelligence using Common LISP. New York: W.H. Freeman.
Touretzky, D. S., 1990. Common LISP: A Gentle Introduction to Symbolic Computation. Redwood City, CA:
Benjamin/Cummings.
Turing, A., 1948. Intelligent Machinery. A report to the National Physical Laboratory. London.
Van Le, T., 1993. Techniques of Prolog Programming with Implementation of Logical Negation and Quantified
Goals. New York: Wiley.
Walker, A., McCord, M., Sowa, J. F., and Wilson, W. G., 1987. Knowledge Systems and Prolog: A Logical
Approach to Expert Systems and Natural Language Processing. Reading, MA: Addison-Wesley.
Warren, D. H. D., Pereira, L. M. and Pereira, F., 1977. Prolog - the language and its implementation
compared with LISP. Proceedings, Symposium on AI and Programming Languages, SIG-PLAN Notices,
12(8).
Warren, D. H. D., Pereira, F. and Pereira, L. M., 1979. User's Guide to DEC-System 10 PROLOG.
Occasional Paper 15, Department of Artificial Intelligence, University of Edinburgh, UK.
Wilensky, R., 1986. Common LISPCraft, New York: Norton Press.
Winston, P. H., Binford, T. O., Katz, B, and Lowry, M., 1983. Learning Physical Descriptions from
Functional Definitions, Examples, and Precedents. Proceedings of National Conference on Artificial
Intelligence, Washington D.C., San Francisco: Morgan Kaufman. 433-439.
Winston, P. H. and Horn, B. K. P., 1984. LISP. Reading, MA: Addison-Wesley.
443