0% found this document useful (0 votes)
165 views100 pages

AI Algorithms, Data Structures, and Idioms in Prolog, Lisp (PDFDrive) - 364-463

Uploaded by

vothanhv
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
165 views100 pages

AI Algorithms, Data Structures, and Idioms in Prolog, Lisp (PDFDrive) - 364-463

Uploaded by

vothanhv
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 100

348 Part IV: Programming in Java

{
System.out.println(
"CloneNotSupportedException:" + e);
}
}
}
24.7 Design Discussion
In closing out this chapter, we would like to look at two major design
decisions. The first is our separation of representation and search through
the introduction of AbstractSolutionNode and its descendants.
The second is the importance of static structure to the design.
Separating The separation of representation and search is a common theme in AI
Representation
and Search
programming. In Chapter 22, for example, our implementation of simple
search engines relied upon this separation for generality. In the reasoning
engine, we bring the relationship between representation and search into
sharper focus. Here, the search engine serves to define the semantics of
our logical representation by implementing a form of logical inference. As
we mentioned before, our approach builds upon the mathematics of the
representation language – in this case, theories of logic inference – to
insure the quality of our representation.
One detail of our approach bears further discussion. That is the use of the
method, getSolver(RuleSet rules, SubstitutionSet
parentSolution), which was defined in the Goal interface. This
method simplifies the handling of the search space by letting search
algorithms treat them independently of their type (simple sentence, node,
etc). Instead, it lets us treat nodes in terms of the general methods defined
by AbstractSolutionNode, and to rely upon each goal to return
the proper type of solution node. This approach is beneficial, but as is
typical of object-oriented design, there are other ways to implement it.
One of these alternatives is through a factory pattern. This would replace
the getSolver() method of Goal with a separate class that creates
instances of the needed node. For example:
Class SolutionNodeFactory
{
public static AbstractSolutionNode
getSolver(Goal goal,
RuleSet rules,
SubstitutionSet parentSolution)
{
if (goal instanceof SimpleSentence)
return new SimpleSentenceSolutionNode(
goal, rules, parentSolution);
if (goal instanceof And)
return new AndSolutionNode(goal, rules,
parentSolution);
}
}

Luger_all_wcopyright_COsfixed.pd364 364 5/15/2008 6:37:14 PM


Chapter 24 A Logic-Based Reasoning System 349

There are several interesting trade-offs between the approaches. Use of


the Factory sharpens the separation of representation and search. It
even allows us to reuse the representation in contexts that do not involve
reasoning without the difficulty of deciding how to handle the
getSolver method required by the parent interface. On the other
hand, the approach we did use allows us to get the desired solver without
using instanceof to test the type of goal objects explicitly. Because the
instanceof operator is computationally expensive, many programmers
consider it good style to avoid it. Also, when adding a new operator, such
as Or, we only have to change the operator’s class definition, rather than
adding the new class and modifying the Factory object. Both
approaches, however, are good Java style. As with all design decisions, we
encourage the reader to evaluate these and other approaches and make up
their own mind.
The A more important design decision concerns the static structure of the
Importance of
Static Structure
implementation. By static structure, we mean the organization of classes in
a program. We call it static because this structure is not changed by
program execution. As shown in Figures 24.6, 24.7, and 24.9, our
approach has a fairly complex static structure. Indeed, in developing the
reasoner, we experimented with several different approaches (this is, we
feel, another good design practice), and many of these had considerably
fewer classes and simpler static structures. We chose this approach
because it is usually better to represent as much of the program’s semantic
structure as is feasible in the class structure of the code. There are several
reasons for this:
1. It makes the code easier to understand. Although our static
structure is complex, it is still much simpler than the dynamic
behavior of even a moderately complex program. Because it is
static, we can make good use of modeling techniques and tools
to understand the program, rather than relying on dynamic
tracing to see what is going on in program executions.
2. It simplifies methods. A well-designed static structure,
although it may be complex, does not necessarily add
complexity to the code. Rather, it moves complexity from
methods to the class structure. Instead of a few classes with
large complex methods, we tend to have more, simpler
methods. If we look at the implementation of our logic-based
reasoner, the majority of the methods were surprisingly simple:
mostly setting or retrieving values from a data structure. This
makes methods easier to write correctly, and easier to debug.
3. It makes it easier to modify the code. As any experienced
programmer has learned, the lifecycle of useful code inevitably
involves enhancements. There is a tendency for these
enhancements to complicate the code, leading to increased
problems with bugs as the software ages. This phenomenon
has been called software entropy. Because it breaks the
program functionality down into many smaller methods
spread among many classes, good static structure can simplify

Luger_all_wcopyright_COsfixed.pd365 365 5/15/2008 6:37:14 PM


350 Part IV: Programming in Java

code maintenance by reducing the need to make complex


changes to existing methods.
This chapter completes the basic implementation of a logic-based
reasoner, except for certain extensions including adding the operators for
or and not. We leave these as an exercise. The next chapter will add a
number of enhancements to the basic reasoner, such as asking users for
input during the reasoning process, or replacing true/false values with
quantitative measures of uncertainty. As we develop these enhancements,
keep in mind how class structure supports these extensions, as well as the
implementation patterns we use to construct them.
Exercises
1. Write a method of AbstractSolutionNode to print out a proof
tree in a readable format. A common approach to this is to indent each
node’s description c * level, where level is its depth in the tree, and c is the
number of spaces each level is indented.
2. Add classes for the logical operators Or and Not. Try following the
pattern of the chapter’s implementation of And, but do so critically. If you
find an alternative approach you prefer, feel free to explore it, rewriting
Or and Not as well. If you do decide on a different approach, explain
why.
3. Extend the “user-friendly” input language from exercise 8 of chapter 22
to include And, , Or, , Not,¬, and Rule, .
4. Write a Prolog-style interactive front end to the logical reasoner that will
read in a logical knowledge-base from a file using the language of exercise
2, and then enter a loop where users enter goals in the same language,
printing out the results, and then prompting for another goal.
5. Implement a factory pattern for generating solutionNodes, and
compare it to the approach taken in the chapter. A factory would be a
class, named solutionNodeFactory with a methods that would take
any needed variables and return an instance of the class
solutionNodes.
6. Give a logical proof that the two approaches to representing And nodes
in Figure 24.10 are equivalent.
7. Modify the nextSolution() method in AndSolutionNode to
replace the recursive implementation with one that iterates across all the
operators of an And operator. Discuss the trade-offs between efficiency,
understandability, and maintainability in the two approaches.

Luger_all_wcopyright_COsfixed.pd366 366 5/15/2008 6:37:15 PM


25 An Expert System Shell

Chapter Completing the meta-interpreter for rule systems in Java


Objectives Full backtracking unification algorithm
A goal-based reasoning shell
An example rule system demonstration
The extended functionality for building expert systems
Askable predicates
Response to how and why queries
Structure presented for addition of certainty factors
Chapter 25.1 Introduction: Expert Systems
Contents 25.2 Certainty Factors and the Unification Problem Solver
25.3 Adding User Interactions
25.4 Design Discussion

25.1 Introduction: Expert Systems


In Chapter 24, we developed a unification-based logic problem solver that
solved queries through a depth-first, backward chaining search. In this
chapter, we will extend those classes to implement two features commonly
found in expert-system shells: the ability to attach confidence estimates, or
certainty factors, to inferences (see Luger 2009 for more on certainty
factors), and the ability to interact with the user during the reasoning
process. Since all the classes in this chapter will extend classes from the
unification problem solver, readers must be sure to have read that chapter
before continuing.
In developing the expert system shell, we have two goals. The first is to
explore the use of simple inheritance to extend an existing body of code.
The second is to provide the reader with a start on more extensive
modifications to the code that will be a valuable learning experience; the
exercises will make several suggestions for such extensions.
Certainty The first extension to the reasoner will be to implement a simplified
Factors
version of the certainty factor algebra described in Luger (2009). Certainty
factors will be numbers between -1.0 and 1.0 that measure our confidence
in an inference: -1.0 indicates the conclusion is false with maximum
certainty, and 1.0 means it is true with maximum certainty. A certainty
value of 0.0 indicates nothing is known about the assertion. Values
between -1.0 and 1.0 indicate varying degrees of confidence.
Rules have an attached certainty factor, which indicates the certainty of
their conclusion if all elements in the premise are known with complete
certainty. Consider the following rule and corresponding certainty factor:
If p then q, CF = 0.5

351

Luger_all_wcopyright_COsfixed.pd367 367 5/15/2008 6:37:15 PM


352 Part IV: Programming in Java

This means that, if p is true with a confidence of 1.0 (maximum


confidence), then q can be inferred to be true with a confidence of 0.5.
This is the measure of the uncertainty introduced by the rule itself. If our
confidence in p is less, than our confidence in q will be lowered
accordingly.
In the case of the conjunction, or “and,” of two expressions, we compute
the certainty of the conjunction as the minimum of the certainty of the
operands. Note that if we limit certainty values to 1.0 (true) and -1.0 (false),
this reduces to the standard definition of “and.” For the “or” operation,
the certainty of the expressions is the maximum of the certainty of its
individual operands. The “not” operator switches the sign of the certainty
factor of its argument. These are also intuitive extensions of the boolean
meaning of those operators.
Certainty factors propagate upward through the inference chain: given a
rule, we unify the rule premises with matching subgoals. After inferring the
certainties of the individual subgoals, we compute the certainty of the
entire rule premise according to the operators for and, or, and not.
Finally, we multiply the certainty of the premise by the certainty of the rule
to compute the certainty of the rule conclusion.
Generally, certainty factor implementations will prune a line of reasoning if
the certainty value falls below a certain value. A common pruning value is
if the certainty is less than 0.2. This can eliminate many branches of the
search space. We will not include this in the implementation of this
chapter, but will leave it as an exercise.
25.2 Certainty Factors and the Unification Problem Solver
Our basic design strategy will be to make minimal changes to the
representation of expressions, and to make most of our changes to the
nodes of the solution tree. The reasoning behind this approach is the idea
that the nodes of the solution tree define the inference strategy, whereas
logical expressions simply are a statement about the world that is
independent of its truth or reasoning. As a variation on truth-values, it
follows that we should treat certainty calculations as a part of the system’s
inference strategy, implementing them as extensions to descendents of the
class AbstractSolutionNode. This suggests we take SimpleSentence
and basic operators to represent assertions independently of their certainty,
and avoid changing them to support this new reasoning strategy.
The classes we will define will be in a new package called
expertSystemShell. To make development of the expert system shell
easier to follow, we will name classes in this package by adding the prefix
“ES” to their ancestors in the package unificationSolver defined in
the previous chapter.
Adding We will support representation of certainty factors as an extension to the
Certainty
Factors to
definition of Rule from the unification problem solver. We will define a
Expressions new subclass of Rule to attach a certainty factor to the basic
representation. We define ESRule as a straightforward extension of the
Rule class by adding a private variable for certainty values, along with

Luger_all_wcopyright_COsfixed.pd368 368 5/15/2008 6:37:15 PM


Chapter 25 An Expert System Shell 353

standard accessors:
public class ESRule extends Rule
{
private double certaintyFactor;
public ESRule(ESSimpleSentence head,
double certaintyFactor)
{
this(head, null, certaintyFactor);
}
public ESRule(ESSimpleSentence head, Goal body,
double certaintyFactor)
{
super(head, body);
this.certaintyFactor = certaintyFactor;
}
public double getCertaintyFactor()
{
return certaintyFactor;
}
protected void setCertaintyFactor(double value)
{
this.certaintyFactor = value;
}
}
Note the two constructors, both of which include certainty factors in their
arguments. The first constructor supports rules with conclusions only;
since a fact is simply a rule without a premise, this allows us to add
certainty factors to facts. The second constructor allows definition of full
rules. An obvious extension to this definition would be to add checks to
make sure certainty factors stay in the range -1.0 to 1.0, throwing an out of
range exception if they are not in range. We leave this as an exercise.
This is essentially the only change we will make to our representation. Most
of our changes will be to the solution nodes in the proof tree, since these
define the reasoning strategy. To support this, we will define subclasses to
both SimpleSentence and And to return the appropriate type of solution
node, as required by the interface Goal (these are all defined in the
preceding chapter). The new classes are:
public class ESSimpleSentence extends SimpleSentence
{
public ESSimpleSentence(Constant functor,
Unifiable... args)
{
super(functor, args);
}

Luger_all_wcopyright_COsfixed.pd369 369 5/15/2008 6:37:15 PM


354 Part IV: Programming in Java

public AbstractSolutionNode getSolver(RuleSet


rules, SubstitutionSet parentSolution)
{
return new
ESSimpleSentenceSolutionNode(this,
(ESRuleSet)rules, parentSolution);
}
}
public class ESAnd extends And
{
public ESAnd(Goal... operands)
{
super(operands);
}
public ESAnd(ArrayList<Goal> operands)
{
super(operands);
}
public AbstractSolutionNode getSolver(RuleSet
rules, SubstitutionSet parentSolution)
{
return new ESAndSolutionNode(this, rules,
parentSolution);
}
}
These are the only extensions we will make to the representation classes.
Next, we will define reasoning with certainty factors in the classes
ESSimpleSentenceSolutionNode and ESAndSolutionNode.
Reasoning with Because the certainty of an expression depends on the inferences that led
Certainty
Factors
to it, the certainty factors computed during reasoning will be held in
solution nodes of the proof tree, rather than the expressions themselves.
Thus, every solution node will define at least a goal, a set of variable
substitutions needed to match the goal during reasoning, and the certainty
of that conclusion. The first two of these were implemented in the
previous chapter in the class AbstractSolutionNode, and its
descendents. These classes located their reasoning in the method,
nextSolution(), defined abstractly in AbstractSolutionNode.
Our strategy will be to use the definitions of nextSolution() from the
classes SimpleSentenceSolutionNode and AndSolutionNode
defined in the previous chapter. So, for example, the basic framework of
ESSimpleSentenceSolutionNode is:

Luger_all_wcopyright_COsfixed.pd370 370 5/15/2008 6:37:16 PM


Chapter 25 An Expert System Shell 355

public class ESSimpleSentenceSolutionNode


extends SimpleSentenceSolutionNode
implements ESSolutionNode
{
private double certainty = 0.0; //default value
public ESSimpleSentenceSolutionNode(
ESSimpleSentence goal, ESRuleSet rules,
SubstitutionSet parentSolution)
{
super(goal, rules, parentSolution);
}
public synchronized SubstitutionSet
nextSolution()
throws CloneNotSupportedException
{
SubstitutionSet solution =
super.nextSolution();
// Compute certainty factor for the solution
// (see below)
return solution;
}
public double getCertainty()
{
return certainty;
}
}
This schema, which will be the same for the ESAndSolutionNode,
defines ESSimpleSentenceSolutionNode as a subclass of the
SimpleSentenceSolutionNode, adding a member variable for the
certainty associated with the current goal and substitution set. When
finding the next solution for the goal, it will call nextSolution() on the
parent class, and then compute the associated certainty factor.
The justification for this approach is that the unification problem solver of
chapter 24 will find all valid solutions (i.e. sets of variable substitutions) to
a goal through unification search. Adding certainty factors does not lead to
new substitution sets – it only adds further qualifications on our
confidence in those inferences. Note that this does lead to questions
concerning logical not: if the reasoner cannot find a set of substitutions
that make a goal true under the unification problem solver, should it fail or
succeed with a certainty of -1.0? For this chapter, we are avoiding such
semantic questions, but encourage the reader to probe them further.
We complete the definition of nextSolution() as follows

Luger_all_wcopyright_COsfixed.pd371 371 5/15/2008 6:37:16 PM


356 Part IV: Programming in Java

public synchronized SubstitutionSet nextSolution()


throws CloneNotSupportedException
{
SubstitutionSet solution = super.nextSolution();
if(solution == null)
{
certainty = 0.0;
return null;
}
ESRule rule = (ESRule) getCurrentRule();
ESSolutionNode child =
(ESSolutionNode) getChild();
if(child == null)
{
// the rule was a simple fact
certainty = rule.getCertaintyFactor();
}
else
{
certainty = child.getCertainty() *
rule.getCertaintyFactor();
}
return solution;
}
After calling super.nextSolution(), the method checks if the value
returned is null, indicating no further solutions were found. If this is the
case, it returns null to the parent class, indicating this branch of the search
space is exhausted.
If there is a solution, the method gets the current rule which was used to
solve the goal, and also gets the child node in the search space. If the child
node is null, this indicates a leaf node, and the certainty factor is simply
that of the associated rule. Otherwise, the method gets the certainty of the
child and multiplies it by the rule’s certainty factor. It saves the result in the
member variable certainty.
Note that this method is synchronized. This is necessary to prevent a
threaded implementation from interrupting the method between
computing the solution substitution set, and the associated certainty, as this
might cause an inconsistency.
The implementation of the class ESAndSolutionNode follows the
same pattern, but computes the certainty factor of the node recursively: as
the minimum of the certainty of the first operand (the head operand) and
the certainty of the rest of the operands (the tail operands).

Luger_all_wcopyright_COsfixed.pd372 372 5/15/2008 6:37:16 PM


Chapter 25 An Expert System Shell 357

public class ESAndSolutionNode


extends AndSolutionNode
implements ESSolutionNode
{
private double certainty = 0.0;
public ESAndSolutionNode(ESAnd goal,
RuleSet rules,
SubstitutionSet parentSolution)
{
super(goal, rules, parentSolution);
}
public synchronized SubstitutionSet
nextSolution()
throws CloneNotSupportedException
{
SubstitutionSet solution =
super.nextSolution();
if(solution == null)
{
certainty = 0.0;
return null;
}
ESSolutionNode head = (ESSolutionNode)
getHeadSolutionNode();
ESSolutionNode tail = (ESSolutionNode)
getTailSolutionNode();
if(tail == null)
certainty = head.getCertainty();
else
certainty =
Math.min(head.getCertainty(),
tail.getCertainty());
return solution;
}
public double getCertainty()
{
return certainty;
}
}
This completes the extension of the unification solver to include certainty
factors.

Luger_all_wcopyright_COsfixed.pd373 373 5/15/2008 6:37:16 PM


358 Part IV: Programming in Java

25.3 Adding User Interactions


Another feature common to expert system shells is the ability to ask users
about the truth of subgoals as determined by the context of the reasoning.
The basic approach to this is to allow certain expressions to be designated
as askable. Following the patterns of the earlier sections of this chapter, we
will define askables as an extension to an existing class.
Looking at the code defined above, an obvious choice for the base class of
askable predicates is the ESSimpleSentence class. It makes sense to limit
user queries to simple sentences, since asking for the truth of a complex
operation would be confusing to users. However, our approach will define
Ask as a subset of the Rule class. There are two reasons for this:
1. In order to query users for the truth of an expression, the system
will need to access a user interface. Adding user interfaces to
ESSimpleSentences not only complicates their definition, but
also it complicates the architecture of the expert system shell by
closely coupling the interface with knowledge representation
classes.
2. So far, our architecture separates knowledge representation syntax
from semantics, with syntax being defined in descendents of the
PCExpression interface, and the semantics being defined in the
nodes of the search tree. User queries are a form of inference (may
the gods of logic forgive me), and will be handled by them.
As we will see shortly, defining Ask as an extension of the Rule class
better supports these design constraints. Although Rule is part of
representation, it is closely tied to reasoning algorithms in the solution
nodes, and we have already used it to define certainty factors. Our basic
scheme will be to modify ESSimpleSentenceSolutionNode as follows:
1. If a goal matches the head of a rule, it is true if the premise of the
rule is true;
2. If a goal matches the head of a rule with no premise, then it is true;
3. If a goal matches the head of an askable rule, then ask the user if it
is true.
Conditions 1 & 2 are already part of the definition of
ESSimpleSentenceSolutionNode. The remainder of this section will
focus on adding #3 to its definition.
Implementing this will require distinguishing if a rule is askable. We will do
this by adding a boolean variable to the ESRule class:
public class ESRule extends Rule
{
private double certaintyFactor;
private boolean ask = false;
// constructors and certainty factor
// accessors as defined above

Luger_all_wcopyright_COsfixed.pd374 374 5/15/2008 6:37:16 PM


Chapter 25 An Expert System Shell 359

public boolean ask()


{
return ask;
}
protected void setAsk(boolean value)
{
ask = value;
}
}
This definition sets ask to false as a default. We define the subclass Ask as:
public class ESAsk extends ESRule
{
public ESAsk(ESSimpleSentence head)
{
super(head, 0.0);
setAsk(true);
}
}
Note that ESAsk has a single constructor, which enforces the constraint
that an askable assertion be a simple sentence.
The next step in adding askables to the expert system shell is to modify the
method nextSolution() of ESSimpleSentenceSolutionNode to test
for askable predicates and query the user for their certainty value. The new
version of nextSolution() is:
public synchronized SubstitutionSet nextSolution()
throws CloneNotSupportedException
{
SubstitutionSet solution = super.nextSolution();
if(solution == null)
{
certainty = 0.0;
return null;
}
ESRule rule = (ESRule) getCurrentRule();
if(rule.ask())
{
ESFrontEnd frontEnd =
((ESRuleSet)getRuleSet()).
getFrontEnd();
certainty = frontEnd.ask((ESSimpleSentence)
rule.getHead(), solution);
return solution;
}

Luger_all_wcopyright_COsfixed.pd375 375 5/15/2008 6:37:17 PM


360 Part IV: Programming in Java

ESSolutionNode child =
(ESSolutionNode) getChild();
if(child == null)
{
certainty = rule.getCertaintyFactor();
}
else
{
certainty = child.getCertainty() *
rule.getCertaintyFactor();
}
return solution;
}
We will define ESFrontEnd in an interface:
public interface ESFrontEnd
{
public double ask(ESSimpleSentence goal,
SubstitutionSet subs);
}
Finally, we will introduce a new class, ESRuleSet, to extend RuleSet
to include an instance of ESFrontEnd:
public class ESRuleSet extends RuleSet
{
private ESFrontEnd frontEnd = null;
public ESRuleSet(ESFrontEnd frontEnd,
ESRule... rules)
{
super((Rule[])rules);
this.frontEnd = frontEnd;
}
public ESFrontEnd getFrontEnd()
{
return frontEnd;
}
}
This is only a partial implementation of user interactions for the expert
system shell. We still need to add the ability for users to make a top-level
query to the reasoner, and also the ability to handle “how” and “why”
queries as discussed in (Luger 2009). We leave these as an exercise.
25.4 Design Discussion
Although the extension of the unification problem solver into a simple
expert system shell is, for the most part, straightforward, there are a couple

Luger_all_wcopyright_COsfixed.pd376 376 5/15/2008 6:37:17 PM


Chapter 25 An Expert System Shell 361

interesting design questions. The first of these was our decision to, as
much as possible, leave the definitions of descendents of PCExpression
as unchanged as possible, and place most of the new material in extensions
to the solution node classes. Our reason for doing this reflects a theoretical
consideration.
Logic makes a theoretical distinction between syntax and semantics,
between the definition of well-formed expressions and the way they are
used in reasoning. Our decision to define the expert system almost entirely
through changes to the solution node classes reflects this distinction. In
making this decision, we are following a general design heuristic that we
have found useful, particularly in AI implementations: insofar as possible,
define the class structure of code to reflect the concepts in an underlying
mathematical theory. Like most heuristics, the reasons for this are intuitive,
and we leave further analysis to the exercises.
The second major design decision is somewhat more problematic. This is
our decision to use the nextSolution method from the unification solver to
perform the actual search, and compute certainty factors afterwards. The
benefits of this are in not modifying code that has already been written and
tested, which follows standard object-oriented programming practice.
However, in this case, the standard practice leads to certain cons that
should be considered. One of these is that, once a solution is found,
acquiring both the variable substitutions and certainty factor requires two
separate methods: nextSolution and getCertainty. This is error
prone, since the person using the class must insure that no state changes
occur between these calls. One solution is to write a convenience function
that bundles both values into a new class (say ESSolution) and returns
them. A more aggressive approach would be to ignore the current version
of nextSolution entirely, and to write a brand new version.
This is a very interesting design decision, and we encourage the reader to
try alternative approaches and discuss their trade-offs in the exercises to
this chapter.
Exercises
1. Modify the definition of the nextSolution method of the classes
ESSimpleSolutionNode and ESAndSolutionNode to fail a line of
reasoning if the certainty factor falls below a certain value (0.2 or 0.3 are
typical values). Instrument your code to count the number of nodes visited
and test it both with and without pruning.
2. Add range checks to all methods and classes that allow certainty factors
to be set, throwing an exception of the value is not in the range of -1.0 to
1.0. Either use Java’s built-in IllegalArgumentException or an
exception class of your own definition. Discuss the pros and cons of the
approach you choose.
3. In designing the object model for the unification problem solver, we
followed the standard AI practice of distinguishing between the
representation of well-formed expressions (classes implementing the
interface unifiable) and the definition of the inference strategy in the

Luger_all_wcopyright_COsfixed.pd377 377 5/15/2008 6:37:17 PM


362 Part IV: Programming in Java

nodes of the solution tree (descendents of AbstractSolutionNode).


This chapter’s expert system shell built on that distinction. More
importantly, because we were not changing the basic inference strategy
other than to add certainty estimates, we approached the expert system by
defining subclasses to SimpleSentenceSolutionNode and
AndSolutionNode, and reusing the existing nextSolution method. If,
however, we were changing the search strategy drastically, or for other
reasons discussed in 25.4, it might have been more efficient to retain only
the representation and rewrite the inference strategy completely. As an
experiment to explore this option, rewrite the expert system shell without
using AbstractSolutionNode or any of its descendants. This will give
you a clean slate for implementing reasoning strategies. Although this does
not make use of previously implemented code, it may allow making the
solution simpler, easier to use, and more efficient. Implement an alternative
solution, and discuss the trade-offs between this approach and that taken
in the chapter.
4. Full implementations of certainty factors also allow the combination of
certainty factors when multiple rules lead to the same goal. I.e., if the goal g
with subsitituions s is supported by multiple lines of reasoning, what is its
certainty? (Luger 2009) discusses how to compute these values. Implement
this approach.
5. A feature common to expert systems that was not implemented in this
chapter is the ability to provide explanations of reasoning through How
and Why queries. As explained in (Luger 2009), How queries explain a fact
by displaying the proof tree that led to it. Why queries explain why a
question was asked by displaying the rule that is the current context of the
question. Implement How and Why queries in the expert system shell, and
support them through a user-friendly front end. This front-end should also
allow users to enter queries, inspect rule sets, etc. It should also support
askable predicates as discussed in the next exercise.
6. Build a front-end to support user interaction around askable predicates.
In particular, it should keep track of answers that have been received, and
avoid asking the same question twice. This means it should keep track of
both expressions and substitutions that have been asked. An additional
feature would be to support asking users for actual substitution values, and
adding them to the substitution set.
7. Revisit the design decision to, so far as possible, locate our changes in
the solution node classes, rather than descendants of PCExpression. In
particular, comment on our heuristic of organizing code to reflect the
structures implied by logical theory. Did this heuristic of following the
structure of theory work well in our implementation? Why? Do you believe
this heuristic to be generalizable beyond logic? Once again, why?

Luger_all_wcopyright_COsfixed.pd378 378 5/15/2008 6:37:17 PM


26 Case Studies: JESS and other Expert
Systems Shells in Java

Chapter This chapter examines Java expert system shells available on the world wide web
Objectives
Chapter 26.1 Introduction
Contents 26.2 JESS
26.3 Other Expert System Shells
26.4 Using Open Source Tools

26.1 Introduction
In the last three chapters we demonstrated the creation of a simple expert
system shell in Java. Chapter 22 presented a representational formalism for
describing predicate calculus expressions, the representation of choice for
expert rule systems and many other AI problem solvers. Chapter 24
created a procedure for unification. We demonstrated this algorithm with a
set of predicate calculus expressions, and then built a simple Prolog in Java
interpreter. Chapter 25 added full backtracking to our unification algorithm
so that it could check all possible unifications in the processes of finding
sets of consistent substitutions across sets of predicate calculus
specifications. In Chapter 25 we also created procedures for answering why
and how queries, as well as for setting up a certainty factor algebra.
In this chapter we present a number of expert system shell libraries written
in Java. As mentioned throughout our presentation of Java, the presence of
extensive code libraries is one of the major reasons for the broad
acceptance of Java as a problem-solving tool. We have explored these
expert shells at the time of writing this chapter. We realize that many of
these libraries will change over time and may well differ (or not even exist!)
when our readers considers them. So we present their urls, current as of
January 2008, with minimal further comment.
26.2 JESS
The first library we present is JESS, the Java Expert System Shell, built and
maintained by programmers at Sandia National Laboratories in
Albuquerque New Mexico. JESS is a rule engine for the Java platform.
Unlike the unification system presented in Chapters 23 and 24, JESS is
driven by a lisp-style scripting language built in Java itself. There are
advantages and disadvantages to this approach. One main advantage of an
independent scripting language is that it is easier to work with for the code
builder. For example, Prolog has its own language that is suitable for rule

363

Luger_all_wcopyright_COsfixed.pd379 379 5/15/2008 6:37:17 PM


364 Part IV: Programming in Java

languages, which makes it easy and clear to write static rule systems.
On the other hand, Prolog is not intended to be embedded in other
applications. In the case of Java, rules may be generated and controlled by
some external mechanism, and in order to use JESS’s approach, the data
needs to be converted into text that this interpreter can handle.
A disadvantage of an independent scripting language is the disconnect
between Java and the rule engine. Once external files and strings are used
to specify rules, standard Java syntax cannot be used to verify and check
syntax. While this is not an issue for stand-alone rule solving systems, once
the user wants to embed the solver into existing Java environments, she
must learn a new language and decide how to interface and adapt the
library to her project.
In an attempt to address standardization of rule systems in Java, the Java
Community Process defined an API for rule engines in Java. The Java
Specification Request #94 defines the javax.rules package and a number of
classes for dealing with rule engines. Our impression of this system is that
it is very vague and seemingly tailored for JESS. It abstracts the rule
system as general objects with general methods for getting/setting
properties on rules.
RuleML, although not Java specific, provides a standardized XML format
for defining rules. This format can theoretically be used for any rule
interpreter, as the information can be converted into the rule interpreter’s
native representations.
JESS has its own JessML format for defining rules in XML, which can be
converted to RuleML and back using XSLT (eXtensible Stylesheet
Language Transformations). These formats, unfortunately, are rather
verbose and not necessarily intended for being read and written by people.
Web links for using JESS include:
http://www.jessrules.com/- The JESS web site,
http://jcp.org/en/jsr/detail?id=94 - JSR 94: JavaTM Rule Engine API,
http://www.jessrules.com/jess/docs/70/api/javax/rules/package-
summary.html - javadocs about javax.rules (from JSR 94), and
http://www.ruleml.org/ - RuleML.
26.3 Other Expert System Shells
We have done some research into other Java expert rule systems, and
found dozens of them. The following url introduces a number of these
(not all in Java):
http://www.kbsc.com/rulebase.html
The general trend of these libraries is to use some form of scripting-
language based rule engine. There is even a Prolog implementation in Java!
There are many real implementations and issues that these things
introduce, including RDF, OWL, SPARQL, Semantic Web, Rete, and
more.
Thie following url discusses a high level look at rule engines in Java (albeit

Luger_all_wcopyright_COsfixed.pd380 380 5/15/2008 6:37:18 PM


Chapter 26 Case Studies: JESS and other Expert System Shells in Java 365

from a couple years ago):


http://today.java.net/pub/a/today/2004/08/19/rulingout.html
Finally, we conclude with a set of links to the seemingly more interesting
rule engines. We only picked the engines listed as free, some are open-
source, some are not:
http://www.drools.org/
http://www.agfa.com/w3c/euler/
http://jlogic.sourceforge.net/ - a prolog interpreter in Java
http://jlisa.sourceforge.net/ - A Clips-like (NASA rule based shell in C)
Rule engine accessible from Java with the power of Common Lisp.
http://mandarax.sourceforge.net/ - this one has some simple
straightforward examples on the site, but the javadocs themselves are
daunting.
http://tyruba.sourceforge.net/
Related to rule interpreters designed to search knowledge-based
specifications, are interpreters intended to transfer knowledge, rules, or
general specifications between code modules. These general module
translation and integration programs are often described under the topic of
the Semantic Web:
http://www.w3.org/2001/SW/
26.4 Using Open Source Tools
The primary advantage these tools have over our simple expert system
shell is their range of features. Jess, for example, provides a rule language
that frees the programmer from having to declare each rule as a set of
nested class instances as in our simple set of tools. An interesting thought
experiment would be to consider what it would take to write a parser for a
rule language that would construct these class instantiations from a user-
friendly rule language. A more ambitious effort, possibly suitable for an
advanced undergraduate or masters level thesis project would be to
implement such a front end.
In using these tools, the reader should not forget the lessons in extending
java classes from the earlier chapter. Inheritance allows the programmer to
extend open source code to include additional functionality if necessary.
More often, we may simply use these tools as a module in a larger program
simply by including the jar files.
In particular, the authors have seen the Jess tool used in a number of large
applications at Sandia Laboratories and the University of New Mexico.
Typical application architecture uses Jess as an inference engine in a larger
system with databases, html front ends using Java Server Faces or similar
technologies, and various tools to assist in file I/O, session archiving, etc.
For example, a development team at Sandia Laboratories led by Kevin
Stamber, Richard Detry, and Shirley Starks has developed a system called
FAIT (Fast Analysis Infrastructure Tool) for use in the National
Infrastructure Simulation and Analysis Center (NISAC). FAIT addresses

Luger_all_wcopyright_COsfixed.pd381 381 5/15/2008 6:37:18 PM


366 Part IV: Programming in Java

the problem of charting and analyzing the interdependencies between


infrastructure elements to help the Department of Homeland Security
respond to hurricanes and other natural disasters. Although there are
databases that show the location of generating plants, sub-stations, power
lines, gas lines, telecommunication facilities, roads and other infrastructure
elements, there are two problems with this data:
1. Interdependencies between elements are not shown explicitly in
the databases. For example, databases of electrical power
generation elements do not explicitly state which substations
service which generating plants, relying on human experts to infer
this from factors like co-location, ownership by the same utility,
etc.
2. Interactions between different types of utilities, such as the effect
of an electrical power outage on telecommunications hubs or gas
pumping stations must be inferred from multiple data sources.
FAIT uses Jess to apply rules obtained from human experts to solve these
problems. What is especially interesting about the FAIT architecture is its
integration of Jess with multiple sources of infrastructure data, its use of a
geographic information system to display interdependencies on maps, and
its presentation of all this through an html front end.
The success of FAIT is intimately tied to its use of both open-source and
commercially purchased tools. If the development team had faced the
challenge of building all these components from scratch, the system would
have cost an order of magnitude more than it did – if it could have been
built at all. This approach of building extremely large systems from
collections of independently designed components using the techniques
discussed in this section has become an essential part of modern software
development.

Luger_all_wcopyright_COsfixed.pd382 382 5/15/2008 6:37:18 PM


27 ID3: Learning from Examples

Chapter Review of supervised learning and decision tree representation


Objectives Representing decision trees as recursive structures
A general decision tree induction algorithm
Information theoretic decision tree test selection heuristic
Chapter 27.1 Introduction to Supervised Learning
Contents 27.2 Representing Knowledge as Decision Trees
27.3 A Decision Tree Induction Program
27.4 ID3: An Information Theoretic Tree Induction Algorithm

27.1 Introduction to Supervised Learning


In machine learning, inductive learning refers to training a learner through use
of examples. The simplest case of this is rote learning, whereby the learner
simply memorizes the training examples and reuses them in the same
situations. Because they do not generalize from training data, rote learners
can only classify exact matches of previous examples. A further limitation
of rote learning is that the learned examples might contain conflicting
information, and without some form of generalization, the learner cannot
effectively deal with this noise. To be effective, a learner must apply
heuristics to induce reliable generalizations from multiple training examples
that can handle unseen situations with some degree of confidence.
A common inductive learning task is learning to classify specific instances
into general categories. In supervised learning, a teacher provides the system
with categorized training examples. This contrasts with clustering and
similar unsupervised learning tasks where the learner forms its own
categories from training data. See (Luger 2009) for a discussion of these
different learning tasks. An example of a supervised inductive learning
problem, which we will develop throughout the chapter is a bank wanting
to train a computer learning system categorize new borrowers according to
credit risk on the basis of properties such as their credit
history, current debt, their collateral, and current income.
One approach would be to look at the credit risk, as determined
over time by the actual debt payoff history of data from previous
borrowers to provide categorized examples. In this chapter we do exactly
that, using the ID3 algorithm.
27.2 Representing Knowledge as Decision Trees
A decision tree is a simple form of knowledge representation that is widely
used in both advisors and machine learning systems. Decision trees are
recursive structures in which each node examines a property of a collection

367

Luger_all_wcopyright_COsfixed.pd383 383 5/15/2008 6:37:19 PM


368 Part IV: Programming in Java

of data, and then delegates further decision making to child nodes based on
the value of that particular property (Luger 2009, Section 10.3). The leaf
nodes of the decision tree are terminal states that return a class for the
given data collection. We can illustrate decision trees through the example
of a simple credit history evaluator that was used in (Luger 2009) in its
discussion of the ID3 learning algorithm. We refer the reader to this book
for a more detailed discussion, but will review the basic concepts of
decision trees and decision tree induction in this section.
Assume we wish to assign a credit risk of high, moderate, or low to people
based on the following properties of their credit rating:
Collateral, with possible values {adequate, none}
Income, with possible values {“0$ to $15K”, “$15K to $35K”,
“over $35K”}
Debt, with possible values {high, low}
Credit History, with possible values {good, bad, unknown}
We could represent risk criteria as a set of rules, such as “If debt is low,
and credit history is good, then risk is moderate.” Alternatively, we can
summarize a set of rules as a decision tree, as in figure 27.1. We can
perform a credit evaluation by walking the tree, using the values of the
person’s credit history properties to select a branch. For example, using the
decision tree of figure 27.1, an individual with credit history = unknown,
debt = low, collateral = adequate, and income = $15K to $35K would be
categorized as having low risk. Also note that this particular categorization
does not use the income property. This is a form of generalization, where
people with these values for credit history, debt, and collateral qualify as
having low risk, regardless of income.

Figure 27.1 A Decision Tree for the Credit Risk Problem (Luger 2009)

Luger_all_wcopyright_COsfixed.pd384 384 5/15/2008 6:37:20 PM


Chapter 27 ID3: Learning from Examples 369

Now, assume the following set of 14 training examples. Although this does
not cover all possible instances, it is large enough to define a number of
meaningful decision trees, including the tree of figure 27.1 (the reader may
want to construct several such trees. See exercise 1). The challenge facing
any inductive learning algorithm is to produce a tree that both covers all
the training examples correctly, and has the highest probability of being
correct on new instances.

risk collateral income debt credit history


high none $0 to $15K high bad
high none $15K to $35K high unknown
moderate none $15K to $35K low unknown
high none $0 to $15K low unknown
low none over $35K low unknown
low adequate over $35K low unknown
high none $0 to $15K low bad
moderate adequate over $35K low bad
low none over $35K low good
low adequate over $35K high good
high none $0 to $15K high good
moderate none $15K to $35K high good
low none over $35K high good
high none $15K to $35K high bad
A valuable heuristic for producing such decision trees comes from the
time-honored logical principle of Occam’s Razor. This principle, first
articulated by the medieval logician, William of Occam, holds that we
should always prefer the simplest correct solution to any problem. In our
case, this would favor decision trees that not only classify all training
examples, but also that do so, on average, by examining the fewest
properties possible. The reason for this is straightforward: the simplest
decision tree that correctly handles the known examples is the tree that
makes the fewest assumptions about unknown instances. Stating it simply,
the fewer assumptions made, the less likely we are to make an erroneous
one.
Because omitting properties is a way of generalizing decision trees, and
because the order in which the properties are tested determines the ability
of the tree to omit properties while still matching all the test data, the order
of tests from root down to leaf nodes is the major factor in inducing
decision trees. This is captured in the following pseudo code for a recursive
algorithm for inducing trees:
function induce_tree (example_set, Properties)
begin
if all entries in example_set are the same class
then return a leaf node labeled with that class

Luger_all_wcopyright_COsfixed.pd385 385 5/15/2008 6:37:20 PM


370 Part IV: Programming in Java

else if Properties is empty


then return a leaf node with default class
else
begin
select a property, P, and
make it the root of the current tree
delete P from Properties
for each value V of P
begin
create a branch of the tree labeled with V
let partition_V be elements of
example_set with values V of P
let branch_V =
induce_tree (partition_V, Properties)
attach branch_V to root for value V of P
end
endfor
return current root
end
endif
end
This algorithm builds trees in a top-down fashion. It stops when all
examples have the same categorization, thereby pruning extraneous
branches of the tree. Using this algorithm, production of a simple (i.e.,
generalized) tree depends upon the order in which properties are selected.
This, in turn, depends upon the selection function used to select the
property to check in the current node of the tree.
For the decision tree induction, we use the original approach from the ID3
algorithm of (Quinlan 1986) elaborated by Luger (2009, Section 10.3). This
approach uses information theory to select the property that gains the most
information about the example set. Intuitively, this heuristic should
minimize the number of properties the tree checks. We will explain it in
detail later. We should note, however, that there are several important
extensions of the early ID3 paradigm, differing only in a few operations.
For example, C4.5 and C5.0 are Quinlan's (1996) own extensions that
overcome a number of the original ID3 weaknesses. We will not
implement C4.5/C5.0 here, but we should remember that more
sophisticated or domain-specific modifications to the core decision tree
induction algorithm may be desired by future developers using this code.
27.3 A Decision Tree Induction Program
Implementing this in Java raises at least two interesting problems.
Managing trees, lists of examples, partitioning examples on various
properties, and so forth is a challenge for designing data structures. Our
example code will not be optimally efficient, but is intended to give the

Luger_all_wcopyright_COsfixed.pd386 386 5/15/2008 6:37:21 PM


Chapter 27 ID3: Learning from Examples 371

student opportunities to improve performance by using table lookup and


other techniques to reduce time spent scanning lists of examples. The
other challenge will be in maintaining the quality of training data. We take a
simplified approach of requiring all examples contain legitimate values for
all desired properties. Although the machine learning literature is filled with
techniques for managing missing or noisy data, this simple assumption will
let us investigate a number of interesting Java techniques, such as
immutable objects, error checks in constructors, etc.
Figure 27.2 shows the five classes that form the basis of our
implementation. AbstractDecisionTreeNode defines the basic
behaviors of a decision tree. It is a recursive structure, as shown by the use
of an assembly link back to itself. AbstractDecisionTreeNode will
define methods to solve a new instance by walking the tree, and the basic
tree induction algorithm mentioned above. The method to evaluate a test
property’s partition of the example space into subproblems into will be
abstract in this class, allowing definition of multiple alternative evaluation
heuristics. The class, InformationTheoreticDecisionTreeNode,
will implement the basic ID3 evaluation heuristic, which uses information
theory to select a property that gives the greatest information gain on the
set of training examples.
The remaining classes define and manage training examples. An
AbstractProperty defines properties as <name, value> pairs. It is an
abstract class, requiring subclasses define a method to test for legal <name,
value> definitions. An AbstractExample defines examples as a set of
properties and a categorization of those properties: i.e. a single row in the
example table given above. Like AbstractProperty, it requires
subclasses define domain specific checks for the validity of examples.
Finally, ExampleSet maintains a set of training examples, such as is
given in the table above. It enforces checks that all examples are of the
same type, provides basic accessors, and also methods to partition an
example set on specific properties.

Figure 27.2 Class structure of decision tree nodes and examples

Luger_all_wcopyright_COsfixed.pd387 387 5/15/2008 6:37:22 PM


372 Part IV: Programming in Java

Properties as The basic definition of a property is straightforward: it consists of two


Immutable
Objects
strings, defining the name and value respectively. A simple initial
implementation might be:
public class Property
{
private String name = null;
private String value = null;
public Property(String name, String value)
{
this.name = name;
this.value = value;
}
public String getName()
{
return name;
}
public String getValue()
{
return value;
}
}
Although this gives the basic structure of the class, and would work in the
program, it fails to perform any correctness checks on data values. The first
of these the opportunity to perform type checks on property values.
Referring to the credit evaluation example, the only values for debt are
“high” and “low,” and a robust program should check for them.
We can implement this by making Property an abstract class that uses
an abstract method to test for legal property values. Each property type will
be a subclass that defines this method. Our definition then becomes:
public abstract class AbstractProperty
{
private String value = null;
public AbstractProperty(String name,
String value)
throws IllegalArgumentException
{
if(isLegalValue(value) == false)
throw
new IllegalArgumentException(value +
"is an illegal Value for Property " +
getName());
this.value = value;
}

Luger_all_wcopyright_COsfixed.pd388 388 5/15/2008 6:37:22 PM


Chapter 27 ID3: Learning from Examples 373

public final String getValue()


{
return value;
}
public abstract boolean isLegalValue(String
value);
public abstract String getName();
}
This version uses the islegalValue(…) method to check for bad
values in the constructor, throwing an IllegalArgumentException
if one is found. Since property is now an abstract class, any property type
must define its own subclass defining the abstract methods. Also note that,
since the name of a property is the same for all instances of a type, we have
made getName() an abstract method as well. An example of how a
property can implement this is given by this implementation of the debt
property:
public class DebtProperty extends AbstractProperty {
public static final String DEBT = "Debt";
public static final String HIGH = "high";
public static final String LOW = "low";
public DebtProperty(String value)
{
super(value);
}
public boolean isLegalValue(String value)
{
return(value.equals(HIGH) ||
value.equals(LOW));
}
public final String getName()
{
return DEBT;
}
}
Although simple, the implementation of AbstractProperty has
another interesting quality. Note that the member variable value is
private, and we have not provided a set method other than through the
constructor. This means that, once an instance of property is created, its
value cannot change. This pattern is called an immutable object. Because
immutable objects avoid many types of bugs (imagine the effect on the
learning algorithm of changing a property value during execution), this
should be used where it matches our intent. To reduce the chance that a
well-intentioned programmer will change this, we should write code so as
to make our intention clear. We can do this by making our get method
final, to prevent subclasses from violating the immutability pattern, and

Luger_all_wcopyright_COsfixed.pd389 389 5/15/2008 6:37:22 PM


374 Part IV: Programming in Java

also by defining set methods that throw an exception if called. This


completes the definition of AbstractProperty as:
public abstract class AbstractProperty
{
private String value = null;
public AbstractProperty(String value)
throws IllegalArgumentException
{
if(isLegalValue(value) == false)
throw
new IllegalArgumentException(value +
"is an illegal Property Value for " +
getName());
this.value = value;
}
public abstract boolean isLegalValue(String
value);
public abstract String getName();
public final String getValue()
{
return value;
}
//Enforcing Immutable object pattern
public final void setValue(String v)
throws UnsupportedOperationException
{
throw new UnsupportedOperationException();
}
//Enforcing Immutable object pattern
public final void setName(String n)
throws UnsupportedOperationException
{
throw new UnsupportedOperationException();
}
}
Implementing Like a property, an example is conceptually simple: it is a collection of
Examples
properties describing a problem instance and a categorization of that
instance. In our credit example, the properties that form an example are
debt, collateral, credit history, and income. The example category is a risk
assessment. Each row of the example table in section 27.1 would be
represented as an example. Like the property class, however, it also
presents opportunities for insuring the validity of examples. In this case, we
will require that an example consist only of specified properties, and that a

Luger_all_wcopyright_COsfixed.pd390 390 5/15/2008 6:37:23 PM


Chapter 27 ID3: Learning from Examples 375

legal example include all properties. Examples also offer an opportunity to


use an immutable object pattern, since it makes little sense to allow
examples to change during the course of a learning session.
The structure of an example is similar to that of an
AbstractProperty: it is an abstract class that requires subclasses
define methods to support validity checks. We will follow the immutable
object pattern, providing access methods but no “add,” “set,” or other
modification methods, and requiring all properties be defined in the
constructor.
The class has two member variables. A category is a String defining
the classification of the example. In our credit example, this would be the
risk level of high, moderate, or low. The properties member variable is
a Map that indexes different properties by their name. We define two
constructors. The primary constructor does error checks to require that
each example contains all legal properties and only legal properties. The
single argument constructor allows us to define uncategorized examples.
Both of these call the private method, addProperties to add the
elements of the propertyList argument to the properties member
variable. This method also checks that the propertyList argument
contains only legal values and all legal values. The implementation of
AbstractExample is:
public abstract class AbstractExample
{
private String category = null;
private Map<String, AbstractProperty> properties
= new HashMap <String, AbstractProperty> ();
// Constructor for classified examples
public AbstractExample(String category,
AbstractProperty... propertyList)
throws IllegalArgumentException
{
if(isLegalCategory(category) == false)
throw
new IllegalArgumentException(category +
"is an illegal category for example.");
this.category = category;
addProperties(propertyList);
}
// Constructor for unclassified examples
public AbstractExample(AbstractProperty...
propertyList)
throws IllegalArgumentException
{
addProperties(propertyList);
}

Luger_all_wcopyright_COsfixed.pd391 391 5/15/2008 6:37:23 PM


376 Part IV: Programming in Java

private void addProperties(AbstractProperty[]


propertyList)
throws IllegalArgumentException
{
Set<String> requiredProps =
getPropertyNames();
// check that all properties are legal
for(int i = 0; i < propertyList.length;
i++)
{
AbstractProperty prop =
propertyList[i];
if(requiredProps.contains(
prop.getName()) == false)
throw
new IllegalArgumentException(
prop.getName() +
"illegal Property for example.");
properties.put(prop.getName(), prop);
requiredProps.remove(prop.getName());
}
// Check that all legal properties were used
if(requiredProps.isEmpty() == false)
{
Object[] p = requiredProps.toArray();
String props = "";
for (int i = 0; i < p.length; i++)
props += (String)p[i] + " ";
throw
new IllegalArgumentException(
"Missing Properties in example: " +
props);
}
}
public AbstractProperty getProperty(
String name)
{
return properties.get(name);
}
public String getCategory()
{
return category;

Luger_all_wcopyright_COsfixed.pd392 392 5/15/2008 6:37:23 PM


Chapter 27 ID3: Learning from Examples 377

}
public String toString()
{
// to be defined by reader
}
public abstract Set<String> getPropertyNames();
}
This implementation of AbstractExample as an immutable object is
incomplete in that it does not include the techniques demonstrated in
AbstractProperty to enforce the immutability pattern. We leave this
as an exercise.
Implementing ExampleSet, along with AbstractDecisionTreeNode, is one of
ExampleSet
the most interesting classes in the implementation. This is because the
decision tree induction algorithm requires a number of fairly complex
operations for partitioning the example set on property values. The
implementation presented here is simple and somewhat inefficient, storing
examples as a simple vector. This requires examination of all examples to
form partitions, retrieve examples with a specific value for a property, etc.
We leave a more efficient implementation as an exercise.
In providing integrity checks on data, we have required that all examples be
categorized, and that all examples belong to the same class.
The basic member variables and accessors are defined as:
public class ExampleSet
{
private Vector<AbstractExample> examples =
new Vector<AbstractExample>();
private HashSet<String> categories =
new HashSet<String>();
private Set<String> propertyNames = null;
public void addExample(AbstractExample e)
throws IllegalArgumentException
{
if(e.getCategory() == null)
throw new IllegalArgumentException(
"Example missing categorization.");
// Check that new example is of same class
// as existing examples
if((examples.isEmpty()) ||
e.getClass() ==
examples.firstElement().getClass())
{
examples.add(e);
categories.add(e.getCategory());

Luger_all_wcopyright_COsfixed.pd393 393 5/15/2008 6:37:23 PM


378 Part IV: Programming in Java

if(propertyNames == null)
propertyNames =
new HashSet<String>(
e.getPropertyNames());
}
else
throw new IllegalArgumentException(
"All examples must be same type.");
}
public int getSize()
{
return examples.size();
}
public boolean isEmpty()
{
return examples.isEmpty();
}
public AbstractExample getExample(int i)
{
return examples.get(i);
}
public Set<String> getCategories()
{
return new HashSet<String>(categories);
}
public Set<String> getPropertyNames()
{
return new HashSet<String>(propertyNames);
}
// More complex methods to be defined.
public int getExampleCountByCategory(String cat)
throws IllegalArgumentException
{
// to be defined below.
}
public HashMap<String, ExampleSet> partition(
String propertyName)
throws IllegalArgumentException
{
// to be defined below.
}
}

Luger_all_wcopyright_COsfixed.pd394 394 5/15/2008 6:37:23 PM


Chapter 27 ID3: Learning from Examples 379

As mentioned, this implementation is fairly simple. It stores examples as a


Vector, so most retrieval or partitioning operations will require iterating
through this list. The categories and propertyNames member
variables are a convenience, allowing simpler access of these values. Since
example sets should not change during a learning session, we could use an
immutable object pattern in the ExampleSet implementation. This
implementation does not, since it would lead to extremely complex
constructor implementations. Instead, we implemented an addExample
method. This method performs simple data integrity checks, requiring that
all examples be of the same type, and prohibiting unclassified examples.
Reworking this using an immutable pattern is left as an exercise. The
remaining methods are straightforward accessors.
ExampleSet includes a number of methods to support the induction
algorithm. The first of these counts the number of examples that belong to
a given category:
public int getExampleCountByCategory(String cat)
throws IllegalArgumentException
{
Iterator<AbstractExample> iter =
examples.iterator();
AbstractExample example;
int count = 0;
while(iter.hasNext())
{
example = iter.next();
if(example.getCategory().equals(cat))
count++;
}
return count;
}
A more complex method partitions the example set according to different
examples value for a specified property. Partition takes as argument a
property name, and returns an instance of HashMap<String,
ExampleSet> where each key is a property value, and each value is an
instance of ExampleSet containing examples that have that value for the
chosen property. Partition calls to private methods, getValues,
which returns a list of values for a property that appear in the example set,
and getExamplesByProperty, which constructs a new instance of
ExampleSet where each example has the same value for a property.
public HashMap<String, ExampleSet> partition(
String propertyName)
throws IllegalArgumentException
{
HashMap<String, ExampleSet> partition =
new HashMap<String, ExampleSet>();

Luger_all_wcopyright_COsfixed.pd395 395 5/15/2008 6:37:23 PM


380 Part IV: Programming in Java

Set<String> values = getValues(propertyName);


Iterator<String> iter = values.iterator();
while(iter.hasNext())
{
String val = iter.next();
ExampleSet examples =
getExamplesByProperty(propertyName,
val);
partition.put(val, examples);
}
return partition;
}
private Set<String> getValues(String propName)
{
HashSet<String>values = new HashSet<String>();
Iterator<AbstractExample> iter =
examples.iterator();
while(iter.hasNext())
{
AbstractExample ex = iter.next();
values.add(ex.getProperty(propName).
getValue());
}
return values;
}
private ExampleSet getExamplesByProperty(
String propName, String value)
throws IllegalArgumentException
{
ExampleSet result = new ExampleSet();
Iterator<AbstractExample> iter =
examples.iterator();
AbstractExample example;
while(iter.hasNext())
{
example = iter.next();
if(example.getProperty(propName).getValue().
equals(value))
result.addExample(example);
}
return result;
}

Luger_all_wcopyright_COsfixed.pd396 396 5/15/2008 6:37:24 PM


Chapter 27 ID3: Learning from Examples 381

Placing the partitioning algorithm in a method of ExampleSet, rather


than in the actual decision tree induction algorithm was an interesting
design decision. The reason for this choice was a desire to treat
ExampleSet as an abstract data type, including all operations on it in its
class definition.
Although this implementation works, it is inefficient, performing multiple
iterations through lists of examples. An alternative approach would
construct more complex sets of indices of examples by property and value
on construction. Trying this approach and evaluating its effectiveness is left
as an exercise.
Implementing A decision tree node will define methods to solve problems by walking the
Decision Tree
Nodes
tree, as described in section 27.1. We have also chosen to implement the
basic induction algorithm in the decision tree class. Justification for this
decision was that the inherently recursive nature of the induction algorithm
matched the recursive structure of trees, simplifying the implementation.
Because the induction algorithm is general, and could be used with a
variety of heuristics for evaluating candidate example partitions, we will
make the basic implementation of decision trees an abstract class.
The basic definition of AbstractDecisionTreeNode appears below.
Member variables include category, which is set to a categorization in
leaf nodes; for internal nodes, its value is not defined.
DecisionPropertyName is the property on which the node branches;
it is undefined for leaf nodes. Children is a HashMap that indexes child
nodes by values of decisionPropertyName. Each constructor calls
induceTree to perform tree induction. Note that the two-argument
constructor is protected. Its second argument is the list of unused
properties for consideration by the induction algorithm, and it is only used
by the induceTree method. The remaining methods defined below are
straightforward accessors.
public abstract class AbstractDecisionTreeNode
{
private String category = null;
private String decisionPropertyName = null;
private HashMap<String,AbstractDecisionTreeNode>
children = new
HashMap<String,AbstractDecisionTreeNode>();
public AbstractDecisionTreeNode (
ExampleSet examples)
throws IllegalArgumentException
{
induceTree(examples,
examples.getPropertyNames());
}
protected AbstractDecisionTreeNode(ExampleSet
examples, Set<String> selectionProperties)

Luger_all_wcopyright_COsfixed.pd397 397 5/15/2008 6:37:24 PM


382 Part IV: Programming in Java

throws IllegalArgumentException
{
induceTree(examples, selectionProperties);
}
public boolean isLeaf()
{
return children.isEmpty();
}
public String getCategory()
{
return category;
}
public String getDecisionProperty()
{
return decisionPropertyName;
}
public AbstractDecisionTreeNode getChild(String
propertyValue)
{
return children.get(propertyValue);
}
public void addChild(String propertyValue,
AbstractDecisionTreeNode child)
{
children.put(propertyValue, child);
}
public String Categorize(AbstractExample ex)
{
// defined below
}
public void induceTree(ExampleSet examples,
Set<String> selectionProperties)
throws IllegalArgumentException
{
// defined below
}
public void printTree(int level)
{
// implementation left as an exercise
}
protected abstract double

Luger_all_wcopyright_COsfixed.pd398 398 5/15/2008 6:37:24 PM


Chapter 27 ID3: Learning from Examples 383

evaluatePartitionQuality(HashMap<String,
ExampleSet> part, ExampleSet examples)
throws IllegalArgumentException;
protected abstract AbstractDecisionTreeNode
createChildNode(ExampleSet examples,
Set<String> selectionProperties)
throws IllegalArgumentException;
}
Note the two abstract methods for evaluating a candidate partition and
creating a new child node. These will be implemented on 27.3.
Categorize categorizes a new example by performing a recursive tree
walk.
public String categorize(AbstractExample ex)
{
if(children.isEmpty())
return category;
if(decisionPropertyName == null)
return category;

AbstractProperty prop =
ex.getProperty(decisionPropertyName);
AbstractDecisionTreeNode child =
children.get(prop.getValue());
if(child == null)
return null;
return child.categorize(ex);
}
InduceTree performs the induction of decision trees. It deals with four
cases. The first is a normal termination: all examples belong to the same
category, so it creates a leaf node of that category. Cases two and three
occur if there is insufficient information to complete a categorization; in
this case, the algorithm creates a leaf node with a null category.
Case four performs the recursive step. It iterates through all properties that
have not been used in the decision tree (these are passed in the parameter
selectionProperties), using each property to partition the example
set. It evaluates the example set using the abstract method,
evaluatePartitionQuality. Once it finds the best evaluated
partition, it constructs child nodes for each branch.
public void induceTree(ExampleSet examples,
Set<String> selectionProperties)
throws IllegalArgumentException
{
// Case 1: All instances are the same
// category, the node is a leaf.

Luger_all_wcopyright_COsfixed.pd399 399 5/15/2008 6:37:24 PM


384 Part IV: Programming in Java

if(examples.getCategories().size() == 1)
{
category = examples.getCategories().
iterator().next();
return;
}
//Case 2: Empty example set. Create
// leaf with no classification.
if(examples.isEmpty())
return;
//Case 3: Empty property set; could not classify.
if(selectionProperties.isEmpty())
return;
// Case 4: Choose test and build subtrees.
// Initialize by partitioning on first
// untried property.
Iterator<String> iter =
selectionProperties.iterator();
String bestPropertyName = iter.next();
HashMap<String, ExampleSet> bestPartition =
examples.partition(bestPropertyName);
double bestPartitionEvaluation =
evaluatePartitionQuality(bestPartition,
examples);
// Iterate through remaining properties.
while(iter.hasNext())
{
String nextProp = iter.next();
HashMap<String, ExampleSet> nextPart =
examples.partition(nextProp);
double nextPartitionEvaluation =
evaluatePartitionQuality(nextPart,
examples);
// Better partition found. Save.
if(nextPartitionEvaluation >
bestPartitionEvaluation)
{
bestPartitionEvaluation =
nextPartitionEvaluation;
bestPartition = nextPart;
bestPropertyName = nextProp;
}
}
// Create children; recursively build tree.
this.decisionPropertyName = bestPropertyName;

Luger_all_wcopyright_COsfixed.pd400 400 5/15/2008 6:37:24 PM


Chapter 27 ID3: Learning from Examples 385

Set<String> newSelectionPropSet =
new HashSet<String>(selectionProperties);
newSelectionPropSet.remove(decisionPropertyName);
iter = bestPartition.keySet().iterator();
while(iter.hasNext())
{
String value = iter.next();
ExampleSet child = bestPartition.get(value);
children.put(value,
createChildNode(child,
newSelectionPropSet));
}
27.4 ID3: An Information Theoretic Tree Induction Algorithm
The heart of the ID3 algorithm is its use of information theory to evaluate
the quality of candidate partitions of the example set by choosing
properties that gain the most information about an examples
categorization. Luger (2009) discusses this approach in detail, but we will
review it briefly here.
Shannon (1948) developed a mathematical theory of information that
allows us to measure the information content of a message. Widely used in
telecommunications to determine such things as the capacity of a channel,
the optimality of encoding schemes, etc., it is a general theory that we will
use to measure the quality of a decision property.
Shannon’s insight was that the information content of a message depended
upon two factors. One was the size of the set of all possible messages, and
the probability of each message occurring. Given a set of possible
messages, M = {m1, m2 . . . mn}, the information content of any
individual message is measured in bits by the sum, across all messages in M
of the probability of each massage times the log to the base 2 of that
probability.
I(M) =  – p(mi) log2 p(mi)
Applying this to the problem of decision tree induction, we can regard a set
of examples as a set of possible messages about the categorization of an
example. The probability of a message (a given category) is the number of
examples with that category divided by the size of the example set. For
example, in the table in section 27.1, there are 14 examples. Six of the
examples have high risk, so p(risk = high) = 6/14. Similarly, p(risk =
moderate) = 3/14, and p(risk = low) = 5/14. So, the information in any
example in the set is:
I(example set) = -6/14 log (6/14) -3/14 log (3/14) -5/14 log (5/14)
= - 6/14 * (-1.222) - 3/14 * (-2.222) - 5/14 * (-1.485)
= 1.531 bits
We can think of the recursive tree induction algorithm as gaining
information about the example set at each iteration. If we assume a set of

Luger_all_wcopyright_COsfixed.pd401 401 5/15/2008 6:37:25 PM


386 Part IV: Programming in Java

training instances, C, and a property P with n values, then P will partition C


into n subsets, {c1, c2, . . . cY}. The information needed to finish inducing
the tree can be measured as the sum of the information in each subset of
the partition, weighted by the size of that partition. That is, the expected
information gain to complete the tree, E, is computed by:
E(P) = S (|ci|/|C|) * I(ci)
Therefore, the information gained for property P is:
Gain(P) = I(C) - E(P)
The ID3 algorithm uses this value to rank candidate partitions.
Implementing We will implement this in a subclass of AbstractDecisionTreeNode
Information
Theoretic
called InformationTheoreticDecisionTreeNode. This class will
Evaluation implement the two abstract methods of the parent class, along with needed
constructors. The createChildNode method is called in
AbstractDecisionTreeNode to create the proper type of child node.
EvaluatePartitionQuality computes the information gain of a
partition. It calls the private methods computeInformation and
log2.
public class InformationTheoreticDecisionTreeNode
extends AbstractDecisionTreeNode
{
public InformationTheoreticDecisionTreeNode(
ExampleSet examples)
throws IllegalArgumentException
{
super(examples);
}
public InformationTheoreticDecisionTreeNode(
ExampleSet examples,
Set<String> selectionProperties)
throws IllegalArgumentException
{
super(examples, selectionProperties);
}
protected AbstractDecisionTreeNode
createChildNode(
ExampleSet examples,
Set<String> selectionProperties)
throws IllegalArgumentException
{
return new
InformationTheoreticDecisionTreeNode(
examples, selectionProperties);
}

Luger_all_wcopyright_COsfixed.pd402 402 5/15/2008 6:37:25 PM


Chapter 27 ID3: Learning from Examples 387

protected double evaluatePartitionQuality(


HashMap<String, ExampleSet> part,
ExampleSet examples)
throws IllegalArgumentException
{
double examplesInfo =
computeInformation(examples);
int totalSize = examples.getSize();
double expectedInfo = 0.0;
Iterator<String> iter =
part.keySet().iterator();
while(iter.hasNext())
{
ExampleSet ex = part.get(iter.next());
int partSize = ex.getSize();
expectedInfo += computeInformation(ex)
* partSize/totalSize;
}
return examplesInfo - expectedInfo;
}
private double computeInformation(
ExampleSet examples)
throws IllegalArgumentException
{
Set<String> categories =
examples.getCategories();
double info = 0.0;
double totalCount = examples.getSize();
Iterator<String> iter =
categories.iterator();
while (iter.hasNext())
{
String cat = iter.next();
double catCount = examples.
getExampleCountByCategory(cat);
info += -(catCount/totalCount)*
log2(catCount/totalCount);
}
return info;
}
private double log2(double a)
{
return Math.log10(a)/Math.log10(2);
}
}

Luger_all_wcopyright_COsfixed.pd403 403 5/15/2008 6:37:25 PM


388 Part IV: Programming in Java

Exercises
1. Construct two or three different trees that correctly classify the training
examples in the table of section 27.1. Compare their complexity using
average path length from root to leaf as a simple metric. What informal
heuristics would use in constructing the simplest trees to match the data?
Manually build a tree using the information theoretic test selection
algorithm from the ID3 algorithm. How does this compare with your
informal heuristics?
2. Extend the definition of AbstractExample to enforce the
immutable object pattern using AbstractProperty as an example.
3.The methods AbstractExample and AbstractProperty throw
exceptions defined in Java, such as IllegalArgumentException
or UnsupportedOperationException when passed illegal values
or implementers try to violate the immutable object pattern. An alternative
approach would use user-defined exceptions, defined as subclasses of
java.lang.RuntimeException. Implement this approach, and discuss its
advantages and disadvantages.
4. The implementation of ExampleSet in section 27.2.3 stores
component examples as a simple vector. This requires iteration over all
examples to partition the example set on a property, count categories, etc.
Redo the implementation using a set of maps to allow constant time
retrieval of examples having a certain property value, category, etc.
Evaluate performance for this implementation and that given in the
chapter.
5. Complete the implementation for the credit risk example. This will
involve creating subclasses of AbstractProperty for each property,
and an appropriate subclass of AbstractExample. Also, write a class
and methods to test your code.

Luger_all_wcopyright_COsfixed.pd404 404 5/15/2008 6:37:25 PM


28 Genetic and Evolutionary Computing

Chapter A brief introduction to the genetic algorithms


Objectives Genetic operators include
Mutation
Crossover
An example GA application worked through
The WordGuess problem
Appropriate object hierarchy created
Generalizable to other GA applications
Exercises emphasize GA interface design
Chapter 28.1 Introduction
Contents 28.2 The Genetic Algorithm: A First Pass
28.3 A GA Implementation in Java
28.4 Conclusion: Complex Problem Solving and Adaptation

28.1 Introduction
The genetic algorithm (GA) is one of a number of computer programming
techniques loosely based on the idea of natural selection. The idea of
applying principles of natural selection to computing is not new. By 1948,
Alan Turing proposed “genetical or evolutionary search” (Turing 1948).
Less than two decades later, H.J. Bremmermann performed computer
simulations of “optimization through evolution and recombination”
(Eiben and Smith 1998). It was John Holland who coined the term, genetic
algorithm (Holland 1975). However, the GA was not widely studied until
1989, when D.E. Goldberg showed that it could be used to solve a
significant number of difficult problems (Goldberg 1989). Currently, many
of these threads have come together under the heading evolutionary computing
(Luger 2009, Chapter 12).
28.2 The Genetic Algorithm: A First Pass
The Genetic Algorithm is based loosely on the concept of natural
selection. Individual members of a species who are better adapted to a
given environment reproduce more successfully. They pass their
adaptations on to their offspring. Over time, individuals possessing the
adaptation form a new species that is particularly suited to the
environment. The genetic algorithm applies the metaphor of natural
selection to optimization problems. No claim is made about its biological
accuracy, although individual researchers have proposed mechanisms both
with and without a motivating basis from nature.
A candidate solution for a genetic algorithm is often called a chromosome.
The chromosome is composed of multiple genes. A collection of

389

Luger_all_wcopyright_COsfixed.pd405 405 5/15/2008 6:37:25 PM


390 Part IV: Programming in Java

chromosomes is called a population. The GA randomly generates an initial


population of chromosomes, which are then ranked according to a fitness
function (Luger 2009, Section 12.1).
Consider an example drawn from structural engineering. Structural
engineers make use of a component known as a truss. Trusses come in
many varieties, the simplest of which should be familiar to anyone who has
noticed the interconnected triangular structures found in bridges and
cranes. Figure 28.1 is an example of the canonical 64-bar truss (Ganzerli et
al. 2003), which appears in the civil engineering literature on optimization.
The arrows are loads, expressed in a unit known as a Kip. Engineers
would like to minimize the volume of a truss, taken as the cross-sectional
area of the bars multiplied by their length.
To solve this problem using a GA, we first randomly generate a population
of trusses. Some of these will stand up under a given load, some will not.
Those that fail to meet the load test are assigned a severe penalty. The
ranking in this problem is based on volume. The smaller the truss volume,
after any penalty has been assigned, the more fit the truss. Only the fittest
individuals are selected for reproduction. It has been shown that the truss
design problem is NP-Complete (Overbay et al. 2006). Engineers have
long-recognized the difficulty of truss design, most often developing good
enough solutions with the calculus-based optimization techniques available
to them (Ganzerli et al. 2003).
By the late nineties, at least two groups were applying genetic algorithms to
very large trusses and getting promising results (Rajeev and
Krishnamoorthy 1997), (Ghasemi et al. 1999). Ganzerli et al. (2003) took
this work a step further by using genetic algorithms to optimize the 64-bar
truss with the added complexity of load uncertainty. The point here is not
simply that the GA is useful in structural engineering, but that it has been
applied in hundreds of ways in recent years, structural engineering being an
especially clear example. A number of other examples, including the
traveling salesperson and SAT problems are presented in Luger (2009,
Section 12.1). The largest venue for genetic algorithm research is The
Genetic and Evolutionary Computation Conference (GECCO 2007). Held in a
different city each summer, the papers presented range from artificial life
through robotics to financial and water quality systems.
Despite the breadth of topics addressed, the basic outline for genetic
algorithm solvers across application domains is very similar. Search
through the problem space is guided by the fitness-function. Once the fitness-
function is designed, the GA traverses the space over many iterations,
called generations, stopping only when some pre-defined convergence
criterion is met. Further, the only substantial differences between one
application of the GA and the next is the representation of the
chromosome for the problem domain and the fitness function that is
applied to it. This lends itself very nicely to an object-oriented
implementation that can be easily generalized to multiple problems. The
technique is to build a generic GA class with specific implementations as
subclasses.

Luger_all_wcopyright_COsfixed.pd406 406 5/15/2008 6:37:26 PM


Chapter 28 Genetic and Evolutionary Computing 391

WordGuess Consider a simple problem called WordGuess (Haupt and Haupt 1998). The
Example
user enters a target word at the keyboard. The GA guesses the word. In
this case, each letter is a gene, each word a chromosome, and the total
collection of words is the population. To begin, we randomly generate a
sequence of chromosomes of the desired length. Next, we rank the
generated chromosomes for fitness. A chromosome that is identical with
the target has a fitness of zero. A chromosome that differs in one letter has
a fitness of 1 and so on. It is easy to see that the size of the search space
for WordGuess increases exponentially with the length of the word. In the
next few sections, we will develop an object-oriented solution to this
problem.
Suppose we begin with a randomly generated population of 128 character
strings. After ranking them, we immediately eliminate the half that is least
fit. Of the 64 remaining chromosomes, the fittest 32 form 16 breeding
pairs. If each pair produces 2 offspring, the next generation will consist of
the 32 parents plus the 32 children.

21 22
19 20

17 18
70 K 2 4 6 8 16 24 26 28

70 K
1 3 5 7 15 23 25 27
20 K 13 14

11 12
9 10
20 K
100 K 100 K

Figure 28.1 A system of trusses to be optimized with a set of genetic


operators.
Having decided who may reproduce, we mate them. The GA literature is
filled with clever mating strategies, having more or less biological
plausibility. We consider two, TopDown and Tournament. In TopDown, the
fittest member of the population mates with the next most fit and so on,
until the breeding population is exhausted. Tournament is a bit more
complex, and slightly more plausible (Haupt and Haupt 1998). Here we
choose a subset of chromosomes from the breeding population. The fittest
chromosome within this subset becomes Parent A. We do the same thing
again, to find its mate, Parent B. Now we have a breeding pair. We
continue with this process until we have created as many breeding pairs as
we need.

Luger_all_wcopyright_COsfixed.pd407 407 5/15/2008 6:37:26 PM


392 Part IV: Programming in Java

Mating is how each chromosome passes its genes to future generations.


Since mating is an attempt to simulate (and simplify) recombinant DNA,
many authors refer to it as recombination (Eiben and Smith 2003). As with
pairing, many techniques are available. WordGuess uses a single technique
called Crossover. Recall that each chromosome consists of length(chromosome)
genes. The most natural data structure to represent a chromosome is an
array of length(chromosome) positions. A gene—in this case an alphabetic
character—is stored in each of these positions. Crossover works like this:
1. Generate a random number n, 0 <= n <
length(chromosome). This is called the Crossover Point.
2. Parent A passes its genes in positions 0 … n to Child 1.
3. Parent B passes its genes in positions 0 … n to Child 2.
4. Parent A passes it genes in positions n + 1 …
length(chromosome – 1) to the corresponding positions in
Child 2.
5. Parent B passes its genes in positions n + 1 …
length(chromosome – 1) to the corresponding positions in
Child 1
Figure 28.2 illustrates mating with n = 4. The parents, PA and PB produce
the two children CA and CB.
After the reproducing population has been selected, paired, and mated, the
final ingredient is the application of random mutations. The importance of
random mutation in nature is easy to see. Favorable (as well as
unfavorable!) traits have to arise before they can be passed on to offspring.
This happens through random variation, caused by any number of natural
mutating agents. Chemical mutagens and radiation are examples. Mutation
guarantees that new genes are introduced into the gene pool. Its practical
effect for the GA is to reduce the probability that the algorithm will
converge on a local minimum. The percentage of genes subject to mutation
is a design parameter in the solution process.
The decision of when to stop producing new generations is the final
component of the algorithm. The simplest possibility, the one used in
WordGuess, is to stop either after the GA has guessed the word or 1000
generations have passed. Another halting condition might be to stop when
some parameter P percent of the population is within Q standard
deviations of the population mean.

PA: CHIPOLTE PB: CHIXLOTI

CA: CHIPLOTI CB: CHIXOLTE

Figure 28.2 Recombination with crossover at the point n = 4.

Luger_all_wcopyright_COsfixed.pd408 408 5/15/2008 6:37:26 PM


Chapter 28 Genetic and Evolutionary Computing 393

The entire process can be compactly expressed through the while-loop:


GA(population)
{
Initialize(population);
ComputeCost(population);
Sort(population);
while (not converged on acceptable solution)
{
Pair(population);
Mate(population);
Mutate(population);
Sort(population);
TestConvergence(population);
}
}
28.3 A GA Implementation in Java
WordGuess is written in the Java programming language with object-
oriented (OO) techniques developed to help manage the search
complexity. An OO software system consists of a set of interrelated
structures known as classes. Each class can perform a well-defined set of
operations on a set of well-defined operands. The operations are referred
to as methods, the operands as member variables, or just variables.
The Class The classes interrelate in two distinct ways. First, classes may inherit
Structure
properties from one another. Thus, we have designed a class called GA. It
defines most of the major operations needed for a genetic algorithm.
Knowing that we want to adapt GA to the problem of guessing a word
typed at the keyboard, we define the class WordGuess. Once having
written code to solve a general problem, that code is available to more
specific instances of the problem. A hypothetical inheritance structure for
the genetic algorithm is shown in Figure 28.3, where the upward pointing
arrows are inheritance links. Thus, WordGuess inherits all classes and
variables defined for the generic GA.
Second, once defined, classes may make use of one another. This
relationship is called compositionality. GA contains several component classes:
• Chromosome is a representation of an individual population
member.
• Pair contains all pairing algorithms developed for the
system. By making Pair its own class, the user can add new
methods to the system without changing the core components
of the code.
• Mate contains all mating algorithms developed for the
system.
• SetParams, GetParams, and Parameters are
mechanisms to store and retrieve parameters.

Luger_all_wcopyright_COsfixed.pd409 409 5/15/2008 6:37:26 PM


394 Part IV: Programming in Java

• WordGuessTst sets the algorithm in motion.


Finally, class GA makes generous use of Java’s pre-defined classes to
represent the population, randomly generate chromosomes, and to handle
files that store both the parameters and an initial population. GA is
character-based. A Graphical User Interface (GUI) can be implemented
with Java’s facilities for GUIs and Event-Driven programming found in
the javax.swing package (see Exercise 28.3).

Figure 28.3 The inheritance hierarchy for implementing the GA.


The Class The variables reflect what a class knows about itself. Class Chromosome
Chromosome
must know how many genes it has, its fitness, and have a representation
for its genes. The number of genes and the fitness of the chromosome can
be easily represented as integers. The representation of the genes poses a
design problem. For WordGuess, a character array works nicely. For an
engineering application, we might want the chromosome to be a vector of
floating point variables. The most general representation is to use Java’s
class Object and have specific implementations, like WordGuess,
define their own chromosomes (see Exercise 28.4).
The methods describe what a class does. Class Chromosome must be
able to set and return its fitness, set and return the number of its genes,
display its genes, and determine if it is equal to another chromosome. The
Java code that implements the class Chromosome follows.
public class Chromosome
{
private int CH_numGenes;
protected int CH_cost;
private Object[] CH_gene;
public Chromosome(int genesIn)
{
CH_numGenes = genesIn;
CH_gene = new char[CH_numGenes];
}

Luger_all_wcopyright_COsfixed.pd410 410 5/15/2008 6:37:31 PM


Chapter 28 Genetic and Evolutionary Computing 395

public int GetNumGenes()


{
return CH_numGenes;
}
public void SetCost(int cost)
{
CH_cost = cost;
}
public void SetGene(int index, Object value)
{
CH_gene[index] = value;
}
public boolean Equals(String target)
{
for (int i = 0; i < CH_numGenes; i++)
if (CH_gene[i] != target.charAt(i))
return false;
return true;
}
}
Classes Pair Chromosomes must be paired and mated. So that we can experiment with
and Mate
more than a single pairing or mating algorithm, we group multiple versions
into classes Pair and Mate. Since pairing and mating are done over an
entire population, before we define Pair and Mate we must decide upon
a representation for the population. A population is a list of chromosomes.
Java’s built-in collection classes are contained in the java.util library
and known as the Java Collection Framework. Two classes, ArrayList and
LinkedList support list behavior. It is intuitively easy to conceive of a
population as an array of chromosomes. Accordingly, we use the class
ArrayList to define a population as follows:
ArrayList<Chromosome> GA_pop;
GA_pop = new ArrayList<Chromosome>();
The first line defines a variable, GA_pop as type ArrayList. The
second creates an instance of GA_pop.
WordGuess implements a single paring algorithm, TopDown.
Tournament pairing is left as an exercise. Pair has to know the
population that is to be paired and the number of mating pairs. Since only
half of the population is fit enough to mate, the number of mating pairs is
the population size divided by 4. Here we can see one of the benefits of
using pre-defined classes. ArrayList provides a method that returns the
size of the list. The code for Pair follows:
public class Pair
{
private ArrayList<Chromosome> PR_pop;

Luger_all_wcopyright_COsfixed.pd411 411 5/15/2008 6:37:31 PM


396 Part IV: Programming in Java

public Pair(ArrayList<Chromosome> population)


{
PR_pop = population;
}
public int TopDown()
{
return (PR_pop.size() / 4);
}
}
Class Mate also implements a single algorithm, Crossover. It is slightly
more complex than Pair. To implement Crossover, we need four
chromosomes, one for each parent, and one for each child. We also need
to know the crossover point, as explained in Section 28.2, the number of
genes in a chromosome, and the size of the population. We now present
the member variables and the constructor for Mate:
public class Mate
{
private Chromosome MT_father,
MT_mother,
MT_child1,
MT_child2;
private int MT_posChild1,
MT_posChild2,
MT_posLastChild,
MT_posFather,
MT_posMother,
MT_numGenes,
MT_numChromes;
public Mate(ArrayList<Chromosome> population,
int numGenes, int numChromes)
{
MT_posFather = 0;
MT_posMother = 1;
MT_numGenes = numGenes;
MT_numChromes = numChromes;
MT_posChild1 = population.size()/2;
MT_posChild2 = MT_posChild1 + 1;
MT_posLastChild = population.size() - 1;
for (int i = MT_posLastChild;
i >= MT_posChild1; i--)
population.remove(i);
MT_posFather = 0;
MT_posMother = 1;

Luger_all_wcopyright_COsfixed.pd412 412 5/15/2008 6:37:31 PM


Chapter 28 Genetic and Evolutionary Computing 397

}
// Remaining method implemented below.
}
Mate takes a population of chromosome as a parameter and returns a
mated population. The for-loop eliminates the least fit half of the
population to make room for the two children per breeding pair.
Crossover, the only other method in Mate, is presented next. It
implements the algorithm described in Section 28.2. Making use of the
Set/Get methods of Chromosome, Crossover blends the
chromosomes of each breeding pair. When mating is complete, the
breeding pairs are in the top half of the ArrayList, the children are in
the bottom half.
public ArrayList<Chromosome> Crossover(
ArrayList<Chromosome> population, int numPairs)
{
for (int j = 0; j < numPairs; j++)
{
MT_father = population.get(MT_posFather);
MT_mother = population.get(MT_posMother);
MT_child1 = new Chromosome(MT_numGenes);
MT_child2 = new Chromosome(MT_numGenes);
Random rnum = new Random();
int crossPoint = rnum.nextInt(MT_numGenes);
// left side
for (int i = 0; i < crossPoint; i++)
{
MT_child1.SetGene(i,
MT_father.GetGene(i));
MT_child2.SetGene(i,
MT_mother.GetGene(i));
}
// right side
for (int i = crossPoint;
< MT_numGenes;i++)
{
MT_child1.SetGene(i,
MT_mother.GetGene(i));
MT_child2.SetGene(i,
MT_father.GetGene(i));
}
population.add(MT_posChild1,MT_child1);
population.add(MT_posChild2,MT_child2);
MT_posChild1 = MT_posChild1 + 2;
MT_posChild2 = MT_posChild2 + 2;

Luger_all_wcopyright_COsfixed.pd413 413 5/15/2008 6:37:31 PM


398 Part IV: Programming in Java

MT_posFather = MT_posFather + 2;
MT_posMother = MT_posMother + 2;
}
return population;
}
The GA Class Having examined its subclasses, it is time to look at class GA, itself. We
never create an instance of class GA. GA exists only so that its member
variables and methods can be inherited, as in Figure 28.3. Classes that may
not be instantiated are called abstract. The classes higher in the hierarchy are
called superclasses. Those lower in the hierarchy are called subclasses. Member
variables and methods designated protected in a super class are
available to its subclasses.
GA contains the population of chromosomes, along with the various
parameters that its subclasses need. The parameters are the size of the
initial population, the size of the pared down population, the number of
genes, the fraction of the total genes to be mutated, and the number of
iterations before the program stops. The parameters are stored in a file
manipulated through the classes Parameters, SetParams, and
GetParams. We use object semantics to manipulate the files. Since file
manipulation is not essential to a GA, we will not discuss it further. The
class declaration GA, its member variables, and its constructor follow.
public abstract class GA extends Object
{
protected int GA_numChromesInit;
protected int GA_numChromes;
protected int GA_numGenes;
protected double GA_mutFact;
protected int GA_numIterations;
protected ArrayList<Chromosome> GA_pop;
public GA(String ParamFile)
{
GetParams GP = new GetParams(ParamFile);
Parameters P = GP.GetParameters();
GA_numChromesInit = P.GetNumChromesI();
GA_numChromes = P.GetNumChromes();
GA_numGenes = P.GetNumGenes();
GA_mutFact = P.GetMutFact();
GA_numIterations = P.GetNumIterations();
GA_pop = new ArrayList<Chromosome>();
}
//Remaining methods implemented below.
}
The first two lines of the constructor create the objects necessary to read
the parameter files. The succeeding lines, except the last, read the file and

Luger_all_wcopyright_COsfixed.pd414 414 5/15/2008 6:37:31 PM


Chapter 28 Genetic and Evolutionary Computing 399

store the results in class GA’s members variables. The final line creates the
data structure that is to house the population. Since an ArrayList is an
expandable collector, there is no need to fix the size of the array in
advance.
Class GA can do all of those things common to all of its subclasses. Unless
you are a very careful designer, odds are that you will not know what is
common to all of the subclasses until you start building prototypes. Object-
oriented techniques accommodate an iterative design process quite nicely.
As you discover more methods that can be shared across subclasses, simply
push them up a level to the superclass and recompile the system.
Superclass GA performs general housekeeping tasks along with work
common to all its subclasses. Under housekeeping tasks, we want a super
class GA to display the entire population, its parameters, a chromosome,
and the best chromosome within the population. We also might want it to
tidy up the population by removing those chromosomes that will play no
part in evolution. This requires a little explanation. Two of the parameters
are GA_numChromesInit and GA_numChromes. Performance of a
GA is sometimes improved if we initially generate more chromosomes
than are used in the GA itself (Haupt and Haput 1998). The first task, then,
is to winnow down the number of chromosomes from the number initially
generated (GA_numChromesInit) to the number that will be used
(GA_numChromes).
Under shared tasks, we want the superclass GA to create, rank, and mutate
the population. The housekeeping tasks are very straightforward. The
shared method that initializes the population follows:
protected void InitPop()
{
Random rnum = new Random();
char letter;
for (int index = 0;
index < GA_numChromesInit; index++)
{
Chromosome Chrom =
new Chromosome(GA_numGenes);
for (int j = 0; j < GA_numGenes; j++)
{
letter = (char)(rnum.nextInt(26) + 97);
Chrom.SetGene(j,letter);
}
Chrom.SetCost(0);
GA_pop.add(Chrom);
}
}
Initializing the population is clear enough, though it does represent a
design decision. We use a nested for loop to create and initialize all genes

Luger_all_wcopyright_COsfixed.pd415 415 5/15/2008 6:37:32 PM


400 Part IV: Programming in Java

within a chromosome and then to add the chromosomes to the population.


Notice the use of Java’s pseudo-random number generator. In keeping
with the object-oriented design, Random is a class with associated
methods. rnum.nextInt(26) generates a pseudo-random number in
the range [0..25]. The design decision is to represent genes as characters.
This is not as general as possible, an issue mentioned earlier and addressed
in the exercises. We add 97 to the generated integer, because the ASCII
position of ‘a’ is 97. Consequently, we transform the generated integer to
characters in the range [‘a’..’z’].
Ranking the population, shown next, is very simple using the sort
method that is part of the static class, Collections. A static class is
one that exists to provide services to other classes. In this case, the
methods in Collections operate on and return classes that implement
the Collection Interface. An interface in Java is a set of specifications that
implementing classes must fulfill. It would have been possible to design GA
as an Interface class, though the presence of common methods among
specific genetic algorithms made the choice of GA as a superclass a more
intuitively clear design. Among the many classes that implement the
methods specified in the Collection interface is ArrayList, the class we
have chosen to represent the population of chromosomes.
protected void SortPop()
{
Collections.sort(GA_pop, new CostComparator());
}
private class CostComparator
implements Comparator <Chromosome>
{
int result;
public int compare(Chromosome obj1,
Chromosome obj2)
{
result = new Integer(obj1.GetCost()).
compareTo(new Integer(obj2.GetCost()));
return result;
}
}
Collections.sort requires two arguments, the object to be sorted—
the ArrayList containing the population—and the mechanism that will
do the sorting:
Collections.sort(GA_pop, new CostComparator());
The second argument creates an instance of a helper class that implements
yet another interface class, this time the Comparator interface. The second
object is sometimes called the comparator object. To implement the
Comparator interface we must specify the type of the objects to be
compared—class Chromosome, in this case—and implement its
compare method. This method takes two chromosomes as arguments,

Luger_all_wcopyright_COsfixed.pd416 416 5/15/2008 6:37:32 PM


Chapter 28 Genetic and Evolutionary Computing 401

uses the method GetCost to extract the cost from the chromosome, and
the compareTo method of the Integer wrapper class to determine which
of the chromosomes costs more. In keeping with OO, we give no
consideration to the specific algorithm that Java uses. Java documentation
guarantees only that the Comparator class “imposes a total ordering on
some collection of objects” (Interface Comparator 2007).
Mutation is the last of the three shared methods that we will consider.
The fraction of the total number of genes that are to be mutated per
generation is a design parameter. The fraction of genes mutated depends
on the size of the population, the number of genes per chromosome, and
the fraction of the total genes to mutate. For each of the mutations, we
randomly choose a gene within a chromosome, and randomly choose a
mutated value. There are two things to notice. First, we never mutate our
best chromosome. Second, the mutation code in GA is specific to genetic
algorithms where genes may be reasonably represented as characters. The
code for Mutation may be found on the Chapter 28 code library.
28.4 Conclusion: Complex Problem Solving and Adaptation
In this chapter we have shown how Darwin’s observations on speciation
can be adapted to complex problem solving. The GA, like other AI
techniques, is particularly suited to those problems where an optimal
solution may be computationally intractable. Though the GA might
stumble upon the optimal solution, odds are that computing is like nature
in one respect. Solutions and individuals must be content with having
solved the problem of adaptation only well enough to pass their
characteristics into the next generation. The extended example,
WordGuess, was a case in which the GA happens upon an exact
solution. (See the code library for sample runs). This was chosen for ease
of exposition. The exercises ask you to develop a GA solution to a known
NP-Complete problem.
We have implemented the genetic algorithm using object-oriented
programming techniques, because they lend themselves to capturing the
generality of the GA. Java was chosen as the programming language, both
because it is widely used and, because its syntax in the C/C++ tradition
makes it readable to those with little Java or, even, OO experience.
As noted, we have not discussed the classes SetParams, GetParams,
and Parameters mentioned in Section 28.3. These classes write to and
read from a file of design parameters. The source code for them can be
found in the auxiliary materials. Also included are instructions for using the
parameter files, and instructions for exercising WordGuess.
Chapter 28 was jointly written with Paul De Palma, Professor of Computer
Science at Gonzaga University.
Exercises
1. The traveling salesperson problem is especially good to exercise the GA,
because it is possible to compute bounds for it. If the GA produces a
solution that falls within these bounds, the solution, while probably not
optimal, is reasonable. See Hoffman and Wolfe (1985) and Overbay, et al.

Luger_all_wcopyright_COsfixed.pd417 417 5/15/2008 6:37:32 PM


402 Part IV: Programming in Java

(2007) for details. The problem is easily stated. Given a collection of cities,
with known distances between any two, a tour is a sequence of cities that
defines a start city, C, visits every city once and returns to C. The optimal
tour is the tour that covers the shortest distances. Develop a genetic
algorithm solution for the traveling sales person problem. Create, at least,
two new classes TSP, derived from GA, and TSPtst that sets the
algorithm in motion. See comments on mating algorithms for the traveling
salesperson problem in Luger (2009, Section 12.1.3).
2. Implement the Tournament pairing method of the class Pair.
Tournament chooses a subset of chromosomes from the population. The
most fit chromosome within this subset becomes Parent A. Do the same
thing again, to find its mate, Parent B. Now you have a breeding pair.
Continue this process until we have as many breeding pairs as we need.
Tournament is described in detail in Haupt and Haupt (1998). Does
WordGuess behave differently when Tournament is used?
3. As it stands, GA runs under command-line Unix/Linux. Use the
javax.swing package to build a GUI that allows a user to set the
parameters, run the program, and examine the results.
4. Transform the java application code into a java applet. This applet
should allow a web-based user to choose the GA to run (either
WordGuess or TSP), the pairing algorithm to run (Top-Down or
Tournament), and to change the design parameters
5. WordGuess does not make use of the full generality provided by
object-oriented programming techniques. A more general design would not
represent genes as characters. One possibility is to provide several
representational classes, all inheriting from a modified GA and all being
super classes of specific genetic algorithm solutions. Thus we might have
CHAR_GA inheriting from GA and WordGuess inheriting from CHAR-
GA. Another possibility is to define chromosomes as collections of genes
that are represented by variables of class Object. Using these, or other,
approaches, modify GA so that it is more general.
6. Develop a two-point crossover method to be included in class Mate.
For each breeding pair, randomly generate two crossover points. Parent A
contributes its genes before the first crossover and after the second to
Child A. It contributes its genes between the crossover points to Child B.
Parent B does just the opposite. See Haupt and Haupt (1998) for still other
possibilities.

Luger_all_wcopyright_COsfixed.pd418 418 5/15/2008 6:37:32 PM


29 Case Studies: Java Machine Learning
Software Available on the Web

Chapter This chapter provides a number of sources for open source and free machine
Objectives learning software on the web.
Chapter 29.1 Java Machine Learning Software
Contents

29.1 Java Machine Learning Software


There are many popular java-based open-source machine learning software
packages available on the internet. Several important and widely used ones
are described below.
Weka Weka is a Java-based open-source software distributed under the GNU
General Public License. It was developed at the University of Waikato in
Hamilton, New Zealand in 1993.
Weka is a very popular machine learning software that is widely used for
data-mining problems. The main algorithms implemented in Weka focus
on pattern classification, regression and clustering. Tools for data
preprocessing and data visualization are also provided. These algorithms
can either be directly applied to a dataset or be called from other Java code.
Weka algorithms can also be used as building blocks for implementing new
machine learning techniques.
http://www.cs.waikato.ac.nz/ml/weka/
ABLE ABLE is a freely-available Java-toolkit for agent-based machine learning
problems developed by the IBM T. J. Watson Research Center in
Yorktown Heights, NY.
The ABLE framework provides a library of packages, classes and interfaces
for implementing machine learning techniques like neural networks,
Bayesian classifiers and decision trees. It also provides a Rule Language for
rule-based inference using Boolean and fuzzy logic. The packages and
classes can be extended for developing custom algorithms. Support is also
provided for reading and writing text and database data, data
transformation and scaling and invocation of user-defined external
functions.
http://www.alphaworks.ibm.com/tech/able
JOONE JOONE (Java Object-Oriented Neural Engine) is a free java framework
for implementing, training and testing machine learning algorithms using
artificial neural networks (ANN). The software includes algorithms for
feed-forward neural networks, recursive neural networks, time-delay neural
networks, standard and resilient back propagation, Kohonen self-
403

Luger_all_wcopyright_COsfixed.pd419 419 5/15/2008 6:37:33 PM


404 Part IV: Programming in Java

organizing maps, Principal Component Analysis (PCA), and modular


neural networks.
JOONE components can be plugged into other software packages and can
be extended to design more sophisticated algorithms. It comes with a GUI
editor to visually create and test any neural network and a Distributed Test
Environment to train many neural networks in parallel and select the best
one for a given problem.
http://www.jooneworld.com/
LIBSVM LIBSVM (Library for Support Vector Machines) is an integrated software
solution for classification, regression and distribution-estimation using
support vector machines (SVM) developed at the National Taiwan
University. The source code is freely available with both C++ and Java
versions.
The main features of the LIBSVM software are different SVM
formulations, efficient multi-class classification, cross-validation for model
selection, probability estimates, weighted SVM for unbalanced data and
automatic model selection. It also includes a GUI and interfaces for other
languages like Python, R, MATLAB, Perl, Ruby and Common Lisp.
http://www.csie.ntu.edu.tw/~cjlin/libsvm/

Luger_all_wcopyright_COsfixed.pd420 420 5/15/2008 6:37:33 PM


30 The Earley Parser: Dynamic
Programming in Java

Chapter Sentence parsing using dynamic programming


Objectives Memoization of subparses
Retaining partial solutions (parses) for reuse
The chart as medium for storage and reuse
Indexes for word list (sentence)
States reflect components of parse
Dot reflects extent of parsing right hand side of grammar rule
Lists of states make up components of chart
Chart linked to word list
Java Implementation of an Earley parser
Context free parser
Deterministic
Chart supports multiple parse trees
Forward development of chart composes components of successful parse
Backward search of chart produces possible parses of the sentence
Chapter 30.1 Chart Parsing: An Introduction
Contents 30.2 The Earley Parser: Components
30.3 The Earley Parser: Java Code
30.4 The Completed Parser
30.5 Generating Parse Trees from Charts and Grammar Rules (Advanced Section)

30.1 Chart Parsing: An Introduction


The Earley parser (Earley 1970) uses dynamic programming to analyze
strings of words. Traditional dynamic programming techniques (Luger
2009, Section 4.1) use an array to save (memoize) the partial solutions of a
problem for use in the generation of subsequent partial solutions. In Earley
parsing this array is called a chart, and thus this approach to parsing
sentences is often called chart parsing.
In Chapter 9, Sections 1 and 2, we first presented the full algorithms along
with the evolving chart for Earley parsing. In these sections, we presented
pseudo-code, demonstrated the “dot” as a pointer indicating the current
state of the parse for each grammar rule, and explicitly considered the state
of the chart after each step of the algorithm. We refer to Sections 9.1, 9.2,
and Luger (2009, Section 15.2.2) for these specific details if there is any
concern about how the chart-parsing algorithm works. We feel that it is
also interesting to compare the data representation and control structures
used in the declarative Prolog environment, Chapter 9, with what we next
present with the object-oriented representations of Java.

Luger_all_wcopyright_COsfixed.pd421 421 5/15/2008 6:37:33 PM


406 Part IV: Programming in Java

30.2 The Earley Parser: Components


We first discuss the data representations required by the Earley parsing
algorithm presented in Sections 9.1 and 9.2. Of course, in the present
chapter we will be using an object-oriented hierarchy to capture the
components of the parser. Consider what the pseudo-code requires:
• A sentence. The sentence needs to be in a format that
supports pointers to any word located in that sentence so that
appropriate grammar rules can be applied.
• A grammar. The Earley parser needs a set of (context free)
grammar rules that can be applied to interpret the components
of the sentence. The parser itself has no knowledge of the parts
of speech (POS) or production rules of the grammar.
• An evolving chart. The chart is used to save partial solutions
(accepted parts of the parse) for later use. Thus, the chart is
used to contain the states as they are produced during the
algorithm. These states need to be stored in the order of their
production and without repeats.
• The states. State will capture the current activity of the parser.
Thus it will need to be a container for the current rule, which
has a left-hand-side, LHS, and a right-hand-side, RHS. Besides
instantiating a particular rule, state must also have the current
(i, j) pair that presents the seen/unseen parts of the
sentence for that rule.
We next consider how each of these constituents of the algorithm is
represented as data structures in Java. In Section 30.3 we describe the
Earley parser itself that will utilize these components.
The Sentence First, we consider the set of descriptors that the sentence needs. The
primary thing we require is the ability to index into specific word locations
in the current sentence. This can be handled two ways: the use of a simple
representation, or the use of a class. The simple representation would be a
String array. This array would enable us easily to index to specific words
in the sentence. Everything that we need for the algorithm is present.
We could also use a class to represent each indexed sentence. If
Sentence is a class, we could incorporate other aspects of the sentence
into that class, such as the segmentation of a String into individual
words. If we were using the Earley algorithm in conjunction with another
algorithm (which is often required), we may need to create the Sentence
class so that we can separate the sentence’s parsing from other code.
For this presentation, we use the simple representation of a sentence as a
String array.
The Grammar For the grammar rule processing required by the Earley parser we create a
class. The application of each rule needs to know the specific rules of a
grammar, and which non-terminals are parts of speech. So that both
characteristics are easily contained, a Grammar class is a good choice. The
Grammar class will need two important methods: getRHS(String
lhs), and isPOS(String lhs):

Luger_all_wcopyright_COsfixed.pd422 422 5/15/2008 6:37:33 PM


Chapter 30 The Earley Parser: Dynamic Programming in Java 407

getRHS(String lhs) will return all RHS’s of the grammar rules for
any left-hand-side, lhs. If there are not any such rules, then it will indicate
such.
isPOS(String lhs) will return true or false based on whether
or not a component of the lhs is a part of speech.
To make the Grammar class easier to extend to more complicated
grammars, the Grammar class itself does not instantiate any rules. It is a
basic framework for the two important methods, and defines how the rules
are contained and related. To create a specific grammar, the Grammar
class needs to be extended to a new class. This new class will instantiate all
the grammar rules. As just noted, the framework of the grammar rules is a
mapping between a LHS and RHS. For each rule there will be only one
LHS for each RHS, but it is likely in the full set of grammar rules that a
particular LHS will have several possible RHSs. Later in the chapter, the
exact framework for this matching is presented.
The Chart A chart is an ordered list of the successively produced components of the
parse. A major requirement is to determine whether any newly produced
possible component of the parse is already contained in the chart.
To make it easier to maintain the charts correctly and consistently, we
create a Chart class. We could have used a simpler structure, like
Vector, to contain the states of the parse as they are produced, but the
code to manipulate the Vector would then be distributed throughout the
parser code. This dispersed code makes it much harder to make
corrections, and to debug. When we create the Chart class, the code to
manipulate the chart will be identical for all uses, and since the code is all in
one place, it will be much easier to debug. Notice that the Chart class
represents a single chart, not the evolving set of states that are created by
the Earley algorithm. Since there is no additional functionality needed
beside that already discussed, we make a Chart array for the evolving set
of chart states, rather than making another class.
The States A state component for the parser has one left-hand-side, LHS, one right-
hand-side, RHS, for each rule that is instantiated, as well as indices from
the sentence String array, an (i j) pair. Because these components all
need to be represented, the easiest way to create the problem solving state,
is to make a State class. Since the State class supports the full Earley
algorithm, it will require get methods for returning the LHS, the RHS,
and the i j indices. Also, as seen in the pseudo-code of Section 9.2, we
need the ability to get the term after the dot in the RHS, as well as the
ability to determine whether or not the dot is in the last (terminal) position.
Methods to support these requirements must be provided.
Throughout our discussion, LHS and RHS have been mentioned, but not
their implementation. Since the Earley parser uses context-free grammar
rules, we create the LHS as a String. The RHS on the other hand, is a
sequence of terms, which may or may not include a dot. Due to the fact
that it is used in two separate classes, and the additional requirement of dot
manipulation, we separated the RHS into its own class.

Luger_all_wcopyright_COsfixed.pd423 423 5/15/2008 6:37:34 PM


408 Part IV: Programming in Java

30.3 The Earley Parser: Java Code


The Earley parser, which manipulates the components described in Section
30.2, will have its own class. This makes it easier to contain and hide the
details of the algorithm. The EarleyParser class can be implemented
in either of two ways.
First, the class could be static. When one wanted to parse a sentence, the
static method would be called, and with the grammar rules and the
sentence to be parsed as arguments, return a boolean indicating whether
the parse was successful. Alternatively, the chart itself could be returned
and examined to determine whether there was a successful parse. Second,
with the class not static, the EarleyParser would be instantiated with
a Grammar, and then a parse method could be called with a sentence.
After the parse method is called, another method would be called to obtain
the charts (if the parse method returns a boolean). We take this second
approach and start with the most basic class, the RHS class, and then work
our way towards creating the EarleyParser class.
The RHS Class The RHS is a String array containing a boolean that records whether
its terms contains a dot, an int recording the offset of the dot (this will
default to –1, indicating no dot), and a final static String
containing the representation of the dot. The constructor determines if
there is a dot and updates hasDot and dot accordingly.
public class RHS
{
private String[] terms;
private boolean hasDot = false;
private int dot = -1;
private final static String DOT = "@";
public RHS (String[] t)
{
terms = t;
for (int i=0; i< terms.length; i++)
{
if(terms[i].compareTo (DOT) == 0)
{
dot = i;
hasDot = true;
break;
}
}
}
// Additional methods defined below.
}

Luger_all_wcopyright_COsfixed.pd424 424 5/15/2008 6:37:34 PM


Chapter 30 The Earley Parser: Dynamic Programming in Java 409

RHS returns its terms, the String array, for use by the
EarleyParser, as well as the String prior to and after the dot. This
enables ease of queries by the EarleyParser regarding the terms in the
RHS of the grammar rule. For example,. EarleyParser gets the term
following the dot from RHS, and queries the Grammar to determine if
that term is a part of speech.
public String[] getTerms ()
{
return terms;
}
public String getPriorToDot ()
{
if(hasDot && dot >0)
return terms[dot-1];
return "";
}
public String getAfterDot ()
{
if(hasDot && dot < terms.length-1)
return terms[dot+1];
return "";
}
The final procedures required to implement RHS are manipulation of and
queries concerning the dot. The queries determine whether there is a dot,
and where the dot is located, last or first. When a dot is moved or added to
a RHS, a new RHS is returned. This is done because whenever a dot is
moved a new State must be created for it.
public boolean hasDot ()
{
return hasDot;
}
public boolean isDotLast ()
{
if(hasDot)
return (dot==terms.length-1);
return false;
}
public boolean isDotFirst ()
{
if(hasDot)
return (dot==0);
return false;
}

Luger_all_wcopyright_COsfixed.pd425 425 5/15/2008 6:37:34 PM


410 Part IV: Programming in Java

public RHS addDot ()


{
String[] t = new String[terms.length+1];
t[0] = DOT;
for (int i=1; i< t.length; i++)
t[i] = terms[i-1];
return new RHS (t);
}
public RHS addDotLast ()
{
String[] t = new String[terms.length+1];
for (int i=0; i< t.length-1; i++)
t[i] = terms[i];
t[t.length-1] = DOT;
return new RHS (t);
}
public RHS moveDot ()
{
String[] t = new String[terms.length];
for (int i=0; i< t.length; i++)
{
if (terms[i].compareTo (DOT)==0)
{
t[i] = terms[i+1];
t[i+1] = DOT;
i++;
}
else
t[i] = terms[i];
}
return new RHS (t);
}
There are two additional methods that we have not included here. These
are overrides methods of equals(Object o), and toString().
Equivalence indicates identical terms, and placement of the dot.
toString() is overridden to make it easier to print during debug, and
when the charts are printed. Next we present one of the two classes that
contain a RHS.
The Grammar The Grammar class does not instantiate the rules of a specific grammar. It
Class
contains a HashMap that links the left-hand-side (LHS) of a grammar rule,
which is a String, to an array of RHSs, and a Vector of Strings that
are the parts of speech of the grammar.

Luger_all_wcopyright_COsfixed.pd426 426 5/15/2008 6:37:34 PM


Chapter 30 The Earley Parser: Dynamic Programming in Java 411

public class Grammar


{
HashMap<String, RHS[]> Rules;
Vector<String> POS;
public Grammar ()
{
Rules = new HashMap<String, RHS[]>();
POS = new Vector<String>();
}
// Additional methods defined below.
}
The Grammar class supports two methods: one returning all the RHSs
associated with a LHS, and the second returning if a String is a part of
speech.
public RHS[] getRHS (String lhs)
{
RHS[] rhs = null;
if(Rules.containsKey (lhs))
{
rhs = Rules.get (lhs);
}
return rhs;
}
public boolean isPartOfSpeech (String s)
{
return POS.contains (s);
}
For EarleyParser to function, the Grammar class must be extended.
To do this we have created SimpleGrammar that demonstrates both
creation of the rules and how these are added to the rule list.
public class SimpleGrammar extends Grammar
{
public SimpleGrammar ()
{
super();
initialize();
}
private void initialize()
{
initRules();
initPOS();
}

Luger_all_wcopyright_COsfixed.pd427 427 5/15/2008 6:37:34 PM


412 Part IV: Programming in Java

private void initRules()


{
String[] s1 = {"NP", "VP"};
RHS[] sRHS = {new RHS(s1)};
Rules.put ("S", sRHS);
String[] np1 = {"NP","PP"};
String[] np2 = {"Noun"};
RHS[] npRHS = {new RHS(np1),
new RHS(np2)};
Rules.put ("NP", npRHS);
String[] vp1 = {"Verb","NP"};
String[] vp2 = {"VP", "PP"};
RHS[] vpRHS = {new RHS(vp1),
new RHS(vp2)};
Rules.put ("VP", vpRHS);
String[] pp1 = {"Prep","NP"};
RHS[] ppRHS = {new RHS(pp1)};
Rules.put ("PP", ppRHS);
String[] noun1 = {"John"};
String[] noun2 = {"Mary"};
String[] noun3 = {"Denver"};
RHS[] nounRHS = {new RHS(noun1),
new RHS(noun2),
new RHS(noun3)};
Rules.put ("Noun", nounRHS);
String[] verb = {"called"};
RHS[] verbRHS = {new RHS(verb)};
Rules.put ("Verb", verbRHS);
String[] prep = {"from"};
RHS[] prepRHS = {new RHS(prep)};
Rules.put ("Prep", prepRHS);
}
private void initPOS()
{
POS.add ("Noun");
POS.add ("Verb");
POS.add ("Prep");
}
}
The State class The State class contains a String representing the LHS of the rule, a
RHS that contains the dotted right-hand-side of the rule, and ints
describing Seen/UnSeen components. There are get methods, and
functions for handling the dot.

Luger_all_wcopyright_COsfixed.pd428 428 5/15/2008 6:37:34 PM


Chapter 30 The Earley Parser: Dynamic Programming in Java 413

public class State


{
private String lhs;
private RHS rhs;
private int i,j;
public State (String lhs, RHS rhs, int i, int j)
{
this.lhs = lhs;
this.rhs = rhs;
this.i = i;
this.j = j;
}
public String getLHS ()
{
return lhs;
}
public RHS getRHS ()
{
return rhs;
}
public int getI ()
{
return i;
}
public int getJ ()
{
return j;
}
public String getPriorToDot ()
{
return rhs.getPriorToDot ();
}
public String getAfterDot ()
{
return rhs.getAfterDot ();
}
public boolean isDotLast ()
{
return rhs.isDotLast ();
}
}

Luger_all_wcopyright_COsfixed.pd429 429 5/15/2008 6:37:35 PM


414 Part IV: Programming in Java

Finally, again, the function overrides of equals(Object o) and


toString()are not included. Equivalent states are identified when the
LHS, RHS, i, and j are all identical. toString()prints out the State
in a readable format.
The Chart Class The Chart class contains a Vector of States. These are the states
produced by the EarleyParser. The States are inserted into the
Vector in order; this is necessary for the algorithm.
public class Chart
{
Vector<State> chart;
public Chart ()
{
chart = new Vector<State>();
}
public void addState (State s)
{
if(!chart.contains (s))
{
chart.add (s);
}
}
public State getState (int i)
{
if(i < 0 || i >= chart.size ())
return null;
return (State)chart.get (i);
}
}
addState(State s) determines whether s is already within the
Chart. If s is not in the Chart, s is added to the end of Vector.
Nothing is done when s is already in the Chart. getState(int i)
returns the State at the i-th offset in Vector. There are checks to
enforce that i is a valid offset. toString()is overridden in Chart, in
addition to a get function that returns the size of the Chart.
30.4 The Completed Parser
We have now completed the design of the components of the Earley
parser. Figure 30.1 presents the object hierarchy that supports this design.
EarleyParser, which implements this design is presented in Section
30.4.1, while Section 30.4.2 describes main which presents two sentences
and produces their chart parses.

Luger_all_wcopyright_COsfixed.pd430 430 5/15/2008 6:37:35 PM


Chapter 30 The Earley Parser: Dynamic Programming in Java 415

Figure 30.1 The design hierarchy for the EarleyParser class.


The The EarleyParser class contains Grammar describing the grammar
EarleyParser
Class
rules for parsing the perspective sentence. It also creates String, an array
containing the sentence to be parsed, and Chart, an array containing the
evolving states of the chart. sentence and Chart will change with each
call of parseSentence(…). Note that each of the methods of the
EarleyParser class reflect the design components of Sections 30.2 and 30.3
and presented in Figure 30.1.
public class EarleyParser
{
private Grammar grammar;
private String[] sentence;
private Chart[] charts;
public EarleyParser (Grammar g)
{
grammar = g;
}
public Grammar getGrammar ()
{
return grammar;
}
public Chart[] getCharts ()
{
return charts;
}
// Additional methods defined below.
}
parseSentence(…) takes the sentence to be parsed, uses it to
initialize Chart to have the number of words in the sentence + 1 number
of chart states, and makes the dummy start state (“$  @ S”, 0,
0) the first chart state. parseSentence then iterates through each of

Luger_all_wcopyright_COsfixed.pd431 431 5/15/2008 6:37:35 PM


416 Part IV: Programming in Java

the charts, and for every State in a Chart, checks to determine which
procedure is called next: completer(...) - the dot is last,
scanner(...) - the term following the dot is a part of speech, or
predictor(...) - the term following the dot is not a part of speech.
After all charts are visited, if the last State added to the last Chart is a
finish state, (”$  S @”, 0, sentenceLength), the sentence
was successfully parsed.
public boolean parseSentence (String[] s)
{
sentence = s;
charts = new Chart[sentence.length+1];
for (int i=0; i< charts.length; i++)
charts[i] = new Chart ();
String[] start1 = {"@", "S"};
RHS startRHS = new RHS (start1);
State start = new State ("$",startRHS,0,0,null);
charts[0].addState (start);
for (int i=0; i<charts.length; i++)
{
for (int j=0; j<charts[i].size (); j++)
{
State st = charts[i].getState (j);
String next_term = st.getAfterDot ();
if (st.isDotLast ())
// State's RHS = ... @
completer (st);
else
if(grammar.isPartOfSpeech (next_term))
// RHS = ... @ A ..., where A is a part of speech.
scanner (st);
else
predictor (st); // A is NOT a part of speech.
}
}
// Determine whether a successful parse.
String[] fin = {"S","@"};
RHS finRHS = new RHS (fin);
State finish = new State ("$",finRHS,
0,sentence.length,null);
State last = charts[sentence.length].getState
(charts[sentence.length].size ()-1);
return finish.equals (last);
}

Luger_all_wcopyright_COsfixed.pd432 432 5/15/2008 6:37:36 PM


Chapter 30 The Earley Parser: Dynamic Programming in Java 417

We next create the predictor, scanner, and completer


procedures. First, the predictor(State s) adds all rules for the
term after the dot in s to the j-th slot in chart.
private void predictor (State s)
{
String lhs = s.getAfterDot ();
RHS[] rhs = grammar.getRHS (lhs);
int j = s.getJ ();
for (int i=0; i< rhs.length; i++)
{
State ns = new State (lhs, rhs[i].addDot (),
j, j, s);
charts[j].addState (ns);
}
}
scanner(State s) determines whether the part of speech term
following the dot in s has a RHS that contains only the j-th word in the
sentence. If so, this new state is added to the (j+1)-th chart.
private void scanner (State s)
{
String lhs = s.getAfterDot ();
RHS[] rhs = grammar.getRHS (lhs);
int i = s.getI ();
int j = s.getJ ();
for (int a=0; a< rhs.length; a++)
{
String[] terms = rhs[a].getTerms ();
if (terms.length == 1 &&
j < sentence.length &&
terms[0].compareToIgnoreCase
(sentence[j]) == 0)
{
State ns = new State (lhs,
rhs[a].addDotLast (), j, j+1, s);
charts[j+1].addState (ns);
}
}
}
Finally, completer(State s) determines whether any state in the
i-th chart slot has a term following the dot that is the same as the LHS of
s. If so, new States based on those terms found are created with a
moved dot, and an updated j.

Luger_all_wcopyright_COsfixed.pd433 433 5/15/2008 6:37:36 PM


418 Part IV: Programming in Java

private void completer (State s)


{
String lhs = s.getLHS ();
for (int a=0; a<charts[s.getI ()].size (); a++)
{
State st = charts[s.getI ()].getState (a);
String after = st.getAfterDot ();
if(after != null &&
lhs.compareTo (after)==0)
{
State ns = new State (st.getLHS (),
st.getRHS ().moveDot (),
st.getI (), s.getJ (), s);
charts[s.getJ ()].addState (ns);
}
}
}
After EarleyParser completes parseSentence(…), the
getCharts() method is called. These charts in conjunction with the
grammar rules can be used to determine the parse trees of the sentence.
We discuss methods for doing this in Section 30.5.
The We next create methods that contain sentences that test
EarleyParser:
A Test Run
EarleyParser. In our example, the included Main, contains two
sentences and then uses SimpleGrammar to parse them. It then prints
out the sentences and the associated Charts for each.

public class Main


{
public static void main (String[] args)
{
String[] sentence1 =
{"John", "called", "Mary"};
String[] sentence2 =
{"John", "called", "Mary",
"from", "Denver"};
Grammar grammar = new SimpleGrammar ();
EarleyParser parser =
new EarleyParser (grammar);
test (sentence1,parser);
test (sentence2,parser);
}

Luger_all_wcopyright_COsfixed.pd434 434 5/15/2008 6:37:36 PM


Chapter 30 The Earley Parser: Dynamic Programming in Java 419

static void test (String[] sent,


EarleyParser parser)
{
StringBuffer out = new StringBuffer ();
for (int i=0; i<sent.length-1;i++)
out.append (sent[i]+" ");
out.append (sent[sent.length-1]+".");
String sentence = out.toString ();
System.out.println (
"\nSentence: \""+sentence+"\"");
boolean successful =
parser.parseSentence (sent);
System.out.println (
"Parse Successful:" + successful);
Chart[] charts = parser.getCharts ();
System.out.println ("");
System.out.println (
"Charts produced by the sentence
\""+sentence+"\"");
for (int i=0; i<charts.length; i++)
{
System.out.println ("Chart "+ i + ":");
System.out.println (charts[i]);
}
}
}
30.5 Generating Parse Trees from Charts and Grammar Rules
(Advanced Section)
All the topics discussed, as well as all the data structures and algorithms we
have designed to this point in Chapter 30, have been used to build a chart
that indicates whether or not a set of grammar rules are sufficient for
parsing a string of words. In this final section we present some ideas for
extracting parse trees from the charts created and the grammar rules that
supported them. Thus, we have completed the forward component of the
forward/backward algorithm (Luger 2009, Section 4.1 and Section 15.2.2)
used in dynamic programming. We now present some ideas for completing
the backward component of the dynamic programming algorithm: how we
can use the chart and set of grammar rules to extract parse trees. We
consider this an advanced topic, and so will present only the main
components of an algorithm and leave its design as an exercise. Included
with the software is our implementation of the stack-based approach to
this problem.
To begin, we must determine how each state in the chart is created. One
method for accomplishing this is to list all the ways that each state of the

Luger_all_wcopyright_COsfixed.pd435 435 5/15/2008 6:37:36 PM


420 Part IV: Programming in Java

chart can be produced. Another approach is to record this information


when each of the states of the chart is first produced. We will discuss and
implement this second method. To record the sources of a State we can
add to the State class a Vector of States. This Vector will
contain all States that produced this State. To maintain this, when a
State is added to a Chart, if the State is already within the Chart,
merge the sources of the State within the Chart and the one we
attempted to add. This approach offers a method to look back through the
charts quickly to find the possible parse trees.
It is important to remember that more then one parse tree can often be
produced from Chart. An example is that the sentence “John called Mary
from Denver” has two interpretations (parses): John called Mary, who is
from Denver; and John called Mary, and John is in Denver. So however we
produce the parse trees, we need to decide if we will attempt to produce all
trees, or select only one. With appropriate forethought, it is easy to
produce all parse trees.
To generate parse trees, the backward component of the dynamic
programming algorithm, we begin with the final state (“$  S @”,
0, sentenceLength+1). For each source state for that final state,
we iterate through the following:
1. If the source state is the start state (“$  @ S”, 0, 0),
return the tree created. Otherwise continue.
2. Look at the current state we are evaluating. If the dot is last,
then add LHS to the tree as the left-most child of the current
evaluating node. We add it as the left-most child because we are
building the tree right to left. This means that we will find the
state with the last word of the sentence before any other
preceding words. We will find the right-most child first, and
want the subsequent children to be added to the left.
3. If the state’s LHS is a part of speech, POS, then add the RHS’s
first term as the only child of the node we just added. We are
guaranteed that the LHS was just added as a child, because for
this type of state there is a single term in the RHS and a final
dot. For example: “Noun  John @”. We have finished
evaluating this state, so move to its source state and continue
evaluation. There are two cases for this:
a. There is only one source. Easy! Use that one source.
b. There are multiple sources. Now we need to find the
one that matches how we have been building the tree.
To demonstrate what we mean, consider the more
complex sentence “Old men and women like dogs.”
The ambiguity for this sentence is: only the men are
old, or if both the men and women are old. “and” is a
conjunction, which is a part of speech so it would
match this rule. Here is the problem:
With 3b. we have finished with the state: (“Conj  and @”, 2,
3). The sources of this state are both (“NP  NP @ Conj NP”,

Luger_all_wcopyright_COsfixed.pd436 436 5/15/2008 6:37:36 PM


Chapter 30 The Earley Parser: Dynamic Programming in Java 421

0, 2) and (”NP  NP @ Conj NP”, 1, 2). Which one should


we use? First, notice that the only difference between the two source states,
is the location of i. (The i describes where the start of each rule is.) In
(“NP  NP @ Conj NP”, 0, 2), the start is at the beginning of
the sentence. For (“NP  NP @ Conj NP”, 1, 2), the start is
“men”. Thus the parse difference between “old (men) and women” and
“old (men and women)”. Furthermore, we need to consider where the
previous states we have used to make the parse tree are looking. If for this
particular tree, we had used (“NP  NP Conj @ NP”, 0, 3)
then we need to use (“NP  NP @ Conj NP”, 0, 2). For (“NP
 NP Conj @ NP”, 1, 3) we would use (“NP  NP @
Conj NP”, 1, 2). Therefore we must find the source state that
matches the rule ignoring the position of the dot (this should be off by
one), and the i.
4. At this point the current state may have been added to the tree,
or it may not have. In either case we need to determine whether
there are multiple sources for this state, and if there are, we
need to iterate across each of them to determine which of the
sources are valid. Before we do this, we need to update the
current evaluating node. Above, we mentioned a current
evaluating node. This is the node we are adding children to.
Before we can continue, we need to determine if this node
needs to change for the next iteration. There are three cases:
a. The current state’s dot is first. This means there are no
more children that need to be added to this node. So
we move to the parent of the current evaluating node.
b. If the current state’s LHS was just added to the tree and
it was not a part of speech, then we will want to be
adding the next nodes to the node we just added. So
the current evaluating node moves to its left-most
child.
c. Otherwise, continue to use this node. This happens if
we have just added a part of speech, POS, node and its
child.
5. Next, we must iterate through all of these sources, and for each
source state that meets the criteria, we attempt to continue
building the tree from that state. One of the following must be
true for this continuation to be accomplished:
a. The source state’s LHS is equal to the current state’s
term prior to the dot. Remember, we are moving from
right to left, and the dot is moving from right to left. So
if this is true, then the current state was generated
because the source state completed a rule (the dot
moved all the way to the right) and the
completer(…) method was called.
b. The source state’s RHS, with dot moved to the right,
and LHS matches on a state we have already evaluated.

Luger_all_wcopyright_COsfixed.pd437 437 5/15/2008 6:37:37 PM


422 Part IV: Programming in Java

6. If the criteria fail for all source states, then this was a dead end,
and no tree is returned. If any of the source states are valid,
start at step 1 with that state as the current state, and update the
current evaluating node. The accepted trees (there may be no
tree possible) are bundled together and returned.
From this algorithm, we can produce the multiple parse trees implicit in the
Earley algorithm’s successful production of the Chart. Example code
implementing this algorithm is included with the Chapter 30 support
materials.
The Earley parser code as well as the first draft of this chapter was written
by Ms Breanna Ammons, MS in Computer Science, University of New
Mexico.
Exercises
1. Describe the role of the dot within the right hand side of the grammar
rules processed by the Earley parser. How is the location of the dot
changed as the parse proceeds? What does it mean that the same right
hand side of a grammar rule can have dots at different locations?
2. In the Earley parser the input word list and the states in the state lists
have indices that are related. Explain how the indices for the states of the
state list are created.
3. Describe in your own words the roles of the predictor,
completer, and scanner procedures in the algorithm for Earley
parsing. What order are these procedures called in when parsing a sentence,
and why is that ordering important? Explain your answers to the order of
procedure invocation in detail.
4. Use the Java parser to consider the sentence “John saw the burglar with
the telescope”. Parse also “Old men and women like dogs”. Comment on
the different parses possible from these sentences and how you might
retrieve them from the chart.
5. Create a Sentence class. Have one of the constructors take a
String, and have it then separate the String into words.
6. Code the algorithm for production of parse trees from the completed
Chart. One method of recording the sources is presented in Section 30.5.
You may find it useful to use a stack to keep track of the states you have
evaluated for the parse tree.
7. Extend EarleyParser to include support for context-sensitive
(Luger 2009, Section 15.9.5) grammar rules. What new structures are
necessary to guarantee constraints across subtrees of the parse?

Luger_all_wcopyright_COsfixed.pd438 438 5/15/2008 6:37:37 PM


31 Case Studies: Java Natural Language
Tools Available on the Web

Chapter This chapter provides a number of sources for open source and free atural
Objectives language understanding software on the web.
Chapter 31.1 Java NLP Software
Contents 31.2 LingPipe from the University of Pennsylvania
31.3 The Stanford Natural Language Processing Group Software
31.4 Sun’s Speech API

31.1 Java Natural Language Processing Software


There are several popular java-based open-source natural language
understanding software packages available on the internet. We describe
three of these in Chapter 31. It must be remembered both that these web
sites may change over time and that new sites will appear focusing on
problems in NLP.
31.2 LingPipe from the University of Pennsylvania
Background LingPipe is an available Java resource from Alias-I, http://www.alias-
i.com/lingpipe/. Alias-i began in 1995 as a collaboration of students at
University of Pennsylvania. After competing in different events (including
DARPA MUC-6), the group was awarded a research contract under the
TIDES (Trans-Lingual Information Detection Extraction and
Summarization) program. Starting as Baldwin Language Technologies, the
company’s name later changed to Alias-i. LingPipe was used in two of
Alias-i’s products, FactTracker and ThreatTracker. In 2003, LingPipe was
released as open source software with commercial licenses available as well.
LingPipe contains many tools for linguistic analysis of human language,
including tools for sentence-boundary detection; and a part-of-speech
tagger and phrase chunker.
LingPipe is easy to download from the website. The download contains
demos, documentation and there are models available to download as well.
On the website there are tutorials, documentation, and a FAQ. Also there
are links to the community of LingPipe consumers. This includes a listing
of some commercial customers, as well as research patrons. There is a
newsgroup for discussion as well as a blog for being kept up to date on the
suite.
We next take a look at some of the tools provided by LingPipe.
Sentence- To start with, there are tutorials contained in the download for the
boundary
Detection
sentence-boundary detection classes. These tutorials contain example
programs that use and extend the com.aliasi.sentences classes. If you follow
423

Luger_all_wcopyright_COsfixed.pd439 439 5/15/2008 6:37:37 PM


424 Part IV: Programming in Java

the tutorials, you will get suggestions for how some of the classes can be
extended to do sentence detection for other corpora.
The AbstractSentenceModel class contains the basic functionality needed to
detect sentences. When extending this class, definitions of possible stops,
impossible penultimates, and impossible starts are needed. Possible stops
are any token that can be placed at the end of a sentence. This includes ‘.’
and ‘?’. Impossible penultimates are tokents that cannot precede an end of
the sentence. An example would be ‘Mr’ or ‘Dr’. Impossible starts are
normally punctuation that should not start a sentence and should be
associated with the end of the last sentence. These can be things like end
quotes.
The AbstractSentenceModel is already extended to the HeuristicSentenceModel,
which is extended to the IndoEuropeanSentenceModel, and the
MedlineSentenceModel. These last two provide good examples of definitions
for the possible stops, impossible penultimates and the impossible starts.
From these examples, HeuristicSentenceModel can be extended to suit a data
set. After creating an example set with known sentence boundaries,
running the evaluator contained in the tutorials for the sentences class
gives an idea of fallacies of the current model. From the evaluator’s output
files, corrections can be made to the possible stops, impossible
penultimates and impossible starts. Be careful though; when attempting to
remove all false positives and all false negatives, the definitions can be
come too rigid and cause more errors when run with more then the
example set. So try to find a good balance.
Within the download, the AbstractSentenceModel is only extended to the
HeuristicSentenceModel. This does not mean that you must use the
HeuristicSentenceModel. The HeuristicSentenceModel can be used as an example
to create a new class that extends only AbstractSentenceModel. Therefore if
you have a different type of model that you would like to use, extend
AbstractSentenceModel and try it out.
Part-of-speech The part-of-speech (POS) tagger is a little more complicated then the
Tagger
sentences classes. To use it the POS tagger must be trained. After it is
trained, the tagger can be used to produce a couple of different statistics
about its confidence of the tags it applies to input. In the download there
are examples of code for the Brown, Genia and MedPost corpora. The
classes used in making a POS tagger come from the com.aliasi.hmm package.
The tagger is a HmmDecoder defined by a HiddenMarkovModel.
To train a tagger, you need first a corpus or test set that has been tagged.
Using this tagged set, the HmmCharLmEstimator (in the com.aliasi.hmm
package) can read the training set and create a HiddenMarkovModel. This
model can be used immediately, or it can be written out to file and used at
a later time. The file can be useful when evaluating different taggers. For
each test on different corpora, the exact tagger can be used without having
to recreate it each time.
Now that we have a tagger, we can use it to tag input. Within one
HmmDecoder there are a couple of different ways to tag; all are methods of
the decoder you create from the HiddenMarkovModel. Based on what kind
of information you need, the options are first-best, n-best and confidence-

Luger_all_wcopyright_COsfixed.pd440 440 5/15/2008 6:37:38 PM


Chapter 31 Web-based Java NLP software 425

based. First-best returns only the “best” tagging of the input. N-best
returns the first n “best” taggings. Confidence-based results are the entire
lattice of forward/backward scores.
Provided in the tutorials is an evaluator of taggers. This uses pre-tagged
corpora and trains a little, then evaluates a little. It parses reference
taggings, uses the model to tag, and evaluates how well the model did. The
reference tags are then added to the training data, and the parser moves on.
The arguments to the evaluator will determine how well the model learns
and how long it will take. Experiment with this package to see what is
appropriate for your own data set.
A tagger produced by this package could be used in other algorithms.
Whether as tags needed for the algorithm or as a source to produce a
grammar, this POS tagger is useful. A future project might be to create a
parse tree from the POS tagger, but that functionality is not within
LingPipe. An exercise might be to extend LingPipe to create parse trees.
31.3 The Stanford Natural Language Processing Group
The Stanford NLP group is a team of faculty, postdoctoral researchers,
graduate, and undergraduate students, members from both the Computer
Science and Linguistics departments. The site http://nlp.stanford.edu
describes the team members, their publications, and the libraries that can
be downloaded.
Exploring this Stanford website, the reader finds, as of January 2008, six
Java libraries available for work in natural language processing. These
include a parser as well as a part-of-speech tagger. Although the
information contained in the introduction for each package is extensive
and contains a set of “frequently asked questions”, the code
documentation is often sparse without sufficient design documentation.
The libraries are licensed under the full GNU Public License, which means
that they are available for research or free software projects.
The Stanford The parser makes up a major component of the Stanford NLP website.
Parser
There is background information for the parser, an on-line demo, and
answers for frequently asked questions. The Stanford group refers to their
parser as “a program that works out the grammatical structure of
sentences”. The software package includes an optimized probabilistic
context-free grammar (Luger 2009, Section 15.4).
Within the download of the Stanford parser is a package called
edu.stanford.nlp.parser. This parser interface contains two functions: One
function determines whether the input sentence can be parsed or not. The
other function determines whether the parse meets particular parsing goals,
for example, that it is a noun phrase followed by a verb phrase sentence.
There are also a number of sub-interfaces provided, including ViterbiParser ,
see Chapter 30, and KBestViterbiParser, the first supporting the most likely
probabilistic parse of the sentence and the second giving the K best parses,
where all parses are returned with their “scores”.
Within the interface edu.stanford.nlp.parser there is a further interface,
edu.stanford.nlp.parser.lexparser, which supports parsers for English,

Luger_all_wcopyright_COsfixed.pd441 441 5/15/2008 6:37:38 PM


426 Part IV: Programming in Java

German, Chinese, and Arabic expressions. There are also classes, that once
implemented, can be used to train the parser for use on other languages.
To train the parser, a training set needs to include systematically annotated
data, specifically in the form of a Treebank (a corpus of annotated data that
explicitly specifies parse trees). Once trained the parser contains three
components: grammars, a lexicon, and a set of option values. The grammar
itself consists of two parts, unary (NP = N) and binary (S = NP VP)
rewrite rules. The lexicon is a list of lexical (word) entries) preceeded by a
keyword followed by a raw count score. The options are persistent
variable-value pairs that remain constant across the training and parsing
stages.
The Stanford tools also include a GUI for easy visualization of the trees
produced through parsing the input streams. The training stages require
much more time and memory than using the already trained parser. Thus,
for experimental purposes, it is convenient to use the already trained
parsers, although there is much that can be learned by stepping through the
creation of a parser.
Named-Entity A program that performs named–entity recognition (NER) locates and
Recognition
classifies elements of input strings (text) into predefined categories. For the
Stanford NER the primary categories for classification are name,
organization, location, and miscellaneous. There are two recognizers, the first
classifying the first three of these categories and trained on a corpus
created from data from conference proceedings. The second is trained on
more specific data, the proceedings from one conference.
Using the text classifiers is straightforward. They can be run either as
embedded in a larger program or by command line. When run as part of a
program the classifier is read in using a function associated with
CRFClassifier. This function returns an AbstractSequenceClassifier that uses
methods to classify the contents of a String. An example of one (of the
three possible) output formats, called /Cat is: My/O name/O is/O
Bill/PERSON Smith/PERSON ./O. /O indicates that the text string is
not recognized as a named-entity. There are a number of issues involved in
this type classification, for example that at this point Bill Smith is not
recognized as the name of a single person but rather as two consecutive
PERSON tokens. When working with this type pattern matching it is
important to monitor issues in over-learning and under-learning: when one
pattern matching component is created to fit a complex problem situation,
another set of patterns may not then be classified properly.
Unfortunately the documentation for the Named-Entity package is
minimal. Although it contains a set of JavaDocs they can be both wrong
(referring to classes that are not included) or simply unhelpful.
31.4 Sun’s Speech API
To this point we have focused on Java-based natural language processing
tools analyzing written language. Speech recognition and synthesis are also
important components of NLP. To assist developers in these areas Sun
Microsystems has created an API for speech. This Java API can be found
at http://research.sun.com/speech. From this page there is also a link to a

Luger_all_wcopyright_COsfixed.pd442 442 5/15/2008 6:37:38 PM


Chapter 31 Web-based Java NLP software 427

free speech recognizer developed using this API at Carnegie Mellon


University, as well as a speech written by Sun that is based on Flite, a
speech synthesis engine also developed at Carnegie Mellon University. To
run programs written in the Java Speech API needs a compliant speech
recognizer and synthesizer, audio hardware for output and a microphone
for input.
The API contains three packages: speech, speech recognition, and speech
synthesis. The speech package contains several packages and interfaces
used by both the recognition and synthesis systems. The main interface is
an Engine that is the parent interface for all speech systems. The engine
contains the procedures for communicating with other classes as well as
allocation/deallocation methods for moving between states. These states
determine whether the engine has acquired resources sufficient for
executing a function. The engine also provides methods to pause and
resume play and to access all properties including, listeners, audio, and
vocabulary managers. The speech class also contains procedures for
listeners as well as a Word class that contains the written and spoken
pronunciation forms for words. The collection of words, the vocabulary, is
controlled by the Vocabulary manager.
The main class of the speech recognizer is Recognizer. An instance of
recognizer creates listener events and passes them to all registered event
listeners much the same way as action listeners work for GUI applications.
The events are either accepted or rejected based on sets of grammar rules.
There are two forms of grammar rules: rule-based and dictation. Dictation
rules offer fewer constraints on what can be said with a resulting higher
cost in computational resources an often lesser quality results. Rule-based
grammars are constrained to the Java speech grammar format (JSGF) and
as a result impose a greater constraint on the recognizer. They also require
fewer resources with a reasonable freedom for expressions. A tutorial for
the JSGF is located at http://java.sun.com/products/java-media/
speech/forDevelopers/JSGF and can be used to create grammars.
Once a grammar is created it is passed to a recognizer and activated. Then
the recognizer begins processing and sending out events to all registered
listeners. Sample applications are linked to the previously mentioned web
site.
The Java speech API also contains a package for synthesis. Analogous to
the recognizer, the synthesizer package contains a Synthesis class. The
synthesizer is able to speak instances of the Speakable class in a voice
constructed by an instance of the Voice class. This class contains both make
and female, as well as young, middle-aged, and older voices. It’s only task is
to output a “speakable” text, as defined by a Java speech markup language
(JSML) specification. These specifications can be found at
http://java.sun.com/products/java-media/speech/forDevelopers/JSML/.
Demonstrations of the Sun speech synthesizer are available at
http://fretts.sourceforge.net/docs/index.php. where the source code can
be downloaded. The Sun speech API comes with a wealth of
documentation and example source code. Which is fairly transparent and
easy to follow.

Luger_all_wcopyright_COsfixed.pd443 443 5/15/2008 6:37:38 PM


428 Part IV: Programming in Java

There are a number of other web-based sources that support tasks in


natural language text and speech understanding. These range from
phoneme capture, the development of word and language models using
probabilistic finite-state acceptors and various forms of Markov models.
There are also parsers and recognizers of sentence structures as well as
more examples of speech recognizers and synthesizers. There are also a
number of tools available for speech to text conversion. Besides tools in
Java, many also exist in other languages including C++ and Python.

Luger_all_wcopyright_COsfixed.pd444 444 5/15/2008 6:37:39 PM


PART V: Conclusion: Model Building and the
Master Programmer

The limits of my language mean the limits of my world…


— Ludwig Wittgenstein, “Tractatus Logico-Philosophicus”
Theories are like nets: He who casts, captures…
— Ludwig Wittgenstein, “Tractatus Logico-Philosophicus”
The best you can do by Friday is a form of the best you can do…
— Charles Eames, Noted Twentieth Century Designer

We have come to the end of our task! In Part V we will give a brief
summary of our views of computer language use, especially in a
comparative setting where we have been able to compare and contrast the
idioms of three different language paradigms and their use in building
structures and strategies for complex problem solving. We begin Chapter
32 with a brief review of these paradigm differences, and then follow with
summary comments on paradigm based abstractions and idioms.
But first we briefly review the nature of the programming enterprise and
why we are part of it.
Well, first, we might say that programming offers monetary compensation
to ourselves and our dependents. But this isn’t really why most of us got
into our field. We authors got into this profession because computation
offered us a critical medium for exploring and understanding our world.
And, yes, we mean this in the large sense where computational tools are
seen as epistemological artifacts for comprehending our world and
ourselves.
We see computation as Galileo might have seen his telescope, as a medium
for exploring entities, relationships, and invariance’s never before perceived
by the human agent. It took Newton and his “laws of motion” almost
another century fully to capture Galileo’s insights. We visualize
computation from exactly this viewpoint, where even as part of our own
and our colleagues’ small research footprint we have explored complex
human phenomena including:
• Human subjects’ neural state and connectivity, using human
testing, fMRI scanning, coupled with dynamic Bayesian
networks and MCMC sampling, none of which would be
possible without computation.

429

Luger_all_wcopyright_COsfixed.pd445 445 5/15/2008 6:37:39 PM


430 Part V: Model building and the Master Programmer

• Patterns of expressed genes as components of the human


genome. These gene expression patterns are assumed to be at
the core of protein creation that enables and supports much of
the human animal’s metabolic system, including cortical activity
and communication.
• Real time diagnostics and prognostics on human and mechanical
systems. These complex tasks often require various forms of
hidden Markov models along with other stochastic tools and
languages.
• Understanding human language and voiced speech also requires
computational tools, including various stochastic tools and
models. Better language tools will require conditioning such
systems with realistic models of human understanding and
intention.
Of course this list could go on to include many of the exciting tasks that
make up the daily challenges of our readers. What is important is that we
see computer programming less in terms of the act of building tools, than
as a medium for creating and debugging models of the world – as an
epistemological medium.
We feel that there are (at least) two consequences of our thinking of
computation as an epistemological medium: First, as programmers we are
model builders. We use our data structures and search strategies to capture
state, relations, and invariance’s in our application domains. We come to
understand this domain through progressive approximation. And our
domains are rarely static, but change and evolve across time. Thus we often
require stochastic engines and probabilistic relationships to capture these
complex evolving phenomena.
Second, we explore our world by iterative approximation. When we build a
model, we make an approximation of some aspect of reality. The quality of
our model building is often seen through the lens of failure. As the
philosophers of science continue to remind us, good models are falsifiable.
It is through their failure points that we begin to appreciate our own failure
to comprehend aspects of the phenomena we wish to understand. When
our models are carefully designed and crafted, we can then deconstruct
them to address these failure points and attempt to expand our
understanding. Our increased understanding is then reflected in the next
iteration of our model building. Thus the iterative design methodology,
whether used by the individual programmer, or as is more often the case,
within the collaborating communities of groups of programmers is a
critical methodology in coming to understand our application domains.
We urge the reader to keep these ideas in mind in reading the final chapter
and its reprise of the book’s main themes of language-paradigm-based
abstractions and idioms of the master programmer.

Luger_all_wcopyright_COsfixed.pd446 446 5/15/2008 6:37:39 PM


32 Conclusion: The Master Programmer

Chapter This chapter provides a summary and discussion of the primary idioms and design
Objectives patterns presented in our book.
Chapter 32.1 Paradigm-Based Abstractions and Idioms
Contents 32.2 Programming as a Tool for Exploring Problem Domains
32.3 Programming as a Social Activity
32.4 Final Thoughts

32.1 Language Paradigm-Based Abstractions and Idioms


In the Introduction to this book, we stated that we wanted to do more
than simply demonstrate the implementation of key AI algorithms in some
of the major languages used in the field. We also wanted to explore the
ways that the problems we try to solve, the programming languages we
create to help in their solution, and the patterns and idioms that arise in the
practice of AI programming have shaped each other. We will conclude
with a few observations on these themes.
More than anything else, the history of programming languages is a history
of increasingly powerful, ever more diverse abstraction mechanisms. Lisp,
the oldest of the languages we have explored, remains one of the most
dramatic examples of this progression. Although procedural in nature, Lisp
was arguably the first to abstract procedural programming from such
patterns as explicit branching, common memory blocks, parameter passing
by reference, pointer arithmetic, global scoping of functions and variables,
and other structures that more or less reflect the underlying machine
architecture. By adopting a model based on the theory of recursive
functions, Lisp provides programmers with a cleaner semantics, including
recursive control structures, principled variable scoping mechanisms, and a
variety of structures for implementing symbolic data structures.
Like Lisp, Prolog bases its abstraction on a mathematical theory: in this
case, formal logic and resolution theorem proving. This allows Prolog to
abstract out procedural semantics almost completely (the left to right
handling of goals and such pragmatic mechanisms as the cut are necessary
exceptions). The result is a declarative semantics that allows programmers
to view programs as sets of constraints on problem solutions. Also,
because grammars naturally take the form of rules, Prolog has not only
proven its value in natural language processing applications, as well as a
tool for manipulating formal languages, such as compilers or interpreters.
Drawing in part on the lessons of these earlier languages, object-oriented
languages, such as Java, offer an extremely rich set of abstractions that
support the idea of organizing even the most ordinary program as a model

431

Luger_all_wcopyright_COsfixed.pd447 447 5/15/2008 6:37:39 PM


432 Part V: Model Building and the Master Programmer

of its application domain. These abstractions include class definitions,


inheritance, abstract classes, interfaces, packages, overriding of methods,
and generic collections. In particular, it is interesting to note the close
historical relationship between Lisp and the development of object
languages. Although Smalltalk was the first “pure” object-oriented
language, it was closely followed by many object-oriented Lisp dialects.
This relationship is natural, since Lisp laid a foundation for object-
orientation through such features as the ability to manipulate functions as
s-expressions, and the control over evaluation it gives the programmer.
Java has continued this development, and is particularly notable for
providing powerful software engineering support through development
environments such as Eclipse, and the large number of packages it
provides for user data structures, network programming, user interface
implementation, web-based implementation, Artificial Intelligence, and
other aspects of application development.
In addition to – or perhaps because of – their underlying semantic models,
all these languages support more general forms of abstraction. The
organization of programs around abstract data types, “bundles” of data
structures and operations on them, is a common device used by good
programmers – no matter what language they are using. Meta-linguistic
abstraction is another technique that is particularly important to Artificial
Intelligence programming. The complexity of AI problems clearly requires
powerful forms of problem decomposition, but the ill-formed nature of
many research problems defies such common techniques as top-down
decomposition. Meta-linguistic abstraction addresses this conundrum by
enabling programmers to design languages that are tailored to solving
specific problems. It tames hard problems by abstracting their key features
into a meta language, rather than decomposing them into parts. The
general search algorithms, expert system shells, learning frameworks,
semantic networks, and other techniques illustrated in this book are all
examples of meta-linguistic abstraction.
This diversity of abstraction mechanisms across languages underlies a
central theme of this book: the relationship between programming
languages and the idioms of their use. Each language suggests a set of
natural ways of achieving common programming tasks. These are refined
through practice and shared throughout the programmer community
through examples, mentoring, conferences, books, and all the mechanisms
through which any language idiom spreads. Lisp’s use of lists and
CAR/CDR recursion to construct complex data structures is one of that
language’s central idioms; indeed, it is almost emblematic of the language.
Similarly, the use of rule ordering in Prolog, with non-recursive terminating
statements preceding recursive rules appearing throughout Prolog
programs is on of that language’s key idioms. Object-oriented languages
rely upon a particularly rich set of idioms and underscore the importance
of understanding and using them properly.
Java, for example, adopted the C programming language syntax to improve
its learnability and readability (whether or not this was good idea continues
to be passionately debated). It would be possible for a programmer to write
Java programs that consisted of a single class with a static main method

Luger_all_wcopyright_COsfixed.pd448 448 5/15/2008 6:37:40 PM


Chapter 32 The Master Programmer 433

that called additional main methods in the class. This program might
function correctly, but it would hardly be considered a good java program.
Instead, quality Java programs distribute their functionality over relatively
large numbers of class definitions, organized into hierarchies by
inheritance, interface definitions, method overloading, etc. The goal is to
reflect the structure of the problem in the implementation of its solution.
This not only brings into focus the use of programming languages to
sharpen our thinking by building epistemological models of a problem
domain, but also supports communication among developers and with
customers by letting people draw on their understanding of the domain.
There are many reasons for the importance of idioms to good
programming. Perhaps the most obvious is that the idiomatic patterns of
language use have evolved to help with the various activities in the
software lifecycle, from program design through maintenance. Adhering to
them is important to gaining the full benefits of the language. For example,
our hypothetical “Java written as C” program would lack the
maintainability of a well-written Java program.
A further reason for adhering to accepted language idioms is for
communication. As we will discuss below, software development (at least
once we move beyond toy programs) is a fundamentally social activity. It is
not enough for our programs to be correct. We also want other
programmers to be able to read them, understand the reasons we wrote the
program as we did, and ultimately modify our code without adding bugs
due to a misunderstanding of our original intent.
Throughout the book, we have tried to communicate these idioms, and
suggested that mastering them, along with the traditional algorithms, data
structures, and languages, is an essential component of programming skill.

32.2 Programming as a Tool for Exploring Problem Domains


Idioms are also bound up – along with the related concept of design
patterns, also discussed below – with an idea we introduced in the book’s
introduction: programming languages as tools for thinking. In the early
stages of learning to program, the greatest challenges facing the student are
in translating a software requirement, usually a homework assignment, into
a program that works correctly. As we move into professional-level
research or software development, this changes. We are seldom given clear,
stable problem statements; rather, our job is to interpret a vague customer
need or research goal and project it into a program that meets our needs.
The languages we have addressed in this book are the product of many
person-decades of theoretical development, experience, and insight. They
are not only tools for programming computers, but also for refining our
understanding of problems and their solution.
Illustrating this idea of programming languages as tools for thinking has
been one of our primary goals in writing this book. Lisp is the oldest, and
still one of the best, examples of this. The s-expression syntax is ideally
suited for constructing symbolic data structures, and, along with the basic
cons/car/cdr operations, provides an elegant foundation for structures as

Luger_all_wcopyright_COsfixed.pd449 449 5/15/2008 6:37:40 PM


434 Part V: Model Building and the Master Programmer

diverse as lists, trees, frames, networks, and other types of knowledge


representation common to Artificial Intelligence. A search of early AI
literature shows the power of s-expressions as both a basis for symbolic
computing and for communication of theoretical ideas: numerous articles
on knowledge representation, learning, reasoning, and other topics use s-
expressions to state theoretical ideas as natural science uses algebra.
Prolog continues this tradition with its use of logical representation and
declarative semantics. Logic is the classic “tool for thinking,” giving a
mathematical foundation to the disciplines of clarity, validity, and proof.
Subtler is the idea of declarative semantics, of stating constraints on a
problem solution independently of the procedural steps used to realize
those constraints. This brings a number of benefits. Prolog programs are
notoriously concise, since the mechanisms of procedural computing are
abstracted out of the logical statement of problem constraints. This
concision helps give clear formulation to the complex problems faced in
AI programming. Natural language understanding programs are the most
obvious example of this, but we also call the reader’s attention to the
relative ease of writing meta-interpreters in Prolog. This discipline of meta-
linguistic abstraction is a quintessential way a language assists in our
thinking about hard problems.
Java’s core disciplines of encapsulation, inheritance, and method extension
also reflect a heritage of AI thinking. As a tool for thinking, Java brings
these powerful disciplines to problem decomposition and representation,
metalinguistic abstraction, incremental prototyping, and other forms of
problem solving. An interesting example of the subtle influence object-
oriented programming has on our thinking can be found in comparing the
declarative semantics of Prolog with the static structure of an object-
oriented program.
Although we have no “hard” data to prove this, our work as both
engineers and teachers has convinced us that the more experienced a Java
programmer becomes, the more classes and interfaces we find in their
programs. Novice programmers seem to favor fewer classes with longer
methods, most likely because they lack the rich language of idioms and
patterns used by skilled object-oriented designers. Breaking a program
down into a larger number of objects brings several obvious benefits,
including ease of debugging and validating code, and enhanced reuse.
Another benefit of this is a shift of program semantics from procedural
code to the static structure of objects and relations in the class structure.
For example, a well-designed class hierarchy with the use of overloaded
methods can eliminate many if-then tests in the program: the class
“knows” which method to use without an explicit test. For this reason,
Java programmers frown on the use of operators like instanceof to
test explicitly for class membership: the object should exploit inheritance to
call the proper method rather than use such tests.
The analogy of this to Prolog’s declarative semantics is useful: both
techniques move program semantics from dynamic execution to static
structure. The static structure of objects or assertions can be understood by
inspection of code, rather than by stepping through executions. It can be

Luger_all_wcopyright_COsfixed.pd450 450 5/15/2008 6:37:40 PM


Chapter 32 The Master Programmer 435

analyzed and verified in terms of things and relations, rather than the
complexities of analyzing the many paths a program can take through its
execution. And, it enhances the use of the programming language as a tool
for stating theoretical ideas: as a tool for thinking.
32.3 Programming as a Social Activity
As programming has matured as a discipline, we have also come to
recognize that teams usually write complex software, rather than a single
genius laboring in isolation. Both authors work in research institutions, and
are acutely aware that the complexity of the problems modern computer
science tackles makes the lone genius the exception, rather than the rule.
The most dramatic example of this is open-source software, which is built
by numerous programmers laboring around the world. To support this, we
must recognize that we are writing programs as much to be read by other
engineers as to be executed on a computer.
Software This social dimension of programming is most strongly evident in the
Engineering
and AI
discipline of software engineering. We feel it unfortunate that many
textbooks on software engineering emphasize the formal aspects of
documentation, program design, source code control and versioning,
testing, prototyping, release management, and similar engineering practices,
and downplay the basic source of their value: to insure efficient, clear
communication across a software development team.
Both of this book’s authors work in research institutions, and have
encountered the mindset that research programming does not require the
same levels of engineering as applications development. Although research
programming may not involve the need for tutorials, user manuals and
other artifacts of importance to commercial software, we should not forget
that the goal of software engineering is to insure communication. Research
teams require this kind of coordination as much as commercial
development groups. In our own practice, we have found considerable
success with a communications-focused approach to software engineering,
treating documentation, tests, versioning, and other artifacts as tools to
communicate with our team and the larger community. Thinking of
software engineering in these terms allows us to take a “lightweight”
approach that emphasizes the use of software engineering techniques for
communication and coordination within the research team. We urge the
programmer to see their own software engineering skills in this light.
Prototyping Prototyping is an example of a software engineering practice that has its
roots in the demands of research, and that has found its way into
commercial development. In the early days, software engineering seemed
to aim at “getting it right the first time” through careful specification and
validation of requirements. This is seldom possible in research
environments where the complexity and novelty of problems and the use
of programming as a tool for thinking precludes such perfection.
Interestingly, as applications development has moved into interactive
domains that must blend into the complex communication acts of human
communities, the goal of “getting it right the first time” has been rejected
in favor of a prototyping approach.

Luger_all_wcopyright_COsfixed.pd451 451 5/15/2008 6:37:41 PM


436 Part V: Model Building and the Master Programmer

We urge the reader to look at the patterns and techniques presented in this
book as tools for building programs quickly and in ways that make their
semantics clear – as tools for prototyping. Metalinguistic abstraction is the
most obvious example of this. In building complex, knowledge-based
systems, the separation of inference engine and knowledge illustrated in
many examples of this book allows the programmer to focus on
representing problem-specific knowledge in the development process.
Similarly, in object-oriented programming, the mechanisms of interfaces,
class inheritance, method extension, encapsulation, and similar techniques
provide a powerful set of tools for prototyping. Although often thought of
as tools for writing reusable software, they give a guiding structure to
prototyping. “Thin-line” prototyping is a technique that draws on these
object-oriented mechanisms. A thin-line prototype is one that implements
all major components of a system, although initially with limited
complexity. For example, assume an implementation of an expert-system
in a complex network environment. A thin-line prototype would include all
parts of the system to test communication, interaction, etc., but with
limited functionality. The expert system may only have enough rules to
solve a few example problems; the network communications may only
implement enough messages to test the efficiency of communications; the
user interface may only consist of enough screens to solve an initial
problem set, and so on.
The power of thin-line prototypes is that they test the overall architecture
of the program without requiring a complete implementation. Once this is
done and evaluated for efficiency and robustness by engineers and for
usability and correctness by end users, we can continue development with a
focused, easily managed cycle of adding functionality, testing it, and
planning. In our experience, most AI programs are built this way.
Reuse It would be nearly impossible to write a book on programming without a
discussion of an idea that has become something of a holy grail to modern
software development: code reuse. Both in industry and academia,
programmers are under pressure, not only to build useful, reliable software,
but also to produce useful, reusable components as a by-product of that
effort. In aiming for this goal, we should be aware of two subtleties.
The first is that reusable software components rarely appear as by-products
of a problem-specific programming effort. The reason is that reuse, by
definition, requires that components be designed, implemented, and tested
for the general case. Unless the programmer steps back from the problem
at hand to define general use cases for a component, and designs, builds,
tests, and documents to the general cases, it is unlikely the component will
be useful to other projects. We have built a number of reusable
components, and all of them have their roots in this effort to define and
build to the general case.
The second thing we should consider is that actual components should not
be the only focus of software reuse. Considerable value can be found in
reusing ideas: the idioms and patterns that we have demonstrated in this
book. These are almost the definition of skill and mastery in a programmer,
and can rightly be seen as the core of design and reuse.

Luger_all_wcopyright_COsfixed.pd452 452 5/15/2008 6:37:41 PM


Chapter 32 The Master Programmer 437

32.4 Final Thoughts


It has been our goal to give the reader an understanding of, not only the
power and beauty of the programming languages Prolog, Lisp, and Java,
but also of the intellectual depth involved in mastering them. This mastery
involves the languages syntax and semantics, the understanding of its
idioms of use, and the ability to project those idioms into the patterns of
design and implementation that define a well-written program.
In approaching this goal, we have focused on common problems in
Artificial Intelligence programming, and reasoned our way through their
solution, letting the idioms of language use and the patterns of program
organization emerge from that process. The power of idioms, patterns, and
other forms of engineering mastery is in their application, and they can
have as many realizations, as many implementations as there are problems
that they may fit. We hope our method and its execution in this book have
helped the student understand the deeper reasons, the more nuanced
habits of thinking and perception, behind these patterns. This is, to
paraphrase Einstein, less a matter of knowledge than of imagination.
We hope this book has added some fuel to the fires of our readers’
imaginations.

Luger_all_wcopyright_COsfixed.pd453 453 5/15/2008 6:37:41 PM


438 Part V: Model Building and the Master Programmer

Luger_all_wcopyright_COsfixed.pd454 454 5/15/2008 6:37:41 PM


Bibliography

Abelson, H. and Sussman, G. J., 1985. Structure and Interpretation of Computer Programs. Cambridge, MA:
MIT Press.
Alexander, C., Ishikawa, S., Silverstein, M., Jacobson, M., Fiksdahl-King, I. and Angel, S., 1977. A
Pattern Language. New York: Oxford University Press.
Bellman, R. E., 1956. Dynamic Programming. Princeton, NJ: Princeton University Press.
Brachman, R. J. and Levesque, H. J., 1985. Readings in Knowledge Representation, Los Altos CA: Morgan
Kaufmann.
Bundy, A., Byrd, L., Luger, G., Melish, C., Milne, R., and Stone, M. 1979. Solving Mechanics
Problems Using Meta-Inference. In Proceedings of IJCAI-1979, 1017-1027.
Chakrabarti, C., Rammohan, R., and Luger, G. F., 2005. A First-Order Stochastic Prognostic System
for the Diagnosis of Helicopter Rotor Systems for the US Navy. In Proceedings of the 2nd Indian
International Conference on Artificial Intelligence. Pune, India. Elsevier Publications.
Charniak, E., Riesbeck, C. K., McDermott, D.V., and Meehan, J.R., 1987. Artificial Intelligence
Programming, 2nd ed. Hillsdale, NJ: Erlbaum.
Church, A. (1941). The Calculi of Lambda-Conversion. Annals of Mathematical Studies 6 . Princeton NJ:
Princeton University Press.
Clocksin, W. F. and Mellish, C. S., 1984. Programming in Prolog. New York, Springer-Verlag.
Clocksin, W. F. and Mellish, C. S., 2003. Programming in Prolog: Using the ISO Standard. New York,
Springer.
Collins, A. and Quillian, M. R., 1969. Retrieval Time for Semantic Memory. Journal of Verbal Learning
& Verbal Behavior, 8: 240-247.
Colmerauer, A. H., 1975. Les Grammaires de Metamorphose, Groupe Intelligence Artificielle, Universite
Aix-Marseille II, France.
Colmerauer, A., H. Kanoui, H., 1973. Un Systeme de Communication Homme-machine en Francais. Groupe
Intelligence Artificiale, Université Aix-Marseille II, France.
Coplein, J. O. and Schmidt, D. C., 1995. Pattern Languages of Program Design. Reading, MA: Addison-
Wesley.
Dahl, V., 1977. Un Système Deductif d’Interrogation de Banques de Donnes en Espagnol, PhD thesis,
Université Aix-Marseille, France.
Dahl, V. and McCord, M.C. 1983. Treating Coordination in Logic Grammars. American Journal of
Computational Linguistics, 9:69–91.
Darwin, C., 2007. The Voyage of the Beagle. Retrieved 3/23/07 from
http://www.literature.org/authors/darwin-charles/the-voyage-of-the-beagle.
DeJong, G. and Mooney, R., 1986. Explanation-Based Learning: An Alternative View. Machine
Learning, 1(2); 145-176.
Dybvig, R. K., 1996. The Scheme Programming Language. Upper Saddle River, NJ: Prentice Hall.
Earley, J., 1970. An efficient context-free parsing algorithm. Communications of the ACM, 6(8): 451-455.
Eiben, A. E., Smith, J. E., 1998. Introduction to Evolutionary Computing. Berlin: Springer.
Evans, E., 1983. Domain Driven Design: Tackling Complexity in the Heart of Software. Upper Saddle River
NJ: Addison-Wesley.
Feigenbaum, E. A., and Feldman, J., eds., 1963. Computers and Thought. New York: McGraw-Hill.
Fikes, R. E., Hart, P. E., and Nilsson, N. J., 1972. Learning and Executing General Robot Plans.
Artificial Intelligence, 3(4): 251-288.

439

Luger_all_wcopyright_COsfixed.pd455 455 5/15/2008 6:37:42 PM


440 Bibliography

Fikes, R. E. and Nilsson, N. J., 1971. STRIPS: A New Approach to the Application of Theorem
Proving to Artificial Intelligence. Artificial Intelligence, 1(2): 189-208.
Forbus, K. D. and deKleer, J. 1993. Building Problem Solvers. Cambridge, MA: MIT Press.
Gamma, E., Helm, R., Johnson, R, and Vlissides, J., 1995. Design Patterns: Elements of Reusable Object-
oriented Software. Reading, MA: Addison-Wesley.
Ganzerli, S., De Palma, P., Smith, J. D., and Burkhart, M. F., 2003. Efficiency of Genetic Algorithms
for Optimal Structural Design Considering Convex Models of Uncertainty. Proceedings of the
Ninth International Conference on Statistics and Probability in Civil Engineering, Berkeley: 1003-1010.
Rotterdam, NL: Millpress Science Publishers.
Gazdar, G. and Mellish, C., 1989. Natural Language Processing in PROLOG: An Introduction to
Computational Linguistics. Reading, MA: Addison-Wesley.
GECCO, 2007. Genetic and Evolutionary Computing Conference, 2007. Retrieved 3/23/07 from
http://www.sigevo.org/gecco-2007.
Goldberg, D. E., 1989. Genetic Algorithms in Search Optimization and Machine Learning. New York:
Addison-Wesley.
Graham, P. 1993. On LISP: Advanced Techniques for Common LISP. Englewood Cliffs, NJ: Prentice Hall.
Graham, P. 1995. ANSI Common Lisp. Englewood Cliffs, NJ: Prentice Hall.
Halcolm, J. R. and Shultz, R., 2005. Tau: A web-deployed hybrid prover for first-order logic with identity with
optional inductive proof. 12 April 2008, http://www.hsinfosystems.com/Tau_JAR.pdf
Hasemer, T. and Domingue, J., 1989. Common LISP Programming for Artificial Intelligence. Reading, MA:
Addison-Wesley.
Haupt, L. and Haupt, S., 1998. Practical Genetic Algorithms. New York: John Wiley and Sons.
Hayes, P., 1977. In Defense of Logic. Proceedings of IJCAI-77, Cambridge, MA: MIT Press.
Hermenegildo, M. And the Ciao Development Team, 2007. An Overview of The Ciao Multiparadigm
Language and Program Development Environment and its Design Philosophy. ECOOP Workshop on
Multiparadigm Programming with Object-Oriented Languages MPOOL 2007, July 2007.
Hill, P. and Lloyd, J., 1995. The Gödel Programming Language. Cambridge, MA: MIT Press.
Holland, J. H., 1975. Adaptation in Natural and Artificial Systems. Ann Arbor MI: University of
Michigan Press.
Jurafsky, D., and Martin, J. H., 2008. Speech and Language Processing (2nd ed), Upper Saddle River, NJ:
Prentice Hall.
Kedar-Cabelli, S. T. and McCarty, L. T., 1987. Explanation-Based Generalization as Resolution
Theorem Proving. Proceedings of the Fourth International Workshop on Machine Learning.
King, S. H., 1991. Knowledge Systems Through Prolog. Oxford: Oxford University Press.
Kowalski, R., 1979. Algorithm = Logic + Control. Communications of the ACM 22: 424-436.
Kowalski, R., 1979. Logic for Problem Solving. Amsterdam: North Holland.
Krzysztof, R. A. and Wallace, M., 2007. Constraint Logic Programming Using Eclipse. Cambridge UK:
Cambridge University Press.
Lloyd, J. W., 1984. Foundations of Logic Programming. New York: Springer Verlag.
Lucas, R., 1996. Mastering Prolog. London UK: UCL Press.
Luger, G. F., 2009. Artificial Intelligence: Structures and Strategies for Complex Problem Solving, Boston MA:
Addison-Wesley Pearson.
Maclean, N., 1989. A River Runs Through It, Chicago: University of Chicago Press.
Maier, D. and Warren, D. S., 1988. Computing with Logic: Logic Programming with Prolog. Boston MA:
Addison-Wesley.

Luger_all_wcopyright_COsfixed.pd456 456 5/15/2008 6:37:42 PM


Bibliography 441

Malpas, J., 1987. Prolog: A Relational Language and its Applications. Englewood Cliffs NJ: Prentice Hall.
McCarthy, J., 1960. Recursive functions of symbolic expressions and their computation by machine.
Communications of the ACM 3(4).
McCord, M. C., 1982. Using slots and modifiers in logic grammars for natural language. Artificial
Intelligence, 18:327–367.
McCord, M. C., 1986. Design of a Prolog based machine translation system. Proceedings of the Third
International Logic Programming Conference, London.
McCune, W. W. and Wos, L., 1997. Otter: The CADE-13 competition incarnations. Journal of
Automated Reasoning, 18(2): 211-220.
Milner, R., Tofte, M. , Harper , R., and MacQueen, D., 1997. The Definition of Standard ML (Revised).
Cambridge MA: MIT Press.
Minsky, M., 1975. A Framework for Representing Knowledge. In Brachman and Levesque (1985).
Minton, S., 1988. Learning Search Control Knowledge. Dordrecht: Kluwer Academic Publishers.
Mitchell, T. M., 1978. Version Spaces: An Approach to Concept Learning. Report No.STAN-CS-78-
711, Computer Science Dept., Stanford University.
Mitchell, T. M., 1979. An Analysis of Generalization as a Search Problem. Proceedings of IJCAI, 6.
Mitchell, T. M., 1982. Generalization as Search, Artificial Intelligence, 18(2): 203-226.
Mitchell, T. M., Keller, R. M., and Kedar-Cabelli, S. T., 1986. Explanation-Based Generalization: A
Unifying View. Machine Learning, 1(1): 47-80.
Mycroft, A. and O’Keefe, R. A., 1984. A Polymorphic Type System for Prolog. Artificial Intelligence,
23: 295-307.
Neves J. C. F. M., Luger, G. F., and Carvalho, J. M., 1986. A Formalism for Views in a Logic Data
Base. In Proceedings of the ACM Computer Science Conference, Cincinnati OH.
Newell, A., 1982. The Knowledge Level. Artificial Intelligence, 18(1): 87-127.
Newell, A. and Simon, H. A. 1976. Computer Science as Empirical Inquiry: Symbols and Search.
Communications of the ACM, 19(3):113–126.
Nilsson, N. J., 1980. Principles of Artificial Intelligence. Palo Alto, CA: Tioga.
O’Keefe, R., 1990. The Craft of PROLOG. Cambridge, MA: MIT Press.
O’Sullivan, B., 2003. Recent advances in constraints, Joint ERCIM/CologNet International
Workshop on Constraint Solving and Constraint Logic Programming. Lecture Notes in Computer
Science 2627, Berlin: Springer.
Overbay, S., Ganzerli, S., De Palma, P, Brown, A., Stackle, P., 2006. Trusses, NP-Completeness, and
Genetic Algorithms. Proceedings of the 17th Analysis and Computation Specialty Conference. St. Louis,
MO.
Paulson, L. C., 1989. Isabelle: The Next 700 Theorem Provers. Journal of Automated Reasoning 5: 383-
397.
Pereira, L. M. and Warren, D. H. D., 1980. Definite Clause Grammars for Language Analysis – A
Survey of the Formalism and a Comparison with Augmented Transition Networks. Artificial
Intelligence, 13:231–278.
Pless, D. and Luger, G. F., 2003. EM learning of product distributions in a first-order stochastic logic
language. Artificial Intelligence and Soft Computing: Proceedings of the IASTED International Conference.
Anaheim: IASTED/ACTA Press. Also available as University of New Mexico Computer Science
Technical Report TR-CS-2003-01.
Quinlan, J. R., 1986. Induction of Decision Trees. Machine Learning, 1(1):81–106.
Quinlan, J, R., 1996. Bagging, Boosting and C4.5. Proceedings AAAI 96. Menlo Park CA: AAAI Press.

Luger_all_wcopyright_COsfixed.pd457 457 5/15/2008 6:37:42 PM


442 Bibliography

Rajeev, S. and Krishnamoorthy, C. S., 1997. Genetic Algorithms-Based Methodologies for Design
Optimization of Trusses. Journal of Structural Engineering, 123 (3): 350-358.
Robinson, J. A., 1965. A Machine-Oriented Logic Based on the Resolution Principle. Journal of the
ACM, 12: 23-41.
Robinson, J. A. and Voronkov, A., 2001. Handbook of Automated Reasoning: Volume 1. Cambridge MA:
MIT Press.
Ross, P., 1989. Advanced Prolog. Reading, MA: Addison-Wesley.
Roussel, P., 1975. Prolog: Manuel de Reference et d'Utilisation. Luminy, France, Groupe d'Intelligence
Artificialle, Université d' Aix-Marseille.
Sakhanenko, N., Luger, G. F. and Stern, C. R., 2006. Managing Dynamic Contexts using Failure-
Driven Stochastic Models. Proceedings of FLAIRS Conference. Menlo Park CA: AAAI Press.
Seibel, P., 2005. Practical Common Lisp. Berkeley CA: Apress, Inc.
Shannon, C., 1948. A Mathematical Theory of Communication. Murray Hill NJ: Bell System Technical
Journal.
Shapiro, S. C., ed., 1987. Encyclopedia of Artificial Intelligence. New York: Wiley-Interscience.
Smith, J. B., 2006. Practical OCaml. Berkeley CA: Apress, Inc.
Somogyi, Z., Henderson, F., and Conway, T., 1995. Logic Programming for the real world. In
Proceedings of the Eighteenth Australasian Computer Science Conference, R Kotagiri (Editor), 1995,
Australian Computer Science Communications: Glenelg, South Australia. pp. 499-512.
Sowa, J. F., 1984. Conceptual Structures: Information Processing in Mind and Machine. Reading MA: Addison-
Wesley.
Steele, G. L., 1990, Common LISP: The Language, 2nd ed. Bedford, MA: Digital Press.
Sterling, L. and Shapiro, E., 1986. The Art of Prolog. Advanced Programming Techniques.
Cambridge MA: MIT Press.
Sussman, G. and Steele, G., 1975. SCHEME: An Interpreter for Extended Lambda Calculus, AI Memo
349, MIT Artificial Intelligence Laboratory, Cambridge, Mass.
Tanimoto, S. L., 1990. The Elements of Artificial Intelligence using Common LISP. New York: W.H. Freeman.
Touretzky, D. S., 1990. Common LISP: A Gentle Introduction to Symbolic Computation. Redwood City, CA:
Benjamin/Cummings.
Turing, A., 1948. Intelligent Machinery. A report to the National Physical Laboratory. London.
Van Le, T., 1993. Techniques of Prolog Programming with Implementation of Logical Negation and Quantified
Goals. New York: Wiley.
Walker, A., McCord, M., Sowa, J. F., and Wilson, W. G., 1987. Knowledge Systems and Prolog: A Logical
Approach to Expert Systems and Natural Language Processing. Reading, MA: Addison-Wesley.
Warren, D. H. D., Pereira, L. M. and Pereira, F., 1977. Prolog - the language and its implementation
compared with LISP. Proceedings, Symposium on AI and Programming Languages, SIG-PLAN Notices,
12(8).
Warren, D. H. D., Pereira, F. and Pereira, L. M., 1979. User's Guide to DEC-System 10 PROLOG.
Occasional Paper 15, Department of Artificial Intelligence, University of Edinburgh, UK.
Wilensky, R., 1986. Common LISPCraft, New York: Norton Press.
Winston, P. H., Binford, T. O., Katz, B, and Lowry, M., 1983. Learning Physical Descriptions from
Functional Definitions, Examples, and Precedents. Proceedings of National Conference on Artificial
Intelligence, Washington D.C., San Francisco: Morgan Kaufman. 433-439.
Winston, P. H. and Horn, B. K. P., 1984. LISP. Reading, MA: Addison-Wesley.

Luger_all_wcopyright_COsfixed.pd458 458 5/15/2008 6:37:42 PM


Index

8-puzzle 289-292 framework 288


ABLE 403 genetic algorithms 322, 389-402
and/or graph search 324-329 goal regression 102
automated reasoning 144-145 heuristics 6
best-first search 56-57, 291 ID3 271, 367-388
breadth-first search 54-56, 289-291 Idiom 1, 3-11, 16
C# 15 immutable object 373-374
C++ 15, 270, 271, 273 inductive bias 93
candidate elimination algorithm 89-100 inductive learning 367-388
case frame 110 inference engine 9
certainty factors 73-81, 351-357 information theory 385-387
chart parsing see Earley Parser inheritance 8, 277-280
Chomsky Hierarchy 322 interface 280-282
class 275-276 Java 269-428
Common Lisp 14 abstract class 292-293
Common Lisp Object System (CLOS) 8, 14, abstract method 292-293
15, 269, 287 AbstractDecisionTreeNode 371,
conceptual graph 108-111 381-385
context-free parsers 111-119, 405-422 AbstractExample 371, 375-377
context-sensitive parsers 119-123 AbstractOperator 331-333, 338
continuation 335 AbstractProperty 371, 372-373
covers 88, 93 AbstractSolutionNode 340-341
decision tree 367-388 AbstractSolver 296-297
declarative semantics 11-12, 17, 142-144, And 333
287, 431 AndSolutionNode 343-346
depth-first search 34, 52-54, 289-291 BestFirstSolver 299-300
design pattern 3-6, 16 Breadth-First Solver 298-299
dotted grammar rules 126 Chart 414
dynamic programming 125-140 class 275-276
Earley parser 126-140, 272, 405-422 clone 313
Eclipse 432 Comparator interface 400
encapsulation 275-276 Constant 310-314, 315, 338
epistemological artifacts 430 copy constructor 313
eval & assign pattern 5 DepthFirstSolver 298
evolutionary computing see genetic EarleyParser 414-418
algorithms ESAnd 354
expert system shell 9, 73-81 ESAndSolutionNode 357
explanation-based learning 100-106 ESAsk 359
factory pattern 348-349 ESRule 353, 358-359
feature vector 93 ESRuleSet 360
first-order predicate calculus see predicate ESSimpleSentence 353-354, 358
calculus ESSimpleSentenceSolutionNode
fitness function 390 355, 358
Flavors 14, 269 ExampleSet 371, 377-381
FP 13 farmer, wolf, goat and cabbage 300-
frames 8 303

443

Luger_all_wcopyright_COsfixed.pd459 459 5/15/2008 6:37:43 PM


444 Index

final 283 and global variables 189


generics 293-294 and symbolic computing 149, 161–
Goal 331-333, 343 163
Grammar 406, 411-412 applying functions 152
HashMap 320 association list 201
HashSet 296 atom 151
history 14-15 best-first search 192–193
IllegalArgumentException 373 binding 153, 171-173
InformationTheoreticDecisionTree- binding variables 171–173
Node 371, 386-387 bound variable 153, 172
inheritance 277-280 breadth-first search 189–192
instanceof 349 car/cdr recursion 163–165
interface 7 class precedence list 243–244
interface 280-282, 293-294, 310 CLOS 237-249
Java Standard Library 283 Common Lisp Object System
LinkedList 298-299 (see CLOS)
Object 280 conditionals 157–159
PCExpression 310-314, 317-318, conditional evaluation 159
321, 329-331, 337-338 control of evaluation 220–221
PriorityQueue 299 data abstraction 161–163
private 282, 284 data types 175–176, 803
public 284 defining classes 792–794
RHS 408-410 defining functions 156–158
Rule 333-334 delayed evaluation 219–223
RuleSet 337 depth-first search 192
Set 296 dotted pairs 201
SimpleSentence 310-314, 316-317, evaluation 155–156
339 expert system shell 219–232
SimpleSentenceSolutionNode 341- farmer, wolf, goat, and cabbage
342 problem 177–182
Solver interface 296 filters 185–187
State (Earley Parser) 412-414 form 153
State (Search) 292-295 free variable 172-173, 186–187
static 283 function closure 220
SubstitutionSet 314-321 generic functions 241-242
this 283 higher-order functions 185–189
Unifiable 310-314 inheritance 233–236, 243–244
unify 314-321 ID3 251–266
Variable 310-314, 316, 339 lambda expressions 188–189
Vector 5 learning 251–266
JESS 271, 363-364 lexical closure 186, 220-221
JOONE 403 list defined 151
knowledge level 9-11, local variables 173–175
LIBSVM 404 logic programming 207–217
LingPipe 272, 423-424 maps 187–189
Lisp 149–268 meta-interpreters 156, 204–205,
(see also CLOS, Lisp functions) 219–231, 244-249
a-list (see association list) meta-linguistic abstraction
accessor 239, 254 (see meta-interpreters)
and functional programming 149 methods 241-243

Luger_all_wcopyright_COsfixed.pd460 460 5/15/2008 6:37:43 PM


Index 445

multiple inheritance 243–244 defstruct 251, 253-254


macro 221–222 defun 156
nil 152-153 do 262-263
occurs check 200 eq 179
pattern matching 195–197 equal 179
predicates 158 eval 154-155, 203
procedural abstraction 185–189 funcall 186–187
program control 157–159 gensym 216
property lists 233–237 get 234–235
read-eval-print loop 152, 203–204 if 158–159
recursion 151-170 length 154
s-expression 151-154 let 173–175
semantic networks 233–237 list 151-154, 156, 164-165
simulation 244-249 listp 175
slot options 238–239 mapcar 187–188, 251, 260-261
slot-specifiers 238–239 member 158-159
special declaration 216–217 minusp 158
state space search 177–182 nth 173-174
streams 209–210 null 154
streams and delayed evaluation numberp 158
209–217 oddp 158
thermostat simulation 244–249 or 159
tree-recursion 163–168 plusp 158
unification 195–202 print 203-204
Lisp functions quote 154
* 152 read 203-205
+ 152 remprop 234–235
- 152 set 171–173
< 158 setf 171–173, 234–235
= 153, 158 setq 171–173
> 153, 158 sqrt 157
>= 158 symbol-plist 234-235
' 153–156 terpri 203
#S 257 typep 204
abs 157-158 zerop 158
acons 202 / 151–155
and 152, 159 machine learning 87-106
append 166 map pattern 4-6,
apply 187 maximally general concept 90-91
assoc 202 maximally specific generalization 90
car 163-165 memoize 125, 405
case 248 Meta-DENDRAL 100
cdr 163-165 meta-interpreters 60, 69
cond 158–160 meta-linguistic abstraction 8-9, 285, 306,
cons 164-165 322, 325, 432, 436
declare 216 modus ponens 308, 326
defclass 238-240 MYCIN 73
defgeneric 241 object-oriented programming 14-15, 269-
defmacro 221–222 428
defmethod 241–242 Objective C 15, 273

Luger_all_wcopyright_COsfixed.pd461 461 5/15/2008 6:37:43 PM


446 Index

OCAML 13 implies 19, 20, 21-23


Occam's Razor 369 is 5, 65
occurs check 64, 310 Knight's Tour 33-38, 44-46
packages 270 listing 25
pattern language 3-6, lists 5, 25-28
pattern matching 7-8, member 26-27
Physical Symbol System Hypothesis 269 meta-predicate 18, 60
planner 82-85 model 21, 37
polymorphism 276 more_general 93
predicate calculus 7, 11, 17, 19-148, 271, move (Knight's tour) 34
306-323, 325 negation as failure 22
probabilistic parsers 114-119 nonvar 60
Prolog 17-148 nospy 25
! see cut not 19, 20, 49
=.. 60 or 19, 20
Abstract Data Type (ADT) 38-41 path 33-38, 50
add_to_chart 137 predicate 19
and 19, 20 priority queue 40
anonymous variables 27 process 94-95, 97-98
append 64 production system 43-58
askuser 70-71 prolog_ebg 103
assert 24, 60 read 24
asserta 24 recognize-act cycle 44
assertz 24 recursion 25-28
assignment 145 resolution refutation 21, 25
atom 18 retract 24
backtracking 23 retry 25
bagof 54, 95, 99 reverse_writelist 28
call 60 rule see implies
clause 60 scanner 136
closed world assumption 22 see 24-25
completer 136 semantic net 28-29
conflict resolution 44 set 40-41
consult 24 solve 69-73
covers 93 specialize_set 98-99
cut (!) 17, 36-38 spy 25
earley 134 tell 24-25
exit 25 trace 25
exshell 73-81 trace 25
extract_support 104 types 61-64
farmer, wolf, goat and cabbage 46- var 60
52 why queries 71-72
frame 29-32 working memory 43
function 19 write 24
functor 60 writelist 27
generalize 94
generalize_set 95 proof tree 72-73, 76-78, 103-106, 321, 328-
history 11-12, 329, 335-346
horn clause 12, 25 prototyping 435-436
how query 72-73 quantification 23, 308

Luger_all_wcopyright_COsfixed.pd462 462 5/15/2008 6:37:43 PM


Index 447

queue 39-40, 291 Stanford NLP 425


recursive function theory 13 static structure 349-350
resolution theorem proving 12 STRIPS 100
reuse 436 Sun Speech API 426-428
Scheme 13 supervised learning 367
search 6-7, 288-304, 324-329 symbolic computing 6, 287
servlet 270 thin-ling prototype 436
SmallTalk 8, 14, 23, 33-42, 269, 270, 273, unification 7-8, 17, 23, 25, 64-67, 271, 309-
432 320
SML-NJ 13 version space search 87-100
Software Engineering 435 Weka 403
stack 38-39, 291 WordGuess 391-394
standardizing variables apart 337 XML 270

Luger_all_wcopyright_COsfixed.pd463 463 5/15/2008 6:37:44 PM

You might also like