Prolog - Programming For Artificial Intelligence PDF
Prolog - Programming For Artificial Intelligence PDF
✓ •n cnwrzt "rx+.waa r t. a
Iv
Pro log
programming
for artificial
intelligence
Third edition
Pro log
INTERNATIONA L CO MPUTER SCIENCE SERIES
Consulting Editor
A.D. McGettrick University of Strathclyde
programming
Concurrent Systems: An Integrated Approach to Operating Systems, Database and
Distributed Systems (2nd edn) Bacon, J.
Programming Language Essentials Bal, H.E. and Grune, D.
Programming in Ada 95 Barnes, J.
Java Gently (3rd edn) Bishop, J.
for a rt if i c i a I
I
Java Gently for Engineers and Scientists Bishop, J. and Bishop, N.
Software Design Budgen, D.
intelligence
Concurrent Programming Burns, A. and Davies, G.
I
Database Systems: A Practical Approach to Design Implementation and Management
Connolly, T. and Begg, C.
Distributed Systems: Concepts and Design (2nd edn) Coulouris, G., Dollimore, J. and
Kindberg, T. Third edition
Fortran 90 Programming Ellis, T.M.R., Phillips, U?. and Lahey, T.M.
Program Verification Francez, N.
Introduction to Programming using SML Hansen, M. and Rische/, H. IVAN BRATKO
Parallel Processing: The Transputer and its Applications Hull, M.E.C., Crookes, D. and
Sweeney, P.J. Faculty of Computer and Information Science,
Introductory Logic and Sets for Computer Scientists (2nd edn) Nissanke, N. lJubl1ana University
and
Human-Computer Interaction Preece, J. et al.
J. Stefan Institute
Algorithms - A Functional Programming Approach Rabhi, F.A.
Foundations of Computing: System Development with Set Theory and Logic
Scheurer, T.
Java from the Beginning Skansholm, J.
Ada from the Beginning (2nd edn) Skansholm, J.
Object-Oriented Programming in Eiffel (2nd edn) Thomas, P. and Weedon, R.
Miranda: The Craft of Functional Programming Thompson, S.
Haskell: The Craft of Functional Programming (2nd edn) Thompson, S.
Discrete Mathematics for Computer Scientists (2nd edn) Truss, J.K.
Compiler Design Wilhelm, R. and Maurer, D. ...,•. Addison-Wesley
Discover Delphi: Programming Principles Explained Wil/iams, S.
Comparative Programming Languages (2nd edn) Wilson, L.B. and Clark, R.G. An imprint of Pearson Education
Harlow, England - London • New York - Reading, Mass<1chusetts - San Francisco - Toronto - Don Mills, Ontario . Sydney
Software Development with Z Wordsworth, J.B. Tokyo · Singapore - Hong Kong • Seoul • Taipei - Cape Town - Madrid - Mexico City - Amsterdam - Munich - Paris - Milan
Pearson Education Li1nited
Edinburgh Gate
Harlow
Essex CMZO 2JE
England .,
and Associated Companies throughout the world _/) 5119. 0 [ dedicate the third edition of this book
to my mother, the kindest person I know
Visit us on tile World Wide Web at: and to my father, who, during world war II
-...vww.pearsoneduc.corn escaped from a concentration camp by
digging an underground tunnel, which he
First edition 1986 described in his novel, The Telescope
Second edition 1990
The programs in this book have been included for their instructional value.
They have been tested with care but are not guaranteed for any particular
purpose. The publisher does not offer any warranties or representations
nor does it accept any liabilities with respect to the programs.
ISBN 0201-40375-7
10 9 8 7 6 54 3 2
0504 03 02 01
1 Introduction to Prolog 3
.......... ............ ...............
Vii
viii Contents Contents ix
I
11.4 Analysis of basic search techniques 255
7.1 Testing the type of terms 147
7 .2 Constructing and decomposing terms: = . . , functor, arg, name 155
12 Best-First Heuristic Search 260
7.3 Various kinds of equality and comparison 160 ·······························
7.4 Database manipulation 161 12.1 Best-first search 260
7.5 Control facilities 166 12.2 Best-first search applied to the eight puzzle 270
7.6 bagof, setof, and finda/1 167 12.3 Best-first search applied to scheduling 274
12.4 Space-saving techniques for best-first search 280
8 Programming Style and Technique 171
··········································································
13 Problem Decomposition and AND/OR Graphs 292
8.1 General principles of good programming 171 .....................................................................
8.2 How to think about Prolog programs 173 13.1 AND/OR graph representation of problems 292
8.3 Programming style 176 13.2 Examples of AND/OR representation 296
8.4 Debugging 179 13.3 Basic AND/OR search procedures 300
8.5 Improving efficiency 181 13.4 Best-first AND/OR search 305
x Contents Contents xr
14.1 Constraint satisfaction and logic programming 319 18.1 Introduction 442
14.2 CLP over real numbers: CLP(R) 324 18.2 The problem of learning concepts from examples 443
14.3 Scheduling with CLP 329 18.3 Learning relational descriptions: a detailed example 448
14.4 A simulation program with constraints 336 18.4 Learning simple if-then rules 454
14.5 CLP over finite domains: CLP(FD) 341 18.5 Induction of decision trees 462
18.6 Learning from noisy data and tree pruning 469
18.7 Success of learning 476
15 Knowledge Representation and Expert Systems 347
· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ······················· · · · · · · · · · · · · · · · · · · · · · · ··· · ·· ·· ·· ·
15.1 Functions and structure of an expert system 347 19 Inductive Logic Programming 484
. . . . . . . . . . . . . . . . . · · · · · . . • . . . . . • " .. · · · · ·
15.2 Representing knowledge with if-then rules 349 19.1 Introduction 484
15.3 Forward and backward chaining in rule-based systems 352 19.2 Constructing Prolog programs from examples 487
15.4 Generating explanation 358 19.3 Program HYPER 500
15.5 Introducing uncertainty 360
15.6 Belief networks 363 20 Qualitative Reasoning 520
15.7 Semantic networks and frames 372
20.1 Common sense, qualitative reasoning and naive physics 520
20.2 Qualitative reasoning about static systems 525
16 An Expert System Shell 383 20.3 Qualitative reasoning about dynamic systems 529
· · · · · ·· · ········ ················ · · · · · · · · · · · · · · · · · · · · · · · · · · ······
20.4 A qualitative simulation program 536
16.1 Knowledge representation format 383
20.5 Discussion of the qualitative simulation program 547
16.2 Designing the inference engine 388
16.3 Implementation 392
21 Language Processing with Grammar Rules 555
16.4 Concluding remarks 410 · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ····················
21.1 Grammar rules in Prolog 555
21.2 Handling meaning 563
17 Planning 413 21.3 Defining the meaning of natural language 568
· · ·· · · · · · · · · · · · · · · · · · · · · · · · · ·
17.1 Representing actions 413
22 Game Playing 581
17.2 Deriving plans by means-ends analysis 418 · · · · · · · · · · · · · · · · · · · · · · · · · · ············· ················
17.3 Protecting goals 422 22.1 Two-person, perfect-information games 581
17.4 Procedural aspects and breadth-first regime 424 22.2 The minimax principle 583
17.5 Goal regression 427 22.3 The alpha-beta algorithm: an efficient implementation of
17.6 Combining means-ends planning with best-first heuristic 430 minimax 586
17.7 Uninstantiated actions and partial-order planning 434 22.4 Minimax-based programs: refinements and limitations 590
, '!!lfl.."'' ·" ·�::":a·er,"�J.�J.tt'&.���.:m-��r��;t'Z�!tt?.t"!f!'!".�-it:sfatt��"l"1xr:i:il"'o::·y
xii Contents
!
I
xiii
xiv Foreword Foreword xv
programmer to dnuiiJc si tuatiorrs and problems, not the deta iled means by which precedent knowledge, the transition from novice to skilled programmer is already
the problems arc to be solved. under way.
Consequently, ,in introduction to Prolog is important for all students of Of course, a beneficial side effect of good programming examples is that they
Computer Scicr1cc, for thc:rc is no better way to see wllat the notion of what-type expose a bit of interesting science as well as a lot about programming itself. The
programming h ,ill ;ibr,ut. science behind the examples in this book is Artificial Intelligence. The reader lea rns
In particula r, the chapters of this book clearly ill u strate the difference between about such problem-solving ideas as problem reduction, forward and backward
how-type and what-typ(' thinking. In the first chapter, for example, the difference is chaining, 'how' and 'why' questioning, and various search techniques.
illustrated thro ugh prr,hkms dealing with family relations. The Prolog programmer fn fact, one of the great features of Prolog is that it is simple enough for students
straightforwardly dv,crd>es the grandfather concept in explicit, natural terms: a in introductory Artificial Intelligence subjects to learn to use immediately. I expect
grandfather is a Lit lier of a parent. Here is the Prolog notation: that many instructors will use this book as pa rt of their artificial intelligence subjects
grandfather( X, /) :- father( X, ), paren so that their students can see abstract ideas immediately reduced to concrete,
Y t( Y, Z).
motivating form.
Once Prolog know, wh,:t a grandfather is, it is easy to ask a question: who are Among l'rolog texts, I -expect this book to be particularly popular, not only
Patrick's gran dL11 lil'r\ for example. Here again is the Prolog notation, along with a because of its examples, but also because of a n umber of other fea tures:
typical answer: • Careful summaries appea r throughout.
?- grandfather( X, p.>trkk). • N umerous exercises reinforce all concepts.
X = jarnes; • Structure selectors introduce the notion of da ta abstraction.
X = earl • Explicit discussions of programming style and techniq ue occupy an entire
chapter.
It is Prolog's joi, tu figure out how to solve the problem by combing through a
database of known lather and parent relations. The programmer specifies only what • There is honest attention to the problems to be fa ced in Prolog programming, as
is known and wil;11 q11estion is to be solved. The programmer is more concerned well as the joys.
with knowledge· ,111d il'ss concerned with algorithms that exploit the knowledge. Features like this make this a well done, enjoyable. and instructive book.
Given tha t it i, i111port;1nt to learn Prolog, the next question is how. 1 believe that I keep the frrst edition of this textbook in my libra ry on the outstanding
learning a progra1111n ing language is like learning a natural language in many ways. textbooks shelf, programming languages section, for as a textbook it exhibited all
For example, a rl'fertnce m a nual is helpful in learning a programming langu age, just the strengths that set the outstanding textbooks apart from the others, including
as a dictiona y
r i, hl'ipful in learning a natural language. But no one learns a natural clear and direct writing, copious examples, careful summaries, and numerous
language with 011ly ,1 dic tionary, for the words are only part of what must be learned. exercises. And as a programming la nguage textbook, I especially liked its attention
The student of ,1 11;1tur;l1 langua ge must learn the conventions that govern how the to data abstraction, emphasis on programming style, and honest treatment of
words are put lq,ally together, and later, the student should learn the art of those Prolog's problems as well as Prolog's advantages.
who put the words together with style.
Similarly, no orw learns a programming language from only a reference manual,
for a reference n1,11111;J! says little or nothing about the way the primitives of the
langua ge are put to use by those who use the language well. For this, a textbook is
required, and the bes t textbooks offer copious examples, for good examples are
distilled experie11n'. ,ind it is principally through experience that we learn.
In this book, the 1·1rs t example on the first page, and the remaining pages
is
constitute an exan1plc cornu copia, pouring forth Prolog programs written by a
passionate Prolog programmer who is dedicated to the Prolog point of view. By
carefully studying these examples, the reader acquires not only the mechanics of the
language, but ;il .\o a personal collection of precedents, ready to be taken apart,
adap ted, and re,tsSt.'mble
d together into new programs. With this acquisition of
;;;:S.��->>":'
-:.i'f..qB• ,?:,:.J;;,.'.'f, .;,,;::;r.::::;,�,t;-:,_,,11t;,-sJ¥�,J(,;.;'..;¥.(.�����ff'"f?-tR"._Je;��:'"',: �-�- -><-•·-- ,...-,.
Preface
Prolog
Development of Prolog
Prolog stands for programming in logic- an idea that emerged in the early 1970s to use
logic as a programming language. The early developers of this idea included Robert
Kowalski at Edinburgh (on the theoretical side), Maarten van Emden at Edinburgh
(experimental demonstration) and Alain Colmerauer at Marseilles (implementa
tion). David D.I--l. Warren's efficient implementation at Edinburgh in the mid-1970s
greatly contributed to the popularity of Prolog. A more recent development is
constraint logic programming (CLP), usually implemented as part of a Prolog system.
CLP extends Prolog with constraint processing, which has proved in practice to be
an exceptionally flexible tool for problems like scheduling and logistic planning. In
1996 the official ISO standard for Prolog was published.
xvii
xviii Preface Preface xix
other hand, in the United States l'rolog was generally accepted with some delay, due Differences between the second and third edition
to several historical factors. One of these was an early ,\merican experience with the
All the material has been revised and updated. There are new chapters on:
Microplanner language, also akin to the idea of logic programming, but inefficiently
implemented. Some reservations also came in reaction to the early 'orthodox • constraint logic programming (CLP);
school' of logic programming, which insisted on the use of pure logic that should • inductive logic programming;
not be marred by adding practical facilities not related to logic. This led to some • qualitative reasoning.
widespread misunderstandings about Prolog in the past. For example, some believed
that only backward chaining reasoning can be programmed in Prolog. The truth is Other major changes are:
that Prolog is a general programming language and any algorithm can be
• addition of belief networks (Bayes networks) in the chapter on knowledge
programmed in it. The impractical 'orthodox school's' position was modified
by Prolog practitioners who adopted a more pragmatic view, benefiting from representation and expert systems;
combining the new, declarative approach with the traditional, procedural one. • addition of memory-efficient programs for best-first search (!DA*, RBFS) in the
chapter on heuristic search;
Learning Prolog
• major updates in the chapter on machine learning;
• additional techniques for improving program efficiency in the chapter on
Since l'rolog has its roots in mathematical logic it is often introduced through logic. programming style and technique.
However, such a mathematically intensive introduction is not very useful if the aim
is to teach Prolog as a practical programming tool. Therefore this book is not Throughout, more attention is paid to the differences between Prolog implementa
concerned with the mathematical aspects, but concentrates on the art of making the tions with specific references to the Prolog standard when appropriate (see also
few basic mechanisms of Prolog solve interesting problems. Whereas conventional Appendix A).
languages are procedurally oriented, Prolog introduces the descriptive, or declarative,
view. This greatly alters the way of thinking about problems and makes learning to Audience for the book
program in Prolog an exciting intellectual challenge. \-[any believe that every
student of computer science should learn something about Prolog at some point This book is for students of Prolog and Al. It can be used in a Prolog course or in an
because Prolog enforces a different problem-solving paradigm complementary to Al course in which the principles of AI are brought to life through Prolog. The reader
other programming languages. is assumed to have a basic general knowledge of computers, but no knowledge of AI
is necessary. No particular programming experience is required; in fact, plentiful
experience and devotion to conventional procedural programming - for example in
Contents of the book C or Pascal - might even be an impediment to the fresh way of thinking Prolog
Part l of the book introduces the Prolog language and shows how Prolog programs requires.
are developed. Techniques to handle important data structures, such as trees and
graphs, are also included because of their general importance. In Part II, Prolog is The book uses standard syntax
applied to a number of areas of Al, including problem solving and heuristic search,
programming with constraints, knowledge representation and expert systems, Among several Prolog dialects, the Edinburgh syntax, also known as DEC-10 syntax,
planning, machine learning, qualitative reasoning, language processing and game is the most widespread, and is the basis of the ISO standard for Prolog. It is also used
playing. Al techniques are introduced and developed in depth towards their in this book. For compatibility with the various Prolog implementations, this book
implementation in Prolog, resulting in complete programs. These can be used as only uses a relatively small subset of the built-in features that are shared by many
building blocks for sophisticated applications. The concluding chapter, on meta Prologs.
programming, shows bow Prolog can be used to implement other languages and
programming paradigms, including object-oriented programming, pattern-directed How to read the book
programming and writing interpreters for Prolog in Prolog. Throughout, the
emphasis is on the clarity of programs; efficiency tricks that rely on implementa In Part I, the natural reading order corresponds to the order in the book. However,
tion-dependent features are avoided. the part of Section 2.4 that describes the procedural meaning of Prolog in a more
£_t;;f:;
:
-,,;:-.-: t-,,(,t,·,:5;:;:-��;:;:;,};fii;{f;j:-;:4t�,!i�;i!f:Y:S-';'.;cr..f.,♦¥'%-l�f����/f����:;.:;ilt�--:.'
- -,;:;q;:;,ii.!&!iia a&� ,s,;;;;: & . ;;;;:;::::
xx Preface
Preface XXI
t the Prolog development team at Edinburgh, for their programming advice and
numerous discussions. The book greatly benefited from comments and suggestions
2
to the previous editions by Andrew 1vlcGettrick and Patrick H. Winston. Other
t people who read parts of the manuscript and contributed significant comments
3 include: Damjan Bojadziev, Rod Bristow, Peter Clark, Frans Coenen, David C.
t Dodson, Saso Dzeroski, Bogdan Filipic, Wan Fokkink, Matjaz Garns, Peter G.
4 (Selectively) Greenfield, "tv!arko Grobelnik, Chris Hinde, Igor Kononenko, Matevz Kovacic,
t Eduardo Morales, Igor Mozetic, Timothy B. Niblett, Dusan Peterc, Uros Pompe,
Robert Rodosek, Agata Saje, Claude Sammut, Cem Say, Ashwin Srinivasan, Dorian
5
t Sue, Peter Tancig, Tanja Urbancic, Mark Wallace, William Walley, Simon Weilguny,
Blaz Zupan and Darko Zupanic. Special thanks to Cem Say for testing many
6
t programs and his gift of finding hidden errors. Several readers helped by pointing
out errors in the previous editions, most notably G. Oulsnam and Iztok Tvrdy. I
7
would also like to thank Karen Mosman, Julie Knight and Karen Sutherland of
t Pearson Education for their work in the process of making this book. Simon
8 � 9 � IO Plumtree and Debra Myson-Etherington provided much support in the previous
t editions. Most of the artwork was clone by Darko Simersek. Finally, this book would
ll not be possible without the stimulating creativity of the international logic
t
12.l
t
14
t
I5
t
17
i
18 20
t
21
t
22
t
23
programming community.
The publisher wishes to thank Plenum Publishing Corporation for their permis
sion to reproduce material similar to that in Chapter 10 of Human and Maclzine
.....�
12.2 ·I 13
t t Problem Solving (1989), K. Gilhooly (eel).
16 19
Ivan Bratko
Figure P.1 Precedence constraints among the chapters. January 2000
formalized way can be skipped. Chapter 4 presents programming examples that can
be rl':id (or skipped) selectively. Chapter 10 on advanced tree representations can be
skipped.
Par t II allows more flexible reading strategies as most of the chapters are intended
to bt· mutually independent. However, some topics will still naturally be covered
beforl' others, such as basic search strategies (Chapter 11). Figure P.l summarizes the
natural precedence constraints among the chapters.
Acknowledgements
Dorn1ld Michie was responsible for first inducing my interest in Prolog. I am grateful
to Liwrence Byrd, Fernando Pereira and David... H.D. Warren, once members of
part
Introduction to Prolog
1.1 Defining relations by facts 3
1.2 Defining relations by rules 8
1.3 Recursive rules 14
1.4 How Prolog answers questions 18
1.5 Declarative and procedural meaning of programs 23
3
4 Introduction to Prolog Defining relations by facts 5
because the program does not mention anything about Liz being a parent of Pat. It
also answers 'no' to the question:
?- parent( tom, ben).
because the program has not even heard of the name Ben.
More interesting questions can also be asked. For example: Who is Liz's parent?
?- parent( X, liz).
Prolog's answer will not be just 'yes' or 'no' this time. Prolog will tell us what is the
8
value of X such that the above statement is true. So the answer is:
X = tom
The question Who are Bob's children? can be communicated to Prolog as:
?- parent( bob, X).
This time there is more than just one possible answer. Prolog first answers with one
solution:
Figure 1.1 A family tree. X = ann
We may now request another solution (by typing a semicolon), and Prolog will find:
parent( tom, liz).
parent( bob, ann). X =pat
parent( bob, pat).
If we request more solutions again, Prolog will answer 'no' because all the solutions
parent( pat, jim).
have been exhausted.
This program consists of six clauses. Each of these clauses declares one fact about the Our program can be asked an even broader question: Who is a parent of whom'
parent relation. For example, parent( tom, bob) is a particular instance of the parent Another formulation of this question is:
relation. Such an instance is also called a relationship. ln general, a relation is defined Find X and Y such that X is a parent of Y.
as the set of all its instances.
When this program has been communicated to the Prolog system, Prolog can be This is expressed in Prolog by:
posed some questions about the parent relation. For example: Is Bob a parent of Pat? ?- parent( X, Y).
This question can be communicated to the Prolog system by typing into the
terminal: Prolog now finds all the parent-child pairs one after another. The solutions will be
displayed one at a time as long as we tell Prolog we want more solutions, until all the
?- parent( bob, pat).
solutions have been found. The answers are output as:
Having found this as an asserted fact in the program, Prolog will answer: X=pam
Y = bob;
yes
X= tom
A further query can be: Y = bob;
?- parent( liz, pat). X= tom
Y = liz;
Prolog answers:
parent
0t \ X == bob
Y =pat
Yet another question could be: Do Ann and Pat have a common parent? This can be
) grandparent expressed again in two steps:
�
paren
03!
/ (l) Who is a parent, X, of Ann?
(2) Is (this same) X a parent of Pat?
The corresponding question to Prolog is then:
Figure 1.2 The grandparent relation expressed as a composition of two parent relations.
?- parent( X, arm), parent( X, pat).
Our example program can be asked still more complicated questions like: Who is The answer is:
a grandparent of Jim? As our program does not directly know the grandparent X = bob
relation this query has to be broken down into two steps, as illustrated by Figure 1.2.
Our example program has helped to illustrate some important points:
(1) Who is a parent of Jim? Assume that this is some Y.
• It is easy in Prolog to define a relation, such as the parent relation, by stating the
(2) Who is a parent of Y'/ Assume that this is some X.
n-tuples of objects that satisfy the relation.
Such a composed query is written in Prolog as a sequence of two simple ones: • The user can easily query the Prolog system about relations defined in the
?- parent( Y, jirn), parent( X, Y). program.
• A Prolog program consists of clauses. Each clause terminates with a full stop.
The answer will be:
• The arguments of relations can (among other things) be: concrete objects, or
X =bob constants (such as tom and arm), or general objects such as X and Y. Objects of
Y =pat
the first kind in our program are called atoms. Objects of the second kind are
Our composed query can be read: Find such X and Y that satisfy the following two called variables.
requirements: • Questions to the system consist of one or more goals. A sequence of goals, such
parent( Y, jirn) and parent( X, Y) as:
If we change the order of the two requirements the logical meaning remains the parent( X, ann), parent( X, pat)
same: means the conjunction of the goals:
parent( X, Y) and parent( Y, jirn)
X is a parent of Ann, and
We can indeed do this in our Prolog program, and the query: X is a parent of Pat.
?- parent( X, Y), parent( Y, jim). The word 'goals' is used because Prolog accepts questions as goals that are to be
satisfied.
will produce the same result.
In a similar way we can ask: Who are Tom's grandchildren? • An answer to a question can be either positive or negative, depending on
whether the corresponding goal can be satisfied or not. In the case of a positive
?- parent( tom, X), parent( X, Y). answer we say that the corresponding goal was satisfiable and that the goal
Prolog's answers are: succeeded. Otherwise the goal was unsatisfiable and it failed.
• If several answers satisfy the question then Prolog will find as many of them as
X =bob
Y =ann; desired by the user.
8 Introduction to Prolog Defining relations by rules 9
Exercises parent relation; that is, by simply providing a list of simple facts about the offspring
relation, each fact mentioning one pair of people such that one is an offspring of the
1.1 Assuming the parent relation as defined in this section (see Figure 1.1), what will be other. For example:
Prolog's answers to the following questions? offspring( liz, torn).
(a) ? parent( jim, X). However, the offspring relation can be defined much more elegantly by making use
(b) ?- parent( X, jim). of the fact that it is the inverse of parent, and that parent has already been defined.
(c) ? parent( pam, X), parent( X, pat). This alternative way can be based on the following logical statement:
(d) ?- parent( pam, X), parent( X, Y), parent( Y, jim). For all Xand Y,
Y is an offspring of Xif
1.2 Formulate in Prolog the following questions about the parent relation: Xis a parent of Y.
(a) Who is Pat's parent? This formulation is already close to the formalism of Prolog. The corresponding
(b) Does Liz have a child? Prolog clause which has the same meaning is:
(c) Who is Pat's grandparent? offspring( Y, X) :- parent( X, Y).
This clause can also be read as:
For all Xand Y,
1.2 if Xis a parent of Y then
Y is an offspring of X.
Our example program can be easily extended in many interesting ways. Let us first
add the information on the sex of the people that occur in the parent relation. This Prolog clauses such as:
can be done by simply adding the following facts to our program: offspring( Y, X) :- parent( X, Y).
female( pam). are called rules. There is an important difference between facts and rules. A fact like:
male( torn).
male( bob). parent( tom, liz).
female( liz).
female( pat). is something that is always, unconditionally, true. On the other hand, rules specify
female( a1111). things that are true if some condition is satisfied. Therefore we say that rules have:
rnale(jim). • a condition part (the right-hand side of the rule) and
The relations introduced here are male and female. These relations are unary (or one • a conclusion part (the left-hand side of the rule).
place) relations. A binary relation like parent defines a relation between pairs of
objects; on the other hand, unary relations can be used to declare simple yes/no The conclusion part is also called the hee1d of a clause and the condition part the body
of a clause. For example:
properties of objects. The first unary clause above can be react: Pam is a female. We
could convey the same information declared in the two unary relations with one offspring( Y, X) :- parent( X, Y).
binary relation, sex, instead. An alternative piece of program would then be: � �
head body
sex( pam, feminine).
sex( tom, masculine). If the condition parent( X, Y) is true then a logical consequence of this is
sex( bob, masculine). offspring( Y, X).
How rules are actually used by Prolog is illustrated by the following example. Let
As our next extension to the program let us introduce the offspring relation as us ask our program whether Liz is an offspring of Tom:
the inverse of the parent relation. We could define offspring in a similar way as the ?- offspring( liz, tom).
10 Introduction to Prolog
There is no fact about offsprings in the program, therefore the only way to consider
this question is to apply the rule about offsprings. The rnle is general in the sense
that it is applicable to any objects X and Y; therefore it can also be applied to such
particular objects as liz and tom. To apply the rule to liz and tom, Y has to be '"""�)J ""''''"'
f
parent
emale
'; mother
Defining relations by rules
9
0i \
parent
1
11
'
G grandparent
I
substituted with liz, and X with tom. We say that the variables X and Y become
instantiated to: parent /
X=tom and Y=liz
After the instantiation we have obtained a special case of our general rule. The
special case is: Figure 1.3 Definition graphs for the relations offspring, mother and grandparent in terms
of other relations.
offspring( liz, tom) :- parent( tom. liz).
--
14 Introduction to Prolog Recursive rules 15
0 0
1.3 Recursive rules
parent
t
0)
\ parent t \
CR
1\ 1\
rr/
Let us add one more relation to our family program, the predecessor relation. This
predeces.sor I
t
relation will be defined in terms of the parent relation. The whole definition can be
expressed with two rules. The first rule will define the direct (immediate) predeces parent 1 pmP�) predecessor
sors and the second rule the indirect predecessors. We say that some Xis an indirect
predecessor of some Z if there is a parentship chain of people between X and Z, as (if predecessor
illustrated in Figure 1.5. In our example of Figure 1.1, Tom is a direct predecessor of
Liz and an indirect predecessor of Pat.
The first rule is simple and can be formulated as: ""'"'<lf/
For all X and Z,
Xis a predecessor of Z if
Xis a parent of Z.
ctJ
This is straightforwardly translated into Prolog as: Figure 1.6 Predecessor-successor pairs at various distances.
predecessor( X, Z) predecessor( X, Z)
parent( X, Z). parent( X, Yl),
parent( Yl, Y2),
The second rule, on the other hand, is more complicated because the chain of parent( Y2, Z).
parents may present some problems. One attempt to define indirect predecessors
predecessor( X, Z)
could be as shown in Figure 1.6 ..-'\ccording to this, the predecessor relation would be parent( X, Yl),
defined by a set of clauses as follows: parent( YI, Y2),
parent( Y2, Y3),
predecessor( X, Z) parent( Y3, Z).
parent( X, Z).
predecessor( X, Z)
parent( X, Y), This program is lengthy and, more importantly, it only works to some extent. It
parent( Y, Z). would only discover predecessors to a certain depth in a family tree because the
length of the chain of people between the predecessor and the successor would be
0+ > predecessor
0+ \ limited according to the length of our predecessor clauses.
There is, however, an elegant and correct formulation of the predecessor relation: it
9\
parent parent will be correct in the sense that it will work for predecessors at any depth. The key idea
(a) (if I
is to define the predecessor relation in terms of itself. Figure 1.7 illustrates the idea:
For all X and Z,
parent ) predecessor
'"'"9(D/
Xis a predecessor of Z if
there is a Y such that
(1) Xis a parent of Y and
(2) Y is a predecessor of Z.
(b) A Prolog clause with the above meaning is:
predecessor( X, Z) :-
Figure 1.5 Examples of the predecessor relation: (a) Xis a direct predecessor of Z; parent( X, Y),
(b) X is an indirect predecessor of Z. predecessor( Y, Z).
16 Introduction to Prolog Recursive rules 17
parent
0! \ X = pat;
X=jim
Prolog's answers are, of course, correct and they logically follow from our definition
y) \\ of the predecessor and the parent relation. There is, however, a rather important
I question: How did Prolog actually use the program to find these answers?
\ An informal explanation of how Prolog does this is given in the next section. But
\
I first let us put together all the pieces of our family program, which was extended
I gradually by adding new facts and rules. The final form of the program is shown
1 predecessor
I in Figure 1.8. Looking at Figure 1.8, two further points are in order here: the
predecessor I ( )
I
I
I % Pamis a parent of Bob
I parent(pam, bob).
I parent( tom, bob).
' i/
parent( torn, liz).
parent( bob, ann).
0
parent( bob, pat).
parent( pat , jirn).
female( pam). 0,1, Pamis female
Figure 1.7 Recursive formulation of the predecessor relation. male( torn). 'Yr, Tom is male
male(bob).
We have thus constructed a complete program for the predecessor relation, which female( liz) .
female( ann).
consists of two rules: one for direct predecessors and one for indirect predecessors.
female( pat).
Both rules are rewritten together here: male(jim).
predecessor( X, Z) offspring( Y, X) '¼, Y is an offspring of X if
parent( X, Z). parent( X, Y). % Xis a parent of Y
predecessor( X, Z) mother( X, Y) : % X is the mother of Y if
parent( X, Y), parent( X, Y), % Xis a parent of Y and
predecessor( Y, Z). female(X). % Xis female
The key to this formulation was the use of predecessor itself in its definition. Such a grandparent( X, Z) '½,Xis a grandparent of Zif
definition may look surprising in view of the question: When defining something, parent( X, Y), % X is a parent of Y and
parent( Y, Z). % Yis a parent of Z
can we use this same thing that has not yet been completely defined? Such
definitions are, in general, called recursive definitions. Logically, they are perfectly sister( X, Y) : % Xis a sister of Y if
correct and understandable, which is also intuitively obvious if we look at Figure 1. 7. parent( Z, X),
parent( Z, Y) , % X and Y have the same parent and
But will the Prolog system be able to use recursive rules? It turns out that Prolog can female( X), % Xis female and
indeed very easily use recursive definitions. Recursive programming is, in fact, one different( X, Y). % X and Y are different
of the fundamental principles of programming in Prolog. It is not possible to solve
predecessor( X, Z) % Rule prl: X is a predecessor of Z
tasks of any significant complexity in Prolog without the use of recursion. parent( X, Z).
Going back to our program, we can ask Prolog: Who are Pam's successors? That is:
predecessor(X, Z) % Rule pr2: X is a predecessor of Z
Who is a person that has Pam as his or her predecessor?
parent( X, Y),
?- predecessor( pam, X). predecessor( Y, Z).
X = bob;
X = ann; Figure 1.8 The family program.
r
first will introduce the term 'procedure', the second will be about comments in variables, Prolog also has to find what are the particular objects (in place of variables)
programs. for which the goals are satisfied. The particular instantiation of variables to these
The program in Figure 1.8 defines several relations - parent, male, female, objects is displayed to the user. lf Prolog cannot demonstrate for some instantiation
predecessor, etc. The predecessor relation, for example, is defined by two clauses. of variables that the goals logically follow from the program, then Prolog's answer to
We say that these two clauses are about the predecessor relation. Sometimes it is the question will be 'no'.
convenient to consider the whole set of clauses about the same relation. Such a set of An appropriate view of the interpretation of a Prolog program in mathematical
clauses is callee! a procedure. terms is then as follows: Prolog accepts facts and rules as a set of axioms, and the
In Figure 1.8, the two rules about the predecessor relation have been distinguished user's question as a conjectured theorem; then it tries to prove this theorem - that is, to
by the names'prl' and'pr2', added as comments to the program. These names will be demonstrate that it can be logically derived from the axioms.
used later as references to these mies. Comments are, in general, ignored by the We will illustrate this view by a classical example. Let the axioms be:
Prolog system. They only serve as a further clarification to the person who reads
All men are fallible.
the program. Comments are distinguished in Prolog from the rest of the program by
being enclosed in special brackets'/,' and',/'. Thus comments in Prolog look like Socrates is a man.
this: A theorem that logically follows from these two axioms is:
/, This is a comment ,/ Socrates is fallible.
Another method, more practical for short comments, uses the percent character'%'. The first axiom above can be rewritten as:
Everything between '%' and the encl of the line is interpreted as a comment:
For all X, if X is a man then X is fallible.
%, This is also a comment
Accordingly, the example can be translated into Prolog as follows:
vVe have thus shown what can be a sequence of steps that satisfy a goal - that is, predecessor( torn, pat)
make it dear that the goal is trne. Let us call this a proof sequence. We have not,
however, shown how the Prolog system actually finds such a proof sequence.
Prolog finds the proof sequence in the inverse order to that which we have just hy rule pr! by rule pr2
used. Instead of starting with simple facts given in the program, Prolog starts with
the goals and, using rules, substitutes the current goals with new goals, until new parent( torn, pat) parent( tom, Y)
predecessor( Y, pat)
goals happen to be simple facts. Given the question:
no
?- predecessor( tom, pat).
Figure 1.10 Execution trace continued from Figure 1 .9.
Prolog will try to satisfy this goal. In order to do so it will try to find a clause in the
program from which the above goal could immediately follow. Obviously, the only
clauses relevant to this encl are prl and pr2. These are the rules about the predecessor
As before, the variables X and Z become instantiated as:
relation. We say that the heads oi these rules match the goal.
The two clauses, prl and pr2, represent two alternative ways for Prolog to proceed. X = tom, Z = pat
Prolog first tries that clause which appears first in the program:
But Y is not instantiated yet. The top goal predecessor( tom, pat) is replaced by two
predecessor( X, Z) :- parent( X, Z.). goals:
Since the goal is predecessor( tom, pat), the variables in the rule must be instantiated parent( tom, Y),
as follows: predecessor( Y, pat)
X = tom, Z = pat This executional step is shown in Figure 1.10, which is an extension to the situation
we had in Figure 1.9.
The original goal predecessor( tom, pat) is then replaced by a new goal: Being now faced with two goals, Prolog tries to satisfy them in the order in which
parent( tom, pat) they are written. The first one is easy as it matches one oi the facts in the program.
The matching forces Y to become instantiated to bob. Thus the first goal has been
This step of using a rule to transform a goal into another goal, as above, is satisfied, and the remaining goal has become:
graphically illustrated in Figure 1.9. There is no clause in the program whose head
matches the goal parent( tom, pat), therefore this goal fails. Now Prolog backtracks predecessor( bob, pat)
to the original goal in order to try an alternative way to derive the top goal To satisfy this goal the rule prl is used again. Note that this (second) application of
predecessor( tom, pat). The rule pr2 is thus tried: the same rule has nothing to do with its previous application. Therefore, Prolog uses
predecessor( X, Z) : a new set of variables in the rule each time the rule is applied. To indicate this we
parent( X, Y), shall rename the variables in rule pr 1 for this application as follows:
predecessor( Y, Z). predecessor( X', Z')
parent( X', Z').
predecessor( tom,pat) The head has to match our current goal predecessor( bob, pat). Therefore:
X' = bob, Z' = pat
by rule pr/
The current goal is replaced by:
parent( tom.pat) parent( bob, pat)
This goal is immediately satisfied because it appears in the program as a fact. This
Figure 1.9 The first step of the execution. The top goal is true if the bottom goal is true. completes the execution trace, which is graphically shown in Figure 1. 11.
22 Introduction to Prolog Declarative and procedural meaning of programs 23
1. 7 Try to understand how Prolog derives answers to the following questions, using
the program of Figure 1.8. Try to draw the corresponding derivation diagrams in the
style of Figures 1.9 to 1.11. Will any backtracking occur at particular questions? Summary
························································································································ ·················
(a) ?- parent( pam, bob). • Prolog programming consists of defining relations and querying about relations.
(b) ?- mother( pam, bob). • A program consists of clauses. These are of three types: facts, mies and questions.
(c) ?- grandparent( pam, ann)- • A relation can be specified by facts, simply stating the n-tuples of objects that
(d) ?- grandparent( bob, jim). satisfy the relation, or by stating mies about the relation.
24 Introduction to Prolog
References
This chapter gives a systematic treatment of the syntax and semantics of basic
Various implementations of Prolog use different syntactic conventions. However, most of them concepts of Prolog, and introduces structured data objects. The topics included are:
follow the tradition of the so-called Edinburgh syntax (also called DEC-10 syntax, established • simple data objects (atoms, numbers, variables)
by the historically influential implementation of Prolog for the DEC-10 computer; Pereira et al.
1978; Bowen 1981). The Edinburgh syntax also forms the basis of the ISO international • structured objects
standard for Prolog ISO/IEC 13211-1 (Deransart d al. 1996). Major Prolog implementations • matching as the fundamental operation on objects
now largely comply with the standard. In this book we use a subset of the standard syntax, with • declarative (or non-procedural) meaning of a program
some small and insignificant differences. In rare cases of such differences, there is a note to this
effect at an appropriate place. • procedural meaning of a program
• relation between the declarative and procedural meanings of a program
Bowen, D.L. (1981) DECsystem-JO Prolog User's Manual. University of Edinburgh: Department of • altering the procedural meaning by reordering clauses and goals.
Artificial Intelligence.
Deransart, P., Ed-Bdali, A. and Ceroni, L. (1996) Prolog: The Standard. Berlin: Springer-Verlag. Most of these topics have already been reviewed in Chapter 1. Here the treatment
Pereira, L.M., Pereira, F. and Warren, D.H.D. (1978) User's Guide to DECsystem-JO Prolog. will become more formal and detailed.
University of Edinburgh: Department of Artificial Intelligence.
25
'
26 Syntax and Meaning of Prolog Programs Data objects n
data objects
2.1 Data objects
····················································.. ············--······································································
Figure 2.1 shows a classification of data objects in Prolog. The Prolog system
/
simple objects
�
structures
recognizes the type of an object in the program by its syntactic form. This is possible
because the syntax of Prolog specifies different forms for each type of data object. /
constants
�
variables
We have already seen a method for distinguishing between atoms and variables in
1
�·' Chapter 1: variables start with upper-case letters whereas atoms start with lower-case
letters. No additional information (such as data-type declaration) has to be com atoms
/ �
numbers
municated to Prolog in order to recognize the type of an object.
N Figure 2.1 Data objects in Prolog.
'":
2.1.1 Atoms and numbers
(3) Strings of characters enclosed in single quotes. This is useful if we want, fo1
In Chapter 1 we have seen some simple examples of atoms and variables. In general,
example, to have an atom that starts with a capital letter. By enclosing it in
however, they can take more complicated forms - that is, strings of the following
quotes we make it distinguishable from variables:
characters:
'Torn'
;j • upper-case letters A, B, ..., Z 'Sonth_America'
• lower-case letters a, b, ... , z 'Sarah Jones'
• digits 0, 1, 2, ... , 9 Numbers used in Prolog include integer numbers and real numbers. The syntax ot
• special characters such as + - * / < > = : . & integers is simple, as illustrated by the following examples:
Atoms can be constructed in three ways:
1313 0 -97
(1) Strings of letters, digits and the underscore character, '_', starting with a lower
case letter: Not all integer numbers can be represented in a computer, therefore the range ol
integers is limited to an interval between some smallest and some largest number
anna
� permitted by a particular Prolog implementation.
.,; nil
x25 We will assume the simple syntax of real numbers, as shown by the following
'1 x_25 examples:
I
x_ZSAB
x_ 3.14 -0.0035 100.2
x___y
Real numbers are not very heavily used in typical Prolog programming. The reason
alpha_beta_procedure
for this is that Prolog is primarily a language for symbolic, non-numeric computa
} miss_Jones
I!
sarah_jones tion. [n symbolic computation, integers are often used, for example, to count the
number of items in a list; but there is typically less need for real numbers.
(2) Strings of special characters: Apart from this lack of necessity to use real numbers in typical Prolog applica
<---> tions, there is another reason for avoiding real numbers. In general, we want to keep
======> the meaning of programs as neat as possible. The introduction of real numbers
somewhat impairs this neatness because of numerical errors that arise due to
rounding when doing arithmetic. For example, the evaluation of the expression
I
10000 + 0.0001 - 10000
When using atoms of this form, some care is necessary because some strings of
special characters already have a predefined meaning; an example is ':-' . may result in 0 instead of the correct result 0.0001.
,!
28 Syntax and Meaning of Prolog Programs Data objects 29
This rule says: for all X, X has a child if X is a parent of some Y. We are defining the Note that Day is a variable and can be instantiated to any object at some later point
property hasachild which, as it is meant here, does not depend on the name of in the execution.
the child. Thus, this is a proper place in which to use an anonymous variable. The This method for data structuring is simple and powerful. It is one of the reasons
clause above can thus be rewritten: why Prolog is so naturally applied to problems that involve symbolic manipulation.
Syntactically, all data objects in Prolog are terms. For example,
hasachilcl(X) :- parent( X,_).
may
Each time a single underscore character occurs in a clause it represents a new
anonymous variable. For example, we can say that there is somebody who has a and
child if there are two objects such that one is a parent of the other: date( 1,may,2001)
somebody _has_chilcl :- parent(_,_). are terms.
This is equivalent to: All structured objects can be pictured as trees (see Figure 2.2 for an example). The
root of the tree is the functor, and the offsprings of the root are the components. If a
somebody _has_chilcl :- parent( X, Y).
component is also a structure then it is a subtree of the tree that corresponds to the
But this is, of course, quite different from: whole structured object.
Our next example will show how structures can be used to represent some simple
somebocly_has_chilcl :- parent( X,X).
geometric objects (see Figure 2.3). A point in two-dimensional space is defined by its
If the anonymous variable appears in a question clause then its value is not output
when Prolog answers the question. If we are interested in people who have children,
I \II
date date( 1, may, 2001)
/I�
but not in the names of the children, then we can simply ask:
?- parent( X,_).
The lexical scope of variable names is one clause. This means that, for example, if may 2001 functor arguments
the name XlS occurs in two clauses, then it signifies two different variables. But (a) ( b)
each occurrence of XIS within the same clause means the same variable. The
situation is different for constants: the same atom always means the same object in Figure 2.2 Date is an example of a structured object: (a) as it is represented as a tree; (b) as it
any clause - that is, throughout the whole program. is written in Prolog.
?''\
30 Syntax and Meaning of Prolog Programs Data objects 31
l
5
(6.4)
;-, : , �
4
/"g�
I\ I\
point point
/' T·�
2 r
2 3
(7.1)
Fl =(1.l)
2 3 4 5 6 7 8
I\ /\ I\
point point point
two coordinates; a line segment is defined by two points; and a triangle can be
defined by three points. Let us choose the following functors: 4 2 6 4
/ �-
l( -
If the same name appears in the program in two different roles, as is the case for
point above, the Prolog system will recognize the difference by the number of
arguments, and will interpret this name as two functors: one of them with two +
arguments and the other one with three arguments. This is so because each functor
is defined by two things:
a
/� /� b
(1) the name, whose syntax is that of atoms;
(2) the arity- that is, the number of arguments. Figure 2.5 A tree structure that corresponds to the arithmetic expression (a+ b) * (c - 5).
32 Syntax and Meaning of Prolog Programs Matching 33
/\
-� par( rl, r2)
par( rl, par( r2, r3) )
(:1) par( rl, seq( par( r2, r3), r4) )
rI r2
rl Exercises
-0
par
/\
rI r2
2 .1 Which of the following are syntactically correct Prolog objects? What kinds of object
are they (atom, number, variable, structure)?
(t,)
(a) Diana
(b) diana
rl par (c) 'Diana'
r2 /\
rl par
(d) _diana
(e) 'Diana goes south'
(c)
r)
/\ r2 r3
(f) goes( diana, south)
(g) 45
(h) 5( X, Y)
(i) +( north, west)
rl par ( j) three( Black( Cats) )
/\
rl seq
2.2 Suggest a representation for rectangles, squares and circles as structured Prolog
objects. Use an approach similar to that in Figure 2.4. For example, a rectangle can
/\
be represented by four points (or maybe three points only). Write some example
terms that represent some concrete objects of these types using the suggested
representation.
par r4
/\
(d)
• D is instantiated to Dl To satisfy the first goal, Prolog instantiates the variables as follows:
• M is instantiated to may D= Dl
• Yl is instantiated to 2001 M= may
Yl= 2001
This instantiation is more compactly written in the familiar form in which Prolog
outputs results: After having satisfied the second goal, the instantiation becomes more specific as
follows:
D= Dl
D=15
M=may
Yl=200l Dl=15
M= may
On the other hand, the terms date( D, M, 2001) and date( Dl, Ml, 1444) do not YI=2001
match, nor do the terms date( X, Y, Z) and point( X, Y, Z). Y=2001
,vfutching is a process that takes as input two terms and checks whether they This example also shows that variables, during the execution of consecutive goals,
match. If the terms do not match we say that this process fails. If they do match then typically become instantiated to increasingly more specific values.
the process succeeds and it also instantiates the variables in both terms to such values The general rules to decide whether two terms, Sanc! T, match are as follows:
that the terms become identical.
Let us consider again the matching of the two dates. The request for this
operation can be communicated to the Prolog system by the following question, (1) If Sand T are constants then Sand T match only if they are the same object.
using the operator'=': (2) If Sis a variable and T is anything, then they match, and Sis instantiated to
T. Conversely, if T is a variable then T is instantiated to S.
'!- date( D, M, 2001) = date( Dl, may, Yl).
(3) If Sand T are structures then they match only if
We have already mentioned the instantiation D= Dl, M = may, Yl = 2001, which (a) Sand T have the same principal functor, and
achieves the match. There are, however, other instantiations that also make both
(b) all their corresponding components match.
terms identical. Two of them are as follows:
The resulting instantiation is determined by the matching of the com
D=l
Dl=1 ponents.
M= may
Yl= 2001 The last of these rules can be visualized by considering the tree representation of
D=third terms, as in the example of Figure 2.7. The matching process starts at the root (the
Dl=thircl principal functors). As both functors match, the process proceeds to the arguments
M=may where matching of the pairs of corresponding arguments occurs. So the whole
Yl= 2001 matching process can be thought of as consisting of the following sequence of
These two instantiations are said to be less general than the first one because they (simpler) matching operations:
constrain the values of the variables D and Dl more strongly than necessary. For triangle=triangle,
making both terms in our example identical, it is only important that D and D l have point(l.l)= X..
the same value, although this value can be anything. Matching in Prolog always A= point(4,Y),
results in the most general instantiation. This is the instantiation that commits the point(2,3)= point(2,Z).
variables to the least possible extent, thus leaving the greatest possible freedom for The whole matching process succeeds because all the matchings in the sequence
further instantiations if further matching is required. As an example consider the succeed. The resulting instantiation is:
following question:
X=point(l,l}
?- date( D, M, 2001)= elate( Dl, may, Yl), A=point(4,Y)
elate( D, M, 2001)= elate( 15, M, Y). Z=3
36 Syntax and Meaning of Prolog Programs Matching 37
/I�
triangle
point(X.Yl)
I\
point A point
I\ 2 3
point( X. Y) point( XI. Yl
triangle point( X. Yl
/I �
/\ /\
X point point
Figure 2.8 Illustration of vertical and horizontal line segments.
-1 y z A more general question to the program is: Are there any vertical segments that
start at the point (2,3)?
Figure 2.7 Matching triangle( point(l, 1), A, point(2,3) ) = triangle( X, point(4, Y),
point(2,Z) ). ?- vertical( seg( point(Z,3), P)).
P = point(Z, Y)
The following example will illustrate how matching alone can be used for
interesting computation. Let us return to the simple geometric objects of Figure This answer means: Yes, any segment that ends at any point (2, Y), which means
2A, and define a piece of program for recognizing horizontal and vertical line anywhere on the vertical line x = 2. It should be noted that Prolog's actual answer
segments. 'Vertical' is a property of segments, so it can be formalized in Prolog as a would probably not look as neat as above, but (depending on the Prolog
unary relation. Figure 2.8 helps to formulate this relation. A segment is vertical if the implementation used) something like this:
x-coordinates of its encl-points are equal, otherwise there is no other restriction on
P = point(Z,_136)
the segment. The property 'horizontal' is similarly formulated, with only x and y
interchanged. The following program, consisting of two facts, does the job: This is, however, only a cosmetic difference. Here _136 is a variable that has not been
vertical( seg( point(X,Y), point(X,Y l))). instantiated. _136 is a legal variable name that the system has constructed during the
execution. The system has to generate new names in order to rename the user's
horizontal( seg( point(X,Y), point(Xl,Y))).
variables in the program. This is necessary for two reasons: first, because the same
The following conversation is possible with this program: name in different clauses signifies different variables, and second, in successive
applications of the same clause, its 'copy' with a new set of variables is used each
?- vertical( seg( point(l,l), point(l,2))).
time.
yes Another interesting question to our program is: Is there a segment that is both
?- vertical( seg( point(l,l), point(Z,Y)) ). vertical and horizontal?
no ?- vertical( S), horizontal( S).
?- horizontal( seg( point(l, 1), point(Z,Y))). S = seg( point(X,Y), point(X,Y))
Y=l
This answer by Prolog says: Yes, any segment that is degenerated to a point has the
The first question was answered 'yes' because the goal in the question matched one of property of being vertical and horizontal at the same time. The answer was, again,
the facts in the program. For the second question no match was possible. In the third derived simply by matching. As before, some internally generated names may
question, Y was forced to become 1 by matching the fact about horizontal segments. appear in the answer, instead of the variable names X and Y.
Declarative meaning of Prolog programs 39
Thus the difference between the declarative readings and the procedural ones is
that the latter do not only define the logical relations between the head of the clause
111 the following matching operations succeed or fail? If they succeed, what are the and the goals in the body, but also the order in which the goals are processed.
resulting instantiations of variables? Let us now formalize the declarative meaning.
The declarative meaning of programs determines whether a given goal is true,
(a) point( A, B) = point( 1, 2) and if so, for what values of variables it is true. To precisely define the declarative
(b) point( A, B) = point( X, Y, Z) meaning we need to introduce the concept of instance of a clause. An instance of a
(c) plus( 2, 2) = 4 clause C is the clause C with each of its variables substituted by some term. A variant
of a clause C is such an instance of the clause C where each variable is substituted by
(cl) +( 2, D) = +( E, 2) another variable. For example, consider the clause:
(e) triangle( point(-1,0), P2, P3) = triangle( Pl, point(l,O), point(O,Y) ) hasachild( X) :- parent( X, Y).
The resulting instantiation defines a family of triangles. How would you Two variants of this clause are:
describe this family?
hasachild( A) :- parent( A, B).
2.4 Using the representation for line segments as described in this section, write a term hasachild( Xl) :- parent( Xl, X2).
that represents any vertical line segment at x = 5. Instances of this clause are:
2.5 Assume that a rectangle is represented by the term rectangle( Pl, P2, 1'3, P4) where hasachild( peter) :- parent( peter, Z).
hasachild( barry) :- parent( barry, small(caroline) ).
the P's are the vertices of the rectangle positively ordered. Define the relation:
Given a program and a goal G, the declarative meaning says:
regular( R)
which is true if R is a rectangle whose sides are vertical and horizontal. A goal G is true (that is, satisfiable, or logically follows from the program) if and
only if:
The comma binds stronger than the semicolon. So the clause: 2.8 Rewrite the following program without using the semicolon notation.
P :- Q, R; S, T, U. translate( Number, Word) :
is understood as: Number= 1, Word= one;
Number= 2, Word= two;
P :- ( Q, R); ( S, T, U). Number= 3, Word= three.
and means the same as the clauses:
p Q, R.
P :- S, T, U.
process, and can be skipped without seriously affecting the understanding of the rest PROGRAM
of the book.
big( bear). % Clause 1
Particular operations in the goal execution process are illustrated by the example big( elephant). % Clause 2
in Figure 2.10. It may be helpful to study Figure 2.10 before reading the following small( cat). % Clause 3
general description.
brown( bear). % Clause 4
To execute a list of goats: % Clause S
black( cat).
gray( elephant). % Clause 6
Gl, G2, ..., Gm
dark( Z) :- % Clause 7: Anything black is dark
the procedure execute does the following: black( Z).
dark( Z) : % Clause 8: Anything brown is dark
brown( Z).
• If the goal list is empty then terminate with success.
• If the goal list is not empty then continue ,�ith (the following) operation
QUESTION
called 'SCANNING'. ?- dark( X), big( X). ')-b Who is dark and big?
• SCANNING: Scan through the clauses in the program from top to bottom EXECUTlON TRACE
until the first clause, C, is found such that the head of C matches the first goal (1) Initial goal list: dark( X), big( X).
G 1. If there is no such clause then terminate with failure. (2) Scan the program from top to bottom looking for a clause whose head matches the first goal
If there is such a clause C of the form dark( X). Clause 7 found:
dark( Z) :- black( Z).
H :- Bl, ..., Bn.
Replace the first goal by the instantlated body of clause 7, giving a new goal list:
then rename the variables in C to obtain a variant C' of C, such that C' and
black( X), big( X)
the list G 1, ..., Gm have no common variables. Let C' be
(3) Scan the program to find a match with black( X). Clause 5 found: black( cat).This clause has no
H' :- Bl', ... , Bn'. body, so the goal list, properly instantiated, shrinks to:
Match G 1 and H'; let the resulting instantiation of variables be S. big( cat)
In the goal list G 1, G2, . .., Gm, replace Gl with the list Bl', ... , Bn', (4) Scan the program for the goal big( cat).No clause found. Therefore backtrack to step (3) and undo
obtaining a new goal list tile instantiation X = cat. Now the goal list is again:
Bl', ... , Bn', G2, ..., Gm black( X), big( X)
(Note that if C is a fact then n = 0 and the new goal list is shorter than the Continue scanning the program below clause 5. No clause found. Therefore backtrack to step (2) and
continue scanning below clause 7. Clause 8 is found:
original one; such shrinking of the goal list may eventually lead to the empty
list and thereby a successful termination.) dark( Z) :- brown( Z).
Substitute the variables in this new goal list with new values as specified in Replace the first goal in the goal list by brown( X), giving:
the instantiation S, obtaining another goal list brown( X), big( X)
Bl", ... , Bn", G2', ..., Gm' (5) Scan the program to match brown( X), finding brown( bear). This claLtse has no body, so the goal
list shrinks to:
• Execute (recursively with this same procedure) this new goal list. If the
big( bear)
execution of this new goal list terminates with success then terminate
(6) Scan the program and find clause big( bear). It has no body so the goal list shrinks to empty. This
the execution of the original goal list also with success. If the execution indicates successful termination, and the corresponding variable instantiation is:
of the new goal list is not successful then abandon this new goal list and go
X = bear
back to SCANNING through the program. Continue the scanning with the
clause that immediately follows the clause C (C is the clause that was last
Figure 2.10 An example to illustrate the procedural meaning of Prolog: a sample trace of the procedure
used) and try to find a successful terminati0n using some other clause.
execute.
Procedural meaning 45
44 Syntax and Meaning of Prolog Programs
This procedure is more compactly written in a Pascal-like notation in Figure procedure execute (Program, Goa/List, Success);
2.11. Input arguments:
Several additional remarks are in order here regarding the procedure execute as Program: list of clauses
presented. First, it was not explicitly described how the final resulting instantiation Goa/List: list of goals
of variables is produced. It is the instantiation S which led to a successful Output argLiment:
Success: truth value; Success will become true if Goa/List is true with respect to Program
termination, and was possibly further refined by additional instantiations that were
Local variables:
done in the nested recursive calls to execute. Goal: goal
Whenever a recursive call to execute fails, the execution returns to SCANNING, OtilerGoals: list of goals
continuing at the program clause C that had been last used before. As the Satisfied: truth value
application of the clause C did not lead to a successful termination Prolog has to MatchOK: truth value
try an alternative clause to proceed. What effectively happens is that Prolog Instant: instantiation of variables
abandons this whole part of the unsuccessful execution and backtracks to the point H, H', Bl,Bl', ... , B11, 811': goals
(clause C) where this failed branch of the execution was started. When the Auxiliary functions:
empty(L): returns true if L is the empty list
procedure backtracks to a certain point, all the variable instantiations that were
head(L): returns the first element of list L
done after that point are undone. This ensures that Prolog systematically examines tail(L): returns the rest of L
all the possible alternative paths of execution until one is found that eventually append(Ll,l2): appends list L2 at the end of list LI
succeeds, or until all of them have been shown to fail. match(Tl,T2,1'vfatcilOK,Jnstant): tries to match terms Tl and T2; if
We have already seen that even after a successful termination the user can force succeeds then MatchOK is true and Instant is the corresponding instantiation of variables
the system to backtrack to search for more solutions. In our description of execute substitute(I11stant,Coals): substitutes variables in Coals according to instantiation Instant
this detail was left out. begin
Of course, in actual implementations of Prolog, several other refinements have to if mzpty(Goallist) then Success :c, true
be added to execute. One of them is to reduce the amount of scanning through the else
program clauses to improve efficiency. So a practical Prolog implementation will not begin
scan through all the clauses of the program, but will only consider the clauses about Coal := lzead(Goa/List);
the relation in the current goal. OtherCoals := tai/(Goallist);
Satisfied := false;
while not Satisfied and "more clauses in program" do
begin
Let next clause in Program be
H :-Bl, ... ,Bn.
Construct a variant of this clause
H' :-Bl', ... ,Bn'.
match(Goal,H',Matc/1OK,Irzstarzt);
onsider the program in Figure 2.10 and simulate, in the style of Figure 2.10, if MatchOK then
olog's execution of the question: begin
NewGoals := append([B1',... ,En'], OtherCoals);
7- big( X), dark( X). NewGoczls := substitute(Instarzt,NewCoals);
exernte(Program,NewCoals,Satisf7ed)
1pare your execution trace with that of Figure 2.10 when the question was end
1tially the same, but with the goals in the order: end;
Success := Satisfied
'.lark( X), big( X) . end
end;
:h of the two cases does Prolog have to do more work before the answer is
Figure 2.11 Executing Prolog goals.
46 Syntax and Meaning of Prolog Programs Example: monkey and banana 47
(1) Monkey is at door. Statel is the state before the move, Move is the move executed and StateZ is the state
(2) Monkey is on floor. after the move.
The move 'grasp', with its necessary precondition on the state before the move,
(3) Box is at window.
can be defined by the clause:
(4) Monkey does not have banana.
move( state( middle, onbox, middle, hasnot), % Before move
It is convenient to combine all of these four pieces of information into one structured grasp, %Move
object. Let us choose the word 'state' as the functor to hold the four components state( midclle, onbox, micldle, has) ). % After move
together. Figure 2.12 shows the initial state represented as a structured object.
Our problem can be viewed as a one-person game. Let us now formalize the rules This fact says that after the move the monkey has the banana, and he has remained
of the game. First, the goal of the game is a situation in which the monkey has the on the box in the middle of the room.
In a similar way we can express the fact that the monkey on the floor can walk
banana; that is, any state in which the last component is 'has':
from any horizontal position Posl to any position PosZ. The monkey can do this
state(_, _, _, has) regardless of the position of the box and whether it has the banana or not. All
this can be defined by the following Prolog fact:
state
//\�
move( state( Posl, onfloor, Box, Has),
walk( Posl, PosZ), %, Walk from Pos l to Pos2
state( PosZ, onfloor, Box, Has) ).
atdoor on floor atwindow hasnot Note that this clause says many things, including, for example:
Figure 2.12 The initial state of the monkey world represented as a structured object. The four • the move executed was 'walk from some position Posl to some position
components are: horizontal position of monkey, vertical position of monkey, PosZ';
position of box, monkey has or has not banana. • the monkey is on the floor before and after the move;
48 Syntax and Meaning of Prolog Prograrr.s Example: monkey and banana 49
• the box is at some point Box which remained the same after the move; %, move( Statel, Move, State2): making Move in Sta tel results in State2;
• the 'has banana' status Has remains the same after the move. % a state is represented by a term:
% state( MonkeyHorizontal, MonkeyVertical, BoxPosition, l·lasBanana)
The clause actually specifies a whole set of possible moves because it is applicable to move( state( miclclle, onbox, middle, hasnot), % Before move
any situation that matches the specified state before the move. Such a specification grasp, % Grasp banana
is therefore sometimes also called a move schema. Using Prolog variables, such state( middle, onbox, middle, has) ). 91, After move
schemas can be easily programmed in Prolog. move( state( P, onfloor, P, H),
The other two types of moves, 'push' and 'climb', can be similarly specified. climb, % Climb box
The main kind of question that our program will have to answer is: Can the state( P, onbox, P, H) ).
monkey in some initial state State get the banana? This can be formulated as a move( state( Pl, onfloor, Pl, H),
predicate push( Pl, P2), % Push box from P 1 to P2
state( P2, onfloor, P2, H) ).
canget( State)
move( state( Pl, onfloor, B, H),
where the argument State is a state of the monkey world. The program for canget can walk( Pl, P2), % Walk from r 1 to rz
be based on two observations: state( P2, onfloor, B, I-1) ).
(1) For any state in which the monkey already has the banana, the predicate canget '¼, canget( State): monkey can get banana in State
must certainly be true; no move is needed in this case. This corresponds to the canget( state(_,_,_, has) ). %, can 1: Monkey already has it
Prolog fact: canget( State 1) :- % can 2: Do some work to get it
move( State 1, Move, State2), '¼, Do something
canget( state(_,_,_, has) ).
canget( State2). % Get it now
(2) In other cases one or more moves are necessary. The monkey can get the
banana in any state Statel if there is some move \love from Statel to some state Figure 2.14 A program for the monkey and banana problem.
State2, such that the monkey can then get the banana in state State2 (in zero or
more moves). This principle is illustrated in Figure 2.13. A Prolog clause that
corresponds to this rule is: to the program:
canget( State 1) :- ?- canget( state( atdoor, onfloor, atwindow, hasnot) ).
move( State 1, Move, State2),
canget( State2). Prolog's answer is 'yes'. The process carried out by Prolog to reach this answer
This completes our program, which is shown in Figure 2.14. proceeds, according to the procedural semantics of Prolog, through a sequence of
The formulation of canget is recursive and is similar to that of the predecessor goal lists. [t involves some search for the right moves among the possible alternative
relation of Chapter 1 (compare Figures 2.13 and 1.7). This principle is used in Prolog moves. At some point this search will take a wrong move leading to a dead branch.
again and again. At this stage, backtracking will help it to recover. Figure 2.15 illustrates this search
We have developed our monkey and banana program in the non-procedural way. process.
Let us now study its procedural behaviour by considering the following question To answer the question Prolog had to backtrack once only. A right sequence of
moves was found almost straight away. The reason for this efficiency of the program
was the order in which the clauses about the move relation occurred in the
Move Statem program. The order in our case (luckily) turned out to be quite suitable. However,
State l
... � less lucky orderings are possible. According to the mies of the game, the monkey
� could just as easily try to walk here or there without ever touching the box, or
canget canget has aimlessly push the box around. A more thorough investigation will reveal, as shown
in the following section, that the ordering of clauses is, in the case of our program,
Figure 2.13 Recursive formulation of canget. in fact critical.
''?' 7LT:S:Y <+ 4r"""'-�°"'· , A"°< ¼-.;zT;v. "' �',,< >� :,,•:,� ·�S..cA<;.J7T"'C��"7'� ..p
Using the clause above, the goal p is replaced by the same goal p; this will be in turn
state( atdoor, onfloor, atwindow, hasnot)
replaced by p, etc. In such a case Prolog will enter an infinite loop not noticing that
no progress is being made.
walk( atdoor. P2) This example is a simple way of getting Prolog to loop indefinitely. However,
similar looping could have occurred in some of our previous example programs if we
changed the order of clauses, or the order of goals in the clauses. It will be instructive
state( P2, onfloor, atwindow, hasnot) to consider some examples.
In the monkey and banana program, the clauses about the move relation were
/
push( P2, P2') ordered thus: grasp, climb, push, walk ( perhaps 'unclimb' should be added for
climb/ //backtrack
/ P2 = atwindow completeness). These clauses say that grasping is possible, climbing is possible, etc.
According to the procedural semantics of Prolog, the order of clauses indicates that
the monkey prefers grasping to climbing, climbing to pushing, etc. This order of
state( atwindow, onbox, atwinJow, hasnot) state( P2', onfloor, P2', hasnol) preferences in fact helps the monkey to solve the problem. But what could happen if
the order was different? Let us assume that the 'walk ' clause appears first. The
No move
possible: climb execution of our original goal of the previous section
?- canget( state( atdoor, on floor, atwindow, hasnot) ).
state( P2', onbox, P2', hasnut) would this time produce the following trace. The first four goal lists (with variables
appropriately renamed) are the same as before:
grasp
P2' = middle canget( state( atdoor, onfloor, atwindow, hasuot))
(1)
The second clause of canget ('can2') is applied, producing:
state( middle, onbox, middle, has)
(2) move( state( atdoor, onfloor, atwindow, hasnot), M', S2'),
canget( S2')
Figure 2.15 The monkey's search for the banana, The search starts at the top node and
proceeds downwards, as indicated, Alternative moves are tried in the left-to-right By the move walk( atdoor, PZ') we get:
order, Backtracking occurred once only,
(3) canget( state( P2', onfloor, atwindow, hasnot))
Now the difference occurs. The first clause whose head matches the first goal above
is now 'walk' (ancl not 'climb' as before). The instantiation is
2,6.1 Danger of indefinite looping
S2" = state( PZ", onfloor, atwindow, hasnot)
Consider the following clause:
Therefore the goal list becomes:
p :- p. canget( state( P2", onfloor, atwindow, hasnot) )
(5)
This says that 'p is true if p is true'. This is declaratively perfectly correct, but
procedurally is quite useless. In fact, such a clause can cause problems to Prolog. Applying the clause 'can2' we obtain:
Consider the question: (6) move( state( P2", onfloor, atwindow, hasnot), M"', S2"' ),
?- p. canget( S2"' )
52 Syntax and Meaning of Prolog Programs Order of clauses and goals 53
Again, 'walk' is now tried first, producing: (1) the order of clauses in the program, and
(7) canget( state( P2"', onfloor, atwindow, hasnot) ) (2) the order of goals in the bodies of clauses.
Let us now compare the goals (3), (5) and (7). They are the same apart from one The predecessor procedure consists of two clauses, and one of them has two goals in
variable; this variable is, in turn, P', P1' and P"'. As we know, the success of a goal does the body. There are, therefore, four variations of this program, all with the same
not depend on particular names of variables in the goal. This means that from goal declarative meaning. The four variations are obtained by:
list (3) the execution trace shows no progress. We can see, in fact, that the same
(1) swapping both clauses, and
two clauses, 'can2' and 'walk', are used repetitively. The monkey walks around
without e\·er trying to use the box. As there is no progress made this will (2) swapping the goals for each order of clauses.
(theoreticallv) go on for ever: Prolog will not realize that there is no point in
The corresponding four procedures, called predl, pred2, precl3 and pred4, are showrr
continuing along this line.
in Figure 2.16.
This example shows Prolog trying to solve a problem in such a way that a solution
is ne\·er reached, although a solution exists. Such situations are not unusual in
Prolog programming. Infinite loops are, also, not unusual in other programming
languages. What is unusual in comparison with other languages is that a Prolog % Four versions of the predecessor program
program may be declaratively correct, but at the same time be procedurally incorrect % The original version
in that it is not able to produce an answer to a question. In such cases Prolog may not predl( X, Z) :-
be :1b!e to satisfy a goal because it tries to reach an answer by choosing a wrong path. parent( X, ZJ.
..\ nJturJl question to ask at this point is: Can we not make some more substantial predl( X, Z) :
change to our program so as to drastically prevent anv danger of looping? Or shall parent( X, Y),
me al11·a\·s ha\·e to rely just on a suitable ordering of clauses and goals? As it turns out predl( Y, Z).
programs. especiallv large ones, 'A"Ould be too fragile if they just had to rely on some
'¾, Variation a: swap clauses of the original version
suitable ordering. There are several other methods that preclude infinite loops, and
these are much more general and robust than the ordering method itself. These pred2( X, Z) :-
techniques will be used regularly later in the book, especially in those chapters that parent( X, Y),
pred2( Y, Z).
deal 11ith path finding, problem solving and search.
prec12( X, Z) :-
parent( X, Z).
2.6.2 Program variations through reordering of clauses and goals % Variation b: swap goals in second clause of the original version
pred3( X, Z) :-
..\lreJd\· in the example programs of Chapter 1 there was a latent danger of parent( X, Z).
producing a cycling behaviour. Our program to specify the predecessor relation in pred3( X, Z) :-
Chapter 1 was: pred3( X, Y),
parent( Y, Z).
predecessor( Parent, Child)
parent( Parent, Child). % Variation c: swap goals and clauses of the original version
pre<leces.sor( Predecessor, Successor) pred4( X, Z) :-
parent( Predecessor, Child), pred4( X, Y),
predecessor( Child, Successor). parent( Y, Z).
Let us analvze some variations of this program. All the variations will clearly have pred4( X, Z) :-
parent( X, Z).
the same declarative meaning, but not the same procedural meaning. According to
the declarative semantics of Prolog we can, without affecting the declarative
meaning, change: Figure 2.16 Four versions of the predecessor program.
54 Syntax and Meaning of Prolog Programs Order of clauses and goals 55
This question again brings the system into an infinite sequence of recursive calls.
There are important differences in the behaviour of these four declaratively
equivalent procedures. To demonstrate these, consider the parent relation as shown Thus prec!3 also cannot be considered procedurally correct.
in Figure 1.1 of Chapter l. Now, what happens if we ask whether Tom is a
predecessor of Pat using the four variations of the predecessor relation: 2.6.3 Combining declarative and procedural views
?- predl( tom, pat).
The foregoing section has shown that the order of goals and clauses does matter.
yes
Furthermore, there are programs that are declaratively correct.. but do not work in
?- pred2( tom, pat). practice. Such discrepancies between the declarative and procedural meaning may
yes pred2( X, Z) :
parent( X, Y), pred21 lorn, pall
?- pred3( torn, pat).
prec!2( Y, Z).
yes pred2( X. Z) :
parent( X, Z). parenll tom. y· I
?- pred4( torn, pat). pred21 y'. pall
In the la:;t case Prolog cannot find the answer. This is manifested on the terminal by l, y· = hoh
a Prolog message such as 'More core needed' or 'Stack overflow'. pred21 hoh. pall
Figure 1.11 in Chapter 1 showed the trace of preen (in Chapter 1 called
predecessor) produced for the above question. Figure 2.17 shows the corresponding
traces for pred2, prec!3 and pred4. Figure 2. l 7(c) clearly shows that pred4 is hopeless,
and Figure 2. l 7(a) indicates that pred2 is rather inefficient compared to predl: pred2
does much more searching and backtracking in the family tree. yes
This comparison should remind us of a general practical heuristic in problem
solving: it is usually best to try the simplest idea first. In our case, all the versions of
the predecessor relation are based on two ideas:
(1) the simpler idea is to check whether the two arguments of the predecessor
relation satisfy the parent relation; no no
(2) the more complicated idea is to find somebodv 'between' both people (some-
pred2( pat, pat)
body who is related to them by the parent and predecessor relations).
4)'
Of the four variations of the predecessor relation, precll does simplest things first. On l'
the contrary, pred4 always tries complicated things first. prec!2 and pred3 are in parent( pat, pat)
between the two extremes. Even without a detailed study of the execution traces, no
predl should be preferred merely on the grounds of the mle 'try simple things first'.
This rule will be in general a useful guide in programming.
Our four variations of the predecessor procedure can be further compared by
considering the question: What types of questions can particular variations answer,
parent(jim, y'")
and what types can they not answer? It turns out that predl and precl2 are both able pred2( y"", pat) parent(jim, pat)
to reach an answer for any type of question about predecessors; pred4 can never no no
(a)
reach an answer; and pred3 sometimes can and sometimes cannot. One example in
which pred3 fails is: Figure 2.17 The behaviour of three formulations of the predecessor relation on the question:
Is Tom a predecessor of Pat?
?- pred3( liz, jim).
56 Syntax and Meaning of Prolog Programs
The relation between Prolog and logic 57
pred3( X, Z) :- ·
pred3(tom, pat)
parent( X, Z).· appear annoying. One may argue: Why not simply forget about the declarative
pred3( X, Z) :- · meaning? This argument can be brought to an extreme with a clause such as:
pred3( X, Y),
parent( Y, Z). predecessor( X, Z) :- predecessor( X, Z).
pred3( tom, y'J
parent( tom, pat) parent( y', pat) which is declaratively correct, but is completely useless as a working program.
no The reason why we should not forget about the declarative meaning is that
progress in programming technology is achieved by moving away from procedural
parent( tom, Y') details toward declarative aspects, which are normally easier to formulate and
parent( y', pat) understand. The system itself, not the programmer, should carry the burden of
jl y' = bob
filling in the procedural details. Prolog does help toward this encl, although, as we
have seen in this section, it only helps partially: sometimes it does work out the
procedural details itself properly, and sornettmes it does not. The philosophy
parent( bob, pat) adopted by many is tt1at it is better to have at least some declarative meaning rather
yes than none ('none' is the case in most other programming languages). The practical
(h)
aspect of this view is that it is often rather easy to get a working program once we
have a program that is declaratively correct. Consequently, a useful practical
approach that often works is to concentrate on the declarative aspects of the
problem, then test the resulting program, and if it fails procedurally try to rearrange
the clauses and goals into a suitable order.
pred4( X, Z) : pred-i( torn, pat)
pred4( X, Y),
parent( Y, Z).
pred4( X, Z) :
2.7 The relation between Prolog and logic
parent( X, Z).
pred4(tom, Y) Prolog is related to mathematical logic, so its syntax and meaning can be specified
parent( y', pat)
most concisely with references to logic. Prolog is indeed often defined that way.
However, such an introduction to Prolog assumes that the reader is familiar with
certain concepts of mathematical logic. These concepts are, on the other hand,
pred4( tom, Y")
certainly not necessary for understanding and using Prolog as a programming tool,
parent( y'', v') which is the aim of this book. For the reader who is especially interested in the
parent( y', pat) relation between l'rolog and logic, the following are some basic links to mathemat
ical logic, together with some appropriate references.
Prolog's syntax is that of the fzrst-order predicate logic formulas written in the so
called clause form (a conjunctive normal form in which quantifiers are not explicitly
pred4( tom, y"') written), and further restricted to Horn clauses only (clauses that have at most one
parent( y'", Y")
parent( y", Y'J positive literal). Clocksin and Mellish (1987) give a Prolog program that transforms a
parent( y', pat) first-order predicate calculus formula into the clause form. The procedural meaning
of Prolog is based on the resolution principle for mechanical theorem proving
introduced by Robinson in his classic paper (1965). Prolog uses a special strategy
(c) for resolution theorem proving called SLD. An introduction to the first-order
predicate calculus and resolution-based theorem proving can be found in several
Figure 2.17 contd general books on artificial intelligence (Genesereth and Nilsson 1987; Ginsberg
1993; Poole eta/. 1998; Russell and Norvig 1995; see also Flach 1994). Mathematical
58 Syntax and Meaning of Prolog Programs References 59
questions regarding the properties of Prolog's procedural meaning with respect to • Matching, if it succeeds, results in the most general instantiation of variables.
logic are analyzed by Lloyd (1991). • The declarative semantics of Prolog defines whether a goal is true with respect to a
Matching in Prolog corresponds to what is called 1111ifzcatio11 in logic. However, we given program, and if it is true, for what instantiation of variables it is true.
avoid the word unification because matching, for efficiency reasons in most Prolog • A comma between goals means the conjunction of goals. A semicolon between
systems, is implemented in a way that does not exactly correspond to unification
goals means the disjunction of goals.
(see Exercise 2.10). But from the practical point of view this approximation to
unification is quite adequate. Proper unification requires the so-called ocrnrs check:
• The procedural semantics of Prolog is a procedure for satisfying a list of goals in
does a given variable occur in a given term? The occurs check would make matching the context of a given program. The procedure outputs the truth or falsity of the
inefficient. goal list and the corresponding instantiations of variables. The procedure
automatically backtracks to examine alternatives.
• The declarative meaning of programs in 'pure Prolog' does not depend on the
Exercise order of clauses and the order of goals in clauses.
• The procedural meaning does depend on the order of goals and clauses. Thus the
2.10 What happens if we ask Prolog: order can affect the efficiency of the program; an unsuitable order may even lead
?- X = f( X). to infinite recursive calls.
Should this request for matching succeed or fail? According to the definition of
• Given a declaratively correct program, changing the order of clauses and goals
can improve the program's efficiency while retaining its declarative correctness.
unification in logic this should fail, but what happens according to om definition of
Reordering is one method of preventing indefinite looping.
matching in Section 2.27 Try to explain why many Prolog implementations answer
the question above with: • There are other more general techniques, apart from reordering, to prevent
indefinite looping and thereby make programs procedurally robust.
X = f(f(f(f(f(f(f(f(f(f(f(f(f(f( ...
• Concepts discussed in this chapter are:
data objects: atom, number, variable, structure
term
?.�r:'.:.r.1�_rY....................................................... .... ······ ················· ································ . functor, arity of a functor
principal functor of a term
So far we have covered a kind of basic Prolog, also called 'pure Prolog'. lt is 'pure' matching of terms
because it corresponds closely to formal logic. Extensions whose aim is to tailor the most general instantiation
language toward some practical needs will be covered later in the book (Chapters 3, declarative semantics
5, 6, 7). Important points of this chapter are: instance of a clause, variant of a clause
• procedural semantics
Simple objects in Prolog are atoms, variables and numbers. Structured objects, or
executing goals
structures, are used to represent objects that have several components.
• Structures are constructed by means of (zmctors. Each functor is defined by its
name and arity.
References
• The type of object is recognized entirely by its syntactic form.
• The lexical scope of variables is one clause. Thus the same variable name in two Clocksin, W.F. and Mellish, C.S. (1987) Programming in Prolog, second edition. Berlin: Springer
clauses means two different variables. Verlag.
Flach, P. (1994) Simply Logical: Intelligent Reasoning by Example. Chichester, UK: Wiley.
• Structures can be naturally pictured as trees. Prolog can be viewed as a language Genesereth, M.R. and Nilsson, N.J. (1987) Logical Foundation o(Artif,cial Intelligence. Palo Alto,
for processing trees. CA: Morgan Kaufmann.
• The matching operation takes two terms and tries to make them identical by Ginsberg, M. (1993) Essentials o(Artificial Intelligence. San Francisco, CA: Morgan Kaufmann.
instantiating the variables in both terms. Lloyd, J.W. (1991) Foundations o(Logic Programming, second edition. Berlin: Springer-Verlag.
60 Syntax and Meaning of Prolog Programs
Robinson, A.J. (1965) A machine-oriented logic basec! on the resolution principle. JAC!vf 12: chapter 3
23-41.
Poole, D., Mackworth, A. and Gaebel, R. (1998) Computational Intelligence: A Logical Approach.
In this chapter we will study a special notation for lists, one of the simplest and most
useful structures, and some programs for typical operations on lists. We will also
look at simple arithmetic and the operator notation, which often improves the
readability of programs. Basic Prolog of Chapter 2, extended with these three
additions, becomes a convenient framework for writing interesting programs.
The list is a simple data structure widely used in non-numeric programming. A list is
a sequence of any number of items, such as ann, tennis, tom, skiing. Such a list can be
written in Prolog as:
[ ann, tennis, tom, skiing]
This is, however, only the external appearance of lists. As we have already seen in
Chapter 2, all structured objects in Prolog are trees. Lists are no exception to this.
How can a list be represented as a standard Prolog object? We have to consider
two cases: the list is either empty or non-empty. In the first case, the list is simply
written as a Prolog atom,[]. In the second case, the list can be viewed as consisting
of two things:
%]
61
62 Lists, Operators, Arithmetic
Representation of lists 63
This list has the empty list as its tail: Then we could write:
Tail= [b,c] and L =.(a, Tail)
[ skiing] =.(skiing, ( J )
To express this in the square bracket notation for lists . Prolog provides another
This example shows how the general principle for structuring data objects in notational extension, the vertical bar, which separates the head and the tail:
Prolog also applies to lists of any length. As our example also shows, the straight
forward notation with dots and possibly deep nesting of subterms in the taii part can L = ( a I Tail]
produce rather confusing expressions. This is the reason why Prolog provides the The vertical bar notation is in fact more general: we can list any number of elements
neater notation for lists, so that they can be written as sequences of items enclosed followed by' I' and the list of remaining items. Thus alternative ways of writing the
in square brackets. A programmer can use both notations, but the square bracket above list are:
notation is, of course, normally preferred. We will be aware, however, that this is [a,b,c] = [a I (b,c] ] = [a,b I (c] ] = [a,b,c I [11
To summarize:
ann
/� • A list is a data structure that is either empty or consists of two parts: a /zeacl and a
tail. The tail itself has to be a list.
/ �
• Lists are handled in Prolog as a special case of binary trees. For improved
tennis
readability Prolog provides a special notation for lists, th us accepting lists
/ � written as:
tom
[ Iteml, Item2, ... ]
/ �
or
skiing []
[ Head I Tail]
or
Figure 3.1 Tree representation of the list [ ann, tennis, tom, skiing].
[ Item 1, Jtem2, ... I Others]
Some operations on lists 65
64 Lists, Operators, Arithmetic
=
[XI Lll
member( [b,c], [a,[b,cl])
is true. The program for the membership relation can be based on the following
observation: Ix· 1�.::···· Ji···-····-11·:---.L2-·· ·1
X is a member of L if either:
(1) Xis the head of L, or
(2) X is a member of the tail of L.
rx r"'
-,
-- :-•--"•-=-�
�=·-···,,. ��"'""t:::r:"""
L3
3
-- : --·
·r:::::r'.C:1& ',::::,....,,- -·
·:1.
,.,,,,..,,_
This can be ½'Titten in two clauses; the first is a simple fact and the second is a rule:
[XI L3]
member( X, [X I Tail]).
member( X, [Head I Tail]) Figure 3.2 Concatenation of lists.
member( X, Tail).
66 Lists, Operators, Arithmetic Some operations on lists 67
Although the cone program looks rather simple it can be used flexibly in many other This clause says: Xis a member of list Lif Lcan be decomposed into two lists so that
ways. For example, we can use cone in the inverse direction for decomposing a given the second one has X as its head. Of course, memberl defines the same relation as
list into two lists, as follows: member. We have just used a different name to distinguish between the two
implementations. Note that the above clause can be written using anonymous
?- cone( Ll, L2, [a,b,c] ).
variables as:
L1 = (]
L2= [a,b,c]; mernberl( X, L)
L1 = [a] cone( __ , [X I_), L).
L2 = [b,cJ;
ft is interesting to compare both implementations of the membership relation,
Ll = [a,b]
L2= [c]; member and memberl. member has a rather straightforward procedural meaning,
which is as follows:
LI= [a,b,c]
L2 = []; To check whether some Xis a member of some list L:
no ( I) first check whether the head of Lis equal to X, and then
(2) check whether Xis a member of the tail of L.
It is possible to decompose the list [a,b,c] in four ways, all of which were found by
our program through backtracking. On the other hand, the declarative reading of memberl is straightforward, but its
We can also use our program to look for a certain pattern in a list. For example, procedural meaning is not so obvious. An interesting exercise is to tind how
we can find the months that precede and the months that follow a given month, as member! zictually computes something. An example execution trzice will give some
in the following goal: idea: let us consider the question:
?- cone( Before, [may I After],
[jan,feb,mar,apr,may,jun,jul,aug,sep,oct,nov,dec] ). ?- member!( b, [a,b,cJ ).
Before= [jan,fcb,mar,apr] Figure 3.3 shows the execution trace. From the trace we can infer that member!
After= [jun,jul,aug,sep,oct,nov,dec]. behaves similc1rly to member. It scans the list, element by element, until the item in
Further we can find the immediate predecessor and the immediate successor of May question is found or the list is exhausted.
by asking:
?- cone(_ , [Monthl,may,Month2 [ _],
[jan,feb,mar,apr,may,jun,jul,aug,sep,oct,nov,dec] ). Exercises
Monthl= apr
Month2= jun 3.1 (a) Write a goal, using cone, to delete the last three elements from a list Lproducing
another list Ll. Hint: Lis the concatenation of Ll and a three-element list.
Further still, we can, for example, delete from some list, Ll, everything that follows
(b) Write a goal to delete the first three elements and the last three elements from a
three successive occurrences of z in Ll together with the three z's. For example:
list L producing list L2.
?- Ll= [a,b,z,z,c,z,z,z,d,e],
cone( L2, [z,z,z I _], LI). 3.2 Define the relation
Ll = [a,b,z,z,c,z,z,z,d,e]
L2= [a,b,z,z,c] last( Item, List)
We have already programmed the membership relation. Using cone, however, the so that Item is the last element of a list List. Write two versions: (a) using the cone
membership relation could be elegantly programmed by the clause: relation, (b) without cone.
Lists, Operators, Arithmetic Some operations on lists 69
68
[XI L] no
So we actually need no procedure for adding a new element in front of the list. In general, the operation of inserting X at any place in some list List giving BiggerList
Nevertheless, if we want to define such a procedure explicitly, it can be written as can be defined by the clause:
the fact: insert( X, List, BiggerList)
add( X, L, [X I L] ). clel{ X, BiggerList, List).
i
,,$�fg'.,i:f:fif}.:.;t;(J;�";�¼,U.;,},
' 'l ?i.rffJ�q��'", · ·r-;::1'".~"
70 Lists, Operators, Arithmetic Some operations on lists 71
In memherl we elegantly implemented the membership relation by using cone . Of course, the sublist procedure can be used flexibly in several ways. Although it was
We can also use de! to test for membership. The idea is simple: some Xis a member designed to check if some list occurs as a sublist within another list it can also be
of List if Xcan be deleted from List : used, for example, to find all sublists of a given list:
As we have seen before, the cone relation can be used for decomposing lists. So the ?- permutation( [a,b,c], P).
above formulation can be expressed in Prolog as: P = [a,b,c];
sublist( S, L) :- P = [a,c,bJ;
cone( Ll, L2 , L), I'= [b,a,c];
cone( S, L3, L2).
L The program for permut ation can be, agam, based on the consideration of two cases,
t::?:C�J C--� 11 [:�:�.::-:I sublist( S, L) Two Prolog clauses that correspond to these two cases are:
permutation( [ ], []).
L2 permutation( [X I L ], P)
permutation( L , Ll),
Figure 3.4 The member and sublist relations. insert( X, Ll , P).
72 Lists, Operators, Arithmetic
Some operations on lists 73
X ,, Exercises
i
=,;;.
so that they are true if their argument is a list of even or odd length respectively. For
r -··•····--·U ... _ . _ ... :j L1 is a permutation ofL example, the list [a,b,c,d] is 'evenlength' and [a,b,c] is 'oddlength',
insert X obtaining a permutation of [ X I L ] 3.4 Detine the relation
Figure 3.5 One way of constructing a permutation of the list [X I L]. reverse( List, ReverseelList)
Another attempt to use permutation is: Use the following as an auxiliary relation:
Our first version, permutation, will now instantiate L successfully to all six permuta JS Define the relation
tions. If the user then requests more solutions, the program would never answer 'no'
subset( Set, Subset)
because it would get into an infinite loop trying to find another permutation when
there is none. Our second version, permutation2, will in this case find only the first where Set and Subset are two lists representing two sets. We would like to be able to
(identical) permutation and then immediately get into an infinite loop. Thus, some use this relation not only to check for the subset relation, but also to generate all
care is necessary when using these permutation programs. possible subsets of a given set. For example:
74 Lists, Operators, Arithmetic Operator notation 75
/�
?- subset( [a,b,c], S).
S = [a,b,c];
/�
S = [a,b]; * *
S = [a,c];
S = [a];
/ �
S = [b,c];
Figure 3.6 Tree representation of the expressio1 2,a + b,c.
S = [b];
Since we would normally prefer to have such expressions written in the usual, infix
3.9 Defir1e the relation
style with operators, Prolog caters for this notational convenience. Prolog will
cliviclelist( Li t, Listl, List2)
0
therefore accept our expression written simply as:
so that the elements of List are partitioned between Listl and List2, and List! and List2 2*a + b,c
are of approximately the same length. For example, cliviclelist( [a,b,c,cl,e], [a,c,e], [b,d] ). This will be, however, only the external representation of this object, which will
3.10 Rewrite the monkey and banana program of Chapter 2 as the relation be automatically converted intv the usual form of Prolog terms. Such a term will be
output for the user, again, in its external, infix form.
canget( State, Actions)
Thus operators in Prolog are merely a notational extension. If we write a+ b,
to answer not just 'yes' or 'no', but to produce a sequence of monkey's actions Prolog will handle it exactly as if it had been written +(a,b). In order that Prolog
represented as a list of moves. For example: properly understands expressions such as a - b,c, Prolog has to know that * binds
stronger than+. We say that -c has higher precedence than,. So the precedence of
Actions= [ walk(door,winc!ow), push(winclow,mic!c!le), climb, grasp]
operators decides what is the correct interpretation of expressions. For example, the
3.11 Define the relation expression a+ b,c can be, in principle, understood either as
flatten( List, FlatList) +( a, ,(b,c))
where List can be a list of lists, and FlatList is List 'flattened' so that the elements of or as
List's sublists (or sub-sublists) are reorganized as one plain list. For example: ,( +(a,b), c)
?- flatten( [a,b,[c,cl],[ ],[[[ell],t], L ). The general rule is that the operator with the highest precedence is the principal
L = [a,b,c,cl,e,f] functor of the term. If expressions containing + and , are to be understood
according to our normal conventions, then + has to have a higher precedence than
*· Then the expression a+ !J,c means the same as a + (b,c). If another interpretation
is intended, then it has to be explicitly indicated by parentheses - for example,
3.3 Operator notation
···· • • ······························································································································· ····· (a+ b)*c.
A programmer can define his or her own operators. So, for example, we can detine
In mathematics we are used to writing expressions like the atoms has and supports as intix operators and then write in the program facts
2•a + b•c like:
where + and * are operators, and 2, a, b, are arguments. In particular, + and * are peter has information.
floor supports table.
said to be infix operators because they appear between the two arguments. Such
expressions can be represented as trees, as in Figure 3.6, and can be written as Prolog These facts are exactly equivalent to:
terms with + and * as functors:
has( peter, information).
+( ,(2,a), ,(b,c) ) supports( floor, table).
76 Lists, Operators, Arithmetic Operator notation 77
A programmer can define new operators by inserting into the program special
kinds of clauses, sometimes called directives, which act as operator definitions. An
operator definition must appear in the program before any expression containing
that operator. For our example, the operator has can be properly defined by the
directive:
:- op( 600, xfx, has).
This tells Prolog that we want to use 'has' as an operator, whose precedence is 600 precedence 500 precedence 500
and its type is 'xfx', which is a kind of infix operator. The form of the specifier 'xfx'
suggests that the operator, denoted by 'f', is between the two arguments denoted Figure 3.7 Two interpretations of the expres,ion a - b - c assuming that' - ' has
by 'x'. precedence 500. If' - ' is of type yfx, then interpretation 2 is invalid because
Notice that operator definitions do not specif\• any operation or action. In the precedence of b - c is not less than the precedence of' -'.
principle, 110 uperntion 011 data is associated wit/; cm operator (except in very special
cases). Operators are normally used, as functors, only to combine objects into
structures and not to invoke actions on data, although the word 'operator' appears is normally understood as (a - b) - c, and not as a (b - c). To achieve the normal
to suggest an action. interpretation the operator ' - ' has to be defined as yfx. Figure 3. 7 shows why the
Operator names are atoms. An operator's precedence must be in some range second interpretation is then ruled out.
which depends on the implementation. We will assume that the range is between As another example consider the prefix operator not. If not is defined as fy then
1 and 1200. the expression
There are three groups of operator types which are indicated by type specifiers not not p
such as xfx. The three groups are:
is legal; but if not is defined as fx then this expression is illegal because the argument
(l) infix operators of three types:
to the first not is not p, which has the same precedence as not itself. [n this case the
xfx xfy yfx expression has to be written with parentheses:
(2) prefix operators of two types: not( not p)
fx fy
For convenience, some operators are predefined in the Prolog system so that they
(3) postfix operators of two types: can be readily used, and no definition is needed for them. What these operators are
xf yf and what their precedences are depends on the implementation of Prolog. We
will assume that this set of 'standard' operators is as if defined by the clauses in
The specifiers are chosen so as to reflect the structure of the expression where 'f'
Figure 3.8. The operators in this figure are a subset of those defined in the Prolog
represents the operator and 'x' and 'y' represent arguments. An 'f' appearing between
standard, plus the operator not. As Figure 3.8 also shows, several operators can be
the arguments indicates that the operator is infix. The prefix and postfix specifiers
declared by one clause if they all have the same precedence and if they are all of the
have only one argument, which follows or precedes the operator respectively.
same type. In this case the operators' names are written as a list.
There is a difference between 'x' and 'y'. To explain this we need to introduce the
The use of operators can greatly improve the readability of programs. As an
notion of the precedence o(argumerzt. If an argument is enclosed in parentheses or it is
example let us assume that we are writing a program for manipulating Boolean
an unstructured object then its precedence is O; if an argument is a structure then its
expressions. In such a program we may want to state, for example, one of de
precedence is equal to the precedence of its principal functor. 'x' represents an
Morgan's equivalence theorems, which can in mathematics be written as:
argument whose precedence must be strictly lower than that of the operator. 'y'
represents an argument whose precedence is lower or equal to that of the operator. ~(A & B) <=> ~Av ~B
These rules help to disambiguate expressions with several operators of the same
precedence. For example, the expression One way to state this in Prolog is by the clause:
<===>
op( 1200, xfx, [:-,-->l ).
/
op( 1200, fx [:-, ?-] ).
op( 1100,xfy, ';' ).
op( 1050, xfy, -> ). �
/�
op( 1000, xfy, ',' ).
op( 900, fy, [not,'\+')).
op( 700, xfx, [=, \=,==, \==,=-- l ).
&
/�
op( 700, xfx, [is,=:=,=\=,<,=<,> ,>=,@<,@=<,@>,@>=)).
op( 500, yfx, [+ ,- ) ).
op( 400, yfx, [ ,, /, //, mod]). A B A ll
op( 200, xfx, ,,).
op( 200, xfy, A).
Figure 3.9 Interpretation of the term ~(A & B) <===>~Av~B.
op( 200, fy, - ).
• The type of an operator depends on two things: (1) the position of the operator
Figure 3.8 A set of predefined operators. with respect to the arguments, and (2) the precedence of the arguments
compared to the precedence of the operator itself. In a specifier like xfy, x
However, it is in general a good programming practice to try to retain as much indicates an argument whose precedence is strictly lower than that of the
resemblance as possible between the original problem notation and the notation operator; y indicates an argument whose precedence is less than or equal to
used in the program. In our example, this can be achieved almost completely by that of the operator.
using operators. A suitable set of operators for our purpose can be defined as:
op( 800, xfx,<===>). Exercises
op( 700, xfy, v).
op( 600, xfy, &). 3.12 Assuming the operator definitions
op( 500, fy, ~).
op( 300, xfx, plays).
Now the de Morgan's theorem can be written as the fact: op( 200, xfy, and).
~(A & B) <===>~Av ~B.
then the following two terms are syntactically legal objects:
According to our specification of operators above, this term is understood as shown Terml = jimmy plays football and squash
in Figure 3.9. Term2 = susan plays tennis and basketball and volleyball
To summarize:
How are these terms understood by Prolog? What are their principal functors and
• The readability of programs can be often improved by using the operator what is their structure?
notation. Operators can be infix, prefix or postfix.
3.13 Suggest an appropriate definition of operators ('was', 'of', 'the') to be able to write
• In principle, no operation on data is associated with an operator except in clauses like
special cases. Operator definitions do not define any action, they only introduce
new notation. Operators, as functors, only hold together components of diana was the secretary of the department.
structures. and then ask Prolog:
• A programmer can define his or her own operators. Each operator is defined by
?- Who was the secretary of the department.
its name, precedence and type.
'Who = diana
• The precedence is an integer within some range, usually between l and 1200.
The operator with the highest precedence in the expression is the principal ?- diana was What.
functor of the expression. Operators with lowest precedence bind strongest. What = the secretary of the department
80 Lists, Operators, Arithmetic Arithmetic 81
3.14 Consider the program: will be necessary. The following question is a naive attempt to request arithmetic
computation:
t( O+l, H-0).
t( X+-0+1, X-<l+O). ?- X=l + 2.
t( X+l+l, Z) :- Prolog will 'quietly' answer
t( X+l, XI),
t( Xl+l, Z). X=1+2
How will this program answer the following questions if'+' is an infix operator of and not X = 3 as we might possibly expect. The reason is simple: the expression
type yfx (as usual): 1 + 2 merely denotes a Prolog term where + is the functor and 1 and 2 are its
arguments. There is nothing in the above goal to force Prolog to actually activate
(a) ?- t( O+ 1, A). the addition operation. A special predefined operator, is, is provided to circumvent
(b) ?- t( O+l-;-1, B). this problem. The is operator will force evaluation. So the right wc1y to invoke
(c) ?- t( l�0-0 1-rl+l, C). arithmetic is:
(d) ?- t( D, l+l+l+O). ?- Xis 1 +2.
3.15 In the previous section, relations involving lists were written as: Now the answer will be:
member( Element, List), X =3
cone( List 1, List2, List3),
del( Element, List, Newl.ist), ... The addition here was carried out by a special procedure that is associated with the
operator is. We call such procedures built-in procedures.
Suppose that we would prefer to write these relations as:
Different implementations of Prolog may use somewhat different notations for
Element in List, arithmetics. For example, the '/' operator may denote integer division or real
concatenating List! and List2 gives List3, division. In this book, '/' denotes real division, the operator // denotes integer
deleting Element from List gives Newl.ist, ... division, and mod denotes the remainder. Accordingly, the question:
Define 'in', 'concatenating', 'and', etc. as operators to make this possible. Also, ?- X is 5/2,
redefine the corresponding procedures. Yis 5//2,
Z is 5 mod 2.
is answered by:
3.4 Arithmetic X 2.5
Y=2
Some of the predefined operators can be used for basic arithmetic operations. These Z =l
are:
The left argument of the is operator is a simple object. The right argument is an
+ addition arithmetic expression composed of arithmetic operators, numbers and variables.
subtraction Since the is operator will force the evaluation, all the variables in the expression
multiplication must already be instantiated to numbers at the time of execution of this goal. The
division precedence of the predefined arithmetic operators (see Figure 3.8) is such that the
** power associativity of arguments with operators is the same as normally in mathematics.
II integer division Parentheses can be used to indicate different associations. Note that +, -, *,/ and div
mod modulo, the remainder of integer division are defined as yfx, which means that evaluation is carried out from left to right. For
example,
Notice that this is an exceptional case in which an operator may in fact invoke
an operation. But even in such cases an additional indication to perform arithmetic Xis 5-2-l
, ,,;;;;_i�� ¥� =__,»
w
,,;;!;,"
is interpreted as: Let us further illustrate the use of arithmetic operations by two simple examples.
The first is computing the greatest common divisor; the second.. counting the items
Xis (5 - 2) - I
in a list.
Prolog implementations usually also provide standard functions such as sin(X), Given two positive integers, X and Y, their greatest common divisor, D, can be
cos(X), atan(X), log(X), exp(X), etc. These functions can appear to the right of found according to three cases:
operator is. (1) If X and Y are equal then D is equal to X.
Arithmetic is also involved when comparing numerical values. We can, for
(2) If X < Y then D is equal to the greatest common divisor of X and the difference
example, test whether the product of 277 and 37 is greater than 10000 by the goal:
y - X.
?- 277, 37 > 10000. (3) If Y < X then do the same as in case (2) with X and Y interchanged.
yes
It can be easily shown by an example that these three rules actually work. Choosing,
Note that, similarly to is, the'>' operator also forces the evaluation. for example, X == 20 and Y == 25, the above rules would give D == 5 after a sequence
Suppose that we have in the program a relation born that relates the names of of subtractions.
people with their birth years. Then we can retrieve the names of people born These rules can be formulated into a Prolog program by defining a three
between 1980 and 1990 inclusive with the following question: argument relation, say:
Notice the difference between the matching operator'=' and'==:=='; for example, in Of course, the last goal in the third clause could be equivalently replaced by the two
the goals X = Y and X =: = Y. The first goal will cause the matching of the objects X goals:
and Y, and will, if X and Y match, possibly instantiate some variables in X and Y. Xl is X - Y,
There will be no evaluation. On the other hand, X =: = Y causes the arithmetic gcd( Xl, Y, D)
evaluation and cannot cause any instantiation of variables. These differences are
Our next example involves counting, which usually requires some arithmetic. An
illustrated by the following examples:
example of such a task is to establish the length of a list; that is, we have to count the
?- 1 + 2 =: = 2 + 1. items in the list. Let us define the procedure:
yes length( List, N)
?- 1 + 2 = 2 + 1. which will count the elements in a list List and instantiate N to their number. As was
no the case with our previous relations involving lists, it is useful to consider two cases:
A=2 (2) If the list is not empty then List = [Head I Tail]; then its length is equal to 1 plus
B=l the length of the tail Tail.
3-1 Lrs:s, O�'s?r2:2:S, .4rithrnetic Arithmetic 85
These two cases correspond to the following program: ?- length!( (a,b,c], N), Length is N.
N = l+(l+(l+O))
kn:;th\ [ ], 0). Length= 3
k�th\ [_ I Tail], NJ
kngth( Tail, Nl), Finally we note that the predicate length is often provided as a built-in predicate.
\ is 1 - '.'il. To summarize:
\,1 J,•�1iication of length can be: • Built-in procedures can be used for doing arithmetic.
• Arithmetic operations have to be explicitly requested by the built-in procedure
> kn:;th( [a,b,[c,d],e], N).
is. There are built-in procedures associated with the predefmed operators +, -, *,
-f
\ C
/, div and mod.
\,•:,, :hJt in the second clause of length, the two goals of the body cannot be • At the time that evaluation is carried out, all arguments must be already
,,,.1c·:·eJ. The reason for this is that Nl has to be instantiated before the goal: instantiated to numbers.
'\ c, 1 - '\'] • The values of arithmetic expressions can be compared by operators such as<,=<,
etc. These operators force the evaluation of their arguments.
,·.,:1 ·:-,:,processed.With the built-in procedure is, a relation has been introduced that
:, •<>iti,•e to the order of processing and therefore the procedural considerations
:'.2,·,· :,,,:c1me vital.
l: :s interesting to see what happens if we try to program the length relation Exercises
"::''..'Ut the use of is. Such an attempt can be:
3.16 Define the relation
c(Cl:,:th 1 ( [ ], 0).
max( X, Y, Max)
k'.,:;th ll [_ I Tail], N)
1 en�h 1( Tail, Nl),
so that Max is the greater of two numbers X and Y.
'\' = 1-Nl.
3.17 Define the predicate
\,._---,., the goal
maxlist( List, Max)
:- kngthl( [a,b,[c,d],e], N).
so that Max is the greatest number in the list of numbers List.
-.,::: :1rL-.iuce the answer:
3.18 Define the predicate
'\ = 1-\1-(l+(l+0))).
sumlist( List, Sum)
:·::e 1J..iition was never explicitly forced and was therefore not carried out at all. But
so that Sum is the sum of a given list of numbers List.
::: :.,ugthl we can, unlike in length, swap the goals in the second clause:
:eugthl\ [_ I Tail], N)
3.19 Define the predicate
'\' = 1 - Nl, ordered( List)
kngthl( Tail, Nl).
which is true if List is &,1 ordered list of numbers. For example,
:"::::- wrsion of lengthl will produce the same result as the original version. It can
ordered( (1,5,6,6,9,12] ).
i...--..· o,: written shorter, as follows,
:engthl( [_ I Tail], l + N)
3.20 Define the predicate
lengthl( Tail, N). subsum( Set, Sum, SubSet)
:el producing the same result. We can, however, use lengthl to find the number of so that Set is a list of numbers, SubSet is a subset of these numbers, and the sum of
e:e'll.e1.1ts in a list as follows: the numbers in SubSet is Sum. For example:
86 Lists, Operators, Arithmetic Summary 87
?- subsum( [1,2,5,3,2), 5, Sub). • The operator notation allows the programmer to tailor the syntax of programs
Suh= [1,2,2]; toward particular needs. Using operators the readability of programs can be
greatly improved.
Sub= [2,3];
• New operators are defined by the directive op, stating the name of an operator,
Sub= (5];
its type and precedence.
• In principle, there is no operation associated with an operator; operators are
3.21 Define the procedure merely a syntactic device providing an alternative syntax for terms.
between( Nl, NZ, X) • Arithmetic is done by built-in procedures. Evaluation of an arithmetic expres
sion is forced by the procedure is and by the comparison predicates <, = <, etc.
which, for two given integers Nl and N2, generates through backtracking all the
integers X that satisfy the constraint Nl §; X §; N2.
• Concepts introduced in this chapter are:
list, head of list, tail of list
3.22 Define the operators 'if', 'then', 'else' and':=' so that the following becomes a legal list notation
term: operators, operator notation
if X > Y then Z := X else Z := Y infix, prefix and suffix operators
precedence of an operator
Choose the precedences so that 'if' will be the principal functor. Then define the
arithmetic built-in procedures
relation 'if' as a small interpreter for a kind of 'if-then-else' statement of the form
if Val1 > Val2 then Var:= Val3 else Var:= Val4
where Vall, Val2, Val3 and Val4 are numbers (or variables instantiated to numbers)
and Var is a variable. The meaning of the 'if' relation should be: if the value of Vall is
greater than the value of Val2 then Var is instantiated to Val3, otherwise to Val4.
Here is an example of the use of this interpreter:
?- X= 2, Y= 3,
Val2 is 2.,x,
Val4 is 4,X,
if Y > Val2 then Z := Y else Z := Val4,
if Z > 5 then W : = 1 else W := 0.
X= 2
y =3
Z =8
W= 1
Val2= 4
Val4 = 8
Summary
······· ·················-················································································································
• The list is a frequently used structure. It is either empty or consists of a head and
a tail which is a list as well. Prolog provides a special notation for lists.
• Common operations on lists, programmed in this chapter, are: list membership,
concatenation, adding an item, deleting an item, sublist.
Retrieving structured information from a database 89
chapter 4
family
pers,�J�.
Data structures, with matching, backtracking and arithmetic, are a powerful Our database would then be comprised of a sequence of facts like this describing all
programming tool. In this chapter we will develop the skill of using this tool families that are of interest to our program.
through programming examples: retrieving structured information from a database, Prolog is, in fact, a very suitable language for retrieving the desired information
simulating a non-deterministic automaton, travel planning, and eight queens on from such a database. One nice thing about Prolog is that we can refer to objects
the chessboard. We will also see how the principle of data abstraction can be carried without actually specifying all the components of these objects. We can merely
out in Prolog. The programming examples in this chapter can be read selectively. indicate the stnicture of objects that we are interested in, and leave the particular
components in the structures unspecified or only partially specified. Figure 4.2
shows some examples. So we can refer to all Armstrong families by:
38
90 Using Structures: Example Programs Retrieving str�ctured information from a database 91
/I�
1· �
sala ry(person( _,_,_, wo rks(_, S)), $). % Salary of working person
1·�
/.!.�
_ _
We can use these utilities, for example, in the following queries to the database:
1·� 11
• Find the names of all the people in the database:
?- child( X),
�!,.,
( c) farnil� dateotbirth( X ,date(_,_, 2000))-
-<r!ia� /�1·�
Find all employed wives:
>�
?- wife( person( Name , Surname,_, works(_,_)))-
Nsme •
• Find the names of unemployed people who were born before 1973:
Figure 4.2 Specifying objects by their structural properties: (a) any Armstrong family; (b) any ?- exists( Person),
family with exactly three children; (c) any family with at least three children. ctateotlJirth( Person , elate(_,_ , Yea r)),
Structure (c) makes provision for retrieving the wife's name through the Year< 1960 ,
instantiation of the variables Name and Surname. salary( Person , Sala ry),
Salary< 8000.
Let the length relation count the number of elements of a list, as defined in Section Let us now define some relations through which the user can access particular
3.4. Then we can specify all families that have an income per family member of less components of a family without knowing the details of Figure 4.l. Such relations
than 2000 by: can be called selectors as they select particular components. The name of such a
selector relation wiil be the name of the component to be selected. The relation will
?- family( Husband, Wife, Children),
total( [Husband, Wife I Children], Income), have two arguments: first, the object that contains the component, ancl second, the
length( [Husband, Wife I Chilc!ren], N), % :si: size of family component itself:
lncome/N < 2000.
selector_relation( Object, Component_selectec!)
As a result, the variables Person l, Person2 and Family are instantiated as:
Personl = person( tom, fox._,_)
a� / ,:x;
Person2 =person( jim, fox,_,_)
Family= family{ person( tom, fox,_,_),_, [_,person( jim, fox) I_ 1)
null
The use of selector relations also makes programs easier to modify. Imagine that
we would like to improve the efficiency of a program by changing the representation s4
�
b @, s�
,,,
of data. All we have to do is to change the defmitions of the selector relations, and
the rest of the program will work unchanged with the new representation.
Figure 4.3 An example of a non-deterministic finite automaton.
Exercise move is said to be silent because it occurs without any reading of input, and the
observer, viewing the automaton as a black box, will not be able to notice that any
4.3 Complete the definition of nthchild by defining the relation transition has occurred.
nth_member( N, List, X) The state s3 is double circled, which indicates that it is a final state. The
automaton is said to accept the input string if there is a transition path in the graph
which is true if X is the Nth member of List. such that
� ------ a
trans( sl, a, s2). string
trans( sl, b. sl).
trans( s2, b, s3).
trans( s3, b, 54). V null � V ' ' , ____ .,.. ;rry
silent( s2, s4). (b)
silent( s3, s1).
We will represent input strings as Prolog lists. So the string cwb will be represented
Figure 4.4 Accepting a string (a) by reading its first symbol X; (b) by making a silent move.
by [a. a,b]. Given the description of the automaton, the simulator will process a given
input string and decide whether the string is accepted or rejected. By definition, the
non-deterministic automaton accepts a given string if (starting from an initial state),
The program can be asked, for example, about the acceptance of the string aaab by:
after having read the whole input string, the automaton can (possibly) be in its final
state. The simulator is programmed as a binary relation, accepts, which defines the ?- accepts( sl, [a,a,a,b] ).
acceptance of a string from a given state. So
yes
accepts( State, String)
As we have already seen, Prolog programs are often able to solve more general
is true if the automaton, starting from the state State as initial state, accepts the problems than problems for which they were originally developed. In our case, we
string String. The accepts relation can be defined by three clauses. They correspond can also ask the simulator which state om automaton can be in initially so that it
to the following three cases: will accept the string ab:
( 1) The empty string, [ L is accepted from a state State if Sta te is a final state. ?- accepts( S, [a,b] ).
(2) A non-empty string is accepted from State if reading the first symbol in the S =sl;
string can bring the automaton into some state State 1, and the rest of the string
S = s3
is accepted from State 1. Figure 4.4(a) illustrates.
(3) A string is accepted from State if the automaton can make a silent move Amusingly, we can also ask: What are all the strings of length 3 that are accepted
from State to Sta tel and then accept the (whole) input string from Statel. from state s 1 ?
Figure 4.4(b) illustrates. ?- accepts( sl, [X l,X2,X3] ).
These rules can be translated into Prolog as: Xl = a
X2 =a
accepts( State, [ ] ) : % Accept empty string
X3 =b;
final( State).
accepts( State, [X I Rest] ) Xl =b
% Accept by reading first symbol
trans( State, X, Statel), X2 = a
accepts( State 1, Rest). X3 =b;
?- String=[_,_,_ J, accepts( sl, String)_ • I have to visit Milan, Ljubljana and Zurich, starting from London on Tuesday
String=[a,a,b]; and returning to London on Friday_ In what sequence should I visit these cities
String=[b,a,b]; so that I have no more than one flight each day of the tour?
no The program will be centred around a database holding the flight information_ This
will be represented as a three-argument relation:
We can make further experiments asking even more general questions, such as:
From what states will the automaton accept input strings of length 7? timetable( Placel, Place2, ListOfF!ights)
Further experimentation could involve modifications in the structure of the where ListOfFlights is a list of structured items of the form:
automaton by changing the relations final, trans and silent_ The automaton in Figure
4_3 does not contain any cyclic 'silent path' (a path that consists only of silent DepartureTime / Arriva!Time / FlightNumber / ListOfl)ays
moves)_ If in Figure 4_3 a new transition Here the operator '/' only holds together the components of the structure, and of
silent( sl, s3) course does not mean arithmetic division_ Listomays is either a list of weekdays or
the atom allclays_ One clause of the timetable relation can be, for example:
is added then a'silent cycle' is created_ But our simulator may now get into trouble_
timetable( lonclon, eclinburgh,
For example, the question
[ 9:40 / 10:50 / ba4 733 / allclays.
?- accepts( sl, [a])- 19:40 / 20:50 / ba4833 / [mo,tu,we,th,fr,su) 1 )-
would induce the simulator to cycle in state s1 indefinitely, all the time hoping to The times are represented as structured objects with two components, hours and
find some way to the final state_ minutes, combined by the operator':'_
The main problem is to find exact routes between two given cities on a given day
of the week_ This will be programmed as a four-argument relation:
Exercises route( Placel, Place2, Day, Route)
4_4 Here Route is a sequence of flights that satisfies the following criteria:
Why could cycling not occur in the simulation of the original automaton in
Figure 4-3, when there was no'silent cycle' in the transition graph? (1) the start point of the route is Placel;
4_5 Cycling in the execution of accepts can be prevented, for example, by counting the (2) the end point is Place2;
number of moves made so far. The simulator would then be requested to search only (3) all the flights are on the same day of the week, Day;
for paths of some limited length_ Modify the accepts relation this way_ Hint: Add a (4) all the flights in Route are in the timetable relation;
third argument: the maximum number of moves allowed:
(5 ) there is enough time for transfer between flights_
accepts( State, String, MaxMoves)
The route is represented as a list of struchued objects of the form:
From / To / FlightNumber I Departure_time
4_4
!:.?.�-�_\ _
�_9_��-�--------·------------------------------------··-------------------·-------------------·---------------------------
We will also use the following auxiliary predicates:
• What days of the week is there a direct evening flight from Ljubljana to London?
4.5 The eight queens problem
?- flight( ljubljana, london, Day,_, DeptHour: _, _), DeptHour >= 18. The problem here is to place eight queens on the empty chessboard in such a way
that no queen attacks any other queen. The solution will be programmed as a unary
Day= mo;
Day= we; predicate
solution( Pos)
• How crn I get from Ljubljana to Edinburgh on Thursday? which is true if and only if Pos represents a position with eight queens that do not
attack each other. It will be interesting to compare various ideas for programming
?- route( ljubljana, edinburgh, th, R).
this problem. Therefore we will present three programs based on somewhat different
R = [ ljubljana / zurich / jp322 / 11:30, zurich / london / sr806 / 16:10, representations of the problem.
london / edinburgh / ba4822 / 18:40]
• How can I visit Milan, Ljubljana and Zurich, starting from London on Tuesday
and returning to London on Friday, with no more than one flight each day of 4.5.1 Program 1
the tour? This question is somewhat trickier. It can be formulated by using the
permutation relation, programmed in Chapter 3. We are asking for a permuta First we have to choose a representation of the board position. One natural choice is
tion of the cities Milan, Ljubljana and Zurich such that the corresponding t1ights to represent the position by a list of eight items, each of them corresponding to one
are possible on successive days: queen. Each item in the list will specify a square of the board on which the
104 Using Structures: Example Programs The eight queens problem 105
8 •
7 • Case 2 The list of queens is non-empty: then it looks like this:
6I • [ X/Y I Others]
' •
4 • In case 2, the first queen is at some square X/Y and the other queens are at squares
• specified by the list Others. If this is to be a solution then the following conditions
• must hold:
• (1) There must be no attack between the queens in the list Others; that is, Others
123 -15678 itself must also be a solution.
Figure 4.6 A solution to the eight queens problem This position can be specified by the list (2) X and Y must be integers between 1 and 8.
[1/4, 2/2, 3/7, 4/3, 5/6, 6/8, 7/5, 8/1]. (3) A queen at square X/Y must not attack any of the queens in the list Others.
To program the first condition we can simply use the solution relation itself. The
corresponding queen is sitting. Further, each square can be specified by a pair of second condition can be specified as follows: Y will have to be a member of the list of
coordinates (X and Y) on the board, where each coordinate is an integer between integers between 1 and 8 - that is, [1,2,3,4,5,6,7,8]. On the other hand, we do not
1 and 8. In the program we can write such a pair as: have to worry about X since the solution list will have to miltch the template in
X/Y which the X-coordinates are already specified. So X will be guaranteed to have a
proper value between 1 and 8. We can implement the third condition as another
where, of course, the '/' operator is not meant to indicate division, but simply relation, noattack. All this can then be written in Prolog as follows:
combines both coordinates together into a square. Figure 4.6 shows one solution of
the eight queens problem and its list representation. solution( [X/Y I Others]) :
solution( Others),
Having chosen this representation, the problem is to find such a list of the form:
member( Y, [1,2,3,4,5,6,7,81 ),
[ X 1/Yl, X2/Y2, X3/Y3, ... , X8/Y8] noattack( X/Y, Others).
which satisfies the no-attack requirement. Our procedure solution will have to search It now remains to define the noattack relation:
for a proper instantiation of the variables Xl, Yl, xz, Y2, ..., XS, Y8. As we know that noattack( Q , Qlist)
all the queens will have to be in different columns to prevent vertical attacks, we can
Again, this can be broken down into two cases:
immediately constrain the choice and so make the search task easier. We can thus
fix the X-coordinates so that the solution list will fit the following, more specific (1) If the list Qlist is empty then the relation is certainly true because there is no
template: queen to be attacked.
[ 1/Yl, 2/YZ, 3/Y3, ... , 8/Y8 J (2) If Qlist is not empty then it has the form [ Q1 I Qlistl I and two conditions must
be satisfied:
We are interested in the solution on a board of size 8 by 8. However, in (a) the queen at Q must not attack the queen at Ql, and
programming, the key to the solution is often in considering a more general
(b) the queen at Q must not attack any of the queens in Qlistl.
problem. Paradoxically, it is often the case that the solution for the more general
problem is easier to formulate than that for the more specific, original problem. The To specify that a queen at some square does not attack another square is easy: the
original problem is then simply solved as a special case of the more general problem. two squares must not be in the same row, the same column or the same diagonal.
The creative part of the problem is to find the correct generalization of the Our solution template guarantees that all the queens are in different columns, so it
original problem. In our case, a good idea is to generalize the number of queens (the only remains to specify explicitly that:
number of columns in the list) from 8 to any number, including zero. The solution
• the Y-coordinates of the queens are different, and
relation can then be formulated by considering two cases:
• they are not in the same diagonal, either upward or downward; that is, the
Case 1 The list of queens is empty: the empty list is certainly a solution because distance between the squares in the X-direction must not be equal to that in
there is no attack. the Y-direction.
,.. ·." ...., ,.,,, vW�W.�ritf"tvt.:,�ti't.1..\.�.fr :(�· ,:;0/
' . .:::t::,'.Y\ �.:::<:0 }/"J;·>;'J:·:
106 Using Structures: Example Programs The eight queens problem 707
Figure 4.7 shows the complete program. To alleviate its use a template list has been (1) S is the empty list: this is certainly safe as there is nothing to be attacked.
added. This list can be retrieved in a question for generating solutions. So we can (2) S is a non-empty list of the form [Queen I Others]. This is safe if the list Others is
now ask: safe, and Queen does not attack any queen in the list Others.
?- template(S), solution( S). In Prolog, this is:
and the program will generate solutions as follows: safe([ ] ).
S = [ 1/4, 2/2, 3/7, 4/3, 5/6, 6/8, 7/5, 8/1]; safe([Queen I Others])
safe(Others),
S = [ 1/5, 2/2, 3/4, 4/7, 5/3, 6/8, 7/6, 8/1]; noattack(Queen, Others).
S = [ 1/3, 2/5, 3/2, 4/8, 5/6, 6/4, 7/7, 8/1]; The noattack relation here is slightly trickier. The difficulty is that the queens'
positions are only defined by their Y-coordinates, and the X-coordinates are not
explicitly present. This problem can be circumvented by a small generalization of
the noattack relation, as illustrated in Figure 4.8. The goal
Exercise noattack(Queen,Others)
4.6 is meant to ensure that Queen does not attack Others when the X-distance between
When searching for a solution, the program of Figure 4.7 explores alternative values
Queen and Others is equal to 1. What is needed is the generalization of the
for the Y-coordinates of the queens. At which place in the program is the order of
X-distance between Queen and Others. So we add this distance as the third argument
alternatives defined? How can we easily modify the program to change the order?
of the noattack relation:
Experiment with different orders with the view of studying the time efficiency of the
program. noattack(Queen, Others, Xdist)
iC8 Using Structures: Example Programs The eight queens problem 109
(a)
I .
-I-
\
(b)
I
I• /c, solution( Queens) if Queens is a list of ¥-coordinates of eight non-attacking queens
0
solution( Queens) :-
I • I I •I permutation( (1,2,3,4,5,6,7, 8], Queens),
•
'--._
.
-'--
Queen
permutation([], [J ).
I • )
(-/ permutation([Head I Tail], PermList)
permutation( Tail, PermTail),
H I I
de!( Head, PermList, PermTail). % Insert Head in permuted Tail
X-dist = l X-dist = 3
% de!( Item, List, NewList): deleting Item from List gives NewList
de!( Item,[Item I List], List).
Figure 4.8 (a) X-distance between Queen and Others is 1. (b) X-distance between Queen
and Others is 3. de!( Item,[First I List],[First I Listl] )
de!( Item, List, Listl).
% safe( Queens) if Queens is a list of ¥-coordinates of non-attacking queens
Accordingly, the noattack goal in the safe relation has to be modified to
safe([] ).
noattack( Queen, Others, 1) safe( (Queen I Others] )
safe( Others),
The noattack relation can now be formulated according to two cases, depending on noattack( Queen, Others, 1).
the list Others: if Others is empty then there is no target and certainly no attack; if
noattack( _,[], _).
Others is non-empty then Queen must not attack the first queen in Others (which is
Xdist columns from Queen) and also the tail of Others at Xdist + 1. This leads to the noattack( Y,[Yl I Y list], Xdist)
Yl - Y =\= Xdist,
program shown in Figure 4.9.
Y Yl=\=Xdist,
Distl is Xdist + 1,
noattack( Y, Y list, Distl).
4.5.3 Program 3
Figure 4.9 Program 2 for the eight queens problem.
Our third program for the eight queens problem will be based on the following
reasoning. Each queen has to be placed on some square; that is, into some column,
some row, some upward diagonal and some downward diagonal. To make sure that The domains for all four dimensions are:
all the queens are safe, each queen must be placed in a different column, a different
row, a different upward and a different downward diagonal. It is thus natural to Dx= (1,2,3,4,5,6,7,8]
consider a richer representation with four coordinates: Dy= [1, 2, 3,4, 5,6,7,8]
x columns Du=[-7,-6,-5,-4,-3,-2,-l,0,l,2,3,4,5,6,7]
y rows Dv = [2,3,4,5,6,7,8,9,10,1 1, 1 2,1 3, 14,1 5,16]
u upward diagonals
v downward diagonals The eight queens problem can now be stated as follows: select eight 4-tuples
(X,Y,U,V) from the domains (X from Dx, Y from Dy, etc.), never using the same
The coordinates are not independent: given x and y, u and v are determined (Figure element twice from any of the domains. Of course, once X and Y are chosen, U and
4.10 illustrates). For example, as: V are determined. The solution can then be, roughly speaking, as follows: given all
four domains, select the position of the first queen, delete the corresponding items
ll=X-y from the four domains, and then use the rest of the domains for placing the rest
V=x+y of the queens. A program based on this idea is shown in Figure 4.11. The board
11 = .r- j' position is, again, represented by a list of Y-coordinates. The key relation in this
-7 -2 +7 program is
sol( Ylist, Ox,Dy,Di1, Dv)
which instantiates the Y-coordinates (in Ylist) of the queens, assuming that they are
placed in consecutive columns taken from Dx. All Y-coordinates and the corres
ponding U and V-coordinates are taken from the lists Dy, Du and Dv. The top
�l I I J;f I I I I
procedure, solution, can be invoked by the question:
?- solution( S).
This will cause the invocation of sol with the complete domains that correspond to
the problem space of eight queens.
7 The sol procedure is general in the sense that it can be used for solving the
-k I I 'i, I I I l'
N-queens problem (on a chessboard of size N by N). It is only necessary to properly
2. 3 4 5 '6. 7 8 X set up the domains Dx, Dy, etc.
It is practical to mechanize the generation of the domains. For that we need a
procedure
2 6 16
gen( N1, N2, List)
v=x+y
which will, for two given integers Nl and N2, produce the list:
Figure 4.10 The relation between columns, rows, upward and downward diagonals. The List= [ Nl, Nl + 1, Nl + 2, .. ., NZ - 1, NZ]
indicated square has coordinates: x = 2, y = 4, u = 2 - 4 = -2, v = 2 + 4 = 6.
Such a procedure is:
'1/c, solution( Ylist) if Ylist is a list of Y-coordinates of eight non-attacking queens gen( N, N, (N] ).
(a) Define the relation jump( Square 1, Square2) according to the knight jump on the
chessboard. Assume that Squarel is always instantiated to a square while Square2
can be uninstantiated. For example:
S = 3/2;
S = 2/3;
no
(b) Define the relation knightpath( Path) where Path is a list of squares that represent
a legal path of a knight on the empty chessboard.
(c) Using this knightpath relation, write a question to find any knight's path of
length 4 moves from square 2/1 to the opposite edge of the board (Y = 8) that
goes through square 5/4 after the second move.
Preventing backtracking 115
y
chapter 5
4
3 6 X
5.1 Preventing backtracking 114
Figure 5.1 A double-step function.
5.2 Examples using cut 119
5.3 Negation as failure 124 This can be written in Prolog as a binary relation:
f( X, Y)
5.4 Problems with cut and negation 127
as follows:
f( X, 0) :- X<3. /r, Rule 1
0
114
116 Controlling Backtracking Preventing backtracking 117
5.1 .2 Experiment 2
f(l,Y)
2< y
Let us now perform a second experiment with the second version of our program.
I; Suppose we ask:
rule\ ruk3 ?- f( 7, Y).
.- I � �: I
r =V // Y= Y=�
Y=4
1<3 3;;,I
1<6 Let us analyze what has happened. All three rules were tried before the answer was
-
2<0 2<2 obtained. This produced the following sequence of goals:
,
no
Try rule 1: 7 < 3 fails, backtrack and try rule 2 (cut was not reached)
CUT Try rnle 2: 3;;; 7 succeeds, but then 7 < 6 fails, backtrack and try rule 3 (cut was not
reached)
2<0 Try rule 3: 6 ;;; 7 succeeds
This trace reveals another source of inefficiency. First it is established that X < 3 is
no not true (7 < 3 fails). The next goal is 3 =< X ( 3 = < 7 succeeds). But we know that
Figure 5.2 .!'\t the point marked 'CUT' we already know that the rules 2 and 3 are bound to once the first test has failed the second test is bound to succeed as it is the negation
fail. of the first. Therefore the second test is redundant and the corresponding goal can
be omitted. The same is true about the goal 6 =< X in rule 3. This leads to the
not to backtrack. We can do this by using the cut mechanism. The 'cut' is written as! following, more economical formulation of the three rules:
and is inserted between goals as a kind of pseudo-goal. Our program, rewritten with if X < 3 then Y = 0,
cuts, is: otherwise if X < 6 then Y = 2,
f( X, 0) X < 3, 1• otherwise Y = 4.
f( X, 2) 3 =< X, X < 6, !. We can now omit the conditions in the program that are guaranteed to be true
f( X, 4) 6 =< X. whenever they are executed. This leads to the third version of the program:
The ! symbol will now prevent backtracking at the points at which it appears in the f( X, 0) X < 3, !.
program. If we now ask: f( X, 2) X < 6, !.
?- f( l, Y), 2 < Y. f( X, 4).
Prolog will produce the same left-hand branch as in Figure 5.2. This branch will fail This program produces the same results as our original version, but is more efficient
at the goal 2 < 0. Now Prolog will try to backtrack, but not beyond the point marked! than both previous versions. But what happens if we now remove the cuts? The
in the program. The alternative branches that correspond to 'rule 2' and 'rule 3' will program becomes:
not be generated.
f( X, 0) X < 3.
The new program, equipped with cuts, is in general more efficient than the
original version without cuts. When the execution fails, the new program will in f( X, 2) X < 6.
general recognize this sooner than the original program. f( X, 4).
To conclude, we have improved the efficiency by adding cuts. If the cuts are now This may produce multiple solutions, some of which are not correct. For
removed in this example, the program will still produce the same result; it will
example:
perhaps only spend more time. In our case, by introducing the cut we only changed
the procedural meaning of the program; that is, the results of the program were not ?- f( 1, Y).
affected_ We will see later that using a cut may affect the results as well. y =0 ;
118 Controlling Backtracking Examples using cut 119
y= 2;
Y= 4; ,/�
B�C-< -- D
----.. --
no
It is important to notice that, in contrast to the second version of the program, this
time the cuts do not only affect the procedural behaviour, but also change the
/
p -'-+-
-<-- Q- -
I.
�-...:-
- T -U
-<--R-► S -<- -<--
�
_____ _
-- -
--
-- - - -
- -
-- - V
results of the program.
A more precise meaning of the cut mechanism is as follows:
Figure 5.3 The effect of the cut on the execution. Starting with A, the solid arrows indicate
the sequence of calls; the dashed arrows indicate backtracking. There is 'one way
Let us call the 'parent goal' the goal that matched the head of the clause traffic' between R and s.
containing the cut. When the cut is encountered as a goal it succeeds immedi
ately, but it commits the system to all choices made between the time the 'parent Therefore the cut will only affect the execution of the goal C. On the other hand, it
goal' was invoked and the time the cut was encountered. All the remaining will be 'invisible' from goal A. So automatic backtracking within the goal list B, C, D
alternatives between the parent goal and the cut are discarded. will remain active regardless of the cut within the clause used for satisfying C.
1-1 :- Bl, B2, ... , Bm, !, ..., Bn. 5.2 Examples using cut
··········································································································································
Let us assume that this clause was invoked by a goal G that matched H. Then G is the
parent goal. At the moment that the cut is encountered, the system has already
found some solution of the goals Bl, ..., Brn. When the cut is executed, this
5.2.1 Computing maximum
(current) solution of HI, ..., Bm becomes frozen and all possible remaining altern
The procedure for finding the larger of two numbers can be programmed as a
atives are discarded. Also, the goal G now becomes committed to this clause: any
relation
attempt to match G with the head of some other clause is precluded.
Let us apply these rules to the following example: max( X, Y, Max)
C P, Q, R, !, S, T, U. where Max= Xif Xis greater than or equal to Y, and Max is Y if Xis less than Y. This
C V. corresponds to the following two clauses:
A B, C, D. max( X, Y, X) X >= Y.
?- A. max( X, Y, Y) X <Y.
Here A, B, C, D, P, etc. have the syntax of terms. The cut will affect the execution of These two rules are mutually exclusive. If the first one succeeds then the second one
the goal C as illustrated by Figure 5.3. Backtracking will be possible within the goal will fail. If the first one fails then the second must succeed. Therefore a more
list P, Q, R; however, as soon as the cut is reached, all alternative solutions of the economical formulation, with 'otherwise', is possible:
goal list P, Q, R are suppressed. The alternative clause about C,
If X ?c Y then Max = X,
C :- V.
otherwise Max = Y.
will also be discarded. However, backtracking will still be possible within the goal
This is written in Prolog using a cut as:
list S, T, U. The 'parent goal' of the clause containing the cut is the goal C in the
clause: max( X, Y, X) :- X > = Y, L
A B, C, D_ max( X, Y, Y).
120 Controlling Backtrack"ing Examples using cut 121
It should be noted that the use of this procedure requires care. [t is safe if in the goal where X is the item to be added, L is the list to which Xis to be added and Ll is the
max(X, Y,Max) the argument Max is not instantiated. The following example of resulting new list. Our rule for adding can be formulated as:
incorrect use illustrates the problem:
If X is a member of list L then Ll = L,
?- max( 3, 1, 1). otherwise L1 is equal to L with X inserted.
yes It is easiest to insert X in front of L so that X becomes the head of Ll. This is then
The following reformulation of max overcomes this limitation: programmed as follows:
max( X, Y, Max) :- add( X, L, L) :- member( X, L), !.
X >= Y, !, Max= X add{ X, L, [XI L] ).
Max= Y. The behaviour of this procedure is illustrated by the following example:
?- add( a, [b,c], L).
5.2.2 Single-solution membership I.= [a,b,c]
?- add( X, (b,c], L).
We have been using the relation
L = (b,c]
member( X, I.) X=b
for establishing whether X is in list L. The program was: ?- add{ a, [b,c,X], L).
member( X, [XI L] ). L = [b,c,a]
X=a
member( X, [YI L] ) :- member( X, L).
Similar to the foregoing example with max, add( X, Ll, L2) is intended to be called
This is non-deterministic: if X occurs several times then any occurrence can be
with L2 uninstantiated. Otherwise the result may be unexpected: for example
found. Let us now change member into a deterministic procedure which will find
add( a, [a], [a,a] ) succeeds.
only the first occurrence. The change is simple: we only have to prevent back
This example is instructive because we cannot easily program the 'non-duplicate
tracking as soon as X is found, which happens when the first clause succeeds. The
add' without the use of cut or another construct derived from the cut. If we omit the
modified program is:
cut in the foregoing program then the add relation will also add duplicate items. For
member( X, [X I L] ) !. example:
member( X, [YI L] ) member( X, L). ?- add( a, [a,b,c], L).
This program will generate just one solution. For example: L= [a,b,c];
?- member( X, [a,b,c] ). L= [a,a,b,c]
X = a; So the cut is necessary here to specify the intended relation, and not only to improve
no efficiency. The next example also illustrates this point.
5.2.3 Adding an element to a list without duplication 5.2.4 Classification into categories
Often we want to add an item Xto a list L so that Xis added only if Xis not yet in L. Assume we have a database of results of tennis games played by members of a
If X is already in L then L remains the same because we do not want to have club. The pairings were not arranged in any systematic way, so each player just
redundant duplicates in L. The add relation has three arguments: played some other players. The results are in the program represented as facts
add{ X, L, Ll) like:
Examples using cut 123
122 Controlling Backtracking
X is a fighter if Exercises
there is some Y such that X beat Y and
there is some Z such that Z beat X. 5.1 Let a program be:
p( 1).
Now a rule for a winner:
p( 2) :- !.
X is a winner if
p( 3).
X beat some Y and
X was not beaten by anybody. Write all Prolog's answers to the following questions:
This formulation contains 'not' which cannot be directly expressed with our present (a) ?- p( X).
Prolog facilities. So the formulation of winner appears trickier. The same problem (bJ ?- p( X), p( Y).
occurs with sportsman. The problem can be circumvented by combining the (c) ?- p( X), !, p( Y).
definition of winner with that of fighter, and using the 'otherwise' connective. Such
zero and
a formulation is: 5.2 The following relation classifies numbers into three classes: positive,
negative:
If X beat somebody and X was beaten by somebody
class( Number, positive) Number> 0.
then X is a fighter,
otherwise if X beat somebody class( 0, zero).
then X is a winner, class( Number, negative) Number< 0.
otherwise if X got beaten by somebody Define this procedure in a more efficient way using cuts.
then X is a sportsman.
This formulation can be readily translated into Prolog. The mutual exclusion of the
5.3 Define the procedure
three alternative categories is indicated by the cuts: split( Numbers, Positives, Negatives)
class( X, fighter) which splits a list of numbers into two lists: positive ones (including z.ero) and
beat( X, _), negative ones. For example:
beat(_, X), !.
split( [3,-1,0,5,-2], [3,0, 5], [-1,-2])
class( X, winner)
beat( X, _), !. Propose two versions: one with a cut and one without.
124 Controlling Backtracking Negation as failure 125
5.3 Negation as failure We again use the cut and fail combination:
······················· ····································· .. ................ ............ . ......................... ··············· different( X, X) :- !, fail.
'Mary likes all animals but snakes'. How can we say this in Prolog? ft is easy to different( X, Y).
express one part of this statement: Mary likes any X if X is an animal. This is in
Prolog: This can also be written as one clause:
different( X, Y)
likes( mary, X) :- animal( X).
X = Y, !, fail
But we have to exclude snakes. This can be done by using a different formulation:
true.
If X is a snake then 'Mary likes X' is not true, true is a goal that always succeeds.
otherwise if X is an animal then Mary likes X. These examples indicate that it would be useful to have a unary predicate 'not'
That something is not true can be said in Prolog by using a special goal, fail, which such that
always fails, thus forcing the parent goal to fail. The abow formulation is translated not( Goal)
into f'rolog, using fail, as follows:
is true if Goal is not true. We will now define the not relation as follows:
likes( mary, X) :
s,1ake( X), 1, fail. If Goal succeeds then not( Goal) fails,
otherwise not( Goal) succeeds.
likes( mary, X)
animal( X). This definition can be written in Prolog as:
The frrst rule here will take care of snakes: if X is a snake then the cut will prevent not( P ) :-
P, 1, fail
backtracking (thus excluding the second rule) and fail will cause the failure. These
two clauses can be written more compactly as one clause: true.
likes( nrnry, X) : Henceforth, we will assume that not is a built-in Prolog procedure that behaves as
snake( X), !, fail defined here. We will also assume that not is defined as a prefix operator, so that we
animal( X). can also write the goal
not( snake(X) )
We can use the same idea to define the relation
as:
different( X, Y)
not snake( X)
which is true if X and Y are different. We have to be more precise, however, because
Some Prolog implementations, in fact, support this notation. If not, then we can
'different' can be understood in several ways:
always defme not ourselves. Alternatively, not Goal is written as \+ Goal. This more
• X and Y are not literally the same; mysterious notation is also recommended in the Prolog standard for the following
• reason. not defined as failure, as here, does not exactly correspond to negation in
X and Y do not match;
mathematical logic. This difference can cause unexpected behaviour if not is used
• the values of arithmetic expressions X and Y are not equal. without care. This will be discussed later in the chapter.
Nevertheless, not is a useful facility and can often be used advantageously in place
Let us choose here that X and Y are different if they do not match. The key to saying
of cut. Our two examples can be rewritten with not as:
this in Prolog is:
likes( mary, X) :-
If X and Y match then different( X, Y) fails, animal( X),
otherwise different( X, Y) succeeds. not snake( X).
126 Controlling Backtracking Problems with cut and negation 127
Figure 5.4 Another eight queens program. ?- unifiable( [X, b, t(Y)], t(a), List).
List = [X, t(Y)]
different( X,Y) Note that X and Y have to remain uninstantiated although the matching with t( a)
not( X =Y). does cause their instantiation. Hint: Use not( Terml = Tenn2). If Terml = Term2
succeeds then not( Terml = Term2) fails and the resulting instantiation is undone!
This certainly looks better than our original formulations. It is more natural and is
easier to read.
Our tennis classification program of the previous section can also be rewritten,
using not, in a way that is closer to the initial definition of the three categories: 5.4
class( X, fighter)
beat( X, _), Using the cut facility we get something, but not for nothing. The advantages and
beat(_, X). disadvantages of using cut were illustrated by examples in the previous sections. Let
us summarize, first the advantages:
class( X, winner)
beat( X, _), With cut we can often improve the efficiency of the program. The idea is to
(1)
not beat( _, X).
explicitly tell Prolog: do not try other alternatives because they are bound
class( X, sportsman) to fail.
beat(_, X},
not beat( X, _). (2) Using cut we can specify mutually exclusive rules; so we can express rules of
the form:
As another example of the use of not let us reconsider program 1 for the eight
queens problem of the previous chapter (Figure 4.7). We specified the no_attack if condition P then conclusion Q,
relation between a queen and other queens. This relation can be formulated also otherwise conclusion R
as the negation of the attack relation. Figure 5.-l shows a program modified In this way, cut enhances the expressive power of the language.
accordingly.
The reservations against the use of cut stem from the fact that we can lose the
valuable correspondence between the declarative and procedural meaning of
Exercises programs. If there is no cut in the program we can change the order of clauses and
goals, and this will only affect the efficiency or termination of the program, not
5.4 Given two lists, Candidates and RuledOut, write a sequence of goals (using member the declarative meaning. On the other hand, in programs with cuts, a change in the
and not) that will through backtracking find all the items in Candidates that are not order of clauses may affect the declarative meaning. This means that we can get
in RuledOut. different results. The following example illustrates:
123 Controiling Backtracking Problems with cut and negation 129
The problem with uninstantiated negated goals arises from unfortunate change • not defined through failure does not exactly correspond to negation in
of the quantification of variables in negation as failure. In the usual interpretation in mathematical logic. Therefore, the use of not also requires special care.
Prolog, the question:
?- expensive( X).
References
means: Does there exist X such that expensive( X) is true? If yes, what is X? So X is
existentially quantified. Accordingly Prolog answers X = jeanluis. But the question: The distinction between 'green cuts' and 'red cuts' was proposed by van Emden (1982). Le
?- not expensive( X). (1993) proposes a different negation for Prolog which is mathematically advantageous, but
computationally more expensive.
is not interpreted as: Does there exist X such that not expensive( X)? The expected
answer would be X = francesco. But Prolog answers 'no' because negation as failure Le, T.V. (1993) Tec/111iques o(Prolog Progranuning. John Wiley & Sons.
changes the quantification to universal. The question not expensive( X) is interpreted van Emden, M. ( 1982) Reel and green cuts. Logic Programming Newsletter: 2.
as:
not( exists X such that expensive( X))
This is equivalent to:
For all X: not expensive( X)
We have discussed problems with cut, which also indirectly occur in not, in
detail. The intention has been to warn users about the necessary care, not to
definitely discourage the use of cut. Cut is useful and often necessary. Ancl after all,
the kind of complications that are incurred by cut in Prolog commonly occur when
programming in other languages as well.
Summary
• The cut facility prevents backtracking. [t is used both to improve the efficiency
of programs and to enhance the expressive power of the language.
• Efficiency is improved by explicitly telling Prolog (with cut) not to explore
alternatives that we know are bound to fail.
• Cut makes it possible to formulate mutually exclusive conclusions through rules
of the form:
i( Condition then Conclusionl otherwise Conclusion2
• Cut makes it possible to introduce negation as (ailure: not Goal is defined through
the failure of Goal.
• Two special goals are sometimes useful: true always succeeds, fail always fails.
• There are also some reservations against cut: inserting a cut may destroy the
correspondence between the declarative and procedural meaning of a program.
Therefore, it is part of good programming style to use cut with care and not to
use it without reason.
Communication with files 133
chapter 6 User
terminal
· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ··
Output
6.1 Communication with files 132 �pW { fikl file 3 }
streams streams
file2 file 4
6.2 Processing files of terms 135
6.3 Manipulating characters 140 Figure 6.1 Communication between a Prolog program and several files .
input/output request of this type would transfer a whole term from the current
see( file!),
read_from_file( Information), input stream or to the current output stream respectively. Predicates for transfer of
see( user), terms are read and write. Of course, in this case, the information in the file has to be
in a form that is consistent with the syntax of terms.
What kind of file organization is chosen will, of course, depend on the problem.
The current output stream can be changed by a goal of the form: Whenever the problem speciflcation will allow the information to be naturally
tell( Filename) squeezed into the syntax of terms, we will prefer to use a file of terms. It will then be
possible to transfer a whole meaningful piece of information with a single request.
A sequence of goals to output some information to file3, and then redirect On the other hand, there are problems whose nature dictates some other organiza
succeeding output back to the terminal, is: tion of files. An example is the processing of natural language s1entences, say, to
generate a dialogue in English between the system and the user. In such cases, files
tell( file3), will have to be viewed as sequences of characters that cannot be parsed into terms.
write_on_file( Information),
tell( user),
6.2 Processing fil��..?�.��.rri:1�........................................... ...........................................
The goal
seen
closes the current input file. The goal
6.2.1 read and write
told The built-in predicate read is used for reading terms from the current input stream.
The goal
closes the current output file.
We will assume here that files can only be processed sequentially although many read( X)
Prolog implementations also handle files with random access. Sequential files
will cause the next term, T, to be read, and this term will be matched with X. lf Xis a
behave in the same way as the terminal. Each request to read something from an
variable then, as a result, X will become instantiated to T. If matching does not
input file will cause reading at the current position in the current input stream. After
succeed then the goal read( X) fails. The predicate reacl is deterministic, so in the case
the reading, the current position will be, of course, mo\·ed to the next unreaci item.
of failure there will be no backtracking to input another term. Each term in the
So the next request for reading will start reading at this new current position. If a
input file must be followed by a full stop and a space or carriage-return.
request for reading is made at the end of a file, then the information returned by
If read( X) is executed when the end of the current input file has been reached
such a request is the atom end_of_file.
then X will become instantiated to the atom end_of_file.
Writing is similar; each request to output information will append this informa
The built-in predicate write outputs a term. So the goal
tion at the end of the current output stream. It is not possible to move backward and
to overwrite part of the file. write( X)
We will here only consider 'text-files' - that is, files of characters. Characters are will output the term X on the current output file. X will be output in the same
letters, digits and special characters. Some of them are said to be non-printable standard syntactic form in which Prolog normally displays values of variables. A
because when they are output on the terminal they do not appear on the screen. useful feature of Prolog is that the write procedure 'knows' to display any term no
They may, however, have other effects, such as spacing between columns and lines. matter how complicated it may be.
There are two main ways in which files can be viewed in Prolog, depending on Typically, there are additional built-in predicates for formatting the output. They
the form of information. One way is to consider the character as the basic element insert spaces and new lines into the output stream. The goal
of the file. Accordingly, one input or output request will cause a single character to
be read or written. We assume the built-in predicates for this are get, getO and put. tab( N)
The other way of viewing a file is to consider bigger units of information as basic causes N spaces to be output. The predicate nl (which has no arguments) causes the
building blocks of the file. Such a natu,al bigger unit is the Prolog term. So each start of a new line at output.
136 Input and Output Processing files of terms 137
The following examples will illustrate the use of these procedures. The numbers 2, 5 and 12 were typed in by the user on the terminal; the other
Let us assume that we have a procedure that computes the cube of a number: numbers were output by the program. Note that each number entered by the user
had to be followed by a full stop, which signals the end of a term.
cube( N, C) :- It may appear that the above cube procedure could be simplified. However, the
C is N, N * N. following attempt to simplify is not correct:
Suppose we want to use this for calculating the cubes of a sequence of numbers. We cube :-
could do this by a sequence of questions: read( stop), !.
?- cube( 2, X). cube :
read( N),
X=8 C is N, N" N,
write( C),
?- cube( 5, Y). cube.
Y = 125
The reason why this is wrong can be seen easily if we trace the program with input
?- cube( 12, Z). data 5, say. The goal read( stop) will fail when the number is read, and this number
Z = 1728 will be lost for ever. The next read goal will input the next term. On the other hand,
it could happen that the stop signal is read by the goal read( N), which would then
For each number, we had to type in the corresponding goal. Let us now modify this cause a request to multiply non-numeric data.
program so that the cube procedure will read the data itself. Now the program will The cube procedure conducts interaction between the user and the program. In
keep reading data and outputting their cubes until the atom stop is read: such cases it is usually desirable that the program, before reading new data from the
terminal, signals to the user that it is ready to accept the information, and perhaps
cube :- also says what kind of information it is expecting. This is usually done by sending a
read( X),
process( X). 'prompt' signal to the user before reading. Our cube procedure would be accordingly
modified, for example, as follows:
process( stop) !.
cube :-
process( N) :- write( 'Next item, please: '),
C is N • N * N, read( X),
write( C), process( X).
cube.
process( stop) !.
This is an example of a program whose declarative meaning is awkward to
process( N) :-
formulate. However, its procedural meaning is straightforward: to execute cube, first C is N * N * N,
read X and then process it; if X = stop then everything has been clone, otherwise write( 'Cube of'), write( N), write( ' is '),
write the cube of X and recursively call the cube procedure to process further data. A write( C), nl,
table of the cubes of numbers can be produced using this new procedure as follows: cube.
?- cube. A conversation with this new version of cube would then be, for example, as follows:
2.
8 ?-cube.
5. Next item, please: 5.
125 Cube of 5 is 125
12. Next item, please: 12.
1728 Cube of 12 is 1728
stop. Next item, please: stop.
yes yes
Depending on the implementation, an extra request (like ttyflush, say) after writing The bars procedure can be defined as follows:
the prompt might be necessary in order to force the prompt to actually appear
on the screen before reading. bars( [l ).
In the following sections we will look at some typical examples of operations that bars( (NI L])
involve reading and writing. stars( N), nl,
bars( L).
stars( N) :-
6.2.2 Displaying lists N > 0,
write(,),
Besides the standard Prolog format for lists, there are se\·eral other natural forms for Nl is N - l,
displaying lists which have advantages in some situations. The following procedure stars( Nl).
writelist( L) stars( N) :
N =<0.
outputs a list L so that each element of L is written on a separate line:
writelist( [ ] ).
writelist( [XI L]) 6.2.3 Processing a file of terms
write( X), nl,
writelist( L). A typical sequence of goals to process a whole file, F, would look something like this:
If we have a list of lists, one natural output form is to write the elements of each list in ..., see( F), processfile, see( user), ...
one line. To this end, we will define the procedure writelistZ. An example of its use is:
Here processfile is a procedure to read and process each term in F, one after another,
?- writelist2( [ [a,b,c], (d,e,f], [g ,h,i] ] ).
until the end of the file is encountered. A typical schema for processfile is:
abc
def processfile :-
g hi read( Term), % Assuming Term not a variable
process( Term).
A procedure that accomplishes this is:
process( end_of_file) !. % All done
writelistZ( [] ).
process( Term)
writelist2( [LI LL]) treat( Term), % Process current item
doline( L), nl, processfile. % Process rest of file
writelist2( LL).
doline( [ J ). Here treat( Term) represents whatever is to be done with each term. An example
would be a procedure to display on the terminal each term together with its
doline( [XI LJ )
write( X), tab( 1), consecutive number. Let us call this procedure showfile. It has to have an additional
doline( L). argument to count the terms read:
A list of integer numbers can be sometimes conveniently shown as a bar graph. showfile( N) :-·
The following procedure, bars, will display a list in this form, assuming that the read( Term),
numbers in the list are sufficiently small. An example of using bars is: show( Term, N).
show( end_of_file, _) !.
?- bars( (3,4,6,5] ).
**"' show( Term, N) :-
**** write( N), tab( 2), write( Term), nl,
****** Nl is N + 1,
***** sho½file( Nl).
140 Input and Output Constructing and decomposing atoms 141
Exercises sentence processed by squeeze ends with a full stop and that words are separated
simply by one or more blanks, but no other character. An acceptable input is
6.1 Let f be a file of terms. Define a procedure then:
findtcrm( Tenn) The robot tried to pour wine out of the bottle.
that displays on the terminal the first term inf that matches Term. The goal squeeze would output this in the form:
6.2 Let f be a file of terms. Write a procedure The robot tried to pour wine out of the bottle.
findallterms( Term) The squeeze procedure will have a similar structure to the procedures for
processing files in the previous section. First it will read the first character, output
that displays on the terminal all the terms inf that match Tem1. Make sure that Term this character, and then complete the processing depending on this character. There
is not instantiated in the process (which could prevent its match with terms that are three alternatives that correspond to the following cases: the character is either a
occur later in the file).
full stop, a blank or a letter. The mutual exclusion of the three alternatives is
achieved in the program by cuts:
squeeze :
6.3 Manipulating characters
··· ······ · · ······· ······· ······ ··········••··· ·················································· ······· ·························· •········
getO(C),
put(C),
A character is written on the current output stream with the goal dorest(C).
dorest( 46) [. % 46 is ASCII for full stop, all done
put(C)
dorest( 32) [, % 32 is ASCH for blank
where C is the ASCII code (a number between O and 127) of the character to be get(C), % Skip other blanks
OLttput. For example, the question put(C),
dorest( C).
7- put( 65), put( 66), put( 67).
dorest( Letter)
would cause the following output: squeeze.
ABC
Mary was pleased to see the robot fail. (3) Char is a letter: first read the word, Word, which begins with Char, and then use
getsentence to read the rest of the sentence, producing Wordlist. The cumulative
then the goal getsentence( Sentence) will cause the instantiation:
result is the list (Word I Wordlist].
Sentence= [ 'Mary', was, pleased, to, see, the, robot, fail]
The procedure that reads the characters of one word is:
For simplicity, we will assume that each sentence terminates with a full stop and getletters( Letter, Letters, Nextchar)
that there are no punctuation symbols within the sentence.
The program is shown in Figure 6.2. The procedure getsentence first reads the The three arguments are:
current input character, Char, and then supplies this character to the procedure
(1) Letter is the current letter (already read) of the word being read.
getrest to complete the job. getrest has to react properly according to three cases:
(2) Letters is the list of letters (starting with Letter) up to the end of the word.
(1) Char is the full stop: then everything has been read. (3) Nextchar is the input character that immediately follows the word read.
(2) Char is the blank: ignore it, getsentence from rest of input. Nextchar must be a non-letter character.
144 Input and Output Summary 145
We conclude this example with a comment on the possible use of the getsentence will be that all the clauses in file program3 are read and loaded into the memory. So
procedure. It can be used in a program to process text in natural language. Sentences they wilt be used by Prolog when answering further questions from the user.
represented as lists of words are in a form that is suitable for further processing in Another file may be 'consulted' at some later time during the same session. Basically,
Prolog. A simple example is to look for certain keywords in input sentences. A much the effect is again that the clauses from this new file are added into the memory.
more difficult task would be to understand the sentence; that is, to extract from the However, details depend on the implementation and other circumstances. If the
sentence its meaning, represented in some chosen formalism. This is an important new file contains clauses about a procedure defined in the previously consulted
research area of Artificial Intelligence, and is introduced in Chapter 21. file, then the new clauses may be simply added at the end of the current set of
clauses, or the previous definition of this procedure may be entirely replaced by the
new one.
Exercises Several files may be consulted by the same consult goal, for example:
?- consult( [ program3, program4, queens]).
6.4 Define the relation
Such a question can also be written more simply as:
starts( Atom, Character)
?- [ program3, prograrn4, queens].
to check whether Atom starts with Character.
Consulted programs are used by a Prolog interpreter. If a Prolog implementation also
6.5 Define the procedure plural that will convert nouns into their plural form. For features a compiler, then programs can be loaded in a compiled form. This enables
example: more efficient execution with a typlcal speed-up factor of 5 or 10 between the
?- plural( table, X). interpreted and compiled code. Programs are loaded into memory in the compiled
form by the built-in predicate compile, for example:
X = tables
?- compile( program3).
6.6 Write the procedure
or
search( KeyWord, Sentence)
?- compile( [ program4, queens, program6]).
that will, each time it is called, find a sentence in the current input file that contains
the given Keyword. Sentence should be in its original form, represented as a sequence Compiled programs are more efficiently executed, but interpreted programs are
of characters or as an atom (procedure getsentence of this section can be accordingly easier to debug because they can be inspected and traced by Prolog's debugging
modified). facilities. Therefore an interpreter is typically used in the program development
phase, and a compiler is used with the final program.
It should be noted, again, that the details of consulting and compiling files
depend on the implementation of Prolog. Usually a Prolog implementation also
6.5 Reading programs
········································· ·································································································
allows the user to enter and edit the program interactively.
•
chapter 7
Switching between streams is done by:
see( File) File becomes the current input stream
tell( File) File becomes the current output stream
seen close the current input stream
•
told close the current output stream
Files are read and written in two ways: More Built-in Predicates
as sequences of characters
as sequences of terms 7.1 Testing the type of terms 147
Built-in procedures for reading and writing characters and terms are: 7.2 Constructing and decomposing terms:= .. , functor, arg, name 155
read( Term) input next term
write( Term) output Term 7.3 Various kinds of equality and comparison 160
put( CharCode) output character with the given ASC!l code
7.4 Database manipulation 161
get0( CharCode) input next character
get( CharCode) input next 'printable' character 7.5 Control facilities 166
• Two procedures help formatting:
7.6 bagof, setof and finda/1 167
nl output new line
tab( N) output N blanks
• The procedure name( Atom, CodeList) decomposes and constructs atoms.
CodeList is the list of ASCII codes of the characters in Atom.
• Many Prolog implementations provide additional facilities to handle non
sequential files, windows, provide graphics primitives, input information from In this chapter we will examine some more built-in predicates for advanced Prolog
the mouse, etc. programming. These features enable the programming of operations that are not
possible using only the features introduced so far. One set of such predicates
manipulate terms: testing whether some variable has been instantiated to an
Reference. to Prolog standard integer, taking terms apart, constructing new terms, etc. Another useful set of
procedures manipulates the 'database': they add new clauses to the program or
For some of the predicates mentioned in this chapter, ISO standard for Prolog (Deransart et al. remove existing ones.
1996) recommends different names from those used in most Prolog implementations. The built-in predicates largely depend on the implementation of Prolog.
However, the predicates are conceptually the same, so compatibility is only a matter of However, the predicates discussed in this chapter are provided by many Prolog
renaming. The concerned predicates in this chapter are: see(Filename), tell(Filename), implementations. Various implementations may provide additional features.
get(Code), put(Code), name(Atom.CodeList). The corresponding predicate names in the
standard are: set_input(Filename), set_output(Filename), get_code(Code), put_code(Code),
atom_codes(Atom,CodeList).
7.1 Testing the type of terms
Deransart, P., Ed-Bdali, A. and Ceroni, L. (1996) Prolog: Tile Standard. Berlin: Springer-Verlag.
147
148 More Built-in Predicates Testing the type of terms 149
uninstantiated. Further, if it is instantiated, its value can be an atom, a structure, etc. ?- atomic( 3.14).
It is sometimes useful to know what the type of this value is. For example, we may yes
want to add the values of two variables, X and Y, by:
?- atom( ==> ).
Z is X + Y
yes
Before this goal is executed, X and Y have to be instantiated to numbers. If we are ?- atom( p(l) ).
not sure that X and Y will indeed be instantiated to numbers at this point then we
should check this in the program before arithmetic is done. no
To this end we can use the built-in predicate number. number( X) is true if X is a ?- compound ( 2 + X)
number or if it is a variable whose value is a number. We say that X must 'currently yes
stand for' a number. The goal of adding X and Y can then be protected by the
following test on X and Y: We will illustrate the need for atom by an example. We would like to count how
many times a given atom occurs in a given list of objects. To this purpose we will
..., number( X), number( Y), Z is X + Y, ...
define a procedure:
If X and Y are not both numbers then no arithmetic will be attempted. So the
count( A, L, N)
number goals 'guard' the goal Z is X + Y before meaningless execution.
Built-in predicates of this sort are: var, nonvar, atom, integer, float, number, atomic, where A is the atom, L is the list and N is the number of occurrences. The first
compound. Their meaning is as follows: attempt to define count could be:
var( X) succeeds if X is currently an uninstantiated variable count(_,[], 0).
nonvar( X) succeeds if X is not a variable, or X is an already instantiated count( A,[A I L], N) !,
variable count( A, L, Nl), % Nl = number of occurrences in tail
atom( X) is true if X currently stands for an atom N is Nl + 1.
integer( X) is true if X currently stands for an integer count( A, [_ I L], N)
float( X) is true if X currently stands for a real number count( A, L, N).
number( X) is true if X currently stands for a number
atomic( X) is true if X currently stands for a number or an atom Now let us try to use this procedure on some examples:
compound( X) is true if X currently stands for a compound term (a structure) ?- count( a, [a,b,a,a], N).
The following example questions to Prolog illustrate the use of these built-in N=3
predicates:
?- count( a,[a,b,X,Y], Na).
?- var( Z), Z = 2. Na=3
Z=2
?- Z = 2, var( Z). ?- count( b,[a,b,X,Y], Nb).
no
Nb=3
?- integer( Z), Z = 2.
no
?- L=[a, b, X, Y], count( a, L, Na), count( b, L, Nb).
?- Z=2, integer( Z), nonvar( Z).
Na=3
Z=2 Nb= 1
X=a
?- atom( 3.14). y=a
no
150 More Built-in Predicates Testing the type of terms 151
In the last example, X and Y both became instantiated to a and therefore we only got Numberl = [D ll . 0 1 2, ..., D,,...].
Nb= 1; but this is not what we had in mind. We are interested in the number of real Number2 = [Dzt.Dzz .....Dz,.... ]
Number3 = [D:n, D32 ..... D3, ...J
occurrences of the given atom, and not in the number of terms that match this atom.
According to this more precise definition of the count relation we have to check rry
\ ��:� �: ll (
��7 from
ig ht
ere carry
( t}O
whether the head of the list is an atom. The modified program is as follows: � �
�
II C Cl ll
I
count(_, [ I, 0).
count( A, [B I L], N)
atom( B), A = B, !, % Bis atom A?
Number!
+ Numbcr2
I D,,.
D 2, ....................]
count( A, L, Nl), 1<, Count in tail
0
The task is to find SLtch an instantiation of the variables D, 0, N, etc., for which the
7.1.2 A cryptarithmetic puzzle using nonvar sum is valid. When the sum relation has been programmed, the puzzle can be stated
to Prolog by the question:
A popular example of a cryptarithmetic puzzle is ?- sum( [D,O,N,A,L,D], [G,E,R,A,L,D], [R,O,B,E,R,T] ).
DONALD To define the sum relation on lists of digits, we have to implement the actual rules
+GERALD for doing summation in the ciecimal number system. The summation is clone digit
ROBERT by digit, starting with the right-most digits, continuing toward the left, always
taking into account the carry digit from the right. It is also necessary to maintain a
The problem here is to assign decimal digits to the letters D, 0, N, etc., so that the set of available digits; that is, digits that have not yet been used for instantiating
above sum is valid. All letters have to be assigned different digits, otherwise trivial variables already encountered. So, in general, besides the three numbers Nl, NZ and
solutions are possible - for example, all letters equal zero. N, some additional information is involved, as illustrated in Figure 7.1:
We will define a relation
• carry digit before the summation of the numbers;
sum( Nl, NZ, N)
• carry digit after the summation;
where Nl, NZ and N represent the three numbers of a given cryptarithmetic puzzle. • set of digits available before the summation;
The goal surn( Nl, NZ, N) is true if there is an assignment of digits to letters such that • remaining digits, not used in the summation.
N1 +NZ= N.
The first step toward a solution is to decide how to represent the numbersNl, NZ To formulate the sum relation we will use, once again, the principle of generalization
andN in the program. One way of doing this is to represent each number as a list of of the problem: we will introduce an auxiliary, more general relation, suml. suml has
decimal digits. For example, the number 225 would be represented by the list [2,2,5]. some extra arguments, which correspond to the foregoing additional information:
As these digits are not known in advance, an uninstantiated variable will stand for suml( Nl, NZ, N, Cl, C, Digitsl, Digits)
each digit. Using this representation, the problem can be depicted as:
Nl, NZ ancl N are our three numbers, as in the sum relation, Cl is carry from
[D,O,N,A,L, DJ
+ [ G, E, R, A, L, D] the right (before summation of Nl and NZ), and C is carry to the left (after the
[R, O,B, E, R, T] summation). The following example illustrates:
152 More Built-in Predicates Testing the type of terms 153
?- suml( [H,EJ, [6,E], [U,S], l, 1, [l,3,4,7,8,9], Digits). Translating this case into Prolog we have:
I-!�, 8
E =3 suml( [DlINl], [D2IN2], [DI NJ, Cl , C, Digs 1, Digs)
S =7 sum 1( Nl, N2, N, Cl, C2, Digs 1, Digs2),
U =4 digitsum( Dl, D2, C2, D, C, Digs2, Digs).
Digits= [1,9]
It only remains to define the digitsum relation in Prolog. There is one subtle detail
This corresponds to the following summation: that involves the use of the metalogical predicate nonvar. D1, D2 and D have to be
decimal digits. If any of them is not yet instantiated then it has to become
� 1
instantiated to one of the digits in the list DigsZ. This digit has to be deleted from
8 3
the set of available digits. If D1, D2 or D is already instantiated then, of course, none
6 3
4 7
'¾., Solving cryptarithmetic puzzles
As Figure 7.1 shows, Cl and C have to be O if Nl, NZ and N are to satisfy the sum
relation. Digitsl is the list of available digits for instantiating the variables in Nl, NZ sum( Nl, N2, N) :- % Numbers represented as lists of digits
and N; Digits is the list of digits that were not used in the instantiation of these sum l( Nl, N2, N,
variables. Since we allow the use of any decimal digit in satisfying the sum relation, 0, 0, % Carries from right and to left both 0
[0,1,2,3,4,5,6,7,8,9], _). % All digits available
the definition of sum in terms of suml is as follows:
suml( [], [l, [], C, C, Digits, Digits).
sum( Nl, N2, N) :-
suml( [DlINl], [D2INZ], [DINJ, Cl, C, Digsl, Digs)
suml( Nl, N2, N, 0, 0, [0,1,2,3,4,5,6,7,8,9], _ ).
suml( Nl, N2, N, Cl, C2., Digsl, Digs2),
The burden of the problem has now shifted to the suml relation. This relation is, digitsum( Dl, D2, C2, D, C, Digs2, Digs).
however, general enough that it can be defined recursively. We will assume, without digitsum( Dl, D2, Cl, D, C, Digsl, Digs)
loss of generality, that the three lists representing the three numbers are of equal del_var( Dl, Digsl, Digs2), % Select an available digit for D 1
length. Our example problem, of course, satisfies this constraint; if not, a 'shorter' de!_var( D2, Digs2, Digs3), % Select an available digit for D2
del_var( D, Digs3, Digs), % Select an available digit for D
number can be prefixed by zeros.
S is Dl + D2 + Cl,
The definition of suml can be divided into two cases:
D is S mod 1 0, % Remainder
(1) The three numbers are represented by empty lists. Then:
C is SII 10. '¾, Integer division
del_var( A, L, L) :-
suml( [], [], [], C, C, Digs, Digs). nonvar( A), !. % A already instantiated
(2) All three numbers have some left-most digit and the remaining digits on their del_var( A, [A! L], L). o;,, Delete the head
right. So they are of the form: del_var( A, [B I L], [BILl] )
del_var( A, L, Ll). 1,, Delete from tail
0
[DlINll, [D2IN2], [DIN]
% Some puzzles
In this case two conditions must be satisfied:
puzzlel( [D,O,N,A,L,D],
(a) The three numbers Nl, NZ and N have to satisfy the suml relation, giving [G,E,R,A,L,D],
some carry digit, CZ, to the left, and leaving some unused subset of [R,O,B,E,R,T] ).
decimal digits, Digs2. puz.2le2( [0,S,E,N,D],
(b) The left-most digits D1, D2 and D, and the carry digit CZ have to satisfy the [0,M,O,R,E],
[M,O,N,E,Y] ).
relation indicated in Figure 7. l: CZ, D1 and D2 are added giving D and
a carry to the left. This condition will be formulated in our program as a
relation digitsum. Figure 7.2 A program for cryptarithmetic puzzles.
154 More Built-in Predicates Constructing and decomposing terms: - .. , functor, arg, name 155
of the available digits will be spent. This is realized in the program as a non to store a new element into a list. Assume that all of the elements that can be stored
deterministic deletion of an item from a list. If this item is non-variable then are non-variables. List contains all the stored elements followed by a tail that is not
nothing is deleted (no instantiation occurs). This is programmed as: instantiated and can thus accommodate new elements. For example, let the existing
elements stored be a, b and c. Then
clel_var( Item, List, List) :
nonvar(Item), !. % Item already instantiated List=[a, b, c I Tail]
clel_var( Item, [Item I List], List). 0
/c, Delete the head where Tail is a variable. The goal
del_var( Item, [A I List], [A I Listlj) adcl_to_tail( d, List)
de!_var(Item, List, Listl). % Delete Item from tail
will cause the instantiation
A complete program for cryptarithmetic puzzles is shown in Figure 7.2. The Tail= [cl I t'-iewTail] and List=[a, b, c, cl I NewTail]
program also includes the definition of two puzzles. The question to Prolog about
DONALD, GERALD and ROBERT, using this program, would be: Thus the structure can, in effect, grow by accepting new items. Define also the
corresponding membership relation.
?- puzzlel(NJ, NZ, N), sum(Nl, NZ, N).
When do we consider two terms to be equa!7 Until now we have introduced three count(_,[], 0).
kinds of equality in Prolog. The first was based on matching, written as: count(Term, [Head I L ), N)
X= Y Tenn== Head, !,
count(Tenn, L, Nl),
This is true if X and Y match. Another type of equality was written as: N is NI+ 1
X is E
count(Term, L, N).
This is true if X matches the value of the arithmetic expression E. We also had: We have already seen predicates that compare terms arithmetically, for example
El=:= E2 X+2 < 5. Another set of built-in predicates compare terms alphabetically and thus
define an ordering relation on terms. For example, the goal
This is true if the values of the arithmetic expressions El and E2 are equal. In
contrast, when the values of two arithmetic expressions are not equal, we write: X@<Y
El=\=E2 is read: term X precedes term Y. The precedence between simple terms is determined
by alphabetical or numerical ordering. The precedence between structures is
Sometimes we are interested in a stricter kind of equality: the literal equality of two
determined by the precedence of their principal functors. If the principal functors
terms. This kind of equality is implemented as another built-in predicate written as
are equal, then the precedence between the top-most, left-most functors in the
an infix operator'==':
subterms in X and Y decides. Examples are:
Tl== T2
?- paul@< peter.
This is true if terms Tl and T2 are identical; that is, they have exactly the same yes
structure and all the corresponding components are the same. In particular, the
names of the variables also have to be the same. The complementary relation is 'not ?- f(2)@< f(3).
identical', written as: yes
Tl\==T2 ?- g(2)@< f(3).
Here are some examples: 110
make it possible to update this database during the execution of the program. This is ?- disgusting.
done by adding (during execution) new clauses to the program or by deleting yes
existing clauses. Predicates that serve these purposes are assert, asserta, assertz and
?- retract( fog).
retract.
A goal yes
assert( C) ?- disgusting.
no
always succeeds and, as its side effect, causes a clause C to be 'asserted' -- that is,
added to the database. A goal ?- assert( sunshine).
yes
retract( C)
?- funny.
does the opposite: it deletes a clause that matches C. The following conversation
yes
with Prolog illustrates:
?- retract( raining).
?- crisis.
yes
no
?- nice.
?- assert( crisis).
yes
yes
Clauses of any form can be asserted or retracted. However, depending on the
?- crisis.
implementation of Prolog, it may be required that predicates manipulated through
yes assert/retract be declared as dynamic, using the directive dynamic( Predicatelndicator).
?- retract( crisis). Predicates that are only brought in by assert, and not by consult, are automatically
assumed as dynamic.
yes
The next example illustrates that retract is also non-deterministic: a whole set of
?- crisis. clauses can, through backtracking, be removed by a single retract goal. Let us assume
no that we have the following facts in the 'consulted' program:
Clauses thus asserted act exactly as part of the 'original' program. The following fast( ann).
slow( tom).
example shows the use of assert and retract as one method of handling changing
slow( pat).
situations. Let us assume that we have the following program about weather:
We can add a rule to this program, as follows:
nice
sunshine, not raining. ?- assert(
funny :- ( faster(X, Y)
sunshine, raining. fast(X), slow(Y) ) ).
disgusting : yes
raining, fog. ?- faster( A, B).
raining. A= ann
fog. B= tom
The following conversation with this program will gradually update the database: ?- retract( slow(X) ).
X= tom;
?- nice. X= pat;
no no
Database manipulation 165
164 More Built-in Predicates
Y, compute Z
?- faster( arrn, _). of integers between O and 9 as follows: generate a pair of integers X and
of th product table, and then force the
no is X•Y, assert the three numbers as one lin e e
backtrac king, another pair of integ ers to be
failure. The failur e will cause, through
Notice that when a rule is asserted, the syntax requires that the rule (as an argument line tabulate d, tc. Th following proc edur e maketable
found and so anoth e r e e
?- assert( p(b) ), assertz( p(c) ), assert( p(d) ), asserta( p(a) ). will, of course, not succeed, but it will, as a side effect, add the whole product table
to the database. After that we can ask, for example, what pairs give the product 8:
yes
?- product( A, B, 8).
?- p( X).
X =a; A=l
X=b; B = 8;
X= c; A=2
X=d B=4;
There is a relation between consult and assertz. Consulting a file can be defined in
terms of assertz as follows: to consult a file, read each term (clause) in th e file and A remark on the style of programming should be made at this stage. The
assert it at the encl of the database. foregoing examples illustrate some obviously useful applications of assert and
One useful application of asserta is to store alr eady computed answers to retract. However, their use requires special care. Excessiv e and careless use of these
questions. For example, let th ere be a predicate facilities cannot be r ecommended as good programming style. By asserting and
retracting we, in fact, modify the program. Therefore relations that hold at some
solve( Problem, Solution)
point will not be true at some other time. At different times the same questions
defined in th e program. W e may now ask som e question and r equest that the answer receive differ ent answers. A lot of asserting and retracting may thus obscure the
be r emembered for futur e questions. m eaning of the program. The resulting behaviour of the program may becom e
difficult to understand, difficult to explain and to trust.
?- solve( problem1, Solution),
asserta( solve( problem1, Solution) ).
If the first goal above succeeds then the answer (Solution) is stored and used, as any Exercises
other clause, in answering further questions. The advantage of such a 'memoization'
of answers is that a further quest ion that matches the asserted fact will normally be
7.6 (a) Write a Prolog question to remove the whole product table from the database.
answered much quicker than the first one. The result now will be simply retrieved as
a fact, and not computed through a possibly time-consuming process. This tech (b) Modify the question so that it only removes those entries where the product is 0.
nique of storing derived solutions is also called 'caching'.
7. 7 Define the relation
An extension of this idea is to use asserting for generating all solutions in the
form of a table of facts. For example, we can generate a table of products of all pairs copy_tem1( Term, Copy)
166 More Built-in Predicates bagof, setof and finda/1 167
7.6 �.�9?.(..��.�?.'...��.�..(!.0.�?.!!.........................................................................................
which will produce a copy of Tenn so that Copy is Term with all its variables
renamed. This can be easily programmed by using asserta and retract. In some
Prologs copy_term is provided as a built-in predicate.
We can generate, by backtracking, all the objects, one by one, that satisfy some goal.
Each time a new solution is generated, the previous one disappears and is not
accessible any more. However, sometimes we would prefer to have all the generated
objects available together - for example, collected into a list. The built-in predicates
7.5 Control facilities bagof, setof and findall serve this purpose.
The goal
So far we have ccvered most of the extra control facilities except repeat. For
completeness the complete set is presented here. bagof( X, !', L)
• mt, written as'!', prevents backtracking. It was introduced in Chapter 5. A useful will produce the list L of all the objects X such that a goal P is satisfied. Of course,
predicate is once( P) defined in terms of cut as: this usually makes sense only .if X and P have some common variables. For example,
let us have these facts in the program:
once( P) :- P, !.
age( peter, 7).
once( P) produces one solution only. The cut, nested in once, does not prevent
age( ann, 5).
backtracking in other goals. age( pat, 8).
• fail is a goal that always fails. age( tom, S).
• true is a goal that always succeeds. Then we can obtain the list of all the children of age 5 by the goal:
• not( !') is negation as failure that behaves exactly as if defined as:
?• bagof( Child, age( Child, 5), List).
not( P) :- P,!, fail; true. List=[ ann, tom]
Some problems with cut and not were discussed in detail in Chapter 5. If, in the above goal, we leave the age unspecified, then we get, through back
• call( P) invokes a goal P. It succeeds if P succeeds. tracking, three lists of children, corresponding to the three age values:
• repeat is a goal that always succeeds. Its special property is that it is non ?- bagof( Child, age( Child, Age), List).
deterministic; therefore, each time it is reached by backtracking it generates
another alternative execution branch. repeat behaves as if defined by: Age= 7
List= [ peter];
repeat. Age= 5
repeat :- repeat. List= [ arrn, tom];
A typical way of using repeat is illustrated by the following procedure dosquares Age= 8
which reads a sequence of numbers and outputs their squares. The sequence is List= [ pat];
concluded wi.th the atom stop, which serves as a signal for the procedure to no
terminate.
We may prefer to have all of the children in one list regardless of their age. This can
dosquares
be achieved by explicitly stating in the call of bagof that we do not care about the
repeat,
read( X), value of Age as long as such a value exists. This is stated as:
( X=stop,!
?- bagof( Child, Age A age( Child, Age), List).
Y is X•X, write(Y), List= [ peter, ann, pat, tom]
fail
). Syntactically, 'A' is a predefined infix operator of type xfy.
168 More Built-in Predicates Summary 169
If there is no solution for P in the goal bagof( X, P, L), then the bagof goal simply
findall( X, Goal, Xlist)
fails. If the same object X is found repeatedly, then all of its occurrences will appear % Find a solution
call( Goal),
in L, which leads to duplicate items in L. assertz( queue(X) ), % Assert it
The predicate setof is similar to bagof. The goal fail; % Try to find more solutions
assertz( queue(bottom) ), % Mark end of solutions
setof( X, P, L)
collect( Xlist). % Collect the solutions
will again produce a list L of objects X that satisfy P. Only this time the list L will be " .. collect( L) :-
ordered, and duplicate items, if there are any, will be eliminated. The ordering of the retract( queue(X) ), !, % Retract next solution
objects is according to built-in predicate@<, which defines the precedence among ( X== bottom,!, L= [] % End of solutions?
terms. For example:
L = [X I Rest], collect( Rest) ). % Otherwise collect the rest
?- setof( Child, Age A age( Child, Age), ChildList),
setof( Age, Child A age( Child, Age), AgeList).
ChildList= [ ann, pat, peter, tom] Figure 7.4 An implementation of the fmdall relation.
AgeList= [ S, 7, 8]
There is no restriction on the kind of objects that are collected. So we can, for
example, construct the list of children ordered by their age, by collecting pairs of the
form Age/Child:
?- setof( Age/Child, age( Child, Age), List). 7.8 Use bagof to define the relation powerset( Set, Subsets) to compute the set of all
List= [ 5/ann, 5/tom, 7/peter, 8/pat] subsets of a given set (all sets represented as lists).
Another predicate of this family, similar to bagof, is findall. 7.9 Use bagof to define the relation
findall( X, P, L)
copy_term( Term, Copy)
produces, again, a list of objects that satisfy P. The difference with respect to bagof is
that all of the objects X are collected regardless of (possibly) different solutions for such that Copy is Term with all its variables renamed.
variables in P that are not shared with X. This difference is shown in the following
example:
?- findall( Child, age( Child, Age), List). Summary
·· ·· · ·· · ·· · ·· · ··· ··· ···· · ····· ·· ···· · ··· ··················· · ··· · · · ··············· · ········· · · · ·········· · · ···· · ···························
List= [ peter, ann, pat, tom]
• A Prolog implementation normally provides a set of built-in procedures to
If there is no object X that satisfies P then findall will succeed with L = [ J.
accomplish several useful operations that are not possible in pure Prolog. In this
If findall is not available as a built-in predicate in the implementation used then it
chapter, such a set of predicates available in many Prolog implementations was
can be easily programmed as follows. All solutions for P are generated by forced
introduced.
backtracking. Each solution is, when generated, immediately asserted into the
database so that it is not lost when the next solution is found. After all the solutions • The type of a term can be tested by the following predicates:
have been generated and asserted, they have to be collected into a list and retracted var( X) X is a (non-instantiated) variable
from the database. This whole process can be imagined as all the solutions generated nonvar( X) X is not a variable
forming a queue. Each newly generated solution is, by assertion, added to the end of atom( X) X is an atom
this queue. When the solutions are collected the queue dissolves. Note, in addition, integer( X) X is an integer
that the end of this queue has to be marked, for example, by the atom 'bottom' float( X) X is a real number
(which, of course, should be different from any solution that is possibly expected). atomic( X) X is either an atom or a number
An implementation of findall along these lines is shown as Figure 7.4. compound( X) X is a structure
170 More Built-in Predicates
chapter 8
• Terms can be constructed or decomposed:
Term = .. [ Functor I ArgumentList]
functor( Term, Functor, Arity)
arg( N, Term, Argument)
•
name( Atom, CharacterCodes)
Terms can be compared:
Programming Style and Technique
X=Y X and Y match
X== y X and Y are identical 8.1 General principles of good programming 171
X \== y X and Y are not identical
8.2 How to think about Pro log programs 173
X =: =Y X and Y are arithmetically equal
X= \=Y X and Y are not arithmetically equal 8.3 Programming style 176
X <Y arithmetic value of X is less than Y (related: = <, >, > =)
X @<Y term X precedes term Y (related:@=<,@>,@>=) 8.4 Debugging 179
• A Prolog program can be viewed as a relational database that can be updated by
8.5 Improving efficiency 181
the following procedures:
assert( Clause) add Clause to the program
asserta( Clause) add at the beginning
assertz( Clause) add at the end
retract( Clause) remove a clause that matches Clause
• All the objects that satisfy a given condition can be collected into a list by the In this chapter we will review some general principles of good programming and
predicates: discuss the following questions in particular: How to think about Prolog programs?
bagof( X, P, L) Lis the list of all X that satisfy condition P vVhat are elements of good programming style in Prolog7 How to debug Prolog
setof( X, P, L) L is the sorted list of all X that satisfy condition P programs? How to make Prolog programs more efficient?
findall( X, P, L) similar to bagof
• repeat is a control facility that generates an unlimited number of alternatives for
backtracking.
8.1 General principles of good programming
What is a good program? Answering this question is not trivial as there are several
criteria for judging how good a program is. Generally accepted criteria include the
following:
• Correctness Above all, a good program should be correct. That is, it should do
what it is supposed to do. This may seem a trivial, self-explanatory requirement.
However, in the case of complex programs, correctness is often not attained. A
common mistake when writing programs is to neglect this obvious criterion and
pay more attention to other criteria, such as efficiency or external glamour of the
program.
• User-friendliness A good program should be easy to use and interact with.
• Ef{lciency A good program should not needlessly waste computer time and
memory space.
171
How to think about Prolog programs 173
172 Programming Style and Technique
• Rene/ability A good program should be easy to read and easy to understand. It and to data structures. In the initial stages we normally work with more abstract,
should not be more complicated than necessary. Clever programming tricks that bulky units of information whose structure is refined later.
obscure the meaning of the program should be avoided. The general organiza Such a strategy of top-clown stepwise refinement has the following advantages:
tion of the program and its layout help its readability.
• it allows for formulation of rough solutions in terms that are most relevant to
• Modifiability A good program should be easy to modify and to extend. the problem;
Transparency and modular organization of the program help modifiability.
• • in terms of such powerful concepts, the solution should be succinct and simple,
Robustness A good program should be robust. It should not crash immediately and therefore likely to be correct;
when the user enters some incorrect or unexpected data. The program should,
in the case of such errors, stay 'alive' and behave reasonably (should report • each refinement step should be small enough so that it is intellectually manage
errors). able; if so, the transformation of a solution into a new, more detailed
representation is likely to be correct, and so is the resulting solution at the next
• Docwnentation A good program should be properlv documented. The minimal level of detail.
documentation is the program's listing including sufficient program comments.
In the case of Prolog we may talk about the stepwise refinement of relations. If the
The importance of particular criteria depends on the problem and on the
problem suggests thinking in algorithmic terms, then we can also talk about
circumstances in which the program is written, and on the environment in which
refinement of algorit!zms, adopting the procedural point of view in Prolog.
it is used. There is no doubt that correctness has the highest priority. The issues of
In order to properly refine a solution at some level of detail, and to introduce
readability, user-friendliness, modifiability, robustness and documentation are
useful concepts at the next lower level, we need ideas. Therefore programming is
usually given, at least, as much priority as the issue of efficiency.
creative, especially so for beginners. With experience, programming gradually
There are some general guidelines for practically achieving the above criteria.
becomes less of an art and more of a craft. But, nevertheless, a major question is:
One important rule is to first think about the problem to be solved, and only then to
How do we get ideas? i\,fost ideas come from experience, from similar problems
start writing the actual code in the programming language used. Once we have
whose solutions we know. If we do not know a direct programming solution,
developed a good understanding of the problem and the solution is well thought
another similar problem could be helpful. Another source of ideas is everyday life.
through, the actual coding will be fast and easy, and there is a good chance that we
For example, if the problem is to write a program to sort a list of items we may get an
will soon get a correct program.
idea from considering the question: How would I myself sort a set of exam papers
A common mistake is to start writing the code even before the full definition of
according to the alphabetical order of students?
the problem has been understood. A fundamental reason why early coding is bad
General principles of good programming outlined in this section basically apply
practice is that the thinking about the problem and the ideas for a solution should
to Prolog as well. We will discuss some details with particular reference to Prolog in
be done in terms that are most relevant to the problem. These terms are usually far
the following sections.
from the syntax of the programming language used, and they may include natural
language statements and pictorial representation of ideas.
Such a formulation of the solution will have to be transformed into the
programming language, but this transformation process may not be easy. A good
approach is to use the principle of stepwise refinement. The initial formulation of 8.2 How to think about Prolog programs
· · · ·· · · · · · · · · · · · · · ·· ····· · · ·· ·················· ·· · · · · · · · ···· · · · · · · · · ················ ·· · · · · · · · · · · · · ··· · · · · · · · ······· · · · · · · · · · ········· ·· · · ·
the solution is referred to as the 'top-level solution', and the final program as the
'bottom-level solution'. One characteristic feature of Prolog is that it allows for both the procedural and
According to the principle of stepwise refinement, the final program is developed declarative way of thinking about programs. The two approaches have been
through a sequence of transformations, or 'refinements', of the solution. We start discussed in detail in Chapter 2, and illustrated by examples throughout the text.
with the first, top-level solution and then proceed through a sequence of solutions; Which approach will be more efficient and practical depends on_ the problem.
these are all equivalent, but each solution in the sequence is expressed in more Declarative solutions are usually easier to develop, but may lead to an inefficient
detail. In each refinement step, concepts used in previous formulations are program.
elaborated to greater detail and their representation gets closer to the programming During the process of developing a solution we have to find ideas for reducing
language. It should be realized that refinement applies both to procedure definitions problems to one or more easier subproblems. An important question is: How do we
174 Programming Style and Technique How to think about Prolog programs 175
find proper subproblems? There are several general principles that often work in One reason why recursion so naturally applies to defining relations in Prolog is
Prolog programming. These will be discussed in the following sections. that data objects themselves often have recursive strncture. Lists and trees are such
objects. A list is either empty (boundary case) or has a head and a tail that is itself a
list (general case). A binary tree is either empty (boundary case) or it has a root and
8.2.1 Use of recursion two subtrees that are themselves binary trees (general case). Therefore, to process a
whole non-empty tree, we must do something with the root, and process the
The principle here is to split the problem into cases belonging to two groups: subtrees.
(1) trivial, or 'boundary' cases;
(2) 'general' cases where the solution is constructed from solutions of (simpler) 8.2.2 Generalization
versions of the original problem itself.
[n Prolog we use this technique all the time. Let us look at one more example: It is often a good idea to generalize the original problem, so that the solution to the
processing a list of items so that each item is transformed by the same transforma generalized problem can be formulated recursively. The original problem is then
tion rule. Let this procedure be solved as a special case of its more general version. Generalization of a relation
typically involves the introduction of one or more extra arguments. A major
maplist( List, F, NewList) problem, which may require deeper insight into the problem, is how to find the
where List is an original list, Fis a transformation rnle (a binary relation) and NewList right generalization.
is the list of all transformed items. The problem of transforming List can be split into As an example let us revisit the eight queens problem. The original problem was
two cases: to place eight queens on the chessboard so that they do not attack each other. Let us
call the corresponding relation:
(l) Boundary case: List = [ ]
eightqueens( Pos)
if List = [] then NewList = [ ] , regardless of F
This is true if Pos is a position with eight non-attacking queens. A good idea in this
(2) General case: List = [X I Tail] case is to generalize the number of queens from eight to N. The number of queens
To transform a list of the form [X I Tail], do: now becomes the additional argument:
transform the item X by rnle F obtaining NewX, and nqueens( Pos, N)
transform the list Tail obtaining NewTail;
the whole transformed list is [NewX I NewTailj. The advantage of this generalization is that there is an immediate recursive
formulation of the nqueens relation:
In Prolog:
maplist( [ ], _, [ l ). (1) Boundary case: N = 0
map!ist( [X I Tail], F, [NewX I NewTail] ) To safely place zero queens is trivial.
G = .. [F, X, NewX],
call( G), (2) General case: N > 0
maplist( Tail, F, NewTail).
To safely place N queens on the board, satisfy the following:
Suppose we have a list of numbers and want to compute the list of their squares.
maplist can be used for this as follows: • achieve a safe configuration of (N - 1) queens; and
square( X, Y) • add the remaining queen so that she does not attack any other queen.
Y is X•X.
Once the generalized problem has been solved, the original problem is easy:
?- maplist( (2, 6, 5], square, Squares).
Squares = [ 4, 36, 25] eightqueens( Pos) :- nqueens( Pos, 8).
176 Programming Style and Technique Programming style 177
8.2.3 Using pictures • The layout of programs is important. Spacing, blank lines and indentation
should be consistently used for the sake of readability. Clauses about the same
When searching for ideas about a problem, it is often useful to introduce some procedure should be clustered together; there should be blank lines between
graphical representation of the problem. A picture may help us to perceive some clauses (unless, perhaps, there are numerous facts about the same relation); each
essential relations in the problem. Then we just have to describe what we see in the goal can be placed on a separate line. Prolog programs sometimes resemble
picture in the programming language. poems for the aesthetic appeal of ideas and form.
The use of pictorial representations is often useful in problem solving in general; • Stylistic conventions of this kind may vary from program to program as they
it seems, however, that it works with Prolog particularly well. The following depend on the problem and personal taste. lt is important, however, that the
arguments explain why: same conventions are used consistently throughout the whole program.
(1) Prolog is particularly suitable for problems that involve objects and relations • The cut operator should be used with care. Cut should not be used if it can be
between objects. Often, such problems can be naturally illustrated by graphs in easily avoided. It is better to use, where possible, 'green cuts' rather than 'red
which nodes correspond to objects and arcs correspond to relations. cuts'. As discussed in Chapter 5, a cut is callee! 'green' if it can be removed
(2) Structured data objects in Prolog are naturally pictured as trees. without altering the declarative meaning of the clause. The use of 'red cuts'
should be restricted to clearly defined constructs such as not or the selection
(3) The declarative meaning of Prolog facilitates the translation of pictorial
between alternatives. An example of the latter construct is:
representations into Prolog because, in principle, the order in which the
picture is described does not matter. We just put what we see into the program if Condition then Goall else Goal2
in any order. (For practical reasons of the program's efficiency this order will
possibly have to be polished later.) This translates into Prolog, using cut, as:
Condition, !, % Condition true?
Goall <){, If yes then Goal I
8.3 Programming style
·· · ··· ··· ··· ·· ·········· · · · · · · ······· ··· Goal2 % Otherwise Goal2
• The not procedure can also lead to surprising behaviour, as it is related to cut. We
The purpose of conforming to some stylistic conventions is:
have to be well aware of how not is defined in Prolog. However, if there is a
• to reduce the clanger of programming errors; and dilemma between not and cut, the former is perhaps better than some obscure
• to produce programs that are readable and easy to understand, easy to debug and construct with cut.
to modify. • Program modification by assert and retract can grossly degrade the transparency
of the program's behaviour. In particular, the same program will answer the
We will review here some ingredients of good programming style in Prolog: some
same question differently at different times. In such cases, if we want to
general rules of good style, tabular organization of long procedures and commenting.
reproduce the same behaviour we have to make sure that the whole previous
state, which was modified by assertions and retractions, is completely restored.
8.3.1 Some rules of good style • The use of a semicolon may obscure the meaning of a clause. The readability can
sometimes be improved by splitting the clause containing the semicolon into
• Program clauses should be short. Their body should typically contain no more more clauses; but this will, possibly, be at the expense of the length of the
than a few goals. program and its efficiency.
• Procedures should be short because long procedures are hard to understand. To illustrate some points of this section consider the relation
However, long procedures are acceptable if they have some uniform structure
merge( Listl, List2, List3)
(this will be discussed later in this section).
• Mnemonic names for procedures and variables should be used. Names should where Listl and List2 are ordered lists that merge into List3. For example:
indicate the meaning of relations and the role of data objects. merge( [2,4,7], [l,3,4,8], [1,2,3,4,4,7,8))
178 Programming Style and Technique Debugging 179
The following is an implementation of merge in bad style: Undercommenting is a usual fault, but a program can also be overcommented.
merge( Listl, List2, List3) :- Explanation of details that are obvious from the program code itself is only a
Listl = [ ], !, List3 = List2; % First list empty needless burden to the program.
List2 = ( ], !, List3 = Listl; % Second list empty Long passages of comments should precede the code they refer to, while short
Listl = (X I Restl ], comments should be interspersed with the code itself. Information that should, in
List2 = (Y I Rest2], general, be included in comments comprises the following:
( X <Y, !,
Z= X, % Z is head of List3 • What the program does, how it is used (for example, what goal is to be invoked
merge( Restl, List2, Rest3); and what are the expected results), examples of using the program.
Z= Y,
• What are top-level predicates?
merge( Listl, Rest2, Rest3) ),
List3 = [Z I Rest3]. • How are main concepts (objects) represented?
Here is a better version which avoids semicolons: • Execution time and memory requirements of the program.
• What are the program's limitations?
merge( [ ], List, List)
!. % Prevent redundant solutions • Are there any special system-dependent features used?
merge( List, [ ], List). • What is the meaning of the predicates in the program? What are their
merge( [X I Restl], [Y I Rest2], [X I Rest3]) arguments? Which arguments are 'input' and which are 'output', if known?
X <Y, !, (Input arguments have fully specified values, without uninstantiated variables,
merge( Rest 1, [YI Rest2], Rest3). when the predicate is called.)
merge( Listl, [Y I Rest2], (Y I Rest3] ) • Algorithmic and implementation details.
merge( Listl, Rest2, Rest3). • The following conventions are often used when describing predicates. Refer
ences to a predicate are made by stating the predicate's name and its arity,
written as:
8.3.2 Tabular organization of long procedures
PredicateName / Arity
Long procedures are acceptable if they have some uniform stmcture. Typically, such For example, merge( Listl, List2, List3) would be referred to as merge/3. The
a form is a set of facts when a relation is effectively defined in the tabular form. input/output modes of the arguments are indicated by prefixing arguments'
Advantages of such an organization of a long procedure are: names by'+' (input) or'-' (output). For example, merge( +Listl, +List2, -List3)
• Its structure is easily understood. indicates that the first two arguments of merge are input, and the third one is
• lncrementability: it can be refined by simply adding new facts.
output.
• It is easy to check and correct or modify (by simply replacing some fact
independently of other facts).
8.4 �.e.�.�9.g.i.�9............................................. .................. ················································
8.3.3 Commenting When a program does not do what it is expected to do the main problem is to locate
the error(s). It is easier to locate an error in a part of the program (or a module) than
Program comments should explain in the first place what the program is about in the program as a whole. Therefore, a good principle of debugging is to start by
and how to use it, and only then the details of the solution method used and testing smaller units of the program, and when these can be trusted, to test bigger
other programming details. The main purpose of comments is to enable the user to modules or the whole program.
use the program, to understand it and to possibly modify it. Comments should Debugging in Prolog is facilitated by two things: first, Prolog is an interactive
describe, in the shortest form possible, everything that is essential to these ends. language so any part of the program can be directly invoked by a proper question to
180 Programming Style and Technique Improving efficiency 181
the Prolog system; second, Prolog implementations usually provide special debug
ging aids. As a result of these two features, debugging of Prolog programs can, in
8.5 Improving efficiency
··········· ···········································-······················· ··································· · · ·················· · ····
general, be done more efficiently than in most other programming languages.
There are several aspects of efficiency, including the most common ones, execution
The basis for debugging aids is tracing. 'Tracing a goal' means that the informa
time and space requirements of a program. Another aspect is the time needed by the
tion regarding the goal's satisfaction is displayed during execution. This information
programmer to develop the program.
includes:
The traditional computer architecture is not particularly suitable for the Prolog
• Entry information: the predicate name and the values of arguments when the style of program execution - that is, satisfying a list of goals. Therefore, the
goal is invoked. limitations of time and space may be experienced earlier in Prolog than in many
• Exit information: in the case of success, the values of arguments that satisfy the
other programming languages. Whether this will cause difficulties in a practical
application depends on the problem. The issue of time efficiency is practically
goal; otherwise an indication of failure. meaningless if a Prolog program that is run a few times per clay takes 1 second of
• Re-entry information: invocation of the same goal caused by backtracking. CPU time and a corresponding program in some other language, say Fortran, takes
0.1 seconds. The difference in efficiency will perhaps matter if the two programs take
Between entry and exit, the trace information for all the subgoals of this goal can be 50 minutes and 5 minutes respectively.
obtained. So we can trace the execution of our question all the way down to the On the other hand, in many areas of application Prolog will greatly reduce the
lowest level goals until facts are encountered. Such detailed tracing may turn out to program development time. Prolog programs will, in general, be easier to write, to
be impractical due to the excessive amount of tracing information; therefore, the understand and to debug than in traditional languages. Problems that gravitate
user can specify selective tracing. There are two selection mechanisms: first, suppress toward the 'Prolog domain' involve symbolic, non-numeric processing, structured
tracing information beyond a certain level; second, trace only some specified subset data objects and relations between them. In particular, Prolog has been successfully
of predicates, and not all of them. applied in areas such as symbolic solving of equations, planning, databases, general
Such debugging aids are activated by system-dependent built-in predicates. A problem solving, prototyping, implementation of programming languages, discrete
typical subset of such predicates is as follows: and qualitative simulation, architectural design, machine learning, natural language
trace
understanding, expert systems, and other areas of artificial intelligence. On the
other hand, numerical mathematics is an area for which Prolog is not a natural
triggers exhaustive tracing of goals that follow. candidate.
With respect to the execution efficiency, executing a compiled program is
notrace generally more efficient than interpreting the program. Therefore, if the Prolog
system contains both an interpreter and a compiler, then the compiler should be
stops further tracing.
used if efficiency is critical.
spy( P) If a program suffers from inefficiency then it can often be radically improved by
improving the algorithm itself. However, to do this, the procedural aspects of the
specifies that a predicate P be traced. This is used when we are particularly interested program have to be studied. A simple way of improving the executional efficiency is
in the named predicate and want to avoid tracing information from other goals to find a better ordering of clauses of procedures, and of goals in the bodies of
(either above or below the level of a call of P). Several predicates can be simul procedures. Another relatively simple method is to provide guidance to the Prolog
taneously active for 'spying'. system by means of cuts.
Ideas for improving the efficiency of a program usually come from a deeper
nospy( P)
understanding of the problem. A more efficient algorithm can, in general, result
stops 'spying' P. from improvements of two kinds:
Tracing beyond a certain depth can be suppressed by special commands during • Improving search efficiency by avoiding unnecessary backtracking and stopping
execution. There may be several other debugging commands available, such as the execution of useless alternatives as soon as possible.
returning to a previous point of execution. After such a return we can, for example, • Using more suitable data structures to represent objects in the program, so that
repeat the execution at a greater detail of tracing. operations on objects can be implemented more efficiently.
'�,;_,_,�'o\4dfuiui%/4>S· ------------
A detailed study of the way Prolog tries to satisfy the colours goal reveals the 8.5.3 Improving efficiency of list concatenation by difference lists
source of inefficiency. Countries in the country/colour list are arranged in alphabet
ical order, and this has nothing to do with their geographical arrangement. The In our programs so far, the concatenation of lists has been programmed as:
order in which the countries are assigned colours corresponds to the order in the list cone([ ], L, L).
(starting at the end), which is in our case independent of the ngb relation. So the
cone( [X I LI], L2, [X I L3] )
colouring process starts at some end of the map, continues at some other end, etc.,
cone( L 1, L2, L3).
moving around more or less randomly. This may easily lead to a situation in which a
country that is to be coloured is surrounded by many other countries, already This is inefficient when the first list is long. The following example explains why:
painted with all four available colours. Then backtracking is necessary, which leads ?- cone( [a,b,c], [cl,e], L).
to inefficiency.
This produces the following sequence of goals:
It is clear, then, that the efficiency depends on the order in which the countries
are coloured. Intuition suggests a simple colouring strategy that should be better cone([a,b,c], [d,e], L)
than random: start with some country that has many neighbours, and then proceed cone([b,c], [d,e], L') where L = [a I L']
cone([cl, [d,ej, L") where L' = [b I L"]
to the neighbours, then to the neighbours of neighbours, etc. For Europe, then,
cone( [ ], [d,e], L"') where L'' = [c I L'")
Germany (having most neighbours) is a good candidate to start with. Of course, where L"' = [d,e]
true
when the template country/colour list is constructed, Germany has to be put at the
until the
end of the list and other countries have to be added at the front of the list. In this From this it is clear that the program in effect scans all of the first list,
way the colouring algorithm, which starts at the rear end, will commence with empty list is encountered.
append
Germany and proceed from there from neighbour to neighbour. But could we not simply skip the whole of the first list in a single step and
we need
Such a country/colour template dramatically improves the efficiency with respect the second list, instead of gradually working down the first list? To do this,
of lists.
to the original, alphabetical order, and possible colourings for the map of Europe to know where the end of a list is; that is, we need another representation
called difference lists. So a list is represented by a
will be now prodttced without difficulty. One solution is the data structure
We can construct a properly ordered list of countries manually, but we do not pair of lists. For example, the list
have to. The following procedure, makelist, does it. It starts the construction with [a,b,c]
some specified country (Germany in our case) and collects the countries into a list
can be represented by the two lists:
called Closed. Each country is first put into another list, called Open, before it is
transferred to Closed. Each time that a country is transferred from Open to Closed, its LI = [a,b,c,d,e]
neighbours are added to Open. 1.2 = [c!,el
s the
Such a pair of lists, which we will for brevity choose to write as Ll-L2, represent
course only works under the condition that
makelist( List) :- 'difference' between Ll and L2. This of
collect([gemiany], [ ], List). that the same list can be represent ed by several 'differenc e
L2 is a suffix of Ll. Note
pairs'. So the list [a,b,c] can be represented, for example, by
collect([ ], Closed, Closed). % No more candidates for Closed
[a,b,c] -[)
collect([X I Open], Closed, List)
or
member( X, Closed), !, % X has already been collected?
collect( Open, Closed, List). % Discard X [a,b,c,d,e] - [d,e]
or
collect([X I Open], Closed, List)
ngb(X, Ngbs), % Find X's neighbours [a,b,c,d,e IT] - [d,e IT]
cone( Ngbs, Open, Openl), % Put them to Open l
collect( Open 1, [X I Closed], List). % Collect the Rest or
[a,b,c IT] -T
The cone relation is, as usual, the list concatenation relation. where T is any list. The empty list is represented by any pair of the form L-L.
186 Programming Style and Technique Improving efficiency 187
L.,., ,. ...,H..
Al Zl A2 ZZ p( ... ) :-
LI
.s.�-•-cC••·-=t, --··
-=,.
L2
.I
... , !,
p( ... ).
% The cut ensures no backtracking
% Tail-recursive call
In the cases of such tail-recursive procedures, no information is needed upon the
LJ return from a call. Therefore such recursion can be carried out simply as iteration in
which a next cycle in the loop does not require additional memory. A Prolog system
Figure 8.1 Concatenation of lists represented by difference pairs. Ll is represented by Al-Zl, will typically notice such an opportunity of saving memory and realize tail recursion
L2 by A2-Z2, and the result L3 by Al-Z2 when Zl = AZ must be true. as iteration. This is called tail recursion optimization, or last call optimization.
When memory efficiency is critical, tail-recursive formulations of procedures
help. Often it is indeed possible to re-formulate a recursive procedure into a tail
As the second member of the pair indicates the end of the list, the end is directly recursive one. Let us consider the predicate for computing the sum of a list of
accessible. This can be used for an efficient implementation of concatenation. The numbers:
method is illustrated in Figure 8.1. The corresponding concatenation relation
translates into Prolog as the fact: sumlist( List, Sum)
This is not tail recursive. Apart from that, it is also very inefficient because of the A= f( _, ..., _, 1, _, ..., _) 'X, 60th component equal 1
goal cone( RevRest, [XJ, Reversed), which requires time proportional to the length of The point is that time needed to access the Nth component of a structure does not
RevRest. Therefore, to reverse a list of length 11, the procedure above will require time
depend on N. Another typical example statement from other programming
proportional to 112 . But, of course, a list can be reversed in linear time. Therefore, due languages is:
to its inefficiency, the procedure above is also known as 'naive reverse'. A much
more efficient version below introduces an accumulator: X := A[60]
reverse( List, Reversed) :- This translates into our simulated array in Prolog as:
reverse( List, [], Reversed).
arg( 60, A, X)
'1/rJ reverse( List, PartReversecl, Reversed):
% Reversed is obtained by adding the elements of List in reverse order This is much more efficient than having a list of 100 elements and accessing the
% to PartReversed
60th element by nested recursion down the list. However, the updating of the value
reverse( [], Reversed, Reversed). of an element in a simulated array is awkward. Once the values in an array have been
reverse( [X I Rest], PartReversed, Tota!Reversed) initialized, they can be changed, for example:
reverse( Rest, [X I PartReversecl], Tota!Reversed). % Adel X to accumulator
A[60] : = A[60]+ 1
This is efficient (time is linear in the length of list) and tail recursive.
A straightforward way to simulate such update of a single value in an array in
Prolog would be as follows: build a whole new structure with 100 components using
8.5.5 Simulating arrays with arg functor, insert the new value at the appropriate place in the structure, and fill all the
other components by the corresponding components of the previous structure. All
The list structure is the easiest representation for sets in Prolog. However, accessing this is awkward and very inefficient. One idea to improve this is to provide
an item in a list is done by scanning the list. This takes time proportional to the uninstantiated 'holes' in the components of the structure, so that future values of
length of the list. For long lists this is very inefficient. Tree structures, discussed in array elements can be accommodated in these holes. So we can, for example, store
Chapters 9 and 10, offer much more efficient access. However, often it is possible to successive update values in a list in which the rest of the list is an uninstantiated
access an element of a structure through the element's index. In such cases, array variable - a 'hole' for future values. As an example consider the following updates of
structures, provided in other programming languages, are the most effective because the value of variable X in a procedural language:
they enable direct access to a required element.
There is no array facility in Prolog, but arrays can be simulated to some extent by X := 1; X := 2; X := 3
using the built-in predicates arg and functor. Here is an example. The goal:
These updates can be simulated in Prolog with the 'holes' technique as follows:
functor( A, f, 100)
X= [ 1 I Restl] % Corresponds to X:= 1, Restl is hole for future values
makes a structure with 100 elements: Restl = [ 2 I Rest2] % Corresponds to X : = 2, Rest 2 is hole for future values
Rest2= [ 3 I Rest3] % Corresponds to X := 3
A= f( _, _, _, ...)
190 Programming Style and Technique Improving efficiency 191
At this point X =[ 1, 2, 3 I Rest3]. Obviously the whole history of the values of X is As an example consider a program to compute the Nth Fibonacci number for a
maintained, and the current value is the one just preceding the 'hole'. If there are given N. The Fibonacci sequence is:
many successive updates, the 'current' value gets nested deep, and the technique
becomes inefficient again. A further idea, to overcome this source of inefficiency, is 1, 1, 2, 3, 5, 8, 13, . . .
to throw away the previous values at the moment when a list gets too long, and start Each number in the sequence, except for the first two, is the sum of the previous two
again with a list consisting of just a head and an uninstantiated tail. numbers. We will define a predicate
ln spite of these potential complications, in many cases the simulation of arrays
i
f b( N, F)
with arg is simple and works well. One such example is our solution 3 for the eight
queens problem in Chapter 4 (Figure 4.11). This program places a next queen into to compute, for a given N, the Nth Fibonacci number, F. We count the numbers in
a currently free column (X-coordinate), row (Y-coordinate), upward diagonal the sequence starting with N = 1. The following fib program deals first with the first
(U-coordinate) and downward diagonal (V-coordinate). The sets of currently free two Fibonacci numbers as two special cases, and then specifies the general rule about
coordinates are maintained, and when a new queen is placed the corresponding the Fibonacci sequence:
occupied coordinates are deleted from these sets. The deletion of U and V fib( 1, 1). % 1st Fibonacci number
coordinates in Figure 4.11 involves scanning the corresponding lists, which is
fib( 2, 1). 'Yo 2nd Fibonacci number
inefficent. Efficiency can easily be improved by simulated arrays. So the set of all
15 upward diagonals can be represented by the following term with 15 components: fib( N, F) :- % Nth Fib. number. N > 2
N > 2,
Du= u( _, _, _ , _, _ , _, _, _ , _ , _, _, _, _, _, _) Nlis N - 1, fib( Nl. Fl),
N2 is N - 2, fib( N2, FZ),
Consider placing a queen at the square (X,YJ = (1,1). This square iies on the 8th Fis F 1 + FZ. '¼, Nth number is the sum of its two predecessors
upward diagonal. The fact that this diagonal is now occupied can be marked
This program tends to redo parts of the computation. This is easily seen if we trace
by instantiating the 8th component of Ou to l (that is, the corresponding
the execution of the following goal:
X-coordinate):
?- fib( 6, F).
arg( 8, Ou, 1)
Figure 8.2 illustrates the essence of this computational process. For example, the
Now Du becomes: third Fibonacci number, ((3), is needed in three places and the same computation is
repeated each time.
Du::::::: u( _, _, _, _, _ 1, _, _, _, _, _, _, _)
This can be easily avoided by remembering each newly computed Fibonacci
1 _, _,
If later a queen is attempted to be placed at (X,Y) = (3,3), also lying on the 8th number. The idea is to use the built-in procedure asserta and to add these
diagonal, this would require: (intermediate) results as facts to the database. These facts have to precede other
clauses about fib to prevent the use of the general rule in cases where the result is
arg( 8, Du, 3) % Here X = 3 already known. The modified procedure, fibZ, differs from fib only in this assertion:
This will fail because the 8th component of Du is already l. So the program will not fib2( 1, 1). % 1st Fibonacci number
allow another queen to be placed on the same diagonal. This implementation of the fib2( 2, 1). % 2nd Fibonacci number
sets of upward and downward diagonals leads to a considerably more efficient i
f b2( N, F) :- % Nth Fib. number, N > 2
program than the one in Figure 4.11. N > 2,
Nlis N - 1, fib2( Nl, Fl),
N2is N - 2, fib2( NZ, F2),
8.5.6 Improving efficiency by asserting derived facts Fis Fl + F2,
asserta( fib2( N, F) ). % Remember Nth number
Sometimes during computation the same goal has to be satisfied again and again. As This program will try to answer any fib2 goal by first looking at stored facts about
Prolog has no special mechanism to discover such situations whole computation this relation, and only then resort to the general rule. As a result, when a goal
sequences are repeated. fib2( N, F) is executed all Fibonacci numbers, up to the Nth number, will get
Improving efficiency 193
192 Programming Style and Technique
[{6)
I
[(6)
+
I +
/ ----
1/5�-[(.J)
l
f/4)
I
l/5!
I +
I
+ + J, retrievecl
/� /� f/21 !).JI
/ �
f/31
I
f/3)
I I
f/3)
I
[(./)
+
I +
I + + 2, retrieved
/� /� [/1) [{2)
/� IW
/�
1/2)
I I
[/31 1(2) [(3)
I
/12)
+
I I I I I +
/�I ( I)
/�
f/2) f/2) I)/)
I I I I
Figure 8.2 Computation of the 6th Fibonacci number by procedure fib. Figure 8.3 Computation of the 6th Fibonacci number by procedure fib2, which remembers
previous results. This saves some computation in comparison with fib, see
Figure 8.2.
tabulated. Figure 8.3 illustrates the computation of the 6th Fibonacci number by
fib2. A comparison with Figure 8.2 shows the saving in the computational complex
ity. For greater N, the savings would be much more substantial.
Asserting intermediate results, also callee! caching, is a stanclarcl technique for
... Q
avoiding repeated computations. It should be noted, however, that in the case of
Fibonacci numbers we can preferably avoid repeated computation by using another
0
algorithm, rather than by asserting intermediate results. This other algorithm will
lead to a program that is more difficult to unclerstancl, but more efficient to execute.
The idea this time is not to define the Nth Fibonacci number simply as the sum of its
-
two predecessors and leave the recursive calls to unfold the whole computation
'clownwarcls' to the two initial Fibonacci numbers. Instead, we can work 'upwards',
starting with the initial two numbers, and compute the numbers in the sequence one
Starting Transition from Final
by one in the forward direction. We have to stop when we have computed the Nth configuration M configuration.
configuration.
number. Most of the work in such a program is done by the procedure: here \1 = 2 toM + I here M = N
forwardfib( M, N, Fl, F2, F) is
Figure 8.4 Relations in the Fibonacci sequence. A 'configuration', depicted by a large circle,
Here, Fl and F2 are the (M - l)st and Mth Fibonacci numbers, and F is the Nth defined by three things: an index M and two consecutive Fibonacci numbers
Fibonacci number. Figure 8.4 helps to understand the forwardfib relation. According f(M - 1) and f(M).
,i£.f::5,g ••aJlt:m!'-�r�flits.tm"5Bl't�t-�'.�}&\'lt�t!:�}:::·�(t,'
to this figure, forwardfib finds a sequence of transformations to reach a final 8J Define the relation
configuration (when M = N) from a given starting configuration. When forwardfib reverse( List, ReversedList)
is invoked, all the arguments except F have to be instantiated, and M has to be less
than or equal to N. The program is: where both lists are represented by difference pairs.
fib3(N, F) :- 8.4 Rewrite the collect procedure of Section 8.5.2 using difference pair representation for
forwardfib( 2, N, 1, 1, F). % The first two Fib. numbers are 1 lists so that the concatenation can be clone more efficiently.
forwardfib( M, N, Fl, F2, F2) 8.5 The following procedure computes the maximum value in a list of numbers:
M>=N. % Nth Fibonacci number reached
max( [X], X).
forwardfib(M, N, Fl, F2, F)
M<N, % Nth number not yet reached max( [XI Rest], \fox)
NextM is M + 1, max( Rest, MaxRest),
NextF2 is Fl + F2, ( Max Rest>= X, !, Max= Max Rest
forwardfib(NextM, N, F2, NextF2, F).
Max�, X ).
Notice that forwarclfib is tail recursive, and M, Fl and F2 are accumulator arguments.
Transform this into a tail-recursive procedure. Hint: Introduce accumulator argu
ment MaxSoFar.
Exercises 8.6 Rewrite program 3 for eight queens (Figure 4.11) using simulated array with arg to
represent the sets of free diagonals, as discussed in Sectlon 8.5.S. Measure the
8.1 Procedures subl, sub2 and sub3, shown below, all implement the sublist relation. improvement in efficiency.
subl is a more procedural definition whereas sub2 and sub3 are written in a more
declarative style. Study the behaviour, with reference to efficiency, of these three 8.7 Implement the updating of the value of an element of an array simulated by functor
and arg, using 'holes' for future values along the lines discussed in Section 5.5.S.
procedures on some sample lists. Two of them behave nearly equivalently and have
similar efficiency. Which two? Why is the remaining one less efficient?
subl( List, Sublist) :
prefix( List, Sublist). Summary · ··· ·
· · · · · · · · · · ····· · · · · · · ···· · ·· · ···· ···· · · ·············· ·· ··········· · ··· · · ··· ··· · ·· · ·· · · · · ·· · · · · · · · · · · · · · · ·· · •""" " ' " " """""""" ' " " " " ' ' ' ' "
subl( [_I Tail], Sublist) • There are several criteria for evaluating programs:
subl( Tail, Sublist). % Sublist is sublist of Tail
correctness
prefix(_ , [] ).
user-friendliness
prefix( [XI Listl], [XI List2]) efficiency
prefix(Listl, List2).
readability
sub2( List, Sublist) :- modifiability
cone( Listl, List2, List), robustness
cone( List3, Sublist, Listl). documentation
sub3( List, Sublist) :- program
• The principle of stepwise refinement is a good way of organizing the
cone( Listl, List2, List), to relations, algorithm s and
cone(Sublist,_, List2). development process. Stepwise refinement applies
data structures.
8.2 Define the relation • In Prolog, the following techniques often help to find ideas for refinements:
add_at_end(List, Item, NewList) Using recursion: identify boundary and general cases of a recursive definition.
to add Item at the end of List producing NewList. Let both lists be represented by Generalization: consider a more general problem that may be easier to solve
difference pairs. than the original one.
196 Programming Style and Technique
chapter 9
Using pictures: graphical representation may help to identify important
relations.
• It is useful to conform to some stylistic conventions to reduce the danger of
programming errors, make programs easier to read, debug and modify.
• Prolog systems usually provide program debugging aids. Trace facilities are most
useful. Operations on Data Structures
• There are many ways of improving the efficiency of a program. Simple tech-
niques include: 9.1 Sorting lists 197
reordering of goals and clauses 9.2 Representing sets by binary trees 202
controlling backtracking by inserting cuts
remembering (by asserta) solutions that would otherwise be computed again 9.3 Insertion and deletion in a binary dictionary 208
• More sophisticated techniques aim at better algorithms (improving search
efficiency in particular) and better data structures. Frequently used program- 9.4 Displaying trees 213
ming techniques of this kine! are: 9.5 Graphs 215
difference lists
tail recursion, last call optimization
accumulator arguments
simulating arrays with functor and arg
9.1
A list can be sorted if there is an ordering relation between the items in the list. We
will for the purpose of this discussion assume that there is an ordering relation
gt( X, Y)
meaning that X is greater than Y, whatever 'greater than' means. If our items are
numbers then the gt relation will perhaps be defined as:
gt( X, Y) :- X > Y.
197
198 Sorting lists 199
Operations on Data Structures
If the items are atoms then the gt relation can correspond to the alphabetical order, This translates into Prolog as the following inscrtsort procedure:
for example defined by:
insertsort( [ ], [] ).
gt( X, Y) :- X @> Y.
insertsort( [XI Tail], Sorted)
Remember that this relation also orders compound terms. insertsort( Tail, SortedTail), % Sort the tail
Let insert( X, SortedTail, Sorted). % Insert X at proper place
long lists, therefore, a much better sorting algorithm is cJuicksort. This is based
• Find two adjacent elements, X and Y, in List such that gt( X, Y) and swap X following idea, which is illustrated in Figure 9.1.
and Y in List, obtaining Listl; then sort Listl.
• If there is no pair of adjacent elements, X and Y, in List such that gt( X, Y),
then List is already sorted.
L. [5,_3,7.8,1,4,7.61, <.I
The purpose of swapping two elements, X and Y, that occur out of order, is that after clelete X, X - 5
�
the swapping the new list is closer to a sorted list. After a sufficient amount of
swapping we should end up with all the elements in order. This principle of sorting
[ 3,7, 8, I, 4, 7, 6 I
is known as bubble sort. The corresponding Prolog procedure will be therefore called ]
bubblesort:
-"-"'"""�•--,,,.•,·�--··"' :::,t•
r:. : 0
swap( List, Listl), !, % A useful swap in List? .11,sA,11.,
bubblesort( Listl, Sorted).
[3.'"t, 4 1 .��"' 1
= = ===
idea:
concatenate
To sort a non-empty list, L: % quicksort( List, SortedList): sort List with the quicksort algorithm
(1) Delete some element X from L and split the rest of L into two lists, called quicksort( List, Sorted) :-
Small and Big, as follows: all elements in L that are greater than X belong to quicksort2( List, Sorted - [] ).
Big, and all others to Small. % quicksort2( List, Sortec!DiffList): sort List, result is represented as difference list
(2) Sort Small obtaining SortedSmall.
quicksort2( [],Z -Z).
(3) Sort Big obtaining SortedBig.
quicksort2( [XI Tail), Al- ZZ)
(4) The whole sorted list is the concatenation of SortedSmall and [ X I SortedBig]. split( X, Tail, Small, I\ig),
quicksort2( Small, Al- [XIAZ] ),
quicksort2( Big, AZ -ZZ).
If the list to be sorted is empty then the result of sorting is also the empty list. A
Prolog implementation of quicksort is shown in Figure 9.2. A particular detail of this
implementation is that the element, X, that is deleted from L is always simply the Figure 9.3 A rnore efficient irnplernentation of quicksort using difference-pair representation
for lists. Relation split( X, List, Small, Big) is as defined in Figure 9.2.
head of L. The splitting is programmed as a four-argument relation:
split( X, L, Small, Big) our sorting procedure, the lists in the program of Figure 9.2 can be represented by
The time complexity of this algorithm depends on how lucky we are when pairs of lists of the form A-Z as follows:
splitting the list to be sorted. If the list is split into two lists of approximately equal Sortec!Small is represented by Al - Zl
lengths then the time complexity of this sorting procedure is of the order n log n SortcdBig is represented by A2 - zz
where II is the length of the list to be sorted. If, on the contrary, splitting always
results in one list far bigger than the other, then the complexity is in the order of rz2• Then the concatenation of the lists SortedSmall and [XISortedBigJ corresponds to
Analysis would show that the average performance of quicksort is, fortunately, the concatenation of pairs:
closer to the best case than to the worst case.
Al-Zl and [XIA2] -Z2
The program in Figure 9.2 can be further improved by a better implementation of
the concatenation operation. Using the difference-pair representation of lists, The resulting concatenated list is represented by:
introduced in Chapter 8, concatenation is reduced to triviality. To use this idea in
Al- ZZ where Zl = [XIAZ]
% quicksort( List, SortedList): sort List by the quicksort algorithm The empty list is represented by any pairZ-Z. Introducing these changes systematic
ally into the program of Figure 9.2 we get a more efficient implementation of
quicksort( [], [] ).
quicksort, programmed as quicksort2 in Figure 9.3. The procedure quicksort still uses
quicksort( [X I Tail], Sorted)
the usual representation of lists, but the actual sorting is done by the more efficient
split( X , Tail, Small, Big),
quicksort( Small, SortedSmall), quicksort2, which uses the difference-pair representation. The relation between the
quicksort( Big, Sortec!Big), two procedures is:
cone( SortedSmall, [XISortedBig], Sorted).
quicksort( L, S) :
split( X, [], [], [J ). quicksort2( L, S - [) ).
split( X, [Y I Tail], [Y ISmall], Big)
gt( X, Y), !,
split( X, Tail, Small, Big).
Exercises
split( X, [Y ITail], Small, [Y I Big] )
split( X, Tail, Small, Big). 9.1 Write a procedure to merge two sorted lists producing a third list. For example:
9.2 The difference between the sorting programs of Figures 9.2 and 9.3 is in the
root
9 .3 Our quicksort program performs badly when the list to be sorted is already sorted or
almost sorted. Analyze why.
9.4 Another good idea for sorting a list that avoids the weakness of quicksort is based on
dividing the list, then sorting smaller lists, and then merging these sorted smaller
Figure 9.4 A binary tree.
lists. Accordingly, to sort a list L:
• divide L into two lists, Ll and L2, of approximately equal length;
stored as nodes of the tree. In Figure 9.4, the empty subtrees are not pictured; for
• sort L l and L2 giving S 1 and S2; example, the node b has two subtrees that are both empty.
• merge S1 and S2 giving L sorted. There are many ways to represent a binary tree by a Prolog term. One simple
possibility is to make the root of a binary tree the principal functor of the term, and
This is known as the merge-sort algorithm. Implement merge-sort and compare its
the subtrees its arguments. Accordingly, the example tree of Figure 9.4 would be
efficiency with the quicksort program. represented by:
a( b, c(d})
To find X in a list L, this procedure scans the list element by element until X is found In this representation, the example tree of Figure 9.4 is represented by the term:
or the end of the list is encountered. This is very inefficient in the case of long lists. t( t( nil, b, nil), a, t( t( nil, ct, nil), c, nil))
For representing sets, there are various tree structures that facilitate more efficient
implementation of the set membership relation. We will here consider binary trees.
t(L,X,R)
A binary tree is either empty or it consists of three things:
• a root;
• a left subtree;
• a right subtree.
The root can be anything, but the subtrees have to be binary trees again. Figure 9.4
Figure 9.5 A representation of binary trees.
shows an example. This tree represents the set /a, b, c, cl}. The elements of the set are
204 Operations on Data Structures Representing sets by binary trees 205
Let us now consider the set membership relation, here named in. A goal
in(X, T)
is true if X is a node in a tree T. The in relation can be defined by the following rules:
9
X is in a tree T if:
• the root of T is X, or 7
• X is in the left subtree of T, or
• X is in the right subtree of T. Figure 9.6 A binary dictionary. item 6 is reached by following the indicated path 5--) 8--> 6.
%, in( X, Tree): X in binary dictionary Tree list. What is the improvement? Let n be the number of items in our data set. If the set
in( X, t( _, X, _) ). is represented by a list then the expected search time will be proportional to its
in( X, t( Left, Root, Right) ) length n. On average, we have to scan the list up to something like half-way through
gt( Root, X), it. If the set is represented by a binary dictionary, the search time will be roughly
% Root greater than X
in( X, Left). proportional to the height of the tree. The height of a tree is the length of a longest
% Search left subtree
in( X, t( Left, Root, Right) ) path between the root and a leaf in the tree. The height, however, depends on the
gt( X, Root), shape of the tree.
% X greater than Root
in( X, Right). We say that a tree is (approximately) balanced if, for each node in the tree,
% Search right subtree
its two subtrees accommodate an approximately equal number of items. If a
dictionary with n nodes is nicely balanced then its height is proportional to log n.
Figure 9. 7 Finding an item X in a binary dictionary. We say that a balanced tree has logarithmic complexity. The difference between II and
log n is the improvement of a balanced dictionary over a list. This holds, unfortun
ately, only when a tree is approximately balanced. If the tree gets out of balance its
These rules are programmed as the procedure in in Figure 9.7. The relation gt( X, Y)
performance will degrade. In extreme cases of totally unbalanced trees, a tree is in
means: X is greater than Y. If the items stored in the tree are numbers then this
effect reduced to a list. In such a case the tree's height is n, and the tree's
relation is simply X > Y.
performance is equally poor as that of a list. Therefore we are always interested
In a way, the in procedure itself can be also used for constructing a binary
in balanced dictionaries. Methods of achieving this objective will be discussed in
dictionary. For example, the following sequence of goals will construct a dictionary
Chapter 10.
D that contains the elements 5, 3, 8:
?- in( 5, D), in( 3, D), in( 8, D).
D = t( t( Dl, 3, D2), 5, t( D3, 8, D4) ). Exercises
The variables Dl, DZ, D3 and D4 are four unspecified subtrees. They can be anything 9.5 Define the predicates
and D will still contain the given items 3, 5 and 8. The dictionary that is constructed
depends on the order of the goals in the question (Figure 9.8). (a) binarytree( Object)
A comment is in order here on the search efficiency in dictionaries. Generally (b) dictionary( Object)
speaking, the search for an item in a dictionary is more efficient than searching in a
to recognize whether Object is a binary tree or a binary dictionary respectively,
written in the notation of this section.
(a)
9.6 Define the procedure
height( BinaryTree, Height)
8
to compute the height of a binary tree. Assume that the height of the empty tree is 0,
and that of a one-element tree is 1.
9.7 Define the relation
01 D2 03 D4 linearize( Tree, List)
to collect all the nodes in Tree into a list.
D3 04 9.8 Define the relation
Figure 9.8 (a) Tree D that results from the sequence of goals: in( 5, D), in( 3, D), maxelement( D, Item)
in( 8, D). (b) Tree resulting from: in( 3, D), in( 5, D), in( 8, D). so that Item is the largest element stored in the binary dictionary D.
208 Operations on Data Structures Insertion and deletion in a binary dictionary 209
A
DI D2
Figure 9.10 shows a corresponding program.
Let us now consider the delete operation. It is easy to delete a leaf, but deleting an
internal node is more complicated. The deletion of a leaf can be in fact defined as
0 0
the inverse operation of inserting at the leaf level:
delleaf( 01, X, 02) :-
addleaf( 02, X, 01).
Unfortunately, if X is an internal node then this does not work because of the
problem illustrated in Figure 9.11. X has two subtrees, Left and Right. After X is
D4 ( 5
8 8 delete X
==;::,
X
6 6
7 7
Left Right Left Right
Figure 9.9 Insertion into a binary dictionary at the leaf level. The trees correspond to the
following sequence of insen:ions: add( Dl, 6, D2), add( D2, 7, D3), Figure 9.11 Deleting Xfrom a binary dictionary. The problem is how to patch up the tree
add( D3, 4, D4). after Xis removed.
210 Operations on Data Structures Insertion and deletion in a binary dictionary 211
£�
% del( Tree, X, NewTree):
rcmo,·e X tran,fer Y % deleting X from binary dictionary Tree gives NewTree
c::==◊
del( t( nil, X, Right), X, Right).
de!( t( Left, X, nil), X, Left).
Left Right I�n � Left Right! del( t( Left, X, Right), X, t( Left, Y, Rightl) )
delmin( Right, Y, Rightl).
Figure 9.12 Filling the gap after removal of X. de!( t( Left, Root, Right), X, t( Leftl, Root, Right) )
gt( Root, X),
de!( Left, X, Left!).
removed, we have a hole in the tree and Left and Right are no longer connected to del( t( Left, Root, Right), X, t( Left, Root, Rightl) )
the rest of the tree. They cannot both be directly connected to the father of X, A, gt( X, Root),
because A can accommodate only one of them. de!( Right, X, Rightl).
If one of the subtrees Left and Right is empty then the solution is simple: the non % delmin( Tree, Y, NewTree):
empty subtree is connected to A. If they are both non-empty then one idea is as % delete minimal item Yin binary dictionary Tree producing NewTree
shown in Figure 9.12. The left-most node of Right, Y, is transferred from its current delmin( t( nil, Y, Right), Y, Right).
position upwards to fill the gap after X. After this transfer, the tree remains ordered.
delmin( t( Left, Root, Right), Y, t( Leftl, Root, Right) )
Of course, the same idea works symmetrically, with the transfer of the right-most
delmin( Left, Y, Leftl).
node of Left.
According to these considerations, the operation to delete an item from the
binary dictionary is programmed in Figure 9.13. The transfer of the left-most node of Figure 9.13 Deleting from the binary dictionary.
the right subtree is accomplished by the relation
delmin( Tree, Y, Treel) I) y
where Y is the minimal (that is, the left-most) node of Tree, and TreeI is Tree with Y
deleted.
There is another elegant solution to add and delete. The add relation can be
defined non-deterministically so that a new item is inserted at any level of the tree,
J
not just at the leaf level. The rules are:
add X at root
The difficult part of this is the insertion at the root of D. Let us formulate this
operation as a relation
addroot( D, X, DI)
where X is the item to be inserted at the root of D and Dl is the resulting dictionary
with X as its root. Figure 9 .14 illustrates the relations between X, D and D 1. The
Figure 9.14 Inserting X at the root of a binary dictionary.
Displaying trees 213
212 Operations on Data Structures
The relation that imposes all these constraints is just our addroot relation . Namely, if
X were added as the root in to L, then the subtrees of the resulting tree would be just
9.4 �i�p_l ?y_i_��---t�-��-s-···········································································-···················-·········
be directly output by the built-i n
L1 and 1.2. In Prolog, L1 and l.2 must satisfy the goal: Like all data obj ects in Prolog, a binary tree, T, can
addroot( L, X, t( L 1, X, L2) ) procedure write. However, the goal
gt(Y, X),
add( L, X, Ll).
add( t( L, Y, R), X, t( L, Y, Rl) ) % Insert X into right subtree
gt( X, Y),
add( R, X, Rl). 4 6 9
% addroot( Tree, X, NewTree): inserting X as the root into Tree gives NewTree
7
addroot( nil, X, t( nil, X, nil) ). % Insert into empty tree
addroot( t( L, Y, R), X, t( Ll, X, t( L2, Y, R) ) )
gt(Y, X),
addroot( L, X. t( Ll, X, L2) ).
addroot( t( L, Y, R), X, t( t( L, Y, Rl), X, R2) )
gt( X, Y),
addroot( R, X, t( Rl, X, R2) ).
Figure 9.16 (a) A tree as normally pictured. (b) The same
tree as typed out by the procedure
Figure 9.15 Insertion into the binary dictionary at any levei of the tree. show (arcs are added for clarity).
214 Operations on Data Structures Graphs 215
to display a tree T in the form indicated in Figure 9.16. The principle is: connected( a, b).
connected( b, c).
2
Exercise
V
2
9.10 Our procedure for displaying trees shows a tree in an unusual orientation, so that d
the root is on the left and the leaves of the tree are on the right. Write a (more
difficult) procedure to display a tree in the usual orientation with the root at the top
Figure 9.18 (a) A graph. (b) A directed graph with costs attached to the arcs.
and the leaves at the bottom.
216 Operations on Data Structures Graphs 217
If each node is connected to at least one other node then we can omit the list of Path l
nodes from the representation as the set of nodes is then implicitly specified by the
list of edges. A
Yet another method is to associate with each nocle a list of nodes that are adjacent
to that node. Then a graph is a list of pairs consisting of a node plus its adjacency list. Path
Our example graphs can then, for example, be represented by:
Gl =[ a-> (b], b->[a,c,d], c-> [b,d], d->[b,c] ] Figure 9.19 The path 1 relation: Path is a path between A and Z; the last part of Path
G2 = [ s->[t/31, t->[u/5, v/1], u ->[t/2], v -> (u/2] ] overlaps with Path1.
The symbols '->' and '/' above are, of course, infix operators. As illustrated in Figure 9.19, the arguments are:
What will be the most suitable representation will depend on the application and
on operations to be performed on graphs. Two typical operations are:
• A is a node,
• G is a graph,
• find a path between two given nodes; • Pathl is a path in G,
• find a subgraph, with some specified properties, of a graph. • Path is an acyclic path in G that goes from A to the beginning of Pathl and
Finding a spanning tree of a graph is an example of the latter operation. In the continues along Pathl up to its end.
following sections we will look at some simple programs for finding a path and for
The relation between path and pathl is:
finding a spanning tree.
path( A, Z, G, Path) -
: pathl( A, [Z], G, Path).
9.5.2 Finding a path Figure 9.19 suggests a recursive definition of path 1. The boundary case arises when
the start node of Pathl (Y in Figure 9.19) coincides with the start node of Path, A. If
Let G be a graph, and A and Z two nodes in G. Let us define the relation the start nodes do not coincide then there must be a node, X, such that:
path( A, Z, G, P) (1) Y is adjacent to X, and
where Pis an acyclic path between A and Zin G. Pis represented as a list of nodes on (2) X is not in Pathl, and
the path. If G is the graph in the left-hand side of Figure 9.18 then: (3) Path must satisfy the relation pathl( A,[XI Pathl], G, Path).
path( a, d, G,[a,b,d] ) A complete program is shown in Figure 9.20. In this program, member is the list
path( a, d, G, (a,b,c,cl] ) membership relation. The relation
Since a path must not contain any cycle, a node can appear in the path at most once. adjacent( X, Y, G)
One method to find a path is:
% path( A, Z, Graph, Path): Path is an acyclic path from A to Z in Graph
To find an acyclic path, P, between A and Zin a graph, G: path( A, Z, Graph, Path) :-
If A = Z then P = [A], otherwise path I( A, [Z], Graph, Path).
find an acyclic path, Pl, from some node Y to Z, and find a path from A to Y pathl( A,[AI Pathl], _,[AI Pathl] ).
avoiding the nodes in Pl. pathl( A,[YI Path!], Graph, Path)
adjacent( X, Y, Graph),
This formulation implies another relation: find a path under the restriction of not member( X, Path 1), % No-cycle condition
path!( A, [X, YI Path!], Graph, Path).
avoiding some subset of nodes (Pl above). We will, accordingly, define another
procedure:
pathl( A, Path1, G, Path) Figure 9.20 Finding an acyclic path, Path, from A to Z in Graph.
, o:i2"!iF'i'.::;s· .•.
t'k\>'9,:.,;.•c?:.'.bi�!tlii);��•i;'�;n����i�t'4{�f..-1!���;:"'""''�;.= ,,:,.,..,,:.,- ,..,,,,
218 Operations on Data Structures Graphs 219
means that there is an arc from X to Y in graph G. The definition of this relation % path( A, Z, Graph, Path, Cost):
depends on the representation of graphs. If G is represente d as a pair of sets (nodes % Path is an acyclic path with cost Cost from A to Z in Graph
and edges),
path( A, Z, Graph, Path, Cost) :-
G = graph(Nodes, Edges) pathl( A, [Z], 0, Graph, Path, Cost).
Here, node(N, Graph) means: N is a node in Graph. 9.5.3 Finding a spanning tree of a graph
We can attach costs to paths. The cost of a path is the sum of the costs of the arcs
in the path. If there are no costs attached to the arcs then we can talk about the A graph is said to be connected if there is a path from any node to any other nod e . Let
length instead, counting 1 for each arc in the path. Our path and pathl relations can G = (V, E) be a connected graph with the set of nodes V and the set of e dges E. A
be modified to handle costs by introducing an additional argument, the cost, for spanning tree of G is a connected grap h T = (V, E') where E' is a subset of E such that:
e ach path:
(1) T is connected, and
path(A, Z, G, P, C)
( 2) there is no cycle in T.
path!(A, Pl, Cl, G, P, C)
side graph of
These two conditions guarantee that T is a tree. For the left-hand
Here, C is the cost of P and Cl is the cost of Pl. The relation adjacent now also has an hr spannin g tre , whic correspo nd to three lists of edges:
Figure 9.18, there ar e t ee es h
extra argument, the cost of an arc. Figure 9.21 shows a path-finding program that
It should be noted that this is a very inefficient way for finding minimal or maximal where T is a spanning tree of G. 'Ne will assume that G is connected. We can imagine
paths. This method unselectively investigates possible paths and is completely constructing a spanning tree algorithmically as follows: Start with the empty set of
edges and gradually add new edges from G, taking care that a cycle is never created,
unsuitable for large graphs because of its high time complexity. The path-finding
220 Operations on Data Structures Graphs 221
until no more edge can be added because it would create a cycle. The resulting set of mathematical definitions. We will assume that both graphs and trees are represented
edges defines a spanning tree. The no-cycle condition can be maintained by a simple by lists of their edges, as in the program of Figure 9.22. The definitions we need are:
rule: an edge can be added only if one of its nodes is already in the growing tree, and
(1) T is a spanning tree of G if:
the other node is not yet in the tree. A program that implements this idea is shown
• T is a subset of G, and
in Figure 9.22. The key relation in this program is:
• T is a tree, and
spread( Tree 1, Tree, G)
• T 'covers' G; that is, each node of G is also in T.
All the three arguments are sets of edges. G is a connected graph; Treel and Tree are
subsets of G such that they both represent trees. Tree is a spanning tree of G obtained (2) A set of edges T is a tree if:
by adding zero or more edges of G to Tree 1. We can say that 'Treel gets spread to Tree'. • T is connected, and
It is interesting that we can also develop a working program for constructing • T has no cycle.
a spanning tree in another, completely declaratiw way, by simply stating
Figure 9.22 Finding a spanning tree of a graph: an 'algorithmic' program. The program Figure 9.23 Finding a spanning tree of a graph: a 'declarative' program. Relations node and
assumes that the graph is connected. adjacent are as in Figure 9.22.
References 223
222 Operations on Data Structures
am for
) and Kingston (1998). The Prolog progr
Using our path program of the previous section, these definitions can be stated in Rivest (1990), Gonnet and Baeza-Yates (1991 show n to the author by M. van
ion 9.3) was first
Prolog as shown in Figure 9.23. It should be noted, however, that this program is, insertion at any level of the binary tree (Sect
in this form, of little practical interest because of its inefficiency. Emden (personal communication).
Comp uter
J.D. (1974) The Design and Analysis of
Aho, A.V., Hopcroft, J.E. and Ullman,
Exercises Algorithms. Addison-Wesley.
(1983) Data Stmctures and Algorithms. Addi
son-
Aho, A.V., Hopcroft, J.E. and Ullman, J.D.
Wesley.
9.11 Consider spanning trees of graphs that have costs attached to edges. Let the cost of a R.L. (1990) Introd uction to Algorithms (seco
nd edition
Cormen, T.H., Leiserson, C.E. and Rivest,
spanning tree be defined as the sum of the costs of all the edges in the tree. Write a
2000). MlT Press. ures in Pascal
program to find a minimum-cost spanning tree of a given graph. ) Handbook of Algorithms and Data Struct
Gonnet, G.H. and Baeza-Yates, R. (1991
and C (second edition). Addison-Wesl
ey.
9.12 Experiment with the spanning tree programs in Figures 9.22 and 9.23, and measure
) ithm s and Data Structures (second edition). Addison-
Wesley.
Kingston, J.H. (1998 Algor
their execution times. Identify the sources of inefficiency in the second program.
Summary
······••O< ••·····················-······ ····· ············ ·································· ·················· ··············· ··············
References
In this chapter we have tackled in Prolog classical topics of sorting and of maintaining data
structures for representing sets. These topics are covered in general books on algorithms and
data structures, for example, Aho, Hopcroft and Ullman{l974, 1983), Cormen, Leiserson and
The 2-3 dictionary 225
chapter 10
3
s
10.1 The 2-3 dictionary 224
10.2 AVL-tree: an approximately balanced tree 231 Figure 10.1 A totally unbalanced binary dictionary. Its performance is reduced to that of a list.
simple schemes for keeping good balance of the tree regardless of the data sequence.
Such schemes guarantee the worst case performance of in, add and delete in the order
log 11. One of them is the 2-3 tree; another scheme is the A VL-tree.
The 2-3 tree is defined as follows: it is either empty, or it consists of a single node,
or it is a tree that satisfies the following conditions:
In this chapter we look at advanced techniques for representing data sets by trees.
The key idea is to keep the tree balanced, or approximately balanced, in order to • each internal node has two or three children, and
prevent the tree from degenerating toward a list. Such tree-balancing schemes • all the leaves are at the same level.
guarantee relatively fast, logarithmic-time data-access even in the worst case. Two
such schemes are presented in this chapter: 2-3 trees and AVL-trees. (The knowledge A 2-3 dictionary is a 2-3 tree in which the data items are stored in the leaves, ordered
of this chapter is not a prerequisite to any other chapter.) from left to right. Figure 10.2 shows an example. The internal nodes contain labels
that specify the minimal elements of the subtrees as follows:
• if an internal node has two subtrees, this internal node contains the minimal
element of the second subtree;
10.1 The 2-3 dictionary
........................................ ................. ..................................... .......................................... • if an internal node has three subtrees then this node contains the minimal
elements of the second and of the third subtree.
A binary tree is said to be well balanced if both its subtrees are of approximately
equal height (or size) and they are also balanced. The height of a balanced tree is
approximately log n where 11 is the number of nodes in the tree. The time needed to
evaluate the relations in, add and delete on binary dictionaries grows proportionally
with the height of the tree. On balanced dictionaries, then, all these operations can
be done in time that is in the order of log n. The logarithmic growth of the 13
complexity of the set membership testing is a definite improvement over the list
representation of sets, where the complexity grows linearly with the size of the data
set. However, poor balance of a tree will degrade the performance of the dictionary.
In extreme cases, the binary dictionary degenerates into a list, as shown in
Figure 10.1. The form of the dictionary depends on the sequence in which the data
is inserted. In the best case we get a good balance with performance in the order
log 11, and in the worst case the performance is in the order 11. Analysis shows that on 8
average, assuming that any sequence of data is equally likely, the complexity of in,
add and delete is still in the order log 11. So the average performance is, fortunately,
Figure 10.2 A 2-3 dictionary. The indicated path corresponds to searching for the item 10.
closer to the best case than to the worst case. There are, however, several rather
224
226 Advanced Tree Representations
The 2-3 dictionary 227
h W
To search for an item, X, in a 2-3 dictionary we start at the root and move toward Tree NTa Mb i'iTh
the bottom level according to the labels in the internal nodes. Let the root contain 6
the labels Ml and M2. Then:
•
dJ b 0 6
if X < Ml then continue the search in the left subtree, otherwise
•
•
if X < M2 then continue the search in the middle subtree, otherwise
continue the search in the right subtree.
0 7
Figure 10.4 The objects in the figure satisfy the relation ins( Tree, 6, NTa, Mb, NTo).
If the root only contains one label, M, then proceed to the left subtree if X < M, and
to the right subtree otherwise. This is repeated until the leaf level is reached, and at
this point X is either successfully found or the search fails.
insertion. Therefore we have another ins relation, with five arguments, to cater for
As all the leaves are at the same level, the 2-3 tree is perfectly balanced with
this case:
respect to the heights of the subtrees. All search paths from the root to a leaf are of
the same length which is of the order log n, where 11 is the number of items stored ins( Tree, X, NTa, Mb, NTb)
in the tree.
Here, when inserting X into Tree, Tree is split into two trees: N"Ta and NTb. Both NTa
When inserting new data, the 2-3 tree can also grow in breadth, not only in
and NTb have the same height as Tree. Mb is the minimal element of tffb. Figure 10.4
depth. Each internal node that has two children can accommodate an additional
shows an example.
child, which results in the breadth-wise growth. If, on the other hand, a node with
In the program, a 2-3 tree will be represented, depending on its form, as follows:
three children accepts another child then this node is split into two nodes, each of
them taking over two of the total of four children. The so-generated new internal • nil represents the empty tree.
node gets incorporated further up in the tree. If this happens at the top level then • l(X) represents a single node tree, a leaf with item X.
the tree is forced to grow upwards. Figure 10.3 illustrates these principles.
Insertion into the 2-3 dictionary will be programmed as the relation • n2( Tl, M, TZ) represents a tree with two subtrees, Tl and T2; M is the minimal
element of T2.
add23( Tree, X, NewTree)
• n3( Tl, MZ, TZ, M3, T3) represents a tree with three subtrees, Tl, T2 and T3; M2 is
where NewTree is obtained by inserting X into Tree. The main burden of insertion the minimal element of T2, and M3 is the minimal element of T3.
will be transferred to two auxiliary relations, both called ins. The first one has three
arguments: Tl, TZ and T3 are all 2-3 trees.
The relation between add23 and ins is: if after insertion the tree does not grow
ins( Tree, X, NewTree)
upwards then simply:
where NewTree is the result of inserting X into Tree. Tree and NewTree have the same add23( Tree, X, NewTree)
height. But, of course, it is not always possible to preserve the same height after ins( Tree, X, NewTree).
If, however, the height after insertion increases, then ins determines the two
w�
subtrees, Tl and T2, which are then combined into a bigger tree:
add23( Tree, X, n2( Tl, M, TZ) )
ins( Tree, X, Tl, M, TZ).
The ins relation is more complicated because it has to deal with many cases_:,
dJ 0 (
inserting into the empty tree, a single node tree, a tree of type n2 or n3. Additional
3 7 subcases arise from insertion into the first, second or third subtree. Accordingly, ins
will be defined by a set of rules so that each clause about ins will deal with one of the
cases. Figure 10.5 illustrates some of these cases_ The cases in this figure translate
Figure 10.3 Inserting into a 2-3 dictionary. The tree first grows in breadth and then upwards.
into Prolog as follows:
228 Advanced Tree Representations The 2-3 dictionary 229
(a)
A
&&
M>X % Insertion in the 2-3 dictionary
c:::::=�>
ins(Tl,X,NTI)
add23( Tree, X, Treel) : % Add X to Tree giving Treel
ins( Tree, X, Treel). % Tree grows in breadth
add23( Tree, X, n2( Tl, M2, T2) ) % Tree grows upwards
A
ins( Tree, X, Tl, M2, T2).
&&
(b) add23( nil, X, l(X) ).
M>X ins( !(A), X, l(A), X, l(X) )
:::======>
ins(Tl,X,NTla,Mb,NT!b) gt( X, A).
ins( !(A), X, l (X), A, l(A) )
gt( A, X).
ins( n2( Tl, M, T2), X, n2( NTl, M, T2) )
A
gt(M, X),
&&
M2 ins( Tl, X, l'.'Tl).
M2>X
ins( n2( Tl, M, T2), X, n3( l'.'Tla, Mb, NT1b, M, T2) )
ins(TI, X,NT la, :I-lb, NT! bJ
�->
gt(M, X),
ins( Tl, X, NTla, Mb, NTlb).
ins( n2( Tl, M, T2), X, n2( Tl, M, NT2) )
Figure 10.5 Some cases of the ins relation. gt( X, M),
(a) ins( n2( Tl, M, T2), X, n2( NTl, M, T2) ); ins( T2, X, NT2).
(b) ins( n2( Tl, M, T2), X, n3( NTla, Mb, NTlb, :'vi, T2) ); ins( n2( Tl, M, T2), X, n3( Tl, M, :NT2a, Mb, NT2b) )
(c) ins( n3( Tl, M2, T2, M3, T3), X, n2( NTla, ;\fb, i'-.'Tlb), gt( X, M),
M2, n2( T2, M3, T3) ) ins( T2, X, NT2a, Mb, NT2b).
ins( n3( Tl, M2, T2, M3, T3), X, 113( NTl, M2, T2, M3, T3) )
gt(M2, X),
Case a ins( Tl, X, NTl).
ins( n2( Tl, M, T2), X, n2( NTl, M, T2) ) ins( n3( Tl, M2, T2, M3, T3), X, 112( NTla, Mb, NTlb), M2, 112( T2, M3, T3) )
gt(M, X), %M greater than X gt(M2, X),
ins( Tl, X, NTl). ins( Tl, X, NTla, Mb, NTlb).
ins( n3( Tl, M2, T2, M3, T3), X, n3( Tl,M2, NT2, M3, T3) )
Case b
gt( X, M2), g t(M3, X),
ins( n2( Tl, M, T2), X, n3( NTla, Mb, NTlb,M, T2) ) ins( T2, X, NT2).
gt(M, X), ins( n3( Tl, M2, T2, M3, T3), X, n2( Tl, M2, l\'T2a), Mb, n2( NT2b, M3, T3) )
ins( Tl, X, NTla, Mb, NTlb). gt( X, M2), gt(M3, X),
ins( TZ, X, NT2a, Mb, NT2b).
Case c
ins( n3( Tl, M2, T2, M3, T3), X, n3( Tl, M2, T2, M3, NT3) )
ins( n3( Tl, M2, T2, M3, T3), X, n2( NTla, Mb, NTlb), M2, n2( gt( X, M3),
T2, M3, T3) )
gt(M2, X), ins( T3, X, NT3).
ins( Tl, X, NTla, Mb, NTlb). ins( n3( Tl, MZ, T2, M3, T3), X, n2( Tl, M2, T2), M3, n2( NT3a, Mb, NT3b) )
gt( X, M3),
Figure 10.6 shows the complete program for inserting into the 2-3 dictionary. ins( T3, X, NT3a, Mb, NT3b).
Figure 10.7 shows a program for displaying 2-3 trees.
Our program occasionally does some unnecessary backtracking. If the three Figure 10.6 Inserting in the 2-3 dictionary. In this program, an attempt to insert a duplicate
argument ins fails then the five-argument ins is called, which redoes part of the item will fail.
�--c�<-:��,i;, � •- '."__ !W'M��..,,:fi!J��•-�:!l!�.i'i�t.\'tt't'.t;ii!i:l't�Y�:::-stt.t:2?,J;�,;:v---
where A is the root, Left and Right are the subtrees, and Height is the height of the NewTree so that NewTree is an AVL-dictionary? The order from left to right in
tree. The empty tree is represented by nil/0. Now let us consider the insertion of X NewTree has to be:
into a non-empty AVL-dictionary: Treel, A, Tree2, B, Tree3
Tree = t( L, A, R)/H
We have to consider three cases:
We will start our discussion by only considering the case where Xis greater than A.
(1) The middle tree, Tree2, is taller than both other trees.
Then X is to be inserted into R and we have the following relation:
( 2) Treel is at least as tall as Tree2 and Tree3.
addavl( R, X, t( Rl, B, RZ)/Hb)
(3) Tree3 is at least as tall as Tree2 and Tree 1.
Figure 10.8 illustrates the following ingredients from which NewTree is to be
constructed: Figure 10.9 shows how NewTree can be constructed in each of these cases. In case 1,
the middle tree Tree2 has to be decomposed and its parts incorporated into NewTree.
L,A, Rl, B, RZ
The three rules of Figure 10.9 are easily translated into Prolog as a relation:
What can be the heights of L, R, Rl and R2? L and R can only differ in height by 1 at combine( Tree 1, A, Tree2, B, Tree3, NewTree)
the most. Figure 10.8 shows what the heights of Rl and R2 can be. As only one item,
X, has been inserted into R, at most one of the subtrees Rl and R2 can have the
height h-:- 1.
In the case that Xis less than A then the situation is analogous with left and right Rule I: H2> HI and H2> H3
1"'"
0A® c:=;:>
subtrees interchanged. Therefore, in any case, we have to construct NewTree from
three trees (let us call them Tree 1, Tree2 and Tree3), and two single items, A and B. Let
us now consider the question: How can we combine these five ingredients to make
��:
t )
(bi (,__BJ
� &1� &Hl H2 H3
h- 1
h
ht! l<ufr 2.· 111,;: HZ and HI"': H3
h+2
, R
h-1 possibk
0 0
c:=;:>
&&&
addavli R, X. t( RI, B. R2)/Hb)
heights
h+ I ofR
HI H2 HJ
(c)
0 ®
&
Rule 3. H3;;; H2 and H3;;; H 1
h
� I
h-2
�
h-2 0 0
�M&
h-l
h
h-l
h
c:=;:>
h+I h+ l
Hl H2 HJ
Figure 10.8 The problem of AVL insertion: (a) .A.VL-tree before inserting X, X > A;
(b) AVL-tree after inserting X into R; (c) ingredients from which the new tree is to
Figure 10.9 Three combination rules for AVL-trees.
be constructed.
F::;--·
-,·:;:,·;.,;_�\e,d- t:•iq1:,r;;.:,.1s,•
. ii2-;w-�!:�;,:1.��1i.fc�����.f��::n
234 Advanced Tree Representations References 235
The last argument, NewTree, is an AVL-tree constructed from five ingredients, the A complete addavl program, which also computes the heights of the tree and the
first five arguments. Rule 1, for example, becomes: subtrees, is shown as Figure 10.10.
combine( Our program works with the heights of trees. A more economical representation
Tl/HI, A, t(TZl,B,TZZ) / HZ, C, T3/H3, % Five ingredients is, as said earlier, possible. In fact, we only need the balance, which can only be
t( t(Tl/H1,A,T21)/Ha, B, t(TZZ,C,T3/H3)/Hc)/Hb) % Their combination -1, 0 or+ 1. The disadvantage of such economization would be, however, somewhat
HZ > Hl, HZ > H3, % Middle tree is tallest more complicated combination rules.
Ha is Hl + 1, 01c, Height of left subtree
part 11
Aho, A.V., Hopcroft, J.E. and Ullman, ].D. (1974) Tlze Design and Analysis of Computer
Algorithms. Addison-Wesley.
Aho, A.V., Hopcroft, J.E. and Ullman, J.D. (1983) Data Stnictures and Algorithms. Addison
Wesley.
17 Planning 413
23 Meta-Programming 612
237
:0'.-i¼iJh'..c,,;.• » /4 m�' . ·1,:,dCU.X =iii'lw�;,,\wmi'fod:Soiii'x=..""'"'"'
chapter 11
This chapter is centred around a general scheme, called state space, for representing
problems. A state space is a graph whose nodes correspond to problem situations,
and a given problem is reduced to finding a path in this graph. We will study
examples of formulating problems using the state-space approach, and discuss
general methods for solving problems represented in this formalism. Problem
solving involves graph searching and exploring alternatives. The basic strategies
for exploring alternatives, presented in this chapter, are the depth-first search,
breadth-first search and iterative deepenmg.
239
240 Basic Problem-Solving Strategies Introductory concepts and examples 241
A c==>
A
B CD\ CD
I
A
C
-v
B C
(;\_�
�~
Figure 11.1 A blocks rearrangement problem.
We will not seriously consider putting Con the table as this clearly has no effect on
the situation.
C
� ABC
B
-®-CD C B
\I\ I
p�
As this example illustrates, we have, in such a problem, two types of concept: �
m m
Problem situations and possible moves form a directed graph, called a state space. A
state space for our example problem is shown in Figure 11.2. The nodes of the graph
w �
correspond to problem situations, and the arcs correspond to legal transitions
between states. The problem of finding a solution plan is equivalent to finding a
path between the given initial situation (the start node) and some specified final
situation, also called a goal node.
Figure 11.2 A state-space representation of the block manipulation problem. The indicated
Figure 11.3 shows another example problem: an eight puzzle and its representa
path is a solution to the problem in Figure 11.1.
tion as a path-finding problem. The puzzle consists of eight sliding tiles, numbered
by digits from 1 to 8, and arranged in a 3 by 3 array of nine cells. One of the cells is
to situations, and arcs correspond to 'legal moves', or actions, or solution steps. A
always empty, and any adjacent tile can be moved into the empty cell. We can say
particular problem is defined by:
that the empty cell is allowed to move around, swapping its place with any of the
adjacent tiles. The final situation is some special arrangement of tiles, as shown for • a state space,
example in Figure 11.3. • a start node,
It is easy to construct similar graph representations for other popular puzzles.
• a goal condition (a condition to be reached); 'goal nodes' are those nodes that
Straightforward examples are the Tower of Hanoi, or getting fox, goose and grain
satisfy this condition.
across the river. In the latter problem, the boat can only hold the farmer and one
other object, and the farmer has to protect the goose from the fox, and the grain We can attach costs to legal moves or actions. For example, costs attached to
from the goose. Many practical problems also naturally fit this paradigm. Among moving blocks in the block manipulation problem would indicate that some blocks
them is the travelling salesman problem, which is the formal model of many are harder to move than others. In the travelling salesman problem, moves
practical optimization problems. The problem is defined by a map with n cities and correspond to direct city-to-city journeys. Naturally, the costs of such moves are
road distances between the cities. The task is to find a shortest route from some the distances between the cities.
starting city, visiting all the cities and ending in the starting city. No city, with the In cases where costs are attached to moves, we are normally interested in
exception of the starting one, may appear in the tour twice. minimum cost solutions. The cost of a solution is the sum of the costs of the arcs
Let us summarize the concepts introduced by these examples. The state space of a along the solution path. Even if no costs are given we may have an optimization
given problem specifies the 'rules of the game': nodes in the state space correspond problem: we may be interested in shortest solutions.
,42 Basic Problem-Solving Strategies Introductory concepts and examples 243
lliili
I 2 3 Another question of general importance is, how to represent problem situations,
4
c:==::> 8 4 that is nodes themselves. The representation should be compact, but it should also
enable efficient execution of operations required; in particular, the evaluation of the
5 7 6 5
successor relation, and possibly the associated costs.
As an example, let us consider the block manipulation problem of Figure 11. 1.
We will consider a more general case, so that there are altogether any number of
blocks that are arranged in one or more stacks. The number of stacks will be limited
btllii
to some given maximum to make the problem more interesting. This may also be a
realistic constraint because a robot that manipulates blocks may be only given
a limited working space on the table.
A problem situation can be represented as a list of stacks. Each stack can be, in
tum, represented by a list of blocks in that stack ordered so that the top block in the
stack is the hec1d of the list. Empty stacks are represented by empty lists. The initial
situation of the problem in Figure 11.1 can be thus represented by:
[ [ c ,a,b], [], []]
A goal situation is any arrangement with the ordered stack of all the blocks. There
are three such situations:
[ [ a,b,c],[],[]]
[ [],[a,b,c], []]
I
[ [], [],[a,b,c]]
5 The successor relation can be programmed according to the following rule:
Situation 2 is a successor of Situation l if there are two stacks, Stackl and Stack2, in
Figure 11.3 An eight puzzle and a corresponding state-space representation. Situation 1, and the top block of Stackl can be moved to Stack2. As all situations are
represented as lists of stacks, this is translated into Prolog as:
s( Stacks,[Stack 1,[ToplI Stack.2]I OtherStacks]) 'X, Move Topl to Stack2
Before presenting some programs that implement classical algorithms for search de!( [ToplI Stackl], Stacks, Stacksl), % Find first stack
de!( Stack2, Stacksl , OtherStacks). % Find second stack
ing state spaces, let us first discuss how a state space can be represented in a Prolog
program. del( X, [XI L], L).
We will represent a state space by a relation de!( X, [YI L], [YI L l])
del( X, L , L l).
s( X, Y)
The goal condition for our example problem is:
which is true if there is a legal move in the state space from a node X to a node Y. We
will say that Y is a successor of X. If there are costs associated with moves then we will goal( Sih1ation) :-
member( [ a,b,c], Situation).
add a third argument, the cost of the move:
We will program search algorithms as a relation
s( X, Y , Cost)
solve( Start, Solution)
This relation can be represented in the program explicitly by a set of facts. For typical
state spaces of any significant complexity this would be, however, impractical or where Start is the start node in the state space, and Solution is a path between Start and
impossible. Therefore the successor relation, s, is usually defined implicitly by any goal node. For our block manipulation problem the corresponding call can be:
stating the rules for computing successor nodes of a given node. ?- solve([ [ c ,a,b], [],[]], Solution).
244 Basic Problem-Solving Strategies Depth-first search and iterative deepening 245
e:�.
T
As the result of a successful search, Solution is instantiated to a list of block
arrangements. This list represents a plan for transforming the initial state into a
state in which all the three blocks are in one stack arranged as [a,b,c]. _...··0
11.2 Depth-first search and iterative deepening
··········································•·······························································································
Given a state-space formulation of a problem, there are many approaches to finding
-· ·:/( ''
...
.
_..,··••: e
0
\.'.__),
�/ � 0'
6··..-\0
a solution path. Two basic search strategies are: depth-first search and breadth-first
search. In this section we will implement depth-first search and its variation called
iterative deepening. ® '
We will start the development of this algorithm and its variations with a simple Figure 11.4 A simple state space a is the start node, f and j are goal nodes The order in
idea: which the depth-first strategy visits the nodes in this state space is: a, b, d, h, e,
i, j. The solution found is: [a,b,e,j]. On backtracking, the other solution is
To find a solution path, Sol, from a given node, N, to some goal node:
discovered: [a,c,tl
This program is in fact an implementation of the depth-first strategy. It is called Representing the board position as a list of Y-coordinates of the queens, this can be
'depth-first' because of the order in which the alternatives in the state space are programmed as:
explored. Whenever the depth-first algorithm is given a choice of continuing the s( Queens, (Queen I Queens])
search from several nodes it always decides to choose a deepest one. A 'deepest' node member( Queen, [l,2,3,4,5,6,7,8] ), % Place Queen into any row
is one that is farthest from the start node. Figure 11.4 illustrates the order in which noattack( Queen, Queens).
the nodes are visited. This order corresponds to the Prolog trace when answering the goal([_,_,_,_,_,_,_,_]). % Position with 8 queens
question:
The noattack relation requires that Queen does not attack any of the Queens; it can be
?- solve( a, Sol).
easily programmed as in Chapter 4. The question
The depth-first search is most amenable to the recursive style of programming in ?- solve([], Solution).
Prolog. The reason for this is that Prolog itself, when executing goals, explores
alternatives in the depth-first fashion. will produce a list of board positions with increasing number of queens. The list will
The depth-first search is simple and easy to program, and may work well in encl with a safe configuration of eight queens. It will also fine! alternative solutions
certain cases. The eight queens programs of Chapter 4 were, in fact, examples of through backtracking.
246 Basic Problem-Solving Strategies Depth-first search and iterative deepening 247
. . - ·· 0
_ . . ·· _
_ ........ A
.•· .d
,/ t)
r�
Solution ',
goal node
fl!•:• ... e
Figure 11.5 Starting at a, the depth-first search ends in cycling between d and ha, b, d, h, d,
Figure 11.6 Relation depthfirst( Path, Node, Solution).
space, the depth-first algorithm may miss a goal node, proceeding along an infinite
h, d. branch of the graph. The program may then indefinitely explore this infinite part of
the space never getting closer to a goal. The eight queens state space, as defined in
The depth-first search often works well, as in this example. but there are many this section, may seem to be susceptible to this kind of trap. However, this space is,
ways in which our simple solve procedure can run into trouble. Whether this will incidentally, finite, because by the limited choice of Y-coordinates eight queens at
actually happen or not depends on the state space. To embarrass our solve procedure most can be placed safely.
with the problem of Figure 11.4, a slight modification of this problem is sufficient: To avoid aimless infinite (non-cyclic) branches, we can add another refinement
add an arc from h to cl, thus creating a cycle (Figure 11.5). The search would in this to the basic depth-first search procedure: limiting the depth of search. We then have
case proceed as follows: start at a and descend to h following the left-most branch the following arguments for the depth-first search procedure:
of the graph. At this point, in contrast with Figure 11.4, h has a successor, d. depthfirst2( Node, Solution, Maxdepth)
Therefore the execution will not backtrack from h, but proceed to d instead. Then the
successor of d, /z, will be found, etc., resulting in cycling between d and h. The search is not allowed to go in depth beyond Maxdepth. This constraint can be
An obvious improvement of our depth-first program is to add a cycle-detection programmed by decreasing the depth limit at each recursive call, and not allowing
mechanism. Accordingly, any node that is already in the path from the start node to this limit to become negative. The resulting program is shown in Figure 11.8.
the current node should not be considered again. We can formulate this as a relation:
depthfirst( Path, Node, Solution) 0AJ solve( Node, Solution):
% Solution is an acyclic path (in reverse order) between Node and a goal
As illustrated in Figure 11.6, Node is the state from which a path to a goal state is to
be found; Path is a path (a list of nodes) between the start node and Node; Solution is solve( Node, Solution) :-
depthfirst( ( ], Node, Solution).
Path extended via Node to a goal node.
For the sake of ease of programming, paths will be in our program represented by % depthfirst( Path, Node, Solution):
lists in the inverse order. The argument Path can be used for two purposes: % extending the path [Node I Path] to a goal gives Solution
depthfirst( Path, Node, (Node I Path))
(1) to prevent the algorithm from considering those successors of Node that have goal( Node).
already been encountered (cycle detection);
depthfirst( Path, Node, Sol)
(2) to construct a solution path Solution. s( Node, Noclel),
not member( Node 1, Path), % Prevent a cycle
A corresponding depth-first search program is shown in Figure 11. 7. depthfirst( [Node I Path], Node 1, Sol).
With the cycle-detection mechanism, our depth-first procedure will find solution
paths in state spaces such as that in Figure 11.5. T!J.ere are, however, state spaces in
which this program will still easily get lost. Many state spaces are infinite. In such a Figure 11.7 A depth-first search program that avoids cycling.
248 Basic Problem-Solving Strateg·1es Depth-first search and iterative deepening 249
Last= cl
% depthftrst2( Node, Solutlon, Maxdepth): Path= [d,b,a];
% Solution is a path, not longer than :.laxdepth, from Node to a goal
depthfirst2( Node, [Node], _ ) :-
goal( Node). The path procedure generates, for the given initial node, all possible acyclic pa ths of
increasing length. This is exactly what we need in the iterative deepening approach:
depthfirst2( Node, [Node I Soll, Maxclepth)
Maxclepth > 0, generate paths of increasing length until a path is generated that ends with a goal
s( Node, Nodel), node. This immediately gives a depth-first iterative deepening search program:
Maxl is Maxclepth - 1, clepth_first_iterative_cleepening( Node, Solution)
clepthfirst2( Nodel, Sol, Maxl). path( Node, GoalNocle, Solution),
goal( GoalNocle).
Figure 11.8 A depth-limited, depth-first search program. This proced ure is in fact quite useful in practice, as long as the combinatorial
complexity of the p roblem does not require the use of problem-specific heuristics.
The proced ure is simple and, even if it does not do anything very clever, it does not
A difficulty with the depth-limited program in Figure 11.8 is that we have to guess
waste much time or spa ce. In comparison with some other search st rategies, such as
a suitable limit in advance. If we set the limit too low - that is, less than any solutic,n
bread th first, the m ain advantage of iterative deepening is that it requires relatively
path - then the search will fail. If we set the limit too high, the search will become
little memory space. At any point of execution, the sp ace requirements are basically
too complex. To circumvent this difficulty, we can execute the depth-limited search
reduced to one path between the start node of the search and the current node. Paths
iteratively, varying the depth limit: start with a very low depth limit and gradually
are generated, checked and forgotten, which is in contrast to some other search
increase the limit until a solution is found. This technique is called iterative
procedures (like breadth-first search) that, d uring search, keep many candid ate paths
deepening. It can be implemented by modifying the p rogram of Figure 11.8 in the
at the sa me time. A disadv antage of iterative deepening is the consequence of its
following w ay. The depthfirst2 procedure can be called from another procedure
main strength: on each iteration, when the depth Limit is increased, the paths
which would, on each recursive call, increase the limit by 1.
previously computed have to be recomputed and extended to the new limit. In
There is, however, a more elegant implementation b ased on a procedure
typical search problems, howeve r, this recomputation does not critically affect the
path( Noclel, Node2, Path) overall compL1tation time. Typically, most computation is done at the deepest level
of search; therefore, repeated computation at upper levels adds relatively little to the
where Path is an acyclic path, in reverse order, between nodes Nodel and Node2 in
total time.
the state space. Let the path be represented as a list of nodes in the inverse ord er.
Then path can be written as:
path( Node, Node, [Node]). % Single node pattl Exercises
path( FirstNocle, LastNode, [LastNocle I Path] ) 11.1 Write a depth-first search procedure (with cycle detection)
path( FirstNocle, OneButLast, Path), % Path up to one-but-last node
s( OneButLast, LastNode), % Last step dcpthfirstl( CandidatePath, Solution)
not member( LastNode, Path). % No cycle
to find a solution path Solution as an extension of CancliclatePath. Let both paths be
Let us find some pa ths starting with node a in the state space of Figure 11.4: represented as lists of nodes in the inverse order, so that the goal nod e is the head of
Solution.
?- path( a, Last, Path).
Last= a 11.2 Write a depth-first procedure that combines both the cycle-detection and the depth
Path= (a]; limiting mechanisms of the procedures in Figures 11. 7 and 11.8.
Last= b 11.3 The procedure depth_first_iterative_deepening/2 in this section may get into an
Path= (b,a]; indefinite loop if there is no solution path in the state space. It keeps searching for
Last= c longer solution paths even when it is obvious that there do not exist any longer
Path= (c,a]; paths than those already searched. The same problem may occur when the user
250 Basic Problem-Solving Strategies Breadth-first search 251
T
requests alternative solutions after all the solutions have already been found. Write
an iterative deepening search program that will look for paths of length i+ 1 only if
there was at least one path found of length i. .. ·•
a
11.4 Experiment with the depth-first programs of this section in the blocks world
planning problem of Figure 11.1.
11.5 Write a procedure
d a
0
show( Situation)
to display a problem state, Situation, in the blocks world. Let Situation be a list of
h
stacks, and a stack in turn a list of blocks. The goal
show( l [a), [e,d), [c,b) ])
Figure 11.9 A sirnp!e state space: a is the start node, f and j are goal nodes. The order in
should display the corresponding situation; for example, as: which the breadth-first strategy visits the nodes in this state space is: a, b, c, d,
e, f The shorter solution [a,c,f] is found before the ionger one (a,b,e,j].
e C
a d b An outline for breadth-first search is:
11.3 Breadth-first search • if the head of the first path is a goal node then this path is a solution of the
problem, otherwise
In contrast to the depth-first search strategy, the breadth-first search strategy • remove the first path from the candidate set and generate the set of all
chooses to first visit those nodes that are closest to the start node. This results in a possible one-step extensions of this path, add this set of extensions at the end
search process that tends to develop more into breadth than into depth, as of the candidate set, and execute breadth-first search on this updated set.
illustrated by Figure 11.9.
The breadth-first search is not so easy to program as the depth-ftrst search. The For our example problem of Figure 11.9, this process develops as follows:
reason for this difficulty is that we have to maintain a set of alternative candidate
nodes, not just one as in depth-first search. This set of candidates is the whole (1) Start with the initial candidate set:
growing bottom edge of the search tree. However, even this set of nodes is not [ [al J
sufficient if we also want to extract a solution path from the search process. Generate extensions of [a]:
(2)
Therefore, instead of maintaining a set of candidate nodes, we maintain a set of
candidate paths. Then, [ [b,a], [c,a] ]
breadthfirst( Paths, Solution) (Note that all paths are represented in the inverse order.)
(3) Remove the first candidate path, [b,a], from the set and generate extensions of
is true if some path from a candidate set Paths can be extended to a goal node.
this path:
Solution is such an extended path.
We will use the following representation for the set of candidate paths. The set ( [d,b,a], [e,b,a] J
will be represented as a list of paths, and each path will be a list of nodes in the Add the list of extensions to the end of the candidate set:
inverse order; that is, the head will be the most recently generated node, and the last
element of the list will be the start node of the search. The search is initiated with a [ [c,a], [d,b,a), [e,b,a) ]
single element candidate set: (4) Remove [c,a) and add its extensions to the end of the candidate set, producing:
[ [StartNode ) J [ [d,b,a], [e,b,a], [f,c,a], [g,c,a] ]
252 B2sic Problem-Solving Strategies Breadth-first search 253
In further steps, [d,b,a] and [e,b,a] are extended and the modified candidate set
% solve( Start, Solution):
becomes: % Solution is a path (in reverse orller) from Start to a goal
[ (f,c,a], [g,c,a],[h,d,b,a], [i,e,b,a],[j,e,b,a] ] solve( Start, Solution) :-
l\'ow the search process encounters [f.c,a], which contains a goal node, f. Therefore breadthfirst([ (Start] I Z] - Z, Solution).
this path is returned as a solution. breadthfirst([[Node I Path] I_] - _,[Node I Path] )
.-\ program that carries out this process is shown in Figure 11.10. In this program goal( Nocle).
all one-step extensions are generated by L1sing the built-in procedure bagof. A test to breaclthfirst( [Path I Paths] - Z, Solution)
prevent the generation of cyclic paths is also made. Note that in the case that no extend( Path, NewPaths),
e:,,:tension is possible, bagof fails and therefore an alternative call to breaclthfirst is cone( NewPaths, Zl, Z), ;(, Add NewPaths at end
0
pro\•ided. member and cone are the list membership and list concatenation relations Paths \= = Zl, 1i, Set of candidates not empty
0
introducing this representation into the program of Figure 11.10, it can be Exercises
systematically transformed into the program shown in Figure 11.11. This trans
formation is left as an exercise for the reader.
11.6 Let the state space be a tree with uniform branching b, and let the solution length
be d. For the special case b = 2 and d = 3, how many nodes are generated in the
':-i, sols-e( Start. Solution): worst case by breadth-first search and by iterative deepening (counting regenerated
'){1 Solution is a path (in reverse order) from Start to a goal nodes as well)? Denote by N(b, d) the number of nodes generated by iterative
solve( Start, Solution) :- deepening in the general case. Find a recursive formula giving N(b, d) in terms of
breadthfirst( [[Start] ], Solution). N(b.d- 1).
''-u breadthflrst( [ Path 1, Path2, ... ], Solution): 11.7 Rewrite the breadth-first program of Figure 11.10 using the difference-pair repres
% Solution is an extension to a goal of one of paths
entation for the list of candidate paths, and show that the result can be the program
breadthfirst([ [Node I Path] I _ ],[Node I Path]) in Figure 11.11. In Figure 11.11, what is the purpose of the goal:
goal( Node).
Paths\== Zl
breadthfirst([Path I Paths], Solution)
extend( Path, NewPaths), Test what happens if this goal is omitted; use the state space of Figure 11.9. The
cone( Paths, NewPaths, Pathsl),
difference should only show when trying to find more solutions when there are
breadthfirst( Paths 1, Solution).
none left.
extend([Node I Path], :--iewPaths) :
bagof( [NewNode, Node I Path], 11.8 How can the search programs of this section be used for searching from a starting set
( s( Node, NewNode), not member( NewNode,[Node I Path] ) ), of nodes instead of a single start node?
NewPaths),
!. 11.9 How can the search programs of this chapter be used to search in the backward
extend( Path,(] ). % bagof failed: Node has no successor direction; that is, starting from a goal node and progressing toward the start node
(or a start node in the case of multiple start nodes)? Hint: redefine the s relation. In
what situations would the backward search be advantageous over the forward
Figure 11.10 An implementation of breadth-first search.
search?
254 Basic Problern-Soiving Strategies Analysis of basic search techniques 255
11.10 Sometimes it is beneficial to search bidirectionally; that is, to work from both ends, equal_length( [ J, [ )).
the start and the goal. The search ends when both ends come together. Define the eqm,l_length( [X 1 I Ll], [X2 I L2])
search space (relation s) and the goal relation for a given graph so that our search equal_length( Ll, L2).
procedures would, in effect, perform bidirectional search.
11.13 Experiment with various search techniques in the blocks world planning problem.
11.11 Three search procedures findl, find2 and find3 defined below use different search 11.14 The breadth-first programs of this chapter only check for repeated nodes that appear
strategies. Identify these strategies. in the same candidate path. In graphs, a node can be reached by different paths. This
findl( Node, [Node]) :- is not detected by our programs, which, therefore, duplicate the search below such
goal( Node). nodes. 1v!odify the programs of Figures 1 l.10 and 11.11 to prevent this unnecessary
work.
findl( Node, [Node I Path))
s( Node, Nodel),
finc!l( Node1, Path).
find2( Node, Path) : 11.4 Analysis of basic search techniques
· · · · · · · · · · · · · · ····· · • · · • · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·
cone( Path,_,_), % Usual conc/3 for list concatenation
findl( Node, Path). W e will now analyze and compare the basic search techniques. First w e will consider
find3( Node, Path) : their application to searclling graphs, then comment on the optimality of solutions
goal( Goal), they produce. Finally we will analyze their time and space complexity.
find3( Node, [Goal), Path). Examples so far might have made the wrong impression that our search programs
find3( Node, [Node I Path], [Node I Path)). only work for state spaces that are trees and not general graphs. However, when a
graph is searched it, in effect, unfolds into a tree so that some paths are possibly
find3( Node, [Node2 I Path2], Path) :-
copied in other parts of the tree. Figure 11.12 illustrates this. So our programs work
s( Nodel, Node2),
find3( Node, [Node 1, Nocle2 I Path2], Path). on graphs as well, although they may unnecessarily duplicate some work in cases
where a node is reached by various paths. This can be prevented by checking for
11.12 Study the following search program and describe its search strategy: repetition of a node in all the candidate paths, and not only in the path in which the
node was generated. Of course, such checking is only possible in our breadth-first
% search( Start, Path 1 - Path2): Find path from start node S to a goal node
% Solution path is represented by two lists Path 1 and Path2 programs where alternative paths are available for checking.
Our breadth-first search programs generate solL1tion paths, one after another,
search(S, Pl - P2) : ordered according to their lengths: shortest solutions come first. This is important if
similar_length( Pl, P2), % Lists of approximately equal lengths
goal( G),
(l
path2( G, P2, N), (a) (b)
pathl(S, Pl, N).
pathl( N, [NJ, N).
pathl( First, [First I Rest], Last)
s( First, Second),
pathl( Second, Rest, Last). ci
path2( N, [NJ, N).
path2( First, [First I Rest], Last)
s( Second, First),
path2( Second, Rest, Last).
similar_length( Listl, List2) : % Lists of similar length Figure 11.12 (a) A state space: a is the start node. (b) The tree of all possible non-cyclic paths
equal_length( List2, List), % Lists of"equal length from a as effectively developed by the breadth-first search program of
( Listl = List; Listl = [_ I List]). Figure 11.10.
256 Basic Problem-Solving Strategies Analysis of basic search techniques 257
optimality (with respect to length) is a concern. The breadth-first strategy is children of the start node d times, etc. In the worst case the number of nodes
guaranteed to produce a shortest solution first. This is, of course, not true for the generated is:
depth-first strategy. However, depth-first iterative deepening performs depth-first
search to increasing depth limits and is thus bound to find the shortest solutions (d-H)*1 + d*b + (d-l)*b2 + ... + hd b
first. So iterative deepening in a way simulates breadth-first search. This is also O(b")- In fact, the overhead, in comparison with breadth-first search, of
Our programs do not, ho,vever, take into account any costs associated with the regenerating shallow nodes, is surprisingly small. It can be shown that the ratio
arcs in the state space. If the minimal cost of the solution path is the optimization between the number of nodes generated by iterative deepening and those generated
criterion (and not its length) then the breadth-first search is not sufficient. The best by breadth-first search is approximately b/(b - 1). For b ;, 2 this overhead of
first search of Chapter 12 will aspire to optimize the cost. iterative deepening is relatively small in view of the enormous space advantage over
The typical problem associated with search is the co111bi11ntorial complexity. For breadth-first search. In this sense iterative deepening combines the best properties
non-trivial problem domains the number of alternatives to be explored is so high of breadth-first search (optimality guarantee) and depth-first search (space
that the complexity becomes most critical. It is easy to see how this happens. To economy), and it is therefore in practice often the best choice among the basic
simplify the analysis let us assume the state space is a tree 1vith uniform branching b. search methods.
That is, each node in the tree, except the leaves, has exactly b successors. Assume a Let us also consider bidirectional search (Exercises 11.10-12). In cases when it is
shortest solution path has lengthd, and there are no leaves in the tree at depth d or applicable (goal node known) it may result in considerable savings. Assume a search
less. The number of alternattve paths of length d from the start node is b<l _ Breadth graph with uniform branching bin both directions, and the bidirectional search is
first search will explore the number of paths of order Ii , O(ll). The number of realized as breadth-first search in both directions. Let a shortest solution path have
candidate paths grows very fast with their length, which leads to what is callee! lengthd, so the bidirectional search will stop when both breadth-first searches meet,
combinatorial explosion. that is when they both get to half-way between the start and the goal node. That is
Let us now compare the complexity of the basic search algorithms. The time when each of them has progressed to depth about d/2 from their corresponding
complexity is usually measured as the number of nodes generated by a search ends. The complexity of each of them is thus roughly b"i 2. Under these favourable
algorithm. The space complexity is usually measured as the maximum number of circumstances bidirectional search succeeds to find a solution of length cl needing
nodes that have to be stored in memory during search. approximately equal resources as breadth-first search would need to solve a simpler
Consider breadth-first search in a tree with branching factor b and a shortest problem of length d/2. Table 11.1 summarizes the comparison of the basic search
solution path of length d. The number of nodes at consecutive levels in the tree techniques.
grows exponentially with depth, so the number of nodes generated by breadth-first The basic search techniques do not do anything clever about the combinatorial
search is: explosion. They treat all the candidates as equally promising, and do not use any
1 + b + b2 + b3 + _ problem-specific information to guide the search in a more promising direction.
They are unin(ormed in this sense. Therefore, basic search techniques are not
The total number of nodes up to the solution depth cl is O(bd )_ So the time sufficient for solving large-scale problems. For such problems, problem-specific
complexity of breadth-first search is O(b1' ). Breadth-first search maintains all the
candidate paths in memory, so its space complexity is also O(b").
Analysis of unlimited depth-first search is less clear because it may completely
Table 11.1 Approximate complexities of the basic search techniques.bis the branching
miss the solution path of lengthd and get lost in an infinite subtree. To facilitate the factor, d is the shortest solution length, dmax is the depth limit for depth-first search,
analysis let us consider depth-first search limited to a maximum depth dmax so that
d :S:: dmax·
d :S:: dmax- Time complexity of this is O(b"m,,). Space complexity is however only
Od Shortest solution
( maxl- Depth-first search essentially only maintains the currently explored path
Time Space guaranteed
between the start node and the current node of the search. Compared to breadth
first search, depth-first search has the advantage of much lower space complexity, Breadth-first bd b" yes
and the disadvantage of no guarantee regarding the optimality. Depth-first bdma:< dma x no
Iterative deepening b" d yes
Iterative deepening performs (d + 1) depth-first searches to increasing depths: 0, bd/2 bd/2
Bidirectional, if applicable yes
1, . .. , d. So its space complexity is O(d). It visits the start node (d + 1) times, the
258 Basic Problem-Solving Strategies References 259
information has to be used to guide the search. Such guiding information is called References
heuristic. Algorithms that use heuristics perform heuristic search. The next chapter
presents such a search method. The basic search strategies are described in any general text on artificial intelligence, for
example Russell and Norvig (1995) and Winston (1992). Kowalski (1980) showed how logic can
be used for implementing these principles. Korf (1985) analyzed the comparative advantages of
Summary iterative deepening.
Korf, R.E. (1985) Depth-first iterative deepening: an optimal admissible tree search. Artificial
• State space is a formalism for representing problems. [ntelligence, 27: 97-109.
• State space is a directed graph whose nodes correspond to problem situations Kowalski, R. ( 1980) Logic for Problem Solving. North-Holland.
and arcs to possible moves. A particular problem is defmed by a start node and a Russell, S. and Norvig, P. (1995) Artificial Intelligence: A iv/odem Approach. Prentice Hall.
goal condition.!\ solution of the problem then corresponds to a path in the graph. Winston, P.H. (1992) Artificial !ntel/igence, third edition. Addison Wesley.
Thus problem solving is reduced to searching for a path in a graph.
• Optimization problems can be modelled by attaching costs to the arcs of a state
space.
• Three basic search strategies that systematically explore a state space are depth
first, breadth first, and iterative deepening.
• The depth-first search is easiest to program, but is susceptible to cycling. Two
simple methods to prevent cycling are: limit the depth of search; test for
repeated nodes.
• Implementation of the breadth-first strategy is more complicated as it requires
maintaining the set of candidates. This can be most easily represented as a list of
lists.
• The breadth-first search always finds a shortest solution path first, but this is not
the case with the depth-first strategy.
• Breadth-first search requires more space than depth-first search. In practice,
space is often the critical limitation.
• Depth-first iterative deepening combines the desirable properties of depth-first
and breadth-first search.
• In the case of large state spaces there is the danger of combinatorial explosion. The
basic search strategies are poor tools for combating this difficulty. Heuristic
guidance is required in such cases.
• Concepts introduced in this chapter are:
state space
start node, goal condition, solution path
search strategy
depth-first search
breadth-first search
iterative deepening search
bidirectional search
heuristic search
Best-first search 261
chapter 12
candidate node is the one that minimizes ( We will use here a specially constructed
function {which leads to the well-known A* algorithm. f(n) will be constructed so
as to estimate the cost of a best solution path from the start node, s, to a goal node,
under the constraint that this path goes throughn. Let us suppose that there is such
Graph searching in problem solving typically leads to the problem of combinatorial a path and that a goal node that minimizes its cost is t. Then the estimate f(n) can be
complexity due to the proliferation of alternatives. Heuristic search aspires to fight constructed as the sum of two terms, as illustrated in Figure 12.1:
this problem efficiently. One way of using heuristic information about a problem is f(n) = g(n) + h(n)
to compute numerical heuristic estimates for the nodes in the state space. Such an
estimate of a node indicates how promising a node is ,�ith respect to reaching a goal g(n) is an estimate of the cost of an optimal path from s to 11; lz(n) is an estimate of
node. The idea is to continue the search always from the most promising node in the the cost of an optimal path from 11 to t.
candidate set. The best-first search programs of this chapter are based on this When a node n is encountered by the search process we have the following
principle. situation: a path from s to n must have already been found, and its cost can be
computed as the sum of the arc costs on the path. This path is not necessarily an
optimal path from s to n (there may be a better path from s to n, not yet found by
the search), but its cost can serve as an estimate g(rz) of the minimal cost from s ton.
12.1 Best-first search The other term, h(n), is more problematic because the 'world' between n and t has
not been explored by the search until this point. Therefore, h(n) is typically a real
A best-first search program can be derived as a refinement of a breadth-first search heuristic guess, based on the algorithm's general knowledge about the particular
program. The best-first search also starts at the start node and maintains the set of problem. As /z depends on the problem domain there is no universal method for
candidate paths. The breadth-first search always chooses for expansion a shortest constructing /z. Concrete examples of how such a heuristic guess can be made will be
candidate path ( that is, shallowest tip nodes of the search). The best-first search shown later. But let us assume for now that a function his given, and concentrate on
refines this principle by computing a heuristic estimate for each candidate and details of our best-first program.
chooses for expansion the best candidate according to this estimate. We can imagine the best-first search to work as follows. The search process
We will from now on assume that a cost function is defined for the arcs of the consists of a number of competing subprocesses, each of them exploring its own
state space. So c(n,n') is the cost of moving from a node n to its successor n' in alternative; that is, exploring its own subtree. Subtrees have subtrees: these are
the state space. explored by subprocesses of subprocesses, etc. Among all these competing processes,
Let the heuristic estimator be a function f, such that for each noden of the space, only one is active at each time: the one that deals with the currently most promising
f(n) estimates the 'difficulty' of n. Accordingly, the most promising current alternative; that is, the alternative with the lowest {-value. The remaining processes
260
262 Best-First Heuristic Search Best-first search 263
have to wait quietly until the current (-estimates change so that some other
alternative becomes more promising. Then the activity is switched to this alternat (a)
ive. We can imagine this activate-deactivate mechanism as functioning as follows:
the process working on the currently top-priority alternative is given some budget
and the process is active until this budget is exhausted. During this activity, the
process keeps expanding its subtree and reports a solution if a goal node was
encountered. The budget for this run is defined by the heuristic estimate of the
closest competing alternative.
Figure 12.2 shows an example of such behaviour. Given a map, the task is to find
0
the shortest route between the start city sand the goal city t. In estimating the cost
of the remaining route distance from a city X to the goal we simply use the straight
g )IT]
line distance denoted by dist(X. t). So:
f(X) = g(X) + /J(X) = g(X) + dist(X, t)
In this example, we can imagine the best -first search as consisting of two processes,
goal
each of them exploring one of the two alternative paths: Process 1 the path via a,
Process 2 the path via e. In initial stages, Process 1 is more active because (-values
along its path are lower than along the other path. At the moment that Process 1 is
at c and Process 2 still ate, the situation changes: (b)
T
f(c) = g(c) + h(c) = 6 + 4 = 10
f(e) = g(e) + h(e) = 2 + 7 = 9
..
So {(e) < {(c), and now Process 2 proceeds to (and Process 1 waits. Here, however, f(a) = 2 + 5 =7 .· .· ··. e 1/ieJ=2+7=9
...··....
•c
f (() = 7 + 4 = 11 •"'
•
f(c) = 10
f(c) < f(()
4+4=8 .. . . .. 7 +4 = 11
Therefore Process 2 is stopped and Process 1 is allowed to proceed, but only to cl
••
.
when f(d) = 12 > 11. Process 2, invoked at this point, now runs smoothly up to the
goal t.
The search thus outlined, starting with the start node, keeps generating new 6+4=10( C 9 + 2 = ll
successor nodes, always expanding in the most promising direction according to the
(-values. During this process, a search tree is generated whose root is the start node
of the search. Our best-first search program will thus keep expanding this search tree
11 +O= ll
until a solution is found. This tree will be represented in the program by terms of
two forms:
Figure 12.2 Finding the shortest route from s to tin a map. (a) The map with links labelled by
(1) I( N, F/G) represents a single node tree (a leaf); N is a node in the state space, G is their lengths; the numbers in the boxes are straight-line distances to t. (b) The
g(N) (cost of the path found from the start node to N); Fis f(N) = G + h(N). order in which the map is explored by a best-first search. Heuristic estimates are
(2) t( N, F/G, Subs) represents a tree with non-empty subtrees; N is the root of the based on straight-line distances. The dotted line indicates the switching of activity
tree, Subs is a list of its subtrees; G is g(N); Fis the updated (-value of N - that is, between alternate paths. The line shows the ordering of nodes according to their
the (-value of the most promising successor of N; the list Subs is ordered f-values; that is, the order in which the nodes are expanded (not the order in
according to increasing (-values of the subtrees. which they are generated).
264 Best-First Heuristic Search Best-first search 265
For example, consider the search in Figure 12.2 again. At the moment that the node % bestfirst( Start, Solution): Solution is a path from Start to a goal
s has been expanded, the search tree consists of three nodes: the root s and its
children, a and e. In our program this search tree will be represented by the term: bestfirst( Start, Solution) :-
expand( [ ], 1( Start, 010), 9999,_ , yes, Solution). % Assume 9999 is> any f-value
t( s, 710, [l(a,712), l(e,912)])
01c, expand( Path, Tree, Bound, Tree 1, Solved, Solution):
The (-value of the roots is equal to 7, that is the {-value of the root's most promising
% Path is path between start node of search a;-id subtree Tree,
successor a. The search tree is now expanded by expanding the most promising 0;,, Treel is Tree expanded within Bound,
subtree a. The closest competitor to a is e whose f-value is 9. Therefore, a is allowed % if goal found then Solution is solution path and Solved= yes
to expand as long as the (value of a does not exceed 9. Thus, the nodes band care
% Case 1: goal leaf-node, construct a solution path
generated. f(c) = 10, so the bound for expansion has been exceeded and alternative
a is no longer allowed to grow. At this moment, the search tree is: expand( P, I( N,_),_,_, yes, [NI I'] )
t( s_. 910, [l(e,912), t(a, 1012, [t(b, 1014, [l(c, 1016)] )]) l) goal(N).
Notice that now the (-value of node a is 10 while that of nodes is 9. They have been % Case 2: leaf-node, f-value less than Bound
updated because new nodes, band c, have been generated. �ow the most promising % Generate successors and expand them within Bound
successor of sis e, whose f -value is 9. expand( P, l(N, FIG), Bound, Treel, Solved, Sol) :-
The updating of the { -values is necessary to enable the program to recognize the F=< Bound,
most promising subtree at each level of the search tree (that is, the tree that contains ( bagof( MIC, ( s(N, M, C), not memher(M, P ) ), Succ),
the most promising tip node). This modification of f-esti.mates leads, in fact, to a !, % Node N has successors
generalization of the definition off. The generalization extends the definition of the succlist( G, Succ, Ts), % Make subtrees Ts
function f from nodes to trees. For a single node tree (a leaf), n, we have the original bestf( Ts, Fl), % f-value of best successor
expand( 1� t(N, FllG, Ts), Bounct, Tree 1, Solved, Sol)
definition:
{(11) = g(n) + h(rz) Solved never % N has no successors - dead end
).
For a tree, T, whose root is n, and n's subtrees are 5 1 , 52, etc.,
% Case 3: non-leaf, f-value less than Bound
f(T) = minf(S;) % Expand the most promising subtree; depending on
% results, procedure continue will decide how to proceed
A best-first program along these lines is shown as Figure 12.3. Some more explana
tion of this program follows. expand( P, t(N, FIG, [TI Ts]), Bound, Tree 1, Solved, Sol)
The key procedure is expand, which has six arguments: F =< Bound,
bestf( Ts, BF), min( Bound, BF, Boundl), % Boundl = min( Bound, BF)
expand( P, Tree, Bound, Tree 1, Solved, Solution) expand( [NI P ], T, Bound 1, Tl, Solved 1, Sol),
It expands a current (sub)tree as long as the f -value of this tree remains less or equal continue( P, t(N, FIG, [TlI Ts]), Bound, Treel, Solved 1, Solved, Sol).
to Bound. The arguments of expand are: % Case 4: non-leaf with empty subtrees
% This is a dead encl which will never be solved
p Path between the start node and Tree.
Tree Current search (sub)tree. expand(_, t(_,_, []),_,_, never,_) :- !.
Bound {-limit for expansion of Tree.
Treel Tree expanded within Bound; consequently, the f-value of Treel is % Case 5: value greater than Bound
% Tree may not grow
greater than Bound (unless a goal node has been found during the
expansion). expand(_, Tree, Bound, Tree, no,_)
Solved Indicator whose value is 'yes', 'no' or 'never'. f( Tree, F), F > Bound.
Solution A solution path from the start node 'through Treel' to a goal node
within Bound (if such a goal node exists). Figure 12.3 A best-first search program.
266 Best-First Heuristic Search Best-first search 267
TreI
% Extract f-value
f( l(_, Fl_), F). % f-value of a leaf
f( t(_, Fl_,_), F). % £-value of a tree
best£( [TI _ ],F) % Best f-value of a list of trees
f( T,F). Tree l
bestf( [ ],9999). % No trees: bad f-value
P, Tree and Bound are 'input' parameters to expand; that is, they are already
instantiated whenever expand is called. expand produces three kinds of results,
which is indicated by the value of the argument Solved as follows: __,,;I(
J> Bound
(1) Solved= yes.
Solution= a solution path found by expanding Tree within Bound. Figure 12.4 The expand relation: expanding Tree until the f-value exceeds Bound results in
Treel= uninstantiated. Tree 1.
268 Best-First Heuristic Search
Best-first search 269
the type of result produced by this expansion. If a solution was found then this is g(N)
returnee!, otherwise expansion continues.
The clause that deals with the case C c ··
Tree= 1( N, F/G)
generates successor nodes of N together with the costs of the arcs between N and (0 @ iv(
successor nodes. Procedure succlist makes a list of subtrees from these successor g(M) = g(N) + C
nodes, also computing their g-values and (-values as shown in Figure 12.5. The f(M) = g(MJ + /z(M)
resulting tree is then further expanded as far as Bound permits. If, on the other hand,
there were no successors, then this leaf is abandoned for ever by instantiating Figure 12.5 Relation between the g-value of node N, and the f- and g-values of its children in
Solved = 'never'. the search space.
Other relations are:
This indeed guarantees admissibility. The disadvantage of h = 0 is, however, that it
s( N, M, C) M is a successor node of Nin the state space; C is the cost of the arc
has no heuristic power and does not provide any guidance for the search. A* using
from N to M.
h = 0 behaves similarly to the breadth-first search. It in fact reduces to the breadth
h( N, H) H is a heuristic estimate of the cost of the best path from node N to
first search in the case that the arc-cost function c(n, n') = 1 for all arcs (n, n') in the
a goal node.
state space. The lack of heuristic power results in high complexity. We would
Application of this best-first search program to some example problems will be therefore like to have h, which is a lower bound of fz* (to ensure admissibility), and
shown in the next section. But first some general, concluding comments on this which is also as close as possible to h* (to ensure efficiency). Ideally, if we knew h*,
program. It is a variation of a heuristic algorithm known in the literature as the A* we would use h* itself: A* using h* finds an optimal solution directly, without any
algorithm (see references at the end of the chapter). A* has attracted a great deal of backtracking at all.
attention. It is one of the fundamental algorithms of artificial intelligence. We will
mention here an important result from the mathematical analysis of A*:
Exercises
A search algorithm is said to be admissible if it always produces an optimal 12.1 Define the problem-specific relations s, goal and h for the route-finding problem of
solution (that is, a minimum-cost path) provided that a solution exists at all. Our Figure 12.2. Inspect the behaviour of our A* program on this problem.
implementation, which produces all solutions through backtracking, can be
considered admissible if the fzrst solution found is optimal. Let, for each node n
12.2 The following statement resembles the admissibility theorem: 'For every search
in the state space, h*(n) denote the cost of an optimal path from n to a goal node. problem, if A* finds an optimal solution then h(n) ,:;; h*(n) for all nodes n in the state
A theorem about the admissibility of A* says: an A* algorithm that uses a space.' ls this correct?
heuristic function h such that for all nodes n in the state space 12.3 Let h1, hz and h3 be three admissible heuristic functions (h; � h*) alternatively used
h(n),::; h*(n) by A* on the same state space. Combine these three functions into another heuristic
function h which will also be admissible and guide the search at least as well as any
is admissible. of the three functions h; alone.
12.4 A mobile robot moves in the x-y plane among obstacles. All the obstacles are
This result is of great practical value. Even if we do not know the exact value of h* we rectangles aligned with the x and y axes. The robot can only move in the directions x
just have to find a lower bound of h* and use it ash in A*. This is sufficient guarantee and y, and is so small that it can be approximated by a point. The robot has to plan
that A* will produce an optimal solution. collision-free paths between its current position to some given goal position. The
There is a trivial lower bound, namely: robot aims at minimizing the path length and the changes of the direction of
movement (let the cost of one change of direction be equal to one unit of length
h(n) = 0, for all n in the state space travelled): The robot uses the A� algorithm to find optimal paths. Define the
270 Best-First Heuristic Search Best-first search applied to the eight puzzle 271
predicates s( 5tate,New5tate,Cost) and h( 5tate,I·I) (preferably admissible) to be used by /• Problem-specific procedures for the eight puzzle
□
the A* program for this search problem. Assume that the goal position for the robot
is defined by the predicate goal( Xg/Yg) where Xg and Yg are the x and y coordinates Current situation is represented as a list of positions of the tiles, with first item in the list
of the goal point. The obstacles are represented by the predicate corresponding to the empty square.
obstacle( Xmin/Ymin, Xmax/Ymax) Example:
where Xmin/Ymin is the bottom left corner of the obstacle, and Xmax/Ymax is its top This position is represented by:
3 2 3
right comer.
2 8 4 (2/2, 1/3, 2/3, 3/3, 3/2, 3/1, 2/1, 1/1, 1/2]
7 6 5
l 2 3
Best-first search applied to the eight puzzle
12.2 ·········································································································································· 'Empty' can move to any of its neighbours, which means that 'empty' and its neighbour
interchange their positions.
If we want to apply the best-ftrst search program of Figure 12.3 to some particular ,/
problem we have to add problem-specific relations. These relations define the
particular problem ('rules of the game') and also convey heuristic information 'Yo s( Node, SuccessorNode, Cost)
about how to solve that problem. This heuristic information is supplied in the form
of a heuristic function. s( [Empty I Tiles], [Tile I Tilesl], 1) % All arc costs are l
Problem-specific predicates are: swap( Empty, Tile, Tiles, Tilesl). % Swap Empty and Tile in Tiles
s( Node, Node1, Cost) swap( Empty, Tile, [Tile I Ts], [Empty I Ts] )
mandist( Empty, Tile, 1). % Manhattan distance = 1
This is true if there is an arc, costing Cost, between :\"ode and Nodel in the state space. swap( Empty, Tile, [Tl I Ts], [Tl I Tsl])
goal( Node) swap( Empty, Tile, Ts, Tsl).
mandist( X/Y, Xl/Yl, D) 'X, Dis Manh. dist. between two squares
is true if Node is a goal node in the state space. dif( X, Xl, Dx),
h( Node, H) dif( Y, Yl, Dy),
Dis Dx + Dy.
H is a heuristic estimate of the cost of a cheapest path from Node to a goal node. dif( A, B, D) :- % Dis IA-BI
In this and the following sections we will define these relations for two example D is A-B, D >= 0, !
problem domains: the eight puzzle (described in Section 11.1) and the task
scheduling problem. Dis B-A.
Problem-specific relations for the eight puzzle are shown in Figure 12.6. A node in
the state space is some configuration of the tiles on the board. In the program, this is % Heuristic estimate h is the sum of distances of each tile
% from its 'home' square plus 3 times 'sequence' score
represented by a list of the current positions of the tiles. Each position is specified by
a pair of coordinates: X/Y. The order of items in the list is as follows: h( [Empty I Tiles], H) :-
(1) the current position of the empty square, goal( [Emptyl I GoalSquares] ),
totdist( Tiles, Goal5quares, D), % Total distance from home squares
(2) the current position of tile 1, seq( Tiles, 5), % Sequence score
(3) the current position of tile 2, H is D + 3•5.
totdist( [ ], [ ], 0).
The goal situation (see Figure 11.3) is defined by the clause: Figure 12.6 Problem-specific procedures for the eight puzzle, to be used in best-first search of
goal( [2/2,1/3,2/3,3/3,3/2,3/1,2/l,l/l,1/2] ). Figure 123.
�,_ rretrmtn z·· 1 ,r,.•rt't't, rr1@®r#lr1wm 1 nfi· �,
272 Best-First Heuristic Search Best-first search applied to the eight puzzle 273
showsol( [P I L]) • a tile on a non-central square scores O if the tile is, in the clockwise
showsol( L), direction, followed by its proper successor;
nl, write( ' - - - '), • such a tile scores 2 if it is not followed by its proper successor.
showpos( P).
For example, for the starting position of the puzzle in Figure 12.7(a), seq= 6.
% Display a board position
showpos( [S0,51,52,53,54,S5,S6,S7,S8])
member( Y, (3,2,1] ), % Order of Y -coordinates
nl, member( X, (1,2,3] ), % Order of X-coordinates - 4 8
member( Tile-X/Y, % Tile on square X/Y s - 3
[' '-SO, 1-Sl,2-52,3-53,4-54,5-55,6-56,7-57,8-58]), � � �
write( Tile), (a) (b) (c)
fail % Backtrack to next square
Figure 12.7 Three starting positions for the eight puzzle: (a) requires four steps; (b) requires
true. % All squares done five steps; (c) requires 18 steps.
274 Best-First Heuristic Search Best-first search applied to scheduling 275
The heuristic estimate, H, is computed as: precedence relation between tasks which tells what tasks, if any, have to be
completed before some other task can be started. The scheduling problem is to
H = totdist + 3 * seq assign tasks to processors so that the precedence relation is not violated and that all
This heuristic function works well in the sense that it very efficiently directs the the tasks together are processed in the shortest possible time. The time that the last
search toward the goal. For example, when solving the puzzles of Figure 12.7(a) and task in a schedule is completed is called the finishing time of the schedule. We want
(b), no node outside the shortest solution path is ever expanded before the first to minimize the finishing time over all permissible schedules.
solution is found. This means that the shortest solutions are found directly in these Figure 12.8 shows such a task-scheduling problem and two permissible schedules,
cases without any backtracking. Even the difficult puzzle of Figure l2.7(c) is solved one of which is optimal. This example shows an interesting property of optimal
almost directly. A drawback of this heuristic is, however, that it is not admissible: it schedules; namely, that they may include 'idle time' for processors. In the optimal
does not guarantee that the shortest solution path will always be found before any schedule of Figure 12.8, processor 2 after having executed task t2 waits for two time
longer solution. The lz function used does not satisfy the admissibility condition: units although it could start executing task t1 .
h � h* for all the nodes. For example, for the initial position in Figure 12.7(a), One way to construct a schedule is roughly as follows. We start with the empty
schedule (with void time slots for each processor) and gradually insert tasks one by
h = 4 + 3 * 6 = 22, /z* = 4 one into the schedule until all the tasks have been inserted. Usually there are
xx�
On the other hand, the 'total distance' measure itself is admissible: for all positions:
totdist � lz* 4 12 12 t3 / 2
t /
1
This relation can be easily proved by the following argument: if we relaxed the
problem by allowing the tiles to climb on top of each other, then each tile could
travel to its home square along a trajectory whose length is exactly the Manhattan
distance between the tile's initial square and its home square. So the optimal 14120 15 I 20 16 I 11 17 / 11
solution in the relaxed puzzle would be exactly of length totdist. lri the original
problem, however, there is interaction between the tiles and they are in each other's
--
Time
way. This can prevent the tiles from moving along the shortest trajectories, which 2 4 13 24 33
ensues our optimal solution's length be equal or greater than totdist.
' � 17
� 2 16
<t.)
2
Exercise 3 � 1,
12.5 Modify the best-first search program of Figure 12.3 to count the number of nodes 2 4 13 24
generated in the search. One easy way is to keep the current number of nodes --r---+
� 13 16 1,
asserted as a fact, and update it by retract and assert whenever new nodes are � 2 12 '.idle ts
generated. Experiment with various heuristic functions for the eight puzzle with
0.. 3 1, 14
respect to their heuristic power, which is reflected in the number of nodes generated.
Figure 12.8 A task-scheduling problem with seven tasks and three processors. The top part of
the diagram shows the task precedence relation and the duration of the tasks.
12.3 �.�����!'..?.�..?.��!.��..?P.P.i!�.?...�.?..?.�.��.�.��!0.�............................................................ Task t5, for example, requires 20 time units, and its execution can only start after
three other tasks, t1, t2 and t3 , have been completed. Two permissible schedules
Let us consider the following task-scheduling problem. We are given a collection of are shown; an optimal one with the finishing time 24, and a suboptimal one with
tasks, ti , t2 , ..., with their execution times D 1 , D2, ... respectively. The tasks are to be the finishing time 33. In this problem any optimal schedule has to include idle
executed on a set of m identical processors. Any task can be executed on any time. Coffman/Denning, Operating Systems Theory, © 1973, p. 86. Adapted by
processor, but each processor can only execute one task at a time. There is a permission of Prentice Hall, Englewood Cliffs, New Jersey.
276 Best-First Heuristic Search Best-first search applied to scheduling 277
alternatives at any such insertion step because there are several candidate tasks There are m such pairs in the list, one for each processor. We will always add a new
waiting to be processed. Therefore, the scheduling problem is one of search. task to a schedule at the moment that the first current execution is completed. To
Accordingly, we can formulate the scheduling problem as a state-space search this encl, the list of current engagements will be kept ordered according to increasing
problem as follows: finishing times. The three components of a partial schedule (waiting tasks, current
engagements and finishing time) will be combined in the program into a single
• states are partial schedules; expression of the form:
• a successor state of some partial schedule is obtained by adding a not yet WaitingList * Active111sks , FinishingTime
scheduled task to this schedule; another possibility is to leave a processor that
has completed its current task idle; In addition to this information we have the precedence constraint, which will be
specified in the program as a relation:
• the start state is the empty schedule;
• prec( TaskX, TaskY)
any schedule that includes all the tasks in the problem is a goal state;
• the cost of a solution (which is to be minimized) is the finishing time of a goal Now let us consider a heuristic estimate. We will use a rather straightforward
schedule; heuristic function, which will not provide a very efficient guidance to the search
• algorithm. The function will be admissible and will hence guarantee an optimal
accordingly, the cost of a transition between two (partial) schedules whose
schedule. It should be noted, however, that a much more powerful heuristic would
finishing times are F i and F2 respectively is the difference F2 - Fi.
be needed for large scheduling problems.
Some refinements are needed to this rough scenario. First, we decide to fill the Our heuristic function will be an optimistic estimate of the finishing time of a
schedule according to increasing times so that tasks are inserted into the schedule partial schedule completed with all currently waiting tasks. This optimistic estimate
from left to right.Also, each time a task is added, the precedence constraint has to be will be computed under the assumption that two constraints on the actual schedule
checked. Further, there is no point in leaving a processor idle indefinitely if there are be relaxed:
still some candidate tasks waiting. So we decide to leave a processor idle only until ( 1) remove the precedence constraint;
some other processor finishes its current task, and then consider again assigning a
task to it. (2) allow (unrealistically) that a task can be executed in a distributed fashion on
Now let us decide on the representation of problem situations - that is, partial several processors, and that the sum of the execution times of this task over all
schedules. We need the following information: these processors is equal to the originally specified execution time of this task
on a single processor.
(1) list of waiting tasks and their execution times,
Let the execution times of the currently waiting tasks be D i , D2 , .•. , and the
(2) current engagements of the processors; finishing times of the current processors engagements be F i , F2 , .... Such an
optimistically estimated finishing time, Final/, to complete all the currently active
We will also add for convenience: and all the waiting tasks, is:
(3) the finishing time of the (partial) schedule; that is, the latest end-time of the Final/= (I::D; + LJj)/111
current engagements of the processors.
The list of waiting tasks and their execution times will be represented in the program where m is the number of processors. Let the finishing time of the current partial
as a list of the form: schedule be:
Fin= max(F;)
[ Taskl/D1, Task2/D2, ...) I
The current engagements of the processors will be represented by a list of tasks Then the heuristic estimate H (an extra time needed to complete the partial
currently being processed; that is, pairs of the form: schedule with the waiting tasks) is:
A complete program that defines the state-space relations for task scheduling as c!el( A, [Bl L], [Bl Ll] )
outlined above is shown in Figure 12.9. The figure also includes a specification of the c!el( A, L, Ll).
particular scheduling problem of Figure 12.8. These definitions can now be used by goal([ ],_ *_). % Goal state: no task waiting
the best-first search program of Figure 12.3. One of the optimal solutions produced % Heuristic estimate of a partial schedule is based on an
by best-first search in the thus specified problem space is an optimal schedule of % optimistic estimate of the final finishing time of this
Figure 12.8. % partial schedule extended by all the remaining waiting tasks.
h( Tasks * Processors * Fin, H) :-
totaltime( Tasks, Tottime), % Total duration of waiting tasks
/* Problem-specific relations for task scheduling sumnum( Processors, Ftirne, i'.'), % Ftime is sum of finishing times
% of processors, N is their number
Nodes in the state space are partial schedules specified by: Final! is ( Tottirne + Ftirne)/N,
( Final!> Fin, !, H is Final! - Fin
[ WaitingTaskl/Dl, WaitingTask2/D2, ...] * [ Taskl/Fl, Task2/F2, ...] * FinTime
H=O
The first list specifies the waiting tasks and their durations; the second list specifies the ).
currently executed tasks and their finishing times, ordered so that Fl ;£ F2, F2 ;£ F3 ....
Fintime is the latest completion time of current engagements of the processors. totaltime([ ], 0).
*I totaltime([_ JDI Tasks], T)
totaltime( Tasks, Tl),
% s( Node, SuccessorNode, Cost) T is Tl+ D.
s( Tasksl *[_/FI Activel] * Finl, Tasks2 • Active2 • Fin2, Cost) surnnum([ ], 0, 0).
del( Task/D, Tasksl, Tasks2), % Pick a waiting task surnnum( [_/TI Procs], FT N)
not ( member( T/_, Tasks2), before( T, Task)), % Check precedence sumnum( Procs, ITl, Nl),
not ( member( Tl/Fl, Active 1), F < Fl, before( Tl, Task) ), %Active tasks too N is Nl + 1,
Time is F + D, % Finishing time of activated task FT is FTl + T.
insert( Task/Time, Active 1, Active2, Finl, Fin2),
Cost is Fin2 - Finl. % A task-precedence graph
s( Tasks* [_/FI Activel]• Fin, Tasks• Active2 * Fin, 0) prec( tl, t4). prec( tl, t5). prec( t2, t4). prec( t2, t5).
insertidle(F, Activel, Active2). % Leave processor idle
prec( t3, tS). prec( t3, t6). prec( t3, t7).
before( Tl, T2) :- % Task Tl before T2
'¼, A start node
prec( Tl, T2). % according to precedence
before( Tl, T2) :- start([tl/4, t2/2, t3/2, t4/20, tS/20, t6/ll, t7/11]* [ic!le/0, idle/0, idle/OJ* 0).
prec( T, T2), % An example query: 7- start( Problem), bestfirst( Problem, Sol).
before( Tl, T).
insert( S/ A, [T/BIL], (S/A, T/BIL], F, F) % Task lists are ordered
Figure 12.9 Problem-specific relations for the task-scheduling problem. The particular
A=<B, !.
scheduling problem of Figure 12.8 is also defined by its precedence graph and an
insert( S/ A, [T/B IL], [T/B ILl], Fl, F2) initial (empty) schedule as a start node for search.
insert( SIA, L, Ll, Fl, F2).
insert( SIA,( ], [S/A], _, A).
insertidle( A,(T/B IL],(idle/B, T/BIL]) % Leave processor idle Project
A <B, !. % until first greater finishing time
insertidle( A, [T/B IL], [T/B ILl] ) In general, scheduling problems are known to be combinatorially difficult. Our
insertidle( A, L, Ll). simple heuristic function does not provide very powerful guidance. Propose other
del( A, [A IL], L). % Delete item from list functions and experiment with them.
280 Best-First Heuristic Search Space-saving techniques for best-first search 281
282 Best-First Heuristic Search Space-saving techniques for best-first search 283
that fis an evaluation function not necessarily of the form f = g + h. If function fis Solution):
% iclastar( Start,
not monotonic then the best-first order is not guaranteed. Function f is said to be % Perform IDA* search; Start is the start node, Solution is solution path
monotonic if its value monotonically increases along the paths in the state space.
iclastar( Start,Solution) :-
That is: fis monotonic if for all pairs of nodes N and N': if s(N, N') then f(N) � f(N'). retract( next_bound(_ )), fail % Clear next_bound
The reason for a non best-first order is that with non-monotonic f, the {-bound may
become so large that nodes with different {-values will be expanded for the first time asserta( next_bound( 0)), % Initialize bound
by this depth-first search. This depth-first search will keep expanding nodes as long idastar0( Start, Solution).
as they are within the {-bound, and will not care about the order in which they are idastarO( Start,Sol) :-
expanded. In principle we are interested in best-first order because we expect that retract( next_bouncl( Bound)), '¾, Current bound
the function freflects the quality of solutions. asserta( next_bound( 99999)), % Initialize next bound
One easy way of implementing IDA* in Prolog is shown in Figure 12.10. This f( Start, F), 0A, f-value of start node
program largely exploits Prolog's backtracking mechanism. The (-bound is main elf( [Start], F, Bound, Sol) % Find solution; if not, change bound
although more complicated space-saving technique is the so-called RBFS ('recursive the value F(N;) was necessarily � F(N). Otherwise, by the back-up rule, F(N) would
best-first search'). RBFS is very similar to our A* program of Figure 12.3 (which also is have been smaller.
recursive in the same sense as RBFS!). The difference between our A* program and To illustrate how RBFS works, consider the route-finding problem of Figure 12.2.
RBFS is that A* keeps in memory all the already generated nodes whereas RBFS only Figure 12.11 shows selected snapshots of the current path (including the siblings
keeps the current search path and the sibling nodes along this path. When RBFS along the path) kept in the memory. The search keeps switching (as in A*) between
temporarily suspends the search of a subtree (because it no longer looks the best), it the alternative paths. However, when such a switch occurs, the previous path is
'forgets' that subtree to save space. So RBFS's space complexity is (as in IDA*) only removed from the memory to save space. In Figure 12.11, the numbers written next
linear in the depth of search. The only thing that RBFS remembers of such an to the nodes are the nodes' I-values. At snapshot (A), node a is the best candidate
abandoned subtree is the updated (-value of the root of the subtree. The (-values are node (F(a) < F(e)). Therefore the subtree below a is searched with Bound = 9
updated through backing-up the (-values in the same way as in the A* program. To
distinguish between the 'static' evaluation function ( and these backed-up values, we (A) (B) (C)
✓0
write (for a node N):
f(N) = value of node N returned by the evaluation function (always the same
during search)
F(N) = backed-up (-value (changes during search because it depends on the
descendent nodes of N)
7�0 9
,(@,
8 cp
1 0 9
l00
F(N) is defined as follows:
F(N) = f(N) if N has (never) been expanded by the search (D) (E) (F)
'" t
F(N) = min( F(N; [ N; is a child of Nf
As the A* program, RBFS also explores subtrees within a given (-bound. The bound is
determined by the F-values of the siblings along the current search path (the
,0�9 IO fa(""!J>..©
V
0
. 11 IO(c0A
1 0
11
smallest F-value of the siblings, that is the F-value of the closest competitor to (j) ll
the current node). Suppose that a node N is currently the best node in the search tree
(i.e. has the lowest F-value). Then N is expanded and N's children are explored up to
00
some ( -bound Bound. When this bound is exceeded (manifested by F(N) > Bound)
1
then all the nodes generated below N are 'forgotten'. However, the updated value
F(N) is retained and is used in deciding about how to continue search. 120
t
The F-values are not only determined by backing-up the values from a node's
r
children, but can also be inherited from the node's parents. Such inheritance occurs (G) (H)
as follows. Let there be a node N which is about to be expanded by the search. If
�
F(N) > f(N) then we know that N must have already been expanded earlier and F(N) 20
1 11 12 � 11
\.'.'.,J
was determined from N's children, but the children have then been removed from
the memory. Now suppose a child N; of N is generated again and N;'s static value (j) 11
f(N;) is also computed again. Now F(N;) is determined as follows:
if f(N;) < F(N) then F(N;) = F(N) else F(N;) = f(N;) t
cp11
This can be written shorter as: 'f
0 1
1
F(N;) = ma.'<{f(N),f(N;)}
Thus in the case f(N;) < F(N), N;'s F-value is inherited from N;'s parent N. This is Figure 12.11 The trace of the RBFS algorithm on the route problem of Figure 12.2. The
justified by the following argument: when N; was generated (and removed) earlier, figures next to the nodes are the nodes' F-values (which change during search).
286 Best-First Heuristic Search Space-saving techniques for best-first search 287
(i.e. F(e), the closest - the only competitor). When this search reaches node c A Prolog implementation of RBFS is shown in Figure 12.13. This program is
(snapshot BJ, it is found that F(c) = 10 >Bound.Therefore this path is (temporarily) similar to the A* program of Figure 12.3.The heart of the program is the procedure
abandoned, the nodes c and b removed from the memory and the value F(c) = 10
rbfs( Path, SiblingNodes, Bound, NewBestFF, Solved, Solution)
backed-up to node a, so F(a) becomes 10 (snapshot C). Now e is the best competitor
(F(e)= 9 < 10= F(a)), and its subtree is searched with Bound = 10 = F(a). This which carries out the RBFS algorithm.The arguments are:
search stops at node f because F(f)= 11 > Bound (snapshot D). Node e is removed,
and F(e) becomes 11 (snapshot E). Now the search switches to a again with Path Path so far from start node of search in reverse order
Bound= 11. When bis regenerated, it is found that f(b)= 8 < F(a).Therefore node SiblingNodes The children of the last node in the path so far, i.e. the head of
b inherits its F-value from node a, so F(b) becomes 10. Next c is regenerated and c also Path
inherits its F-value from b, so F(c) becomes 10. The Bound = 11 is exceeded at Bound Upper bound on F-values for extending the search from
snapshot F, the nodes d, c and bare removed and F(d) = 12 is backed-up to node a SiblingNodes.
(snapshot G). Now the search switches to node e and runs smoothly to the goal t. NewBestFF The best F-value after extending search just beyond Bound.
Let us now formulate the RBFS algorithm more formally.The algorithm is centred Solved Indicates the success of search below SiblingNodes (Solved=yes if
around the updating of the F-values of nodes. So a good way to formulate the a goal was found, no if search went just beyond Bound, never if
algorithm is by defining a function: this is a dead end).
Solution Solution path if Solve=yes, otherwise undefined.
NewF( N, F(N), Bound)
The representation of the nodes includes, besides a state in the state space also the
where N is a node whose current F-value is F(N). The function carries out the search path costs, {-values and F-values as follows:
within Bound, starting with node N, and computes the new F-value of N, NewF,
resulting from this search.The new value is determined at the moment that the bound Node= ( State, G/F/FF)
is exceeded. If, however, a goal is found before this, then the function terminates,
signalling success as a side effect.The function NewF is defined in Figure 12.12.
Figure 12.13 A best-first search program that only requires space linear in the depth of search
Figure 12.12 The updating of F-values in RBFS search. (RBFS algorithm).
�tgjzj"':JlflQl[r@ tf@r ' ®ir vtIMC'"
288 Best-First Heuristic Search Space-saving techniques for best-first search 289
where G is the cost of the path from the start state to State,F is static value {(State),
Figure 12.13 contd
and FF is the current backed-up valuef(State). It should be noted that variable F in
rbfs( Path, [ (Node, G/F/FF) I_},_,_,yes, [NodeIPath]) r
the program denotes an value, and FF in the program denotes an f-value. The
F = FF, % Only report solution once,when first reached; then F=FF procedure rbfs carries out the search below SiblingNodes within Bound, and
goal( Node). computes NewBestFF according to the function NewF in Figure 12.12.
rbfs( _,[ ],_,_,never,_) !. % No candidates,dead end! Let us summarize the important properties of the RBFS algorithm. First, its space
rbfs( Path, [ (Node, G/F/FF)INs},Bound,NewFF, Solved,Sol) :- complexity is linear in the depth of search. The price for this is the time needed to
FF =< Bound, % Within Bound: generate children regenerate already generated nodes. However, these overheads are substantially
findall( Child/Cost, smaller than in IDA*. Second, like A* and unlike IDA*, RBFS expands the nodes in
( s( Node,Child,Cost),not member( Child,Path)), the best-first order even in the case of a non-monotonic (-function.
Children),
inherit( F,FF, lnheritedFF), % Children may inherit FF
succlist( G, lnheritedFF,Children, SuccNodes), % Order children
bestff( Ns,NextBestFF), % Closest competitor FF among siblings Exercises
min( Bound,NextBestFF,Bound2), !,
rbfs( [NodeIPath],SuccNodes, Bound2,NewFF2, Solved2, Sol), 12.8 Consider the route-finding problem of Figure 12.2. How many nodes are generated
continue(Path, [(Node,G/F/NewFF2)1Ns),Bound,NewFF, Solved2, Solved, Sol). by A*, IDA* and RBFS on this problem (counting also the regenerated nodes)?
'¾, continue( Path,Nodes,Bound,NewFF,ChildSolved,Solved, Solution) 12.9 Consider the state space in Figure 12.14. Let a be the start node and I the goal node.
continue( Path, (NI Ns],Bound,NewFF,never, Solved, Sol) :- !, Give the order in which nodes are generated (including regeneration) by the RBFS
rbfs( Path,Ns,Bound, NewFF, Solved, Sol). % Node N a dead end algorithm. How are the backed-up values F(b) andf(c) changing during this process?
continue( _,_,_,_, yes,yes. Sol).
12.10 Inheritance of f-values in RBFS saves time (prevents unnecessary regeneration of
continue( Path, [ NI Ns),Bound,NewFF, no,Solved, Sol) nodes). Explain how. Hint: Consider the search by RBFS of the binary tree in which
insert( N, Ns, NewNs), !, % Ensure siblings are ordered by values for each node its {-value is equal to the depth of the node in the tree. Study the
rbfs( Path, NewNs,Bound,NewFF, Solved, Sol).
execution of RBFS with and without inheritance off-values on this state space up to
succlist( _, _, [ ), [ ]). depth 8.
succlist( GO,lnheritedFF, [Node/CINCs),Nodes)
G is GO+ C, 12.11 Compare the A*, IDA* and RBFS programs on the eight puzzle problems. Measure
h( Node, H), execution times and the number of nodes generated during search, including the
F is G + H, regenerated nodes (add node counters to the programs).
max( F, lnheritedFF,FF),
succlist( GO, InheritedFF,NCs,Nodes2),
insert( (Node, G/F/FF),Nodes2,Nodes). a l 1
.
?.�0.:.�.�.' Y........................................................................................................................ References
• Heuristic information can be used to estimate how far a node is from a nearest
The best-first search program of this chapter is a variation of many similar algorithms of which
A* is the most popular. Descriptions of A* can be found in general text books on Al, such as
goal node in the state space. In this chapter we considered the use of numerical
Nilsson (1980), Winston (1992) and Russell and Norvig (1995). Doran and Michie (1966)
heuristic estimates. originated the best-first search guided by distance-to-goal estimate. The admissibility theorem
• The best-fzrst heuristic principle guides the search process so as to always expand was discovered by Hart, Nilsson and Raphael (1968). In the literature, the property h ,,; h' is
often included in the definition of A*. An excellent and rigorous treatment of many variations
the node that is currently the most promising according to the heuristic
of best-first search algorithms and related mathematical results is provided by Pearl (1984).
estimates. The well-known A* algorithm that uses this principle was
Kana! and Kumar (1988) edited a collection of interesting papers dealing with various advanced
programmed in this chapter. aspects of search. An early description of IDA' appeared in Korf (1985). The RBFS algorithm was
• To use A* for solving a concrete problem, a state space, a goal predicate, and a introduced and analyzed by Korf (1993). Russell and Norvig (1995) discuss further ideas for
space-bounded search.
heuristic function have to be defined. For complex problems, the difficult part is
The eight puzzle was used in artificial intelligence as a test problem for studying heuristic
to find a good heuristic function.
principles by several researchers - for example, Doran and Michie (1966), Michie and Ross
• The admissibility theorem helps to establish whether A*, using a particular (1970), and Gaschnig (1979).
heuristic function, will always find an optimal solution. Our task-scheduling problem and its variations arise in numerous applications in which
servicing of requests for resources is to be planned. Our example task-scheduling problem in
• The time and space requirements of A* typically grow exponentially with Section 12.3 is borrowed from Coffman and Denning (1973).
solution length. In practical applications, space is often more critical than time. Finding good heuristics is important and difficult, therefore the study of heuristics is one of
Special techniques for best-first search aim at saving space at the expense of the central themes of artificial intelligence. There are, however, also some limitations on how
time. far we can get in the refinement of heuristics. lt mav appear that to solve any combinatorial
problem efficiently we only have to find a powerful heuristic. However, there are problems
• IDA* is a simple space-efficient best-first search algorithm based on a similar (including many scheduling problems) for which no general heuristic exists that would
idea as iterative deepening. Overheads due to node regeneration in IDA* guarantee both efficiency and admissibility in all cases. Many theoretical results that pertain
are acceptable in cases when many nodes in the state space have equal to this limitation issue are collected in Garey and Johnson (1979).
r values. When the nodes tend not to share (-values the overheads become Coffman, E.G. and Denning, P.J. (1973) Operating Systems Tileory Prentice Hall.
unacceptable. Doran, J. and Michie, D. (1966) Experiments with the graph traverser program. Proc. Royal
• RBFS is a more sophisticated space-efficient best-first search algorithm that
Society of Lo11don, 294(A): 235- 259.
Garey, M.R. and Johnson, D.S. (1979) Computers and Intractability. W.H. Freeman.
generally regenerates less nodes than IDA*.
Gaschnig, J. (1979) Per(ormmzce measurement and analysis of certain searclz algorithms. Carnegie
• The space requirements of both IDA* and RBFS are very modest. They only grow Mellon University: Computer Science Department. Technical Report Cv!U-CS-79-124 (PhD
linearly with the depth of search. Thesis).
Hart, P.E., Nilsson, N.J. and Raphael, B. (1968) A formal basis for the heuristic determination of
• In this chapter best-first search was applied to the eight puzzle problem and a minimum cost paths. IEEE Tra,zsactions on Systems Sciences and Cybernetics, SSC-4(2): 100-107.
task-scheduling problem. Kanai, L. and Kumar, V. (eds) (1988) Search in Artificial Intelligence. Springer-Verlag.
Korf, R.E. (1985) Depth-first iterative-deepening: an optimal admissible tree search. Artificial
• Concepts discussed in this chapter are: Intelligence, 27: 97-109.
Korf, R.E. (1993) Linear-space best-first search. Artificial Intelligence, 62: 41-78.
heuristic estimates
Michie, D. and Ross, R. (1970) Experiments with the adaptive graph traverser. lvfachine
heuristic search
Intelligence, S: 301-308.
best-first search Nilsson, N.J. (1980) Principles of Artificial Intelligence. Tioga; also Springer-Verlag.
algorithms A*, IDA*, REFS Pearl, J. (1984) Heuristics: Intelligent Search Strategies for Computer Problem Solving. Addison
admissibility of search algorithms, admissibility theorem Wesley.
space-efficiency of best-first search Russell, S.J. and Norvig, P. (1995) Artificial Intelligence: A Modem Approach. Prentice Hall.
monotonicity of evaluation function Winston, P.H. (1992) Artificial Intelligence, third edition. Addison-Wesley.
- AND/OR graph representation of problems 293
chapter 13
292
294 Problem Decomposition and AND/OR Graphs AND/OR graph representation of problems 295
find path either only AND successors or only OR successors. Each AND/OR graph can be
a<: transformed into this form by introducing auxiliary OR nodes if necessary. Then, a
node that only issues AND arcs is called an AND node; a node that only issues OR
arcs is called an OR node.
In the state-space representation, a solution to the problem was a path in the state
space. What is a solution in the AND/OR representation? A solution, of course, has
to include all the subproblems of an AND node. Therefore, a solution is not a path
any more, but it is a tree. Such a solution tree, T, is defined as follows:
• the original problem, P, is the root node of T;
• if P is an OR node then exactly one of its successors (in the AND/OR graph),
together with its own solution tree, is in T;
• if P is an AND node then all of its successors (in the AND/OR graph), together
Figure 13.2 An AND/OR representation of the route-finding problem of Figure 13.1. Nodes with their solution trees, are in T.
correspond to problems or subproblems, and curved arcs indicate that all (both)
subproblems have to be solved. Figure 13.4 illustrates this definition. In this figure, there are costs attached to arcs.
Using costs we can formulate an optimization criterion. We can, for example, define
other. Such a decomposition can be pictured as an AND/OR graph (Figure 13.2).
Notice the curved arcs which indicate the AND reladonship between subproblems. (a) {/
Of course, the graph in Figure 13.2 is only the top part of a corresponding AND/OR
tree. Further decomposition of subproblems could be based on the introduction of
additional intermediate cities.
What are goal nodes in such an AND/OR graph? Goal nodes correspond to
subproblems that are trivial or 'primitive'. In our example, such a subproblem would
be 'find a route from a to c', for there is a direct connection between cities a and c in
the road map. f, 2
0
Some important concepts have been introduced in this example. An AND/OR
graph is a directed graph in which nodes correspond to problems, and arcs indicate
relations between problems. There are also relations among arcs themselves. These
relations are AND and OR, depending on whether we have to solve just one of the
successor problems or several (see Figure 13.3). In principle, a node can issue both
(b) {I
)
AND-related arcs and OR-related arcs. We will, however, assume that each node has
,1c
(a) (b)
©Ii �
'-'
Figure 13.4 (a) An AND/OR graph: d, g and h are goal nodes; a is the problem to be solved.
(b) and (c) Two solution trees whose costs are 9 and 8 respectively. The cost of a
Figure 13.3 (a) To solve P solve any of ?1 or P2 or .... (b) To solve O solve all 01 and 02 ....
solution tree is here defined as the sum of all the arc costs in the solution tree.
295 Problem Decomposition and AND/OR Graphs Examples of AND/OR representation 297
the cost of a solution graph as the sum of all the arc costs in the graph- As we are • AND nodes are of the form
normally interested in the minimum cost, the solution graph in Figure 13.4(c) will X-Z via Y
be preferred_ meaning: find a shortest path from X to Z under the constraint that the path
But we do not have to base our optimization measure on the costs of arcs_ goes through y_
Sometimes it is more natural to associate costs with nodes rather than arcs, or with
both arcs and nodes_
• A node X-Z is a goal node (primitive problem) if X and Z are directly connected
in the map.
To summarize:
• The cost of each goal node X-Z is the given road distance between X and z_
• AND/OR representation is based on the philosophy of decomposing a problem
• The costs of all other (non-terminal) nodes are Q_
into subproblems_
• Nodes in an AND/OR graph correspond to problems; links between nodes The cost of a solution graph is the sum of the costs of all the nodes in the solution
indicate relations between problems_ graph (in our case, this is just the sum over the terminal nodes)_ For the problem of
• A node that issues OR links is an OR node_ To solve an OR node, one of its
Figure 13-1, the start node is a-z. Figure 13_5 shows a solution tree of cost 9. This tree
corresponds to the path (a,b,d,f,i,z]. This path can be reconstructed from the solution
successor nodes has to be solved_
tree by visiting all the leaves in this tree in the left-to-right order.
• A node that issues AND links is an AND node_ To solve an AND node, all of its
successors have to be solved_
• For a given AND/OR graph, a particular problem is specified by two things:
a start node, and
a goal condition for recognizing goal nodes-
• Goal nodes (or 'terminal' nodes) correspond to trivial (or 'primitive') problems_
• A solution is represented by a solution graph, a subgraph of the AND/OR graph_
• The state-space representation can be viewed as a special case of the AND/OR
representation in which all the nodes are OR nodes_
• To benefit from the AND/OR representation, AND-related nodes should repre
sent subproblems that can be solved independently of each other. The
independency criterion can be somewhat relaxed, as follows: there must exist
an ordering of AND subproblems so that solutions of subproblems that come
earlier in this ordering are not destroyed when solving later subproblems_
• Costs can be attached to arcs or nodes or both in order to formulate an
optimization criterion_
0
��
2
13 _2 -�_x_?.:!'.P l_��--�!-�-�g(Q�--:.�P_'._�?-�-�-�-�_t_i_�-�---------------------------------------------·-·-·-·-·-·--·-----
13_2_ 1 AND/OR representation of route finding
For the shortest route problem of Figure 13.1, an AND/OR graph including a cost 2 2
function can be defined as follows: Figure 13.5 The cheapest solution tree for the route problem of Figure 13_ 1 formulated as an
• OR nodes are of the form X-Z, meaning: find a shortest path from X to Z. AND/OR graph_
298 Problem Decomposition and AND/OR Graphs Examples of AND/OR representation 299
13.2.2 The Tower of Hanoi problem Disk c can only move from 1 to 3 if both a and b are stacked on peg 2. Then, our
initial problem of moving a, b and c from peg 1 to peg 3 is reduced to three
The Tower of Hanoi problem, shown in Figure 13.6, is another, classical example of subproblems:
effective application of the AND/OR decomposition scheme. For simplicity, we will
consider a simple version of this problem, containing three disks only: To move a, band c from 1 to 3:
(1) move a and b from 1 to 2, and
There are three pegs, 1, 2 and 3, and three disks, a, band c (a being the smallest (2) move cfrom 1 to 3, and
and c being the biggest). Initially, all the disks are stacked on peg 1. The problem (3) move a and b from 2 to 3.
is to transfer them all on to peg 3. Only one disk can be moved at a time, and no
disk can ever be placed on top of a smaller disk. Problem 2 is trivial (one-step solution). The other two subproblems can be solved
independently of problem 2 because disks a and b can be moved regardless of the
This problem can be viewed as the problem of achieving the following set of goals: position of disk c. To solve problems 1 and 3, the same decomposition principle can
(1) Disk a on peg 3. be applied (disk b is the hardest this time). Accordingly, problem 1 is reduced to
three trivial subproblems:
(2) Disk b on peg 3.
(3) Disk con peg 3. To move a and b from 1 to 2:
(1) move a from 1 to 3, and
These goals are, unfortunately, not independent. For example, disk a can immedi (2) move bfrom 1 to 2, and
ately be placed on peg 3, satisfying the first goal. This will, however, prevent the (3) move a from 3 to 2.
fulfilment of the other two goals (unless we undo the first goal again). Fortunately,
there is a convenient ordering of these goals so that a solution can easily be derived
from this ordering. The ordering can be found by the following reasoning: goal 3 13.2.3 AND/OR formulation of game playing
(disk con peg 3) is the hardest because moving disk c is subject to most constraints. A
good idea that often works in such situations is: try to achieve the hardest goal first. Garnes like chess and checkers can naturally be viewed as problems, represented by
The logic behind this principle is: as other goals are easier (not as constrained as the AND/OR graphs. Such games are called two-person, perfect-information games, and
hardest) they can hopefully be achieved without the necessity of undoing this we will assume here that there are only two possible outcomes: WIN or LOSS. (We
hardest goal. can think of games with three outcomes - WIN, LOSS and ORAW - as also having
The problem-solving strategy that results from this principle in our task is: just two outcomes: WIN and NO-WIN.) As the two players move in turn we have two
First satisfy the goal 'disk con peg 3', kinds of positions, depending on who is to move. Let us call the two players 'us' and
then satisfy the remaining goals. 'them', so the two kinds of positions are: 'us-to-move' positions and 'them-to-move'
positions. Assume that the game starts in an us-to-move position P. Each alternative
But the first goal cannot immediately be achieved: disk c cannot move in the initial us-move in this position leads to one of them-to-move positions Q1 , Q:, ...
situation. Therefore, we first have to prepare this move and our strategy is refined to: (Figure 13.7). Further, each alternative them-move in Q 1 leads to one of the
(1) Enable moving disk c from 1 to 3. positions R 11, R 12, .... In the AND/OR tree of Figure 13.7, nodes correspond to
positions, and arcs correspond to possible moves. Us-to-move levels alternate with
(2) Move disk c from 1 to 3.
them-to-move levels. To win in the initial position, P, we have to find a move from P
(3) Achieve remaining goals: a on 3, and bon 3. to Q, for some i, so that the position Q is won. Thus, Pis won if Q 1 or Q: or ... is
won. Therefore position P is an OR node. For all i, position Q is a them-to-move
position, so if it is to be won for us it has to be won after each them-move. Thus Q is
� won if all R;1 and R;z and ... are won. Accordingly, all them-to-move positions are
'/
� � AND nodes. Goal nodes are those positions that are won by the rules of the game; for
2 2 3 example, their king checkmated in chess. Those positions that are lost by the rules of
the game correspond to unsolvable problems. To solve the game we have to find a
Figure 13.6 The Tower of Hanoi problem. solution tree that guarantees our victory regardless of the opponent's replies. Such
300 Problem Decomposition and ANO/OR Graphs Basic AND/OR search procedures 301
us-to-move position Now Prolog will effectively search the graph of Figure 13.4 in the depth-first fashion
and answer'yes', after having visited that part of the search graph corresponding to
the solution tree in Figure 13.4(b).
rt
The advantage of this approach to programming AND/OR search is its simplicity.
There are disadvantages, however:
them-to-move positions
• We only get an answer 'yes' or 'no', not a solution tree as well. We could
reconstrnct the solution tree from the program's trace, but this can be awkward
and insufficient if we want a solution tree explicitly accessible as an object in the
program.
In this section we will only be interested in finding some solution of the problem, op( 600, xfx,•··>).
regardless of its cost. So for the purposes of this section we can ignore the costs of op( 500, xfx, :).
links or nodes in an AND/OR graph.
The simplest way of searching AND/OR graphs in Prolog is to use Prolog's own The complete AND/OR graph of Figure 13.4 is thus specified by the clauses:
search mechanism. This happens to be trivial as Prolog's procedural meaning itself is a •··>or: [b,c].
nothing but a procedure for searching AND/OR graphs. For example, the AND/OR b •··>and: [d,e].
graph of Figure 13.4 (ignoring the arc costs) can be specified by the following c •··>and: [f,g].
clauses: e •··>or: [h].
f •··>or: [h,i].
a b. % a is an OR node with two successors, band c
goal( ct). goal( g). goal( h).
a • C.
b d, e. % b is an AND node with two successors, d and e The depth-first AND/OR procedure can be constrncted from the following
principles:
e . h.
C f, g.
To solve a node, N, use the following rules:
f h, i.
d. g. h. % d, g and h are goal nodes (1) If N is a goal node then it is trivially solved.
To ask whether problem a can be solved we can simply ask: (2) If N has OR successors then solve one of_ them (attempt them one after
another until a solvable one is found):
?- a.
302 Problem Decomposition and AND/OR Graphs Basic AND/OR search procedures 303
For example, in the AND/OR graph of Figure 13.4, the first solution of the top node a
(3) If N has AND successors then solve all of them (attempt them one after
is represented by:
another until they have all been solved).
a--->b--->and: [d, e --->h)
If the above rules do not produce a solution then assume the problem cannot be
solved. The three forms of a solution tree correspond to the three clauses about our solve
relation. So our initial solve procedure can be altered by simply modifying each of
the three clauses; that is, by just adding solution tree as the second argument to
A corresponding program can be as follows: solve. The resulting program is shown as Figure 13.8. An additional procedure in this
program is show for displaying solution trees. For example, the solution tree of
solve( Node) : Figure 13.4 is displayed by show in the following form:
goal( Node).
a--->b ---> d
solve( Node) -:
e ---> h
Node--->or: Nodes, % Node is an OR node
member( Node 1, Nodes), % Select a successor Nodel of Node The program of Figure 13.8 is still prone to infinite loops. One simple way to
solve( Nodel). prevent infinite loops is to keep trace of the current depth of the search and prevent
solve( Node) : the program from searching beyond some depth limit. We can do this by simply
Node--->and: Nodes, % Node is an AND node introducing another argument to the solve relation:
solveall( Nodes). % Solve all Node's successors
solve( Node, SolutionTree, MaxDepth)
solveall( []).
As before, Node represents a problem to be solved, and SolutionTree is a solution not
solveall( [Node I Nodes])
solve( Node), deeper than MaxDepth. MaxDepth is the allowed depth of search in the graph. In the
solveall( Nodes). case that MaxDepth = 0 no further expansion is allowed; otherwise, if MaxDepth > 0
then Node can be expanded and its successors are attempted with a lower depth limit
member is the usual list membership relation. MaxDepth - l. This can easily be incorporated into the program of Figure 13.8. For
This program still has the following disadvantages: example, the second clause about solve becomes:
• it does not produce a solution tree, and solve( Node, Node--->Tree, MaxDepth)
• it is susceptible to infinite loops, depending on the properties of the AND/OR MaxDepth >0,
graph (cycles). Node--->or: Nodes, % Node is an OR node
member( Nodel, Nodes), % Select a successor Nodel of Node
The program can easily be modified to produce a solution tree. We have to modify Depthl is MaxDepth - 1, % New depth limit
the solve relation so that it has two arguments: solve( Node 1, Tree, Depthl). % Solve successor with lower limit
solve( Node, SolutionTree) This depth- limited, depth-first procedure can also be used in the iterative
deepening regime, thereby simulating the breadth-first search. The idea is to do
Let us represent a solution tree as follows. We have three cases: the depth-first search repetitively, each time with a greater depth limit, until a
(1) If Node is a goal node then the corresponding solution tree is Node itself. solution is found. That is, try to solve the problem with depth limit 0, then with 1,
then with 2, etc. Such a program is:
(2) If Node is an OR node then its solution tree has the form:
iterative_deepening( Node, So!Tree)
Node--->Subtree trydepths( Node, So!Tree, 0). % Try search with increasing depth limit, start with 0
where Subtree is a solution tree for one of the successors of Node. trydepths( Node, So!Tree, Depth)
(3) If Node is an AND node then its solution tree has the form: solve( Node, So!Tree, Depth)
Node--->and: Subtrees Depthl is Depth+ 1, % Get new depth limit
where Subtrees is the list of solution trees of all of the successors of Node. trydepths( Node, So!Tree, Depthl). % Try higher depth limit
304 Problem Decomposition and AND/OR Graphs Best-first AND/OR search 305
member(Node 1, Nodes), % Select a successor Nodel of Node 13.3 Consider some simple two-person, perfect-information game without chance and
solve(Node 1, Tree).
define its AND/OR representation. Use a depth-first AND/OR search program to find
solve( Node, Node ---> and: Trees)
winning strategies in the form of AND/OR trees.
Node--->and: Nodes, % C:ode is an AND node
solveall(Nodes, Trees). % Solve all >"ode's successors
% solveall( [Node!, Node2, ... ], [SolutionTreel, SolutionTree2, ... ] )
solveall( [ ], [ ] ). 13.4 Best-first AND/OR search
solveall( [Node I Nodes], [Tree I Trees] )
solve(Node, Tree),
solveall(Nodes, Trees). 13.4.1 Heuristic estimates and the search algorithm
show(Tree) :- % Display solution tree
show(Tree, 0). 1
X, Indented by 0 The basic search procedures of the previous section search AND/OR graphs system
% show( Tree, H): display solution tree indented by H atically and exhaustively, without any heuristic guidance. For complex problems
such procedures are too inefficient due to the combinatorial complexity of the
show(Node --->Tree, H) :- !,
search space. Heuristic guidance that aims to reduce the complexity by avoiding
write(Node), write(' --->'),
HlisH+7, useless alternatives becomes necessary. The heuristic guidance introduced in this
show(Tree,Hl). section will be based on numerical heuristic estimates of the difficulty of problems
show(and: [T], H) !, % Display single AND tree in the AND/OR graph. The program that we shall develop is an impl ementation of
show(T). the algorithm known as AO*. It can be viewed as a generalization of the A* best-first
% Display AND list of solution trees search program for the state-space representation of Chapter 12.
show(and: [T [ Ts],H) !,
show(T,H), Let us begin by introducing an optimization criterion based on the c osts of arcs in
tab(H), the AND/OR graph. First, we extend our representation of AND/OR graphs to
show(and: Ts, H). include arc costs. For example, the AND/OR graph of Figure 13.4 can be represented
show(Node,H) :- by the following clauses:
write(Node), nl.
a --->or: [b/1, c/3).
b --->and: [d/1, e/1].
Figure 13.8 Depth-first search for AND/OR graphs. This program does not avoid infinite c --->and: [f/2, g/1].
cycling. Procedure solve finds a solution tree and procedure show displays such e --->or: [h/6].
a tree. show assumes that each node only takes one character on output. f --->or: [h/2, i/3].
goal(d). goal(g). goal(h).
As with iterative deepening in state space (see Chapter 11), a disadvantage of this
breadth-first simulation is that the program researches top parts of the sear ch space We shall define the cost of a solution tree as the sum of ail the arc costs in the tree.
each time that the depth limit is increased. On the other hand, the important The optimization obj ective is to find a minimum-cost solution-tree. For illustration,
advantage as compared with genuine breadth-first search is space economy. see Figure 13.4 again.
306 Problem Decomposition and AND/OR Graphs Best-first AND/OR search 307
It is useful to define the cost of a node in the AND/OR graph as the cost of formula is justified by the fact that, to solve N, we just have to solve one of its
the node's optimal solution tree. So defined, the cost of a node corresponds to the successors.
difficulty of the node. The difficulty of an AND node N is approximated by:
We shall now assume that we can estimate the costs of nodes (without knowing
their solution trees) in the AND/OR graph with some heuristic function h. Such H(N) = 'i)cost(N, N;) + H(N;))
estimates will be used for guiding the search. Our heuristic search program will
begin the search with the start node and, by expanding already visited nodes, We say that the H-value of an interior node is a 'backed-up' estimate.
gradually grow a search tree. This process will grow a tree even in cases where the In our search program, it will be more practical to use (instead of the fl-values)
AND/OR graph itself is not a tree; in such a case the graph unfolds into a tree by another measure, F, defined in terms of H, as follows. Let a node Mbe the predecessor
duplicating parts of the graph. of Nin the search tree, and the cost of the arc from M to N be cost(M, N), then we
The search process will at any time of the search select the 'most promising' define:
candidate solution tree for the next expansion. Now, how is the function h used to
estimate how promising a candidate solution tree is? Or, how promising a node (the
F(N) = cost(1vf. N) + H(N)
root of a candidate solution tree) is? Accordingly, if Mis the parent node of N, and N1 , N2 , ... are N's children, then:
For a node Nin the search tree, H(N)will denote its estimated difficulty. For a tip
node N of the current search tree, H(N) is simply Jz(N). On the other hand, for an F(N) = cost(M,N) + min F(N;), if N is an OR node
interior node of the search tree we do not have to use function II directly because we
already have some additional information about such a node; that is, we already F(N) = cost(M,N) + L_F(N;). if N is an AND node
know its successors. Therefore, as Figure 13.9 shows, for an interior OR node N we
approximate its difficulty as: The start node S of the search has no predecessor, but let us choose the cost of its
(virtual) incoming arc as 0. Now, if lz for all goal nodes in the AND/OR graph is 0,
H(N) = min(cost(N,N;) + H(N;))
and an optimal solution tree has been found, then F(S) is just the cost of this
solution tree (that is, the sum of all the costs of its arcs).
where cost(N, N;) is the cost of the arc from N to N;. The minimization rule in this
At any stage of the search, each successor of an OR node represents an alternative
candidate solution subtree. The search process will always decide to continue the
exploration at that successor whose F-value is minimal. Let us return to Figure 13.4
OR node again and trace such a search process when searching the AND/OR graph of this
1-/(N) = min (cos1(N, Ni)+ /-/(Ni)) figure. Initially, the search tree is just the start node a, and then the tree grows until
a solution tree is found. Figure 13.10 shows some snapshots taken during the growth
of the search tree. We shall assume for simplicity that h = 0 for all the nodes.
Numbers attached to nodes in Figure 13.10 are the F-values of the nodes (of course,
these change during the search as more information is accumulated). Here are some
explanatory remarks to Figure 13.10.
Expanding the initial search tree (snapshot A) produces tree B. Node a is an OR
AND node node, so we now have two candidate solution subtrees: b and c. As F(b) = 1 < 3
= F(c), alternative b is selected for expansion. Now, how far can alternative b be
1-/(N) = I (cost(N, Ni)+ /-/(Ni))
expanded? The expansion can proceed until either:
cos1(N, NI)
( 1) the F-value of node b has become greater than that of its competitor c, or
05 (2) it has become clear that a solution tree has been found.
So candidate b starts to grow with the upper bound for F(b): F(b),:; 3 = F(c). First, b's
Figure 13.9 Estimating the difficulty, H, of problems in the AND/OR graph. successors d and e are generated (snapshot C) and the F-value of bis increased to 3.
Best-first AND/OR search 309
308 Problem Decomposition and AND/OR Graphs
As this does not exceed the upper bound, the candidate tree rooted in b continues to
expand. Node dis found to be a goal node, and then node e is expanded, resulting in
(A) (B) snapshot D. At this point F(b) = 9 > 3, which stops the expansion of alternative b.
Go 3
This prevents the process from realizing that his also a goal node and that a solution
tree has already been generated. Instead, the activity now switches to the competing
alternative c. The bound on F(c) for expanding this alternative is set to 9, since at
03 this point F(b) = 9. Within this bound the candidate tree rooted inc is expanded
candidate l candidate 2 until the situation of snapshot Eis reached. Now the process realizes that a solution
tree (which includes goal nodes h and g) has been found, and the whole process
terminates. Notice that the cheaper of the two possible solution trees was reported a
(C) (D) solution by this process - that is, the solution tree in Figure 13.4(c).
a J 3
candidare 2 A program that implements the ideas of the previous section is given in Figure 13.12.
Before explaining some details of this program, let us consider the representation of
the search tree that this program uses.
There are several cases, as shown in Figure 13.11. The different forms of the search
tree arise from combining the following possibilities with respect to the tree's size
candidate l and 'solution status':
• Size:
candidate l
(E) (l) the tree is either a single node tree (a leaf), or
(2) it has a root and (non-empty) subtrees.
• Solution status:
(1) the tree has already been discovered to be solved (the tree is a solution
tree), or
(2) it is still just a candidate solution tree.
The principal functor used to represent the tree indicates a combination of these
w
fd'\
possibilities. This can be one of the following:
leaf solvedleaf tree solved tree
Case I: Search leaf The list of subtrees is always ordered according to increasing F-values. A subtree can
,c already be solved. Such subtrees are accommodated at the encl of the list.
Now to the program of Figure 13.12. The top-level relation is
leaf( N. F, Cl
F = C + h(N) andor( Node, SolutionTree)
Case 2: Search tree with OR subtrees where Node is the start node of the search. The program produces a solution tree (if
t
one exists) with the aspiration that this will be an optimal solution. Whether it will
really be a cheapest solution depends on the heuristic function h used by the
algorithm. There is a theorem that talks about this dependence on h. The theorem is
tree( N. F,C,or:[Tl,T2.... ]) similar to the admissibility theorem about the state-space, best-first search of
Chapter 12 (algorithm A*). Let COST(N) denote the cost of a cheapest solution tree
of a node N. If for each node N in the AND/OR graph the heuristic estimate
h(N),:; COST(N) then andor is guaranteed to find an optimal solution. If h does not
Case 3: Search tree with AND subtrees satisfy this condition then the solution found may be suboptimal. A trivial heuristic
function that satisfies the admissibility condition is /z = 0 for all the nodes. The
disadvantage of this function is, of course, lack of heuristic power.
!
tc Search tree is either:
solvedleaf ( N, F) tree( Node, F, C, SubTrees) tree of candidate solutions
F=C
Case 5: Solution tree rooted at OR node
leaf( Node, F, C) leaf of a search tree
l F=C+F 1
solvedleaf( Node, F) leaf of solution tree
Treel is, depending on the cases above, either a solution tree, an extension of Tree
Figure 13.12 contd just beyond Bound, or uninstantiated in the case Solved = never.
'¼, 'backup' finds the backed-up F-value of AND/OR tree list Procedure
backup(or:[Tree I _ l, F) 0
/r, First tree in OR list is best expandlist(Trees, Bound, Trees 1, Solved)
f(Tree, F), !.
backup(and: [ ], 0) :- !. is similar to expand. As in expand, Bonnel is a limit of the expansion of a tree, and
backup(and: [Treel I Trees], F) Solved is an indicator of what happened during the expansion ('yes', 'no' or 'never').
f( Treel, Fl), The first argument is, however, a list of trees (an AND list or an OR list):
backup(and : Trees, F2),
F is Fl + F2, !. Trees= or: [Tl, T2, ... ] or Trees = and : [T1, T2, ...]
backup(Tree, F)
f(Tree, F). expandlist selects the most promising tree T (according to F-values) in Trees. Due to
the ordering of the subtrees this is always the first tree in Trees. This most promising
% Relation selecttree( Trees, BestTree, OtherTrees, Bound, Boundl):
% OtherTrees is an AND/OR list Trees without its best member subtree is expanded with a new bound Bound I. Bound! depends on Bound and also
% BestTree; Bound is expansion bound for Trees, Bound I is on the other trees in Trees. If Trees is an OR list then Bonnell is the lower of Bound
'¾, expansion bound for BestTree and the F-value of the next best tree in Trees. If Trees is an AND list then Bound l is
Bound minus the sum of the F-values of the remaining trees in Trees. Trees! depends
selecttree( Op : [Tree], Tree, Op:[], Bound, Bound) !. % The only candidate
on the case indicated by Solved. In the case Solved= no, Treesl is the list Trees with
selecttree(Op: [Tree I Trees], Tree, Op: Trees, Bound, Bound!) the most promising tree in Trees expanded with BoundL In the case Solved= yes,
backup(Op : Trees, F),
Treesl is a solution of the list Trees (found within Bound). If Solved = never, Treesl is
(Op= or, !, min(Bound, F, Boundl);
Op= and, Boundl is Bound - F). uninstantiated.
The procedure continue, which is called after expanding a tree list, decides what
min(A, B, A) :- A < B, !.
to do next, depending on the results of expandlist. It either constructs a solution
min(A, B, B). tree, or updates the search tree and continues its expansion, or signals 'never' in the
case that the tree list was found unsolvable.
Another procedure,
The key relation in the program of Figure 13.12 is: combine(OtherTrees, NewTree, Solvecll, NewTrees, Solved)
expand(Tree, Bound, Tree 1, Solved)
relates several objects dealt with in expancllist. NewTree is the expanded tree in the
Tree and Bound are 'input' arguments, and Treel and Solved are 'output' arguments. tree list of expandlist, OtherTrees are the remaining, unchanged trees in the tree list,
Their meaning is: and Solved1 indicates the 'solution-status' of NewTree. combine handles several cases,
depending on Solvedl and on whether the tree list is an AND list or an OR list. For
Tree is a search tree that is to be expanded. example, the clause
Bound is a limit for the F-value within which Tree is allowed to expand.
Solved is an indicator whose value indicates one of the following three cases: combine(or:_, Tree, yes, Tree, yes).
(1) Solved= yes: Tree can be expanded within bound so as to comprise a solution says: in the case that the tree list is an OR list, and the just expanded tree was
tree Treel; solved, and its solution tree is Tree, then the whole list has also been solved, and its
(2) Solved = no: Tree can be expanded to Treel so that the F-value of Treel solution is Tree itself. Other cases are best understood from the code of combine
exceeds Bound, and there was no solution subtree before the F-value over itself.
stepped Bound; For displaying a solution tree, a procedure similar to show of Figure 13.8 can be
(3) Solved= never: Tree is unsolvable. defined. This procedure is left as an exercise for the reader.
316 Problem Decomposition and AND/OR Graphs Summary 317
• Nodes of an AND/OR graph are of two types: AND nodes and OR nodes.
• A concrete problem is defined by a start node and a goal condition. A solution of chapter 14
a problem is represented by a solution graph.
• Costs of arcs and nodes can be introduced into an AND/OR graph to model
319
320 Constraint Logic Programming Constraint satisfaction and logic programming 321
·<
b ---►d
All the starting times are determined except for task c, which may start at any time in
the interval between 2 and 4.
For case 2, the arc consistency, of course, does not guarantee that all possible executed again, reducing all the domains to singletons except the domain of Tc
combinations of domain values are solutions to the constraint problem. It may even to 2 ..4.
be that no combination of values actually satisfies all the constraints. Therefore Notice how consistency techniques exploit the constraints to reduce the domains
some combinatorial search is needed over the reduced domains to find a solution. of variables as soon as new information is available. New information triggers the
One possibility is to choose one of the multi-valued domains and try to assign related constraints, which results in reduced domains of the concerned vaciables.
repeatedly its values to the corresponding variable. Assigning a particular value to Such execution can be viewed as data-driven. Constraints are active in the sense that
the variable means reducing the variable's domain, possibly causing inconsistent they do not wait to be explicitly called by the programmer, but activate automatic
arcs to appear again. So the consistency algorithm can be applied again to further ally when relevant information appears. This idea of data-driven computation is
reduce the domains of variables, etc. If the domains are finite, this will eventually further discussed in Chapter 23 under 'Pattern-directed programming'.
result in either an empty domain, or all the domains single-valued. The search can
be done differently, not necessarily by choosing a single value from a domain. An
alternative policy may be to choose a non-singleton domain and split it into two Exercises
approximately equal size subsets. The algorithm is then applied to both subsets.
For illustration let us consider how this algorithm may work on our scheduling 14.1 Try to execute the arc consistency algorithm with different orders of the arcs. What
example. Let the domains of all the variables be integers between O and 10. happens?
Figure 14.2 shows the constraint network and the trace of a constraint satisfaction
algorithm. Initially, at step 'Start', all the domains are O .. 10. In each execution 14.2 Execute the arc consistency algorithm on the final state of the trace in Figure 14.2,
step, one of the arcs in the network is made consistent. In step 1, arc (Tb, Ta) after Tf is assigned value 9.
is considered, which reduces the domain of Tb to 2 .. 10. Next, arc (Td, Tb) is
considered, which reduces the domain of Td to 5 .. 10, etc. After step 8 all the arcs
are consistent and all the reduced domains are multi-valued. Being interested in the 14.1.3 Extending Prolog to constraint logic programming
minimal frnishing time, assigning Tf = 9 may now be tried. Arc consistency is then
Let us now consider the relation between Prolog and constraint satisfaction. Pure
Tb+ 3 s Td Prolog itself can be viewed as a rather specific constraint satisfaction language where
Tb---------Td all the constraints are of very limited form. They are just equalities between terms.
Ta+2S1/ �d+4STJ These equality constraints are checked by Prolog's matching of terms. Although
constraints among the arguments of a predicate are also stated in terms of other
Ta ----- Tf
f predicates, these predicate calls eventually unfold to match[ng. Prolog can be
Ta� Tc ________Tc+ 5 S T
extended to a 'real' CLP language by introducing other types of constra[nts in
addition to match[ng. Of course, the Prolog interpreter has to be enhanced so that it
can handle these other types of constraints. A CLP system that can handle
Step Arc Ta Tb Tc Td Tf
arithmetic equality and inequality constraints can directly solve our example
Start 0 .. 10 0 .. 10 0.. 10 0 .. 10 0 .. 10 scheduling problem as stated above.
l (Tb,Ta) 2 .. 10 A program with constraints is interpreted roughly as follows. During the
2 (Td,Tb) S .. 10 execution of a list of goals, a set CurrConstr of current constraints is maintained.
3 (Tf,Td) 9 .. 10
Initially this set is empty.The goals in the list of goals are executed one by one in the
4 (Td,Tf) S .. 6
usual order. Normal Prolog goals are processed as usual. When a goal with
s (Tb,Td) 2 .. 3
constraints Constr is processed, the constraint sets Constr and CurrConstr are merged,
6 (Ta,Tb) 0 .. l
7 (Tc,Ta) 2 .. 10 giving NewConstr. The domain-specialized constraint solver then tries to satisfy
8 (Tc,Tf) 2 .. S NewConstr. Two basic outcomes are possible: (a) NewConstr is found to be unsatisfi
able, which corresponds to the goal's failure and causes backtracking; (b) NewConstr
Figure 14.2 Top: constraint network for the scheduling problem. Bottom: an arc consistency is not found to be unsatisfiable, and constraints in NewConstr are simplified as much
execution trace. as possible by the constraint solver. For example, two constraints X � 3 and X � 2
324 Constraint Logic Programming CLP over real numbers: CLP(R) 325
are simplified into the single constraint X :( 2. The extent of simplification depends Each constraint is of the form:
on the current state of the information about the variables, as well as on the abilities
Exprl Operator Expr2
of the particular constraint solver. The remaining goals in the list are executed with
the so updated set of current constraints. Both Exprl and Expr2 are usual arithmetic expressions. They may, depending on the
CLP systems differ in the domains and types of constraints they can process. particular CLP(R) system, also include calls to some standard functions, such as
Families of CLP techniques appear under names of the form CLP(X) where X stands sin(X). The operator can be one of the following, depending on the type of
for the domain. For example, in CLP(R) the domains of the variables are real constraint:
numbers, and constraints are arithmetic equalities, inequalities and disequalities
over real numbers. CLP(X) systems over other domains include: CLP(Z) (integers), for equations
CLP(Q) (rational numbers), CLP(B) (Boolean domains), and CLP(FD) (user-defined =\= for disequations
finite domains). Available domains and types of constraints in actual implementa <, =<, >, >= for inequations
tions largely depend on the available techniques for solving particular types of
constraints. In CLP(R), for example, linear equalities and inequalities are typically Let us now took at some simple examples of using these constraints, and study in
available because efficient techniques exist for handling these types of constraints. particular the flexibility they offer in comparison with the usual built-in numerical
On the other hand, the use of non-linear constraints is very limited. facilities in Prolog.
In the remainder of this chapter we will look in more detail at CLP(R), CLP(Q) and To convert temperature from Centigrade into Fahrenheit, we may in Prolog use
CLP(FD) using the syntactic conventions for CLP in SICStus Protog (see reference at the built-in predicate 'is'. For example:
the end of the chapter). convert( Centigrade, Fahrenheit) :-
Centigrade is (Fahrenheit - 32)*5/9.
This can be used to convert 35 degrees Centigrade into Fahrenheit, but not to
14.2 CLP over real numbers: CLP(R) convert from Fahrenheit to Centigrade because the built-in predicate is expects
everything on the right-hand side instantiated. To make the procedure work in both
Consider the query: directions, we can test which of the arguments is instantiated to a number, and then
use the conversion formula properly rewritten for each case. All this is much more
?- 1 + X = 5. elegant in CLP(R), where the same formula, interpreted as a numerical constraint,
works in both directions.
In Prolog this matching fails, so Prolog's answer is 'no'. However, if the user's
intention is that X is a number and '+' is arithmetic addition, then the answer convert( Centigrade, Fahrenheit)
X = 4 would be more appropriate. Using the built-in predicate is instead of'=' does {Centigrade= (Fahrenheit - 32)*5/9 }.
not quite achieve this interpretation, but a CLP(R) system does. In our syntactic
?- convert( 35, F).
convention, this CLP(R) query will be written as:
F = 95
?- {l+X=5}. '¾, Numerical constraint
X= 4 ?- convert( C, 95).
C = 35
The constraint is handled by a specialized constraint solver that 'understands'
Our CLP(R) program even works when neither of the two arguments is instantiated:
operations on real numbers, and can typically solve sets of equations or inequations
of certain types. According to our syntactic convention, a set of constraints is ?- convert( C, F).
inserted into a Prolog clause as a goal enclosed in curly brackets. Individual { F = 32.0 + LS*C l
constraints are separated by commas and semicolons. As in Prolog, a comma means
As the calculation in this case is not possible, the answer is a formula, meaning: the
conjunction, and a semicolon means disjunction. So the conjunction of constraints
solution is a set of all F and C that satisfy this formula. Notice that this formula,
Cl, CZ and C3 is written as:
produced by the CLP system, is a simplification of the constraint in our convert
{Cl, CZ, C3} program.
326 Constraint Logic Programming CLP over real numbers: CLP(R) 327
A typical constraint solver can handle sets of linear equations, inequations and ?- { 2=< X, X=< 5}, inf( X, Min), sup( X, Max).
disequations. Here are some examples: Max 5.0
Min= 2.0
?- I 3,X - 2,y 6, 2,Y= X). {X >= 2.0)
X= 3.0 {X=< 5.0)
Y= 1.5
?- {X>=2, Y>=2, Y=< X+l, 2*Y=< 8-X, Z= 2,X + 3•Y),
?- { Z=< X-2, Z=< 6-X, Z+l= 2). sup(Z,Max), inf(Z,Min), maximize(Z).
Z= 1.0 X 4.0
{X >= 3.0) Y = 2.0
IX=< 5.0) Z = 14.0
Max= 14.0
A CLP(R) solver also includes a linear optimization facility. This fmds the extreme
Min= 10.0
value of a given linear expression inside the region that satisfies the given linear
constraints. The built-in CLP(R) predicates for this are: Our initial simple scheduling example with four tasks a, b, c and d can be stated in
minimize( Expr) CLP(R) in a straightforward way:
maximize( Expr)
?- {Ta+ 2=< Tb, % a precedes b
In these two predicates, Expr is a linear expression in terms of variables that appear Ta+ 2=< Tc, /c, a precedes c
0
in linear constraints. The predicates find the variable values that satisfy these Tb+ 3=< Td, % b precedes d
Tc+ 5=< Tf, % c finished by finishing time Tf
constraints and respectively minimize or maximize the value of the expression. For
Td + 4=< Tf}, % d finished by Tf
example: minimize( Tf).
?- IX=< S}, maximize(X). Ta= 0.0
X = 5.0 Tb= 2.0
Td = 5.0
?- { X=< 5, 2=< XI, minimize( 2*X + 3). Tf = 9.0
X = 2.0 {Tc=<4.0)
{Tc>= 2.0)
?- {X>=2, Y >=2, Y=< X+l, 2,Y=< 8-X, Z= 2,X + 3,Y), maximize(Z).
X =4.0 The next example further illustrates the flexibility of constraints in comparison with
Y = 2.0
Prolog standard arithmetic facilities. Consider the predicate fib(N ,F ) for computing
Z = 14.0
F as the Nth Fibonacci number: F(O)=l, F(l)=l, F(2)=2, F(3)=3, F(4)=5, etc. In
?- IX=< 5), minimize( X). general, for N>l, F(N)=F(N-l)+F(N-2). Here is a Prolog definition of fib/2:
no
fib( N, F ) :
In the last example, X was not bounded downwards, therefore the minimization N=O, F=l
goal failed.
The following CLP(R) predicates find the supremum (least upper bound) or N=l, F=l
infimum (greatest lower bound) of an expression:
N>l,
sup( Expr, MaxVal) Nl is N-1, fib(N l,Fl),
inf( Expr, Min Val) N2 is N-2, fib(N2,F2},
Fis Fl + F2.
Expr is a linear expression in terms of linearly constrained variables. MaxVal
and MinVal are the maximum and the minimum values that this expression takes An intended use of this program is:
within the region where the constraints are satisfied. Unlike with maximize/1 and
minimize/1, the variables in Expr do not get instantiated to the extreme points. For ?- fib( 6,F).
example: F=l3
328 Constraint Logic Programming Scheduling with CLP 329
However, this program still gets into trouble when asked an unsatisfiable question: A CLP(Q) solver gives: X=2/3, Y =1/3
The problem is to find a schedule whose finishing time is minimal. A schedule 01<, Scheduling with CLP with unlimited resources
assigns a processor to each task, and states the starting time for each task. Of course,
a schedule has to satisfy all the precedence and resource constraints: each task has to schedule( Schedule, FinTime) :-
be executed by a suitable processor, and no processor can execute two tasks at the tasks( TasksDurs),
prececlence_constr( TasksDurs, Schedule, FinTime), 'Yo Construct precedence constraints
same time. The variables in the corresponding CLP formulation of the scheduling
minimize( FinTime).
problem are: start times 51, ..., Sn, and names Pl, ... , Pn of the processors assigned
to the each task. prececlence_constr( [ ], [ ], FinTime).
An easy special case of this scheduling problem is when there is no constraint on precedence_constr( [T/D [ TDs], [T/Start/1) [ Rest], FinTime)
resources. In this case resources are assumed unlimited, so there is a free processor I Start>= 0, % Earliest start at 0
always available to do any task at any time. Therefore, in this case only the Start + D =< FinTirnel, 0
/r, Must finish by FinTime
prececlence_constr( TDs, Rest, FinTime),
constraints corresponding to the precedences among the tasks have to be satisfied.
prec_constr( T /Start/D, Rest).
As already shown in the introductory section to this chapter, these can be stated
prec_constr( _ , [ ]).
in a straightforward way. Suppose we have a precedence constraint regarding tasks
a and b: prec(a,b). Let the duration of task a be Da, and the starting times of a and b prec_constr( T/S/D.. [Tl/Sl/Dl [ Rest])
be Sa and Sb. To satisfy the precedence constraint, Sa and Sb have to satisfy the ( prec('C Tl),!, I S-;-D =< Sil
constraint:
prec( Tl, T), !, I Sl+Dl =< S}
I Sa + Da = < Sb l
tme ),
In addition we require that no task Ti may start before time 0, and all the tasks must prec_constr( T /5/D, Rest).
be completed by the finishing time FinTime of the schedule:
ry,, List of tasks to be scheduled
I Si>= 0, Si-" Di=< FinTime )
tasks( [ tl/5, t2/7, t3/10, t4/2, tS/9]).
We will specify a particular scheduling task by the following predicates: '¼, Prece(ience constraints
tasks( [Taskl/Durationl, Task2/Duration2, ... 1) prec( tl, t2). prec( tl, t4). prec( t2, t3). prec( t4, tS).
This gives the list of all the task names and their durations.
prec( Task l, Task2) Figure 14.3 Scheduling with precedence constraints and no resource constraints.
constructs the constraints among the start times of the tasks in Schedule, and the % Scheduling with CLP with limited resources
schedule's finishing time FinTime. The predicate
schedule( BestSchedule, BestTime) :
prec_cons t r( Task/Start/Duration, RestOfSchedule) tasks( Tasks Ours),
precedence_constr( Tasks Ours, Schedule, FinTime), % Set up precedence inequalities
constructs the constraints between the start time Start of Task and the start times in
ini tialize_bound, '¾, Initialize bollnd on finishing time
RestOfSchedule, so that these constraints on start times correspond to the precedence assign_processors( Scheclule, FinTime), % Assign processors to tasks
constraints among the tasks. minimize( FinTirne),
The program in Figure 14.3 also comprises the definition of a simple scheduling upclate_bouncl( Schedule, FinTime),
problem with five tasks. The scheduler is executed by the question: fail 1<, Backtrack to find more schedules
0
This is almost the same as in Figure 14.3. The only slight difference is due to the
Figure 14.4 contd different representation of a schedule.
% resource_constr( ScheduledTask, TaskList): Let us now deal with resource constraints. This cannot be done so efficiently as
% Construct constraints to ensure no resource conflict with precedence constraints. To satisfy resource constraints, an optimal assignment
% between ScheduledTask and TaskList of processors to tasks is needed. This requires a search among possible assignments,
resource_constr( _, [ ]). and there is no general way to do this in polynomial time. [n the program of
Figure 14.4, this search is done according to the branch-and-bound method, roughly
resource_constr( Task, [Taskl I Rest])
no_conflict( Task, Task1), as follows. Alternative schedules are non-deterministically constructed one by one
resource_constr( Task, Rest). (generate a schedule and fail). The best schedule so far is asserted in the database as
a fact. Each time a new schedule is constructed, the best-so-far is updated. When a
no_conflict( T/P/S/D, T1/Pl/S1/Dl)
P\==P l ,! % Different processors new schedule is being constructed, the best-so-far finishing time is used as an upper
bound on the finishing time of the new schedule. As soon as it is found that a new,
prec( T, Tl),! '¾, Already constrained partially built schedule cannot possibly better the best-so-far time, it is abandoned.
This is implemented in Figure 14.4 as follows. Procedure assign_processors non
prec( T1, T), ! 1i, Already constrained
0
deterministically assigns suitable processors to tasks, one at a time. Assigning a
processor to a task results in additional constraints on start times, to ensure that
{S+D=<Sl % Same processor, no time overlap
there is no time overlap between tasks assigned to the same processor. So each time a
processor is assigned to a task, the partial schedule is refined. Each time a partial
Sl+Dl=<S}.
schedule is so refined, it is checked whether it has any chance of bettering the best
initialize_bound schedule so far. For the current partial schedule to have any such chance, FinTime
retract(bestsofar(_,_)), fail
has to be less than the best time so far. In the program this is done by constraining:
assert( bestsofar( dummy_schedule, 9999)). A, Assuming 9999 > any finishing time
0
{ FinTirne < BestTimeSoFar }
01<, update_bound( Schedule, FinTime):
If this constraint is incompatible with the other current constraints, then this partial
% update best schedule and time
schedule has no hope. As more resource constraints will have to be satisfied to
update_bound( Schedule, FinTime) :- complete the schedule, the actual finishing time may eventually only become
retract( bestsofar( _, _ )), !, worse. So if FinTime < BestTimeSoFar is not satisfiable, then the partial schedule is
assert( bestsofar( Schedule, FinTirne)).
abandoned; otherwise another task is assigned a processor, etc. Whenever a
'¾, List of tasks to be scheduled complete schedule is built, its finishing time is guaranteed to be less than the best
tasks( [t1/4,t2/2,t3/2, t4/20, tS/20, t6/11, t7/llj). finishing time found so far. Therefore the best-so-far is updated. Finally, when the
best-so-far cannot be bettered, the search stops. Of course, this algorithm only
% Precedence constraints
produces one of possibly many best schedules.
prec( tl, t4). prec( tl, tS). prec( t2, t4). prec( t2, tS). It should be noted that this process is combinatorially complex due to the
prec( t2, t6). prec( t3, tS). prec( t3, t6). prec( t3, t7). exponential number of possible assignments of processors to tasks. Bounding
% resource( Task, Processors): the current partial schedule by BestTimeSoFar leads to abandoning whole sets of
% Any processor in Processors suitable for Task bad schedules before they are completely built. How much computation time is
resource( _ , [l,2,3]). % Three processors, all suitable for any task saved by this depends on how good the upper bound is. [f the upper bound is tight,
bad schedules will be recognized and abandoned at an early stage thus saving more
time. So the sooner some good schedule is found, the sooner a tight upper bound is
which finds a schedule with the minimum finishing time BestTime. Inequality applied and more search space is pruned away.
constraints on start times due to precedences between tasks are again constructed by Figure 14.4 also includes the specification, according to our representation
the predicate convention, of the scheduling problem of Figure 12.8. The question to schedule
precedence_constr( TasksDurations, Schedule, FinTirne) this problem is:
336 Constraint Logic Programming A simulation program with constraints 337
?- schedule( Schedule, FinTime). Real-valued parameters and variables are associated with components, such as
FinTime = 24 electrical resistances, voltages and currents. Such a setting fits well the style of
Schedule= [ tl/3/0/4, t2/2/0/2, t3/l/0/2, t4/3/4/20, constraint programming. The laws of physics impose constraints on the variables
tS/2/4/20, t6/l/2/ll, t?/1/13/11] associated with components. Connections between components impose additional
Task tl is executed by processor 3 starting at time 0, task t2 by processor 2 starting at constraints. So to carry out numerical simulation with CLP(R) for a family of
0, etc. There is one more point of interest in this scheduling problem. All the three systems, such as electrical networks, we have to defme the laws for the types
available processors are equivalent, so permuting the assignments of the processors of components in the domain, and the laws for connecting components. These laws
to the tasks has no effect. Therefore it makes no sense to search through all the are stated as constraints on variables. To simulate a concrete system from such a
permutations as they should give the same results. vVe could avoid these useless family, we then have to specify the concrete components and connections in the
permutations by, for example, fixing processor 1 to task t7, and limiting the choice system. This will cause the CLP interpreter to set up the constraints for the complete
for task t6 to processors 1 and 2 only. This can be done easily by changing the system, and carry out the simulation by satisfying the constraints. Of course, this
predicate resource. Although this is in general a good idea, it turns out that it is not approach is effective if the type of constraints in the simulated domain are handled
worth doing in this particular exercise. Although this would reduce the number of efficiently by our particular CLP system.
possible assignments by a factor of 6, the time saving is, possibly surprisingly, In this section we will apply this approach to the simulation of electrical circuits
insignificant. The reason is that, once an optimal schedule is found, this gives a tight consisting of resistors, diodes and batteries. The relations between voltages and
upper bound on the finishing time, and then other possible processor assignments currents in such circuits are piecewise linear. Given that our CLP(R) system
are abandoned very quickly. efficiently handles linear equations and inequations, it is a suitable tool for
simulating such circuits.
Figure 14.5 shows our components and connections, and the corresponding
constraints enforced by the components and connections. We can define these
Exercises elements in a CLP(R) program as follows. A resistor has some resistance R and two
terminals Tl and T2. The variables associated with each terminal are the electrical
14.4 Experiment with the program in Figure 14.3. Try different resource specifications,
potential V and the current I (directed into the resistor). So a terminal T is a pair
aiming at removing useless permutations, and timing the program's running times.
(V,[). The lawful behaviour of the resistor can then be defmed by the predicate:
What are the improvements?
resistor( (Vl,11), (V2,l2), R) :-
14.5 The program in Figure 14.4 initializes the upper bound on the finishing times I 11 = -12, Vl-V2 = lhR }.
(bestsofar) to a high value that is obviously a gross overestimate. This makes sure that
an optimal schedule is within the bound, and will be found by the program. This The behaviour of the battery can be defined similarly, as shown in the program of
policy, although safe, is inefficient because such a loose upper bound does not Figure 14.6. The figme also gives a definition of the diode. For connections, it is best
constrain the search well. Investigate different policies of initializing and changing to define the general case when any number of terminals are connected:
the upper bound, for example, starting with a very low bound and increasing it if conn( [ Terminall, Termina\2, ... ])
necessary (if no schedule exists within this bound). Compare the run times of
various policies. Measure the nm time for the case when the bound is immediately The voltages at all the terminals must be equal, and the sum of the currents into all
set to the true minimal finishing time. of the terminals must be equal to 0.
It is now easy to compose circuits to be simulated. Figure 14.7 shows some
circuits. The figure also gives definitions of these circuits executable by our simulator
in CLP(R). Consider circuit (a). The following example illustrates that our simulator
14.4 �..?.i.'.!:.�.1.��·i·°-·n...P.'..?.gr.�f!l..�.i.�0..�.°.0.?.!!.�.i.n.�?........................................................... can, to some extent, also be used for design, not only for simulation. In the
definition of predicate circuit_a in Figure 14.7, we have chosen to make the terminal
Numerical simulation can be sometimes done very elegantly with CLP(R). It is T21 one of the arguments of this predicate. This makes it possible to 'read' the
particularly appropriate when a simulated system can be viewed as consisting of a voltage and current at this point in the circuit. The potential at terminal T2 is fixed
number of components and connections among the components. Electrical circuits to 0, the battery has 10 V, but the resistors are left unspecified (they also are
are examples of such systems. Resistors and capacitors are examples of components. arguments of circuit_a).
338 Constraint Logic Programming A simulation program with constraints 339
�--
II R 12
diode( (Vl,11), (V2,!2) )
Vl-V2=ll•R
(a) I 11 + 12 = o I,
II =-12
I l1 > 0, Vl = V2
I ,( .
II 12 ll = 0, Vl =< V2 }.
VI-V2=U
I
(b) • ) battery( {Vl,ll), (V2,12), Voltage)
II=-12
VI V2 { 11 + 12 = 0, Voltage = Vl - V2 j.
u % conn( [Tl,T2,...]):
% Terminals Tl, T2, ... connected
% Therefore all electrical potentials equal, sum of currents = 0
II 12
11+ 12= 0 conn( Terminals) :-
(c) • ) ,,: • 11 > 0 => Vl =V2 conn( Terminals, 0).
� II =O =>VI �V2
VI � conn( [ (V,I) ] , Sum) :
u {Sum+ I= 0 }.
conn( [ (Vl,ll), (V2,l2) 1 Rest],Sum)
- ------
- -�--
{ Vl = V2, Suml =Sum+ Ill,
II 12
conn( [ (V2, 12) I Rest], Sum!).
Vl=V2
(d) II+ 12= 0
VI V2
II Vl V2 12
Vl=V2=V3
(e ) � II+ 12 + I3 = 0
(a) Tl Tl!
RI
T12
Figure 14.5 Components and connections for electrical circuits, and the corresponding !OV
- T21
constraints; (a) resistor; (b) battery; (c) diode; (d, e) connection between two or
�
three terminals.
-
R2
w
T2 T22
% Electric circuit simulator in CLP(R)
% resistor( Tl, T2, R): circuit_a( Rl, RZ, T21)
% R=resistance; Tl, T2 its terminals T2 = (O,_), % Terminal T2 at potential O
battery( Tl, T2, 10), % Battery 10 V
resistor( (Vl,11), (VZ,12), R) :-
resistor( Tll, T12, Rl),
{ 11 = -12, Vl-V2 = IhR }.
resistor( T21, T22, R2),
% diode( Tl, T2): conn( [ Tl, Tl l]),
% Tl, T2 terminals of a diode conn( [ T12, T21]),
% Diode open in direction from Tl to T2 conn( [ T2, T22]).
Figure 14.6 Constraints for some electrical components and connections. Figure 14.7 Two electrical circuits.
340 Constraint Logic Programming CLP over finite domains: CLP(FD) 341
So the potentials at the terminals of RS are 7.340 V and 5.213 V respectively, and the
Figure 14.7 contd current is 0.04255 A.
,D
Tll
(b) Tl T2l
Exercise
Rl 14.6 Experiment with the program in Figure 14.7. Define other circuits. For example,
extend the circuit of Figure 14. 7(b) by adding a diode in series with resistor RS.
Tl2 T22 How does this affect the voltage at T51? Try also the opposite orientation of the
u diode.
T3l RS T4l
Let us now consider the more complex circuit (b). A question may be: Given #= equal
the battery voltage 10 V, what are the electrical potentials and the current at the #\= not equal
'middle' resistor RS? #< less than
?- circuit_b(lO, _, _, _, _, TSl, T52). #> greater than
T51 = ( 7.340425531914894, 0.0425531914893617) #=< less or equal
T52 = ( 5.212765957446809, -0.0425531914893617) #>= greater or equal
342 Constraint Logic Programming CLP over finite domains: CLP(FD) 343
% 8 queens in CLP(FD) The labelling option 'ff' stands for 'fi.rst fail'. That is, the variable with currently the
smallest domain will be assigned a value fi.rst. Having the smallest domain, this
solution( Ys) :- % Ys is list of Y-coordinates of queens
% There are 8 queens
variable is generally the most likely one to cause a failure. This labelling strategy
Ys = [_,_ ,_,_,_ ,_ ,_,_],
% All the coordinates have domains 1..8 aims at discovering inconsistency as soon as possible, thus avoiding futile search
domain( Ys, 1, 8),
all_different( Ys), % All different to avoid horizontal attacks through inconsistent alternatives. Measure the execution time of the modifi.ed
safe( Ys), % Constrain to prevent diagonal attacks program.
% Find concrete values for Ys
labeling( (], Ys).
14.8 Generalize the eight-queens CLP(FD) program to an N queens program. For large N,
safe( [ ]). a good labelling strategy for N queens is 'middle-out', which starts in the middle of
safe( [YI Ys]) the domain and then continues with values further and further away from the
no_attack( Y, Ys, 1), % 1 = horizontal distance between queen Y and Ys
middle. Implement this labelling strategy and compare its effi.ciency experimentally
safe( Ys).
with the straight labelling (as in Figure 14.9).
01b no_attack( Y, Ys, D):
constraint networks
arc consistency algorithms
constraint logic programming (CLP)
chapter 15
CLP(R), CLP(Q), CLP(FD)
branch-and-bound method
Knowledge Representation and
References
Expert Systems
Marriott and Stuckey (1998) is an excellent introduction to techniques of constraint satisfac
tion and CLP programming. Van Hentenryck (1989) is a well-known discussion of various 15.1 Functions and structure of an expert system 347
programming techniques in CLP. Jaffar and Maher (1994), and Mackworth (1992) survey
constraint solving techniques. Ongoing research in this area appears in the specialized journal 15.2 Representing knowledge with if-then rules 349
Constraints (published by Kluwer Academic), as well as in the Journal of Logic Programming and
the Arti(lcial I11tellige11ce Journal. The Fibonacci example in this chapter is similar to the one 15.3 Forward and backward chaining in rule-based systems 352
given by Cohen (1990).
The syntax for constraints in this chapter is as used in the SICStus Prolog (SICStus 1999). 15.4 Generating explanation 358
Cohen, J. (1990) Constraint logic programming languages. Commu11icatio11s of tl1e ACM, 33: 15.5 Introducing uncertainty 360
52-68.
Jaffar, J. and Maher, M. (1994) Constraint logic programming: a survey. Journal of Logic 15.6 Belief networks 363
Programming, 19-20: 503-581.
Mackworth, A.K. (1992) Constraint satisfaction. ln: Shapiro, S.C. (ed.) Encyclopedia o{Arti(lcial
15.7 Semantic networks and frames 372
Intelligence, second edition. New York: Wiley.
Marriott, K. and Stuckey, P.J. (1998) Programming with Co11stmi11ts: m1 Introduction. Cambridge,
MA: The MIT Press.
5/CStus Prolog User's Manual (1999) Stockholm: Swedish Institute of Computer Science
(http://,vww.sics.se/sicstus.html).
Van Hentenryck, P. (1989) Constraint Satisfi-1ction in Logic Programming. Cambridge, MA: MIT
An expert system is a program that behaves like an expert for some problem domain.
Press.
It should be capable of explaining its decisions and the underlying reasoning. Often
an expert system is expected to be able to deal with uncertain and incomplete
information. In this chapter we will review basic concepts in representing know
ledge and building expert systems.
347
r------------ -------- --- ----------,
3<l8 Knov,ledge Representation and Expert Systems Representing knowledge with if-then rules 349
consiclerecl an expert system. We will take the view that an expert system also has to I
be capable, in some way, of explaining its behaviour and its decisions to the user, as I
-
I I
human experts do. Such an explanation feature is especially necessary in uncertain Knowledge
I
Inference User
I
I
� User
I
domains (like medical diagnosis) to enhance the user's confidence in the system's base I engine interface I
advice, or to enable the user to detect a possible flaw in the system's reasoning.
I
I I
If-then rules usually tum out to be a natural form of expressing knowledge, and have
if
the following additional desirable features:
the pressure in V-01 reached relief valve lift pressure
• }vfodularity: each rule defines a small, relatively independent piece of know then
ledge. the relief valve on V-0 I has lifted [N = 0.005, 5 = 400]
if
• lncrementability: new rules can be added to the knowledge base relatively NOT the pressure in V-01 reached relief valve lift pressure, and the relief valve on V-01
independently of other rules. has lifted
• lvfodifzability (as a consequence of modularity): old rules can be changed then
the V-01 relief valve opened early (the set pressure has clriftecl) [N = 0.001, 5 = 2000]
relatively independent of other rules.
• Support system's transparency.
Figure 15.3 Two rules from an Al/X demonstration knowledge base for fault diagnosis
This last property is an important and distinguishing feature of expert systems. By (Reiter 1980). N and 5 are the 'necessity' and 'sufficiency' measures. 5 estimates
transparency of the system we mean the system's ability to explain its decisions and to what degree the condition part of a rule suffices to infer the conclusion part. N
solutions. If-then rules facilitate answering the following basic types of user's estimates to what degree the truth of the condition part is necessary for the
questions: conclusion to be true.
(1) 'How' questions: How did you reach this conclusion?
knowledge-based systems: MYC[N for medical consultatton, ALIX for diagnosing
(2) 'Why' questions: Why are you interested in this information?
equipment failures and AL3 for problem solving in chess.
Mechanisms, based on if-then rules, for answering such questions will be discussed In general, if you want to develop a serious expert system for some chosen
later. domain then you have to consult actual experts for that domain and learn a great
[f-then rules often define logical relations between concepts of the problem deal about it yourself. Extracting some understanding of the domain from experts
domain. Purely logical relations can be characterized as belonging to 'categorical and literature, and moulding this understanding into a chosen knowledge
knowledge', 'categorical' because they are always meant to be absolutely true. representation formalism is called knowledge elicitation. This is. as a rule, a complex
However, in some domains, such as medical diagnosis, 'soft' or probabilistic effort that we cannot go into here. But we do need some domain and a small
knowledge prevails. It is 'soft' in the sense that empirical regularities are usually knowledge base as material to carry out our examples in this chapter. Consider
only valid to a certain degree (often but not always). In such cases if-then rnles may the toy knowledge base shown in Figure 15.5. It is concerned with diagnosing the
be modified by adding a likelihood qualification to their logical interpretation. For problem of water leaking in a flat. A problem can arise either in the bathroom or in
example: the kitchen. In either case, the leakage also causes a problem (water on the floor)
in the hall. Apart from its overall naivete, this knowledge base only assumes single
if condition A then conclusion B follows with certainty F
faults; that is, the problem may be in the bathroom or the kitchen, but not in both
Figures 15.2, 15.3 and 15.4 give an idea of the variety of ways of expressing
knowledge by if-then rules. They show example rules from three different
if
1 there is a hypothesis, H, that a plan P succeeds, and
2 there are two hypotheses,
if H1, that a plan R 1 refutes plan P, ancl
1 the infection is primary bacteremia, and H2 , that a plan R 2 refutes plan P, ancl
2 the site of the culture is one of the sterilesites, and 3 there are facts: H 1 is false, ancl H2 is false
3 the suspected portal of entry of the organism is the gastrointestinal tract then
then 1 generate the hypothesis, H3, that the combined plan 'R 1 or R 2' refutes plan P, ancl
there is suggestive evidence (0.7) that the identity of the organism is bacteroides. 2 generate the fact: H3 implies not(H)
Figure 15.2 An if-then rule from the MYCIN system for medical consultation (Shortliffe 1976) Figure 15.4 A rule for plan refinement in chess problem solving from the AL3 system (Bratko
The parameter 0. 7 says to what degree the rule can be trusted. 1982).
>
352 Knowledge Representation and Expert Systems Forward and backward chaining in rule-based systems 353
kitchen_dry emphasis in expert systems on the style of search, from its semantic point of view, in
>
relation to the human reasoning; that is, it is desirable to sequence the reasoning
lnk_i.,_ba<hrn""' window in ways that humans find natural in the domain of application. This is important
when interaction with the user occurs during the reasoning process and we want to
ha!Lwet kitchen make this process transparent to the user. This section sketches, in Prolog, the basic
reasoning procedures as they appear in the context of expert systems, although they
prnblem i, kl<che" are similar to searching AND/OR graphs.
bath
room
bathroom_ drv
window _close d
,
) leaLin_kitchen 15.3.1 Backward chaining
With our example knowledge base of Figure 15.5, reasoning in the backward
) ""-""' from_"""lde
chaining style may proceed as follows. We start with a hypothesis - for example,
leak in kitchen - then we reason backwards in the inference network. To confirm
no_rain
the hypothesis, we need problem_in_kitchen and no_water_from_outside to be true.
The former is confirmed if we find that the hall is wet and the bathroom is dry. The
Figure 15.5 A toy knowledge base to diagnose leaks in water appliances in the flat shown. latter is confirmed, for example, if we find that the window was closed.
This style of reasoning is called backward chaining because we follow a chain of
of them at the same time. This knowledge base is shown in Figure 15.5 as an inference rules backwards from the hypothesis (leak_in_kitchen) to the pieces of evidence
network. Nodes in the network correspond to propositions and links correspond (hall_wet, etc.). This is trivially programmed in Prolog - this is, in fact, Prolog's own
to rules in the knowledge base. Arcs that connect some of the links indicate the built-in style of reasoning. The straightforward way is to state the rules in the
conjunctive connection between the corresponding propositions. Accordingly, knowledge base as Prolog rules:
the rule about a problem in the kitchen in this network is: leak_in_bathroom
if hall_wet and bathroom_dry then problern_in_kitchen hall_wet,
kitchen_dry.
The network representation of Figure 15.5 is, in fact, an AND/OR graph as discussed problem_in_kitchen
in Chapter 13. This indicates the relevance of AND/OR representation of problems in hall_wet,
the context of rule-based expert systems. _ bathroom_dry.
no_water_from_outside
window_closed
Using Prolog's own syntax for rules, as in the foregoing, has certain disadvantages 1c, A simple backward chaining rule interpreter
0
however:
op( 800, fx, if).
(1) This syntax may not be the most suitable for a user unfamiliar with Prolog; for op( 700, xfx, then).
example, the domain expert should be able to read the rules, specify new rules op( 300, xfy, or).
and modify them. op( 200, xfy, and).
(2J The knowledge base is, therefore, not syntactically distinguishable from the is_true( P)
fact( P).
rest of the program; a more explicit distinction between the knowledge base
and the rest of the program may be desirable. is_true( P)
if Condition then P, % A relevant rule
It is easiest to tailor the syntax of expert rules to our taste by using Prolog operator is_tme( Condition). % whose condition is true
notation. For example, we can choose to use 'if', 'then', 'and' and 'or' as operators, is_tme( Pl and P2)
appropriately declared as: is_true( Pl),
is_tme( P2).
op( 800, fx, if).
op( 700, xfx, then). is_true( Pl or 1'2)
op( 300, xfy, or). is_tme( Pl)
op( 200, xfy, and).
is_true( P2).
Thts suffices to write our example rules of Figure 15.5 as:
if Figure 15.6 A backward chaining interpreter for if-then ruies.
hall_wet and kitchen_dry
then chaining in the depth-first manner. The interpreter can now be called by the
leak in bathroom. question:
if
hall_wet and bathroom_dry ?- is_tme( leak_in_kitchen).
then yes
problem_in_kitchen. A major practical disadvantage of the simple inference procedures in this section is
if that the user has to state all the relevant information as facts in advance, before the
window_closed or no_rain reasoning process is started. So the user may state too much or too little. Therefore,
then it would be better for the information to be provided by the user interactively
no_water_from_outside.
in a dialogue when it is needed. Such a dialogue facility will be programmed in
Chapter 16,
Let the observable findings be stated as a procedure fact:
fact( hall_wet).
fact( bathroom_dry ). 15.3.2 Forward chaining
f act( window_closed).
In backward chaining we start with a hypothesis (such as lec1k in the kitchen) and
Of course, we now need a new interpreter for rules in the new syntax. Such an work backwards, according to the rules in the knowledge base, toward easily
interpreter can be defined as the procedure confirmed fmdings (such as the hall is wet). Sometimes it is more natural to reason
in the opposite direction, from the 'if' part to the 'then' part. Forward chaining does
is_true( P)
not start with a hypothesis, but with some confirmed findings. Once we have
where proposition P is either given in procedure fact or can be derived using rules. observed that the hall is wet and the bathroom is dry, we conclude that there is a
The new rule interpreter is given in Figure 15.6. Note that it still does backward problem in the kitchen; also, having noticed the kitchen window is closed, we infer
356 Knowledge Representation and Expert Systems Forward and backward chaining in rule-based systems 357
that no water came from the outside; this leads to the final conclusion that there is a 15.3.3 Forward chaining vs backward chaining
leak in the kitchen.
Programming simple forward chaining in Prolog is still easy if not exactly as If-then rules form chains that, in Figure 15.5, go from left to right. The elements on
trivial as backward chaining. Figure 15.7 shows a forward chaining interpreter, the left-hand side of these chains are input information, while those on the right
assuming that rules are, as before, in the form: hand side are derived information:
where Condition can be an AND/OR expression. For simplicity we assume through These two kinds of information have a variety of names, depending on the context
out this chapter that rules do not contain variables. This interpreter starts with what in which they are used. Input information can be called data (for example,
is already known (stated in the fact relation), derives all conclusions that follow from measurement data) or findings or manifestations. Derived information can be called
this and adds (using assert) the conclusions to the fact relation. Our example hypotheses to be proved, or rnuses of manifestations, or diagnoses, or explanations that
knowledge base is run by this interpreter thus: explain findings. So chains of inference steps connect various types of information,
7- forward. such as:
% Simple forward chaining in Prolog Both forward and backward chaining involve search, but they differ in the direction
of search. Backward chaining searches from goals to data, from diagnoses to
forward :-
findings, etc. In contrast, forward chaining searches from data to goals, from
new_derived_fact( P), '¾, A new fact
!, findings to explanations or diagnoses, etc. As backward chaining starts with goals
write( 'Derived: '), write( P), nl, we say that it is goal driven. Similarly, since forward chaining starts with data we say
assert( fact( P)), that it is data driven.
forward % Continue An obvious question is: Which is better, forward or backward chaining? This
question is similar to the dilemma between forward ancl backward search in state
write( 'No more facts'). % All facts derived space (see Chapter 11). As there, the answer here also depends on the problem. If we
new_cleriveel_fact( Cone!) want to check whether a particular hypothesis is true then it is more natural to chain
if Cone! then Concl, % A rule backward, starting with the hypothesis in question. On the other hand, if there are
not fact( Cone!), % Rule's conclusion not yet a fact many competing hypotheses, and there is no reason to start with one rather than
composed_fact( Cone!). '¾, Condition true? another, it may be better to chain forward. In particular, forward chaining is more
composecl_fact( Conel) : natural in monitoring tasks where the data are acquired continuously and the system
fact( Conel). % Simple fact has to detect whether an anomalous situation has arisen; a change in the input data
composed_fact( Condl and Cond2) can be propagated in the forward chaining fashion to see whether this change
composed_fact( Condl), indicates some fault in the monitored process or a change in the performance level.
composeel_fact( Cond2). % Both conjuncts true In choosing between forward or backward chaining, simply the shape of the rule
composed_fact( Conell or Conel2) network can also help. If there are a few data nodes (the left flank of the network) and
composed_fact( Conell) many goal nodes (right flank) then forward chaining looks more appropriate; if there
are few goal nodes and many data nodes then vice versa.
composecl_fact( Concl2).
Expert tasks are usually more intricate and call for a combination of chaining in
both directions. In medicine, for example, some initial observations in the patient
Figure 15.7 A forward chaining rule interpreter. typically trigger doctor's reasoning in the forward direction to generate some
,if!!!:�
c�/;;;;;;,)i'.f.;%z>'��;E:'i,!Jl}�ff1#�):k��1���(�Ja;;��=��::,"
358 Knowledge Representation and Expert Systems
Generating explanation 359
initial diagnostic hypothesis. This initial hypothesis has to be confirmed or rejected (2) If P was clerivecl using a rule
by additional evidence, which is done in the backward chaining style. In our
example of Figure 15.5, observing the hall wet may trigger the following inference if Cone! then P
steps: then the proof tree is
P< = ProofCond
ha!Lwet
where ProofCond is a proof tree of Cond.
leak_in_bathroom (3) Let Pl and PZ be propositions whose proof trees are Proof! and Proof2. If Pis Pl
and 1'2 then the proof tree is Proofl and Proof2. If Pis Pl orP2 then the proof tree
kitchen .dry is either Proofl or Proof2.
Constructing proof trees in Prolog is straightforward and can be achieved by
modifying the predicate is_true of Figure 15.6 according to the three cases above.
Figure 15.8 shows such a modified is_trne predicate. Notice that proof trees of this
Exercise kine! are essentially the same as solution trees for problems represented by AND/OR
graphs. Displaying proof trees in some user-friendly format can be programmed
15.1 Write a program that combines forward and backward chainin
g in the style similarly to displaying AND/OR solution trees. More sophisticated output format
discussed in this section. ting of proof trees is part of the shell program in Chapter 16.
In contrast, the'why' explanation is required during the reasoning process, not at
the encl of it. This requires user interaction with the reasoning process. The system
asks the user for information at the moment that the information is needed. When
asked, the user may answer 'Why?', thus triggering the 'why' explanation. This
15 .4 q_�n.-�'.-��i_n..9...��P.!�-�?.�!.°.n............. .. . ......... ...... . ........... kine! of interaction ancl 'why' explanation is programmed as part of the shell in
Chapter 16.
There are standard ways of generating explanation in rule-based systems. Two usual
types of explanation are called'how' ancl'why' explanation. Let us consider first the
'how' explanation. When the system comes up with an answer, the user may ask:
% is_true( P, Proof) Proof is a proof that P is true
How did you find this answer? The typical explanation consists of presenting the
user with the trace of how the answer was clerivecl. Suppose the system has just found :- op( 800, x fx, <=).
that there is a leak in the kitchen and the user is asking'How?'. The explanation can is_true( I', P) :-
be along the following line: fact( P).
is_trne( P, P <= CondProot)
Because: if Cond then P,
is_trne( Cone!, Condl'root).
(1) there is a problem in the kitchen, which was concluded from hall wet and
is_true( Pl and P2, Proofl and Proof2)
bathroom dry, and is_trne( Pl, Proofl),
(2) no water came from outside, which was concluded from window closed. is_trne( 1' 2, Proof2).
Such an explanation is in fact a proof tree of how the final conclusion follows from is_true( Pl orP2, Proof)
is_trne( Pl, Proot)
rules ancl facts in the knowledge base. Let'<=' be defined as an infix operator. Then
we can choose to represent the proof tree of a proposition Pin one of the following is_trne( !'2, Proof).
forms, depending on the case:
(1 ) If Pis a fact then the proof tree is P. Figure 15.8 Generating proof trees.
360 Knowledge Representation and Expert Systems Introducing uncertainty 361
Now we can ask about a leak in the kitchen: • Mathematically correct probabilistic treatment requires either information that
?- certainty( leak_in_kitchen, C). is not available or some simplification assumptions that are not really quite
C = 0.8 justified in a practical application. In the latter case, the treatment would
become mathematically unsound again.
This is obtained as follows. The facts that the hall is wet and the bathroom is dry
indicate a problem in the kitchen with certainty 0.9. Since there was some Conversely, there have been equally eager arguments in favour of mathematically
possibility of rain, the certainty of no_water_from_outside is 0.8. Finally, the certainty well-justified approaches based on the probability theory. Both of the foregoing
of leak_in_kitchen is min(0.8,0.9) = 0.8. objections regarding probability have been convincingly answered in favour of
probability theory. In ad hoc schemes that 'work in practice', clangers clearly arise
from simplifications that involve unsafe assumptions. In the next section
we introduce belief networks - a representation that allows correct treatment
15.5.2 Difficulties in handling uncertainty of probability and at the same time enables relatively economical treatment of
dependences.
The question of handling uncertain knowledge has been much researched and
debated. Typical controversial issues were the usefulness of probability theory in
handling uncertainty in expert systems on the one hand and drawbacks of ad lzoc Exercise
uncertainty schemes on the other. Our ultra-simple approach in Section 15.5.1
belongs to the latter, and can be easily criticized. For example, suppose the certainty 15.2 Let an expert system approximate the probability of the union of two events by the
of a is 0.5 and that of b is 0. Then in our scheme the certainty of a orb is 0.5. Now same formula as in our simple uncertainty scheme:
suppose that the certainty of b increases to 0.5. In our scheme this change will not
affect the certainty of a orb at all, which is counter-intuitive. p(A or B) :::o rnax(p(A),p(B))
Many schemes for handling uncertainty have been proposed, used and investi Under what condition does this formula give probabilistically correct results? In
gated. The most common problem in such schemes typically stems from ignoring what situation does the formula make the greatest error, what is the error?
some dependences between propositions. For example, let there be a rule:
if a orb then c
15.6 Belief networks
The certainty of c should not only depend on the certainty of a and b, but also on
any correlation between a and b; that is, whether they tend to occur together or they
depend on each other in some other way. Completely correct treatment of these .... 15.6.1 Probabilities, beliefs, and belief networks
dependencies is more complicated than it is often considered acceptable and may
require information that is not normally available. The difficulties are therefore The main question addressed bybelief networks, also callee! Bayesian networks, is: How
often dodged by making the assumption of independence of events, such as a and bin to handle uncertainty correctly, in a principled way that is also practical at the same
the rule above. Unfortunately, this assumption is not generally justifiable. In time? We will show that these two goals of correctness and practicality are hard to
practice it is often simply not true, and may therefore lead to incorrect and achieve together, but belief networks offer a good solution. But let us first define a
counter-intuitive results. It has often been admitted that such departures from framework for discussion.
mathematically sound handling of uncertainty may be unsafe in general, but it We will be assuming that the world is defined by a vector of variables that
has also been argued that they are solutions that work in practice. Along with this, randomly take values from their domains (sets of their possible values). We will in all
it has been argued that probability theory, although mathematically sound, is our examples limit the discassion to Boolean random variables only, whose possible
impractical and not really appropriate for the following reasons: values are true or false. For example, 'burglary' and 'alarm' are such variables.
Variable 'alarm' is true when the alarm is sounding, and 'burglary' is true when the
• Human experts seem to have trouble thinking in terms of actual probabilities; house has been broken into. Otherwise these variables are false. A state of such a
their likelihood estimates do not quite correspond to probabilities as defined world at some time is completely specified by giving the values of all the variables at
mathematically.
this time.
364 Knowledge Representation and Expert Systems Belief networks 365
Burglary Lightning
·when variables are Boolean, it is natural to talk about events. For example, event
'alarm' happens when variable alarm= true. �
Sensor
/
/
An agent (a human or an expert system) usually cannot tell for sure whether such
a variable is true or false. So instead the agent can only reason about the probability �
that a variable is true. Probabilities in this context are used as a measure of the Alarm Call
agent's beliefs. The agent's beliefs, of course, depend on how much the agent knows
Figure 15.10 A belief network. When a burglar breaks into the house, he is likely to trigger
about the world. Therefore such beliefs are also called subjective probabilities, mean
the sensor. The sensor is in turn supposed to trigger a sound alarm, and start an
ing that they 'belong to the subject'. 'Subjective' here does not mean 'arbitrary'.
automatic phone call with a warning. A storm with strong lightning may also
Although these probabilities model an agent's subjective beliefs, they still conform
trigger the sensor.
to the calculus of probability theory.
Let us introduce some notation. Let X and Y be propositions, then:
Suppose the weather is fine and we hear the alarm. Given these two facts, what is the
X 1\ Y is the conjunction of X and Y probability of burglary?
X v Y is the disjunction of X and Y The structure of this belief network indicates some probabilistic dependences, as
~X is the negation of X well as independences. It says, for example, that burglary is independent of light
p(X) denotes the probability that proposition X is true. p(XIY) denotes the ning. If, however, it becomes known that alarm is true, then under this condition
conditional probability that Xis true given that Y is true. the probability of burglary is no longer independent of lightning.
A typical question about the world is: Given that the values of some variables It is intuitively obvious that links in the network indicate causality. Burglary is a
have been observed, what are the probabilities of some of the remaining variables? cause of triggering the sensor. The sensor may in tum cause an alarm. So the
Or: Given that some events have been observed, what are the probabilities of structure of the network allows us to reason like this: if alarm is true then burglary
some other events? For example, alarm sounding has been observed, what is the becomes likely as it is one of the causes that explain the alarm. If then we learn there
probability that burglary has occurred? was a heavy storm, burglary becomes less likely. Alarm is explained by another
The main difficulty is how to handle dependences among variables in the cause, lightning, so the first possible cause becomes less likely.
problem. Let there be rz binary variables in the problem, then 2" - 1 numbers are In this example the reasoning was both diagnostic and predictive: knowing alarm is
needed to define the complete probability distribution among the 2" possible states true (consequence, or symptom of burglary), we inferred diagnostically that it might
of the world. This is usually too many! It is not only impractical and computation have been caused by burglary. Then we learned about the storm, and inferred
ally expensive. [t is usually impossible to make reasonable estimates of all these predictively that it might have caused the alarm.
probabilities because there is not enough information available. Let us now define more formally what exactly is stated by the links in a belief
ln fact, usually not all these probabilities are necessary. The complete probability network. What kind of probabilistic inferences can we make given a belief network?
distribution does not make any assumptions regarding independence among the First we have to define a node Z to be a descendant of node X if there is a path,
variables. But it is usually unnecessary to be so cautious. Fortunately, some things according to directed links in the network, from X to z.
are independent after all. Now suppose that Y1 , Y2 , ... are the parents of X in a belief network. By
Therefore, to make the probabilistic approach practical, we have to exploit these definition, the belief network implies the following useful relation of probabilistic
independences. We need economical means of stating dependences among the independence: X is independent of X's non-descendants given its parents. So to
variables, and at the same time benefit (in terms of complexity) from those things compute the probability of X, it is sufficient to take into account X's descendants
that actually are independent. and X's parents Y1 , Y2 , etc. All the possible effects of other variables on X are
Belief networks provide an elegant way of declaring how things depend on each accumulated through X's parents.
other, and what things are independent of each other. In belief networks this can be This meaning of the links in a belief network turns out to provide a practical
stated in a natural and intuitive way. means to (a) define probabilistic relations among the variables in a world, and
Figure 15.10 shows an example belief network about a burglary alarm system. The (b) answer questions about this world.
sensor may be triggered by a burglar when the house is broken into, or by strong To understand the way a belief network is used to represent the knowledge about
lightning. The sensor is supposed to trigger a sound alarm and a warning phone call. a world, consider the example network of Figure 15.10 again. First, the structure of
A typical question that such a belief network helps to answer is something like: the network specifies the dependences and independences among the variables.
366 Knowledge Representation and Expert Systems
Belief networks 367
Second, the links also have a natural causal interpretation. To complete the p(~X) = 1 - p( X)
representation, we have to specify some probabilities, that is give some actual p(X I\ Y)= p(X) p(YIX )= p(Y) p(XIY)
numbers. For the nodes that have no parents ('root causes'), a priori probabilities are p(X v Y) = p(X) + p (Y) - p(X I\ Y)
specified. In our case burglary and lightning are root causes. For other nodes X, we p(X) = p(X I\ Y) + p(X I\ ~Y) = p(Y) p(XIY) + p(~Y) p(Xl~Y)
have to specify the conditional probabilities of the form:
Propositions X and Y are said to be independent if p(XIY) = p(X) and p(YIX)= p(Y).
p( X I State of X's parents)
That is: knowing Y does not affect the belief in X and vice versa. If X and Y are
Sensor has two parents: burglary and lightning. There are four possible combined independent then:
states of the two parents: burglary and lightning, burglary and not lightning, etc.
These states will be written as logical formulas burglary /\ lightning, burglary /\ p(X I\ Y)= p(X) p(Y)
~lightning, etc. So the complete specification of this belief network can be: Propositions X and Y are disjoint if they cannot both be true at the same time:
p(burglary)= 0.001 p(X I\ Y) = 0 and p(XIY) = 0 and p(YIX)= 0.
p(lightning)= 0.02 Let X1, . .. , Xn be propositions; then:
p(sensor I burglary/\ lightning)= 0.9
p(sensor I burglary/\ ~lightning)= 0.9 p(X1 /\ . .. /\ Xn) = p(X1) p(X2IX1) p(X3IX1 /\ Xz) .. . p(X,, IX1 /\ . ../\ X,,_1)
p(sensor I ~burglary/\ lightning)= 0.1 If all X1 are independent of each other then this simplifies into:
p(sensor I ~burglary/\ ~lightning) = 0.001
p(alarm I sensor)= 0.95 p(Xi I\ ... I\ X,,) = p(Xi) p(X2) p(X3) ... p(X,,)
p(alarm I ~sensor) = 0.001
Finally, we will need. Bayes' theorem:
p(call I sensor)= 0.9
p(call I ~sensor)= 0.0 (Y X
p(XiY) = p(X/ I )
This complete specification comprises ten probabilities. If the structure of the p(Y)
network (stating the independences) were not provided, the complete specification
This formula, which follows from the law for conjunction p(X 1\ Y) above, is useful
would require 25 - 1 = 31 probabilities. There are 2" possible states of a world
for reasoning between causes and effects. Considering burglary as a cause of alarm, it
comprising n Boolean variables. So the structure in this network saves 21 numbers.
is natural to think in terms of what proportion of burglaries trigger alarm. That is
In networks with more nodes the savings would of course be much greater.
p(alarmlburglary). But when we hear the alarm, we are interested in knowing the
How much can be saved depends on the problem. If every variable in the problem
probability of its cause, that is: p(burglarylalarm). Bayes' formula helps:
depends on everything else, then of course no saving is possible. However, if the
problem does permit savings, then the savings will depend on the structure of a p(alarm I burglary)
p(burglary I alarm) = p(burglary) .
belief network. Different belief networks can be drawn for the same problem, but ( 1arm
pa
some networks are better than others. The general rule is that good networks respect
causality between the variables. So we should make a directed link from X to Y if X A variant of Bayes' theorem takes into account background knowledge B. It allows us
causes Y. For example, in the burglary domain, although it is possible to reason from to reason about the probability of a hypothesis H, given evidence E, all in the
alarm to burglary, it would lead to an awkward network if we started constructing presence of background knowledge B:
the network with a link from alarm to burglary. That would require more links in the /(E IE I\ B)
network. Also it would be more difficult to estimate the required probabilities in a p(H IE I\ B) = p(H I B -�·
-,-.,,
15.6.2 Some formulas from probability calculus 15.6.3 Reasoning in belief networks
In the following we recall some formulas from the probability calculus that will be In this section we will implement a program that interprets belief networks. Given a
useful for reasoning in belief networks. Let X and Y be propositions. Then: belief network, we would like this interpreter to answer queries of the form: What is
368 Knowledge Representation and Expert Systems Belief networks 369
the probability of some propositions, given some other proposition? Example By rule 6:
queries are:
p(sensorjburglary) =
p( burglary I alarm) = ? p(sensorlburglary11lightning) p(burglary11lightninglburglary) +
p( burglary II lightning) = ? p(sensor!~burglary111ightning) p(~burglary11lightningjburglary) +
p( burglary I alarm /I ~lightning) =? p(sensorlburglary11~lightning) p(burglary11~lightningjburglary) +
p( alarm 11 ~call I burglary) = ? p(sensorl~burglary/\~lightning) p(~burglaryII~ lightning! burglary)
The interpreter will derive an answer to any of these questions by recursively Using rules 1, 2, 3 and 4 at various places, and the conditional probabilities given in
applying the following rules: the network, we have:
(1) Probability of conjunction: p(sensorjburglary) = 0.9 * 0.02 + 0 + 0.9 * 0.98 + 0 = 0.9
p( X1 1\ X2 I Cond) = p( X 1 I Cond) * p( Xz I X1 /I Cond) p(alarmjburglary) = 0.95 * 0.9 + 0.00 1 * (1 - 0.9) = 0.855 1
(2) Probability of a certain event: Using rules l , 4 and 6 several times we get:
As the warning call can be explained by strong lightning, burglary becomes much animal
less likely. However, if the weather was fine then burglary becomes more likely:
?- prob( burglary, [call, not lightning], P).
P = 0.-173934 ,sa
It should be noted that our implementation of belief networks aimed at a short and active - at movino method
clear program. As a result, the program is rather inefficient. This is no problem for daylight hird "'- tly
small belief networks like the one defined by Figure 15.12, but it would be for a
larger network. However, a more efficient implementation would be considerably / 'is:i
more complicated. 1
� colour
kiwi brown
albatross
'"7 j.�
15.7 Semantic networks and frames
walk
In this section we look at two other frameworks for representing knowledge:
Black_and_whitc '
IS'l
Ross
semantic networks and frames. These differ from rule-based representations in that
they are directed to representing, in a structured way, large sets of facts. The set of
facts is structured and possibly compressed: facts can be abstracted away when they night
Alhcrt Kirn
can be reconstructed through inference. Both semantic networks and frames use the
mechanism of inheritance in a similar way as in object-oriented programming.
Figure 15.13 A semantic network.
Semantic networks and frames can be easily implemented in Prolog. Essentially,
this amounts to adopting, in a disciplined way, a particular style of programming
and organizing a program. A network of this kind is immediately translated into Prolog facts; for example,
as:
isa( bird, animal).
15.7.1 Semanticnetworks isa( ross, albatross).
moving_method( bird, fly).
A semantic network consists of entities and relations between the entities. It is moving_method( kiwi, walk).
customary to represent a semantic network as a graph. There are various types of
In addition to these facts, which are explicitly stated, some other facts can be
semantic network with various conventions, but usually nodes of the graph
correspond to entities, while relations are shown as links labelled by the names of inferred from the network. Ways of inferring other facts are built into a semantic
relations. Figure 15.13 shows such a network. The relation name isa stands for 'is a'. network type representation as part of the representation. A typical built-in
principle of inference is inheritance. So, in Figure 15.13, the fact 'albatross flies' is
This network represents the following facts:
inherited from the fact 'birds fly'. Similarly, through inheritance, we have 'Ross flies'
• A bird is a kind of animal. and 'Kim walks'. Facts are inherited through the isa relation. In Prolog, we can state
• Flying is the normal moving method of birds. that the method of moving is inherited as:
• An albatross is a bird. moving_method( X, Method) :
isa( X, SuperX), % Climb isa hierarchy
• Albert is an albatross, and so is Ross. moving_method( SuperX, Method).
Notice that isa sometimes relates a class of objects with a superclass of the class It is a little awkward to state a separate inheritance rule for each relation that can be
(animal is a superclass of bird), and sometimes an instance of a class with the inherited. Therefore, it is better to state a more general rule about facts: facts can
class itself (Albert is an albatross). be either explicitly stated in the network or inherited:
,...,<,:;:,:,t\\(:,f'#�t��;'f'�l1"1?:l'JQ::?i:€t�i0.ff4i
�;���Lllf:?.B
. JR. _ ··-
x""i&::�:-�,·
Semantic networks and frames 375
374 Knowledge Representation and Expert Systems
active during daylight. Here are the frames for two subclasses of bird - albatross and
fact( Fact) % Fact not a variable; Fact= Rel( Argl, Arg2)
Fact,!. % Fact explicit in network - do not inherit kiwi:
fact( Fact) FRAME: albatross
Fact=.. [ Rel, Argl, Arg2), a_kind_of: bird
isa( Argl, SuperArg), % Climb isa hierarchy colour: black_and_ white
SuperFact= .. [ Rel, SuperArg, Arg2),
size: 115
fact( SuperFact).
Our semantic network can now be asked some questions: FRAME: kiwi
a_kind_of: bird
?- fact( moving_rnethocl( kirn, Method)). moving_method: walk
Method= walk
active_at: night
This was inherited from the explicitly given fact that kiwis walk. On the other hand: colour: brown
size: 40
?- fact( moving_method( albert, Method)).
Method= fly Albatross is a very typical bird and it inherits the flying ability and daylight activity
This was inherited from the class bird. Note that in climbing up the isa hierarchy, the from the frame bird. Therefore, nothing is stated about moving_method and active_at
inherited fact is the one encountered first. in the albatross frame. On the other hand, kiwi is a rather untypical bird, and the
usual moving_method and active_at values for birds have to be overruled in the case
of the kiwi. We can also have a particular instance of a class; for example, an albatross
15.7.2 Frames called Albert:
FRAME: Albert
In frame representation, facts are clustered around objects. 'Object' here means instance_of: albatross
either a concrete physical object or a more abstract concept, such as a class of size: 120
objects, or even a situation. Good candidates for representation by frames are, for
example, the typical meeting situation or game conflict situation. Such situations Notice the difference between the two relations a_kind_of and instance_of. The
have, in general, some common stereotype structure that can be filled with details of former is the relation between a class and a superclass, while the latter is the relation
a particular situation. between a member of a class and the class.
A frame is a data structure whose components are called slots. Slots have names The information in our example frames can be represented in Prolog as a set of
and accommodate information of various kinds. So, in slots, we can find simple facts, one fact for each slot value. This can be done in various ways. We will choose
values, references to other frames or even procedures that can compute the slot the following format for these facts:
value from other information. A slot may also be left unfilled. Unfilled slots can be
Frame_name( Slot, Value)
filled through inference. As in semantic networks, the most common principle of
inference is inheritance. When a frame represents a class of objects (such as The advantage of this format is that now all the facts about a particular frame are
albatross) and another frame represents a superclass of this class (such as bird), then collected together under the relation whose name is the name of the frame itself.
the class frame can inherit values from the superclass frame. Figure 15.14 gives some frames in this format.
Some knowledge about birds can be put into frames as follows: To use such a set of frames, we need a procedure for retrieving facts about slot
FRAME: bird values. Let us implement such a fact retriever as a Prolog procedure
a_kind_of: animal value( Frame, Slot, Value)
moving_method: fly
active_at: daylight where Value is the value of slot Slot in frame Frame. If the slot is filled - that is,
its value is explicitly stated in the frame - then this is the value; otherwise the
This frame stands for the class of all birds. Its three slots say that birds are a kind value is obtained through inference - for example, inheritance. To find a value by
of animal (animal is a superclass of bird), that a typical bird flies and that a bird is
376 Knowledge Representation and Expert Systems Semantic networks and frames 377
Figure 15.14 Some frames. The way this result is obtained should be something like the following. We start
at the frame ross and, seeing no value for relative_size, climb up the chain of
relations instance_of (to get to frame albatross), a_kind_of (to get to frame bird) and
inheritance, we have to move from the current frame to a more general frame
a_kind_of again, to get finally to frame animal. Here we find the procedure for
according to the a_kind_of relation or instance_of relation between frames. Such a
computing relative size. This procedure needs the values in slot size of frame ross
move leads to a 'parent frame' and the value may be found in this frame explicitly,
and frame albatross. These values can be obtained through inheritance by our
or through further inheritance. This direct retrieval or retrieval by inheritance can be
existing value procedure. It remains to extend the procedure value to handle the
stated in Prolog as:
cases where a procedure in a slot is to be executed. Before we do that, we need to
value( Frame, Slot, Value) consider how such a procedure can be (indirectly) represented as the content of a
Query=.. ( Frame, Slot, Value], slot. Let the procedure for computing the relative size be implemented as a Prolog
call( Query), !. % Value directly retrieved predicate:
378 Knowledge Representation and Expert Systems Summary 379
relative_size( Object, RelativeSize) : Of the many subtleties of inference among frames, we have not addressed the
value( Object, size, ObjSize), question of multiple inheritance. This problem arises when a frame has more than one
value( Object, instance_of,ObjClass),
'parent' frame (according to the relation instance_of or a_kind_of). Then, an
value( ObjClass, size, ClassSize),
inherited slot value may potentially come from more than one parent frame and
RelativeSize is ObjSize/ClassSize , 100. % Percentage of class size
the question of which one to adopt arises. Our procedure value, as it stands, simply
We can now fill the slot relative_size in frame animal with the call of this procedure. takes the first value encountered, found by the depth-first search among the frames
In order to prevent the arguments Object and RelativeSize getting lost in communica that potentially can supply the value. However, other strategies or tie-breaking rules
tion between frames, we also have to state them as part of the relative_size slot may be more appropriate.
information. The contents of this slot can all be put together as one Prolog term; for
example, as:
execute( relative_size( Object, Re!Size), Object, Re!Size) Exercises
The relative_size slot of frame animal is then specified by: 15.3 Trace the execution of the query
animal( rclative_size, execute( relative_size( Obj, Val), Obj, Val)).
?- value( ross, relative_size, Value).
Now we are ready to modify the procedure value to handle procedural slots. First, we
to make sure you understand clearly how information is passed throughout the
have to realize that the information found in a slot can be a procedure call;
frame network in our frame interpreter.
therefore, it has to be further processed by carrying out this call. This call may need,
as arguments, slot values of the original frame in question. Our old procedure value 15.4 Let geometric figures be represented as frames. The following clauses represent a
forgets about this frame while climbing up to more general frames. Therefore, we square sl and a rectangle r2 and specify the method for computing the area of a figure:
have to introduce the original frame as an additional argument. The following piece
sl( instance_of, square).
of program does this:
sl( side, 5).
value( Frame, Slot,Value) r2( instance_of, rectangle).
value( Frame, Frame, Slot,Value). r2( length, 6).
r2( width, 4).
% Directly retrieving information in slot of (super)frame
value( Frame, SuperFrame, Slot,Value) :- square( a_kind_of, rectangle).
Query=.. [ SuperFrame, Slot, Information], square( length, execute( value(Obj,side,L), Obj, L)).
call( Query), % Value directly retrieved square( width, execute( value(Obj,side,W), Obj, W)).
process( Information,Frame,Value), !. '¾> Information is either a value rectangle( area, execute( area(Obj,A), Obj, A)).
'¾, or a procedure call
area( Obj, A) :-
% Inferring value through inheritance value( Obj, length, L), value( Obj, width, W),
value( Frame, SuperFrarne, Slot, Value) A isbW.
parent( SuperFrame, ParentSuperFrame),
How will the frame interpreter programmed in this section answer the question:
value( Frame, ParentSuperFrame, Slot, Value).
% process( Information, Frame, Value) ?- value( r2, length, A), value( sl, length, B), value( sl, area, C).
process( execute( Goal, Frame, Value), Frame, Value) !,
call( Goal).
process( Value,_ , Value). % A value, not procedure call
With this extension of our frame interpreter we have got close to the programming
�.��0.:�.rJ.......................................................................................................................
paradigm of object-oriented programming. Although the terminology in that • Typical functions required of an expert system are: solving problems in a given
paradigm is usually different, the computation is essentially based on triggering domain, explaining the problem-solving process, and handling uncertain and
the execution of procedures that belong to various frames. incomplete information.
Knowledge Representation and Expert Systems References 381
380
ln this chapter we develop a complete rule-based expert system shell using the
backward chaining approach introduced in Chapter 15. It incorporates proof
construction along the lines introduced in Chapter 15 and some other features not
covered there. We will allow variables in if-then rules and introduce a feature known
as 'query the user', which prompts the user for information only as and when it is
needed during the reasoning process.
(1) Select and define details of the formalism for representing knowledge in the
shell.
(2) Design details of an inference mechanism that suits this formalism.
(3) Add user-interaction facilities.
Let us start with item 1 - knowledge representation formalism. We will use if-then
rules with a syntax similar to that in Chapter 15. However, we need some extra
information in rules. For example, rules will have names. These extra details are
given in Figure 16.1, which shows a knowledge base in the format accepted by our
intended shell. This knowledge base consists of simple rules that help to identify
animals from their basic characteristics, assuming that the identification problem is
limited just to a small number of animals.
383
384 An Expert System Shell Knowledge representation format 385
fuse l fused_rule ::
� if
connected( Devicel, Fuse) and
on( Devicel) and
(not working( Devicel) ) and
� samefuse( Device2, Device!) and
on( Device2) and
(not working( Device2) )
then
light 3 proved( failed( Fuse) ).
�
fuse 2 H same_fuse_rule ::
if
light 4 connected( Device 1, Fuse) and
� connected( Device2, Fuse) and
different( Device 1, Device2)
Figure 16.2 Connections between fuses and devices in a simple electric network. then
sarnefuse( Device 1, Device2).
fact :: different( X, Y) :- not ( X = Y).
fact:: device( heater).
fact :: device( lightl).
fact:: device( light2).
fact:: device( light3).
% A small knowledge base for locating faults in an electric network fact :: device( light4).
fact :: connected( lightl, fusel).
% If a device is on and not working and its fuse is intact
fact:: connected( light2, fusel).
% then the device is broken
fact :: connected( heater, fuse 1).
broken_rule :: fact:: connected( light3, fuse2).
if fact :: connected( light4, fuse2).
on( Device)and askable( on( D), on( 'Device') ).
device( Device)and askable( working( D), working( 'Device') ).
(not working( Device)) and
connected( Device, Fuse) and
proved( intact( Fuse)) Figure 16.3 A knowledge base for locating a fault in a network such as the one in
then Figure 16.2.
proved( broken( Device)).
% If a unit is working then its fuse is OK These two rules already rely on the facts (about our particular network) that light] is
connected to fusel, and that lightl and heater share the same fuse. For another
fuse_ok_rule :
:
network we would need another set of rules. Therefore it is better to state rules more
if
connected( Device, Fuse)and generally, using Prolog variables, so that they can be used for any network, and then
working( Device) add some extra information about a particular network. Thus one useful rule may
then be: if a device is on and not working and its fuse is intact then the device is broken.
proved( intact( Fuse)). This translates into our rule formalism as:
388 An Expert System Shell Designing the inference engine 389
Such information cannot be found in the knowledge base or derived from other Questions of the form:
information. The user can respond to such queries in two ways: not Q
(1) by supplying the relevant information as an answer to the query, or are more problematic and will be discussed later.
(2) by asking the system why this information is needed.
The latter option is useful in order to enable the user to get insight into the system's 16.2.3 Answering 'why' questions
current intentions. The user will ask 'why' if the system's query appears irrelevant, or
if answering the query would require additional work on the part of the user. From A 'why' question occurs when the system asks the user for some information and the
the system's explanation the user will judge whether the information the system is user wants to know why this information is needed. Suppose that the system has
asking for is worth the extra effort of obtaining that information. Suppose, for asked:
example, the system is asking 'Does the animal eat meat?' Then the user, not yet
knowing the answer and not seeing the animal eating anything, may decide that it is Is a true?
not worth waiting to actually catch the animal at eating meat. The user may reply:
We might use Prolog's trace facility in order to obtain some insight into the
system's reasoning process. But such a trace facility would normally prove to be too Why?
rigid for our purpose. So, instead of using Prolog's own interpreting mechanism, An appropriate explanation can be along the following line:
which falls short of this type of user interaction, we will build a special interpreter
facility on top of Prolog. This new interpreter will include a user-interaction facility. Because:
I can use a to investigate b by rule Re,, and
I can use b to investigate c by rule Rb, and
16.2.2 Outline of the reasoning process
I can use y to investigate z by rule Ry , and
An answer to a given question can be found in several ways, according to the z was your original question.
following principles: The explanation consists of showing the purpose of the information asked of the
user. The purpose is shown in terms of a chain of rules and goals that connect this
piece of information with the user's original question. We will call such a chain a
To find an answer Answ to a question Q use one of the following:
trace. We can visualize a trace as a chain of rules that connects the currently explored
• if Q is found as a fact in the knowledge base then Answ is 'Q is true'. goal and the top goal in an AND/OR tree of questions. Figure 16.4 illustrates. So, the
• if there is a rule in the knowledge base of the form answering of 'why' queries is accomplished by moving from the current goal
upwards in the search tree toward the top goal. To be able to do that we have to
'if Condition then Q'
maintain the trace explicitly during the backward chaining process.
then explore Condition and use the result to construct answer Answ to
question Q.
• if Q is an 'askable' question then ask the user about Q. 16.2.4 Answering 'how' questions
• if Q is of the form Ql and Q2 then explore Ql and now:
if Ql is false then Answ is 'Q is false', else explore Q2 and appropriately Once the system has come up with an answer to the user's question, the user may
combine answers to both Ql and Q2 into Answ. like to see how this conclusion was reached. A proper way of answering such a 'how'
question is to display the evidence: that is, rules and subgoals from which the
• if Q is of the form Ql or Q2 then explore Ql and now:
conclusion was reached. For our rule language, such evidence consists of a proof
if Ql is true then Answ is 'Q is true', or_alternatively explore Q2 and tree. The generation of proof trees was programmed in a stylized form in Chapter 15.
appropriately combine answers to both Ql and Q2 into Answ. We will use the same principle in a more elaborate way. An effective way of
392 An Expert System Shell Implementation 393
Trace
Answer
current goal
Figure 16.5 The relation explore( Goal, Trace, Answer). Answer is a proof tree for Goal.
Figure 16.4 The 'why' explanation. The question 'Why are you interested in the current goal7' which finds an answer Answer to a question Goal;
is explained by the chain of rules and goals between the current goal and the useranswer( Goal, Trace, Answer)
user's original question at the top. The chain is called a trace.
generates solutions for an 'askable' Goal by asking the user about Goal and answers
presenting proof trees as 'how' explanations is to use text indentation to convey the 'why' questions;
tree structure. For example: present( Answer)
peter isa carnivore displays the result and answers 'how' questions. These procedures are properly put
was derived by rule3 from into execution by the 'driver' procedure expert. These four main procedures are
peter isa mammal explained in the following sections and coded in Prolog in Figures 16.6 to 16.9. The
was derived by rulel from programs in these figures are to be put together to form the complete shell program.
peter has hair
was told
and
16.3.1 Procedure explore
peter ea ts meat
was told
The heart of the shell is the procedure:
explore( Goal, Trace, Answer)
16.3 Implementation which will find, in the backward chaining style, an answer Answer to a given
question Goal by using the principles outlined in Section 16.2.2: either find Goal as a
fact in the knowledge base, or apply a rule in the knowledge base, or ask the user, or
We will now implement our shell along the ideas developed in the previous section.
treat Goal as an AND or OR combination of subgoals.
Figure 16.S illustrates the main objects manipulated by the shell. Goal is a question
The meaning and the structure of the arguments are as follows:
to be investigated; Trace is a chain of ancestor goals and rules between Goal and the
top-level question; Answer is a proof tree for Goal. Goal is a question to be investigated, represented as an AND/OR combination
The main procedures of the shell will be: of simple assertions. For example:
explore( Goal, Trace, Answer) (X has feathers) or (X flies) and (X lays eggs)
Implementation 395
394 An Expert System Shell
Trace is a chain of ancestor goals and rules between Goal and the original, top % Procedure
goal, represented as a list of items of the form: %
% explore( Goal, Trace, Answer)
Goal by Rule %
This means that Goal is being investigated by means of rule Rule. For % finds Answer to a given Goal. Trace is a chain of ancestor
% goals and rules. 'explore' tends to find a positive answer
example, let the top goal be 'peter isa tiger', and the currently investig
% to a question. Answer is 'false' only when all the
ated goal be 'peter eats meat'. The corresponding trace, according to the % possibilities have been investigated and they all resulted
knowledge base of Figure 16.1, is: % in 'false'.
[ (peter isa carnivore) by rule3, (peter isa tiger) by ruleS J op( 900, xfx, ::).
op( 800, xfx, was).
This means the following:
op( 870, fx, if).
I can use 'peter eats meat' in order op( 880, xfx, then).
to investigate, by rule3, 'peter isa carnivore'. op( 550, xfy, or).
op( 540, xfy, ancl).
Further, I can use 'peter isa carnivore' in order
op( 300, fx, 'derived by').
to investigate, by rules, 'peter isa tiger'. op( 600, xfx, from).
Answer is a proof tree (that is, an AND/OR solution tree) for the question Goal. op( 600, xfx, by).
The general form for Answer is: % Program assumes: op( 700, xfx, is), op( 900, fx. not)
Conclu. sion was Found explore( Goal, Trace, Goal is true was 'founcl as a fact')
where Found represents a justifteation for Conclusion. The following fact:: Goal.
three example answers illustrate different possibilities: % Assume only one rule about each type of goal
(1) ( connected( heater, fuse!) is true ) was 'found as a fact' explore( Goal, Trace,
( 2) ( peter eats meat ) is false was tole! Goal is TruthValue was 'derived by' Rule from Answer)
(3) ( peter isa carnivore ) is true was ( 'derived by' rule3 from Rule :: if Condition then Goal, % Rule relevant to Goal
( peter isa mammal ) is true was ( 'clerivecl by' rulel from explore( Condition, [Goal by Rule I Trace], Answer),
( peter has hair ) is true was told ) and trnth( Answer, TruthValue).
( peter eats meat ) is true was told ) explore( Goall and Goal2, Trace, Answer) :- !,
explore( Goall, Trace, Answerl),
Figure 16.6 shows the Prolog code for explore. This code implements the principles continue( Answerl, Goall and Goal2, Trace, Answer).
of Section 16.2.2, using the data structures specified above. explore( Goall or Goa12, Trace, Answer)
exploreyes( Goal l, Trace, Answer) % Positive answer to Goall
16.3.2 Procedure useranswer exploreyes( Goal2, Trace, Answer). % Positive answer to Goal2
explore( Goall or Goal2, Trace, Answerl ancl Answer2) !,
Procedure useranswer is an implementation of the 'query the user' facility. It queries not exploreyes( Goall, Trace,_),
the user for information when needed, and also explains to the user why it is needed. not exploreyes( Goa 12, Trace,_), % No positive answer
Before developing useranswer let us consider a useful auxiliary procedure: explore( Goall, Trace, Answerl), % Answerl must be negative
explore( Goal2, Trace, Answer2). % Answer2 must be negative
getreply( Reply) explore( Goal, Trace, Goal is Answer was told)
During conversation, the user is often expected to reply with 'yes', 'no' or 'why'. The useranswer( Goal, Trace, Answer). % User-supplied answer
purpose of getreply is to extract such an answer from the user and also to understand
it properly if the user abbreviates ('y' or 'n') or makes a typing error. If the user's Figure 16.6 The core procedure of an expert system shell.
reply is unintelligible then getreply will request another reply from the user.
396 An Expert System Shell Implementation 397
and the user types 'no', our procedure will respond with: 'Answer unknown, try
Figure 16.6 contd
again please.' Therefore, getreply should be called with its argument uninstantiated;
exploreyes( Goal, Trace, Answer) for example, as:
explore( Goal, Trace, Answer),
positive( Answer). getreply( Reply),
( Reply= yes, interpretyes( ... )
continue( Answerl, Goall and Goa12, Trace, Answer)
positive( Answer!),
Reply= no, interpretno( ... )
explore( Goal2, Trace, Answer2),
( positive( Answer2),
Answer = .--\nswerl and Answer2 ... )
The procedure:
negative( .-\nswer2),
Answer= .--\nswer2 useranswer( Goal, Trace, Answer)
).
continue( Answerl, Goall and Goal2, _, Answerl) asks the user about Goal. Answer is the result of this inquiry. Trace is used for
negative( Answerl). explanation in the case that the user asks 'why'. useranswer should first check
truth( Question is TruthValue was Found, TruthValue) !. whether Goal is the kind of information that can be asked of the user. ln our shell,
such kinds of goal are called 'askable'. Suppose, to begin with, that we define what is
truth( Answerl and Answer2, TruthValue)
tmth( Answerl, true), askable by a relation:
tmth( Answer2, true), !, askable( Goal)
TruthValue= true
This will be refined later. If Goal is 'askable' then Goal is displayed and the user will
TruthValue = false. specify whether it is true or false. In the case where the user asks 'why', Trace will be
positive( Answer) : displayed. If Goal is true then the user will also specify the values of variables in Goal
truth( Answer, true). (if there are any). This can be programmed as a first attempt as follows:
negative( Answer) :
tmth( Answer, false). useranswer( Goal, Trace, Answer)
askable( Goal), % Can Goal be asked of the user?
ask( Goal, Trace, Answer). % Ask user about Goal
getreply( Reply) : ask( Goal, Trace, Answer)
read( Answer), introduce( Goal), % Show question to user
means( Answer, Reply), ! % Answer means something? getreply( Reply), % Read user's reply
process( Reply, Goal, Trace, Answer). % Process the reply
nl, write(' Answer unknown, try again please'), nl, %No
getreply( Reply). % Try again process( why, Goal, Trace, Answer) % User is asking 'why'
sho'lvtrace( Trace), % Show why
means( yes, yes). ask( Goal, Trace, Answer). % Ask again
means( y, yes).
means( no, no). process( yes, Goal, Trace, Answer) % User says Goal is true
means( n, no), Answer= tme,
means( why, why). askvars( Goal) % Ask about variables
means( w, why).
ask( Goal, Trace, Answer). % Ask for more solutions
Note that getreply should be used with care because it involves interaction with the
process( no, Goal, Trace, false). % User says Goal is false
user accomplished through read and write, and can therefore only be understood
procedurally, and not declaratively. For example, if we call getreply with: introduce( Goal) :-
nl, write( 'Is it true: '),
getreply( yes) write( Goal), write( ?), nl.
398 An Expert System Shell Implementation 399
The call askvars(Goal) will ask the user to specify the value of each variable in Goal: 16.3.3 Refining useranswer
askvars( Tenn) :
var(Term), !, % A variable? One drawback of our useranswer procedure that shows in the foregoing conversation
nl, write( Term), write('= '), is the awkward appearance of Prolog-generated variable names in Prolog' s output.
read(Term). % Read variable's value Symbols like _17 should be replaced by some meaningful words when displayed to
askvars(Term) :- the user.
Tenn= .. [Functor I Args], % Get arguments of a structure Another, more serious, defect of this version of useranswer is the following. If we
askarglist(Args). % Ask about variables in arguments subsequently use useranswer on the same goal, the user will have to repeat all the
askargl ist([] ). solutions. If our expert system would, during its reasoning process, come to explore
the same 'askable' goal twice it would bore the user with exactly the same
askarglist([Term I Terms] )
askvars(Tem1), conversation again, instead of using the information previously supplied by the user.
askarglist(Terms). Let us now rectify these two defects. First, an improvement of the external
appearance of queries to the user can be based on introducing some standard format
Let us make a few experiments with this useranswer procedure. For example, let
for each 'askable' goal. To this end, a second argument can be added to the relation
the binary relation eats be declared as 'askable':
askable to specify this format, as shown by the following example:
askable(X eats Y).
askable(X eats Y, 'Animal' eats 'Something').
(In the following dialogue between Prolog and the user, the user-typed text is in
In querying the user, each variable in the question shouid then be replaced by
boldface and Prolog's output is in italics.)
keywords in the question format. For example:
?- useranswer(peter eats meat, [ ], Answer).
?- useranswer( X eats Y, [],Answer).
Is it true: peter eats meat? % Question to user
Is it true: Animal eats Something?
yes. 0A, User's reply
yes.
Answer = true Animal= peter.
Something= meat.
A more interesting example that involves variables may look like this:
Answer = true
?- useranswer( Who eats What, [], Answer).
X= peter
ls it true: _17 eats _JS? % Prolog gives internal names to variables Y= meat
yes.
_17= peter. % Asking about variables In an improved version of useranswer, shown in Figure 16.7, this formatting of
18= meat. queries is done by the procedure:
Answer= true format(Goal, ExtemFom1at, Question, Vars0, Variables)
Who= peter Goal is a goal to be formatted. ExtemFormat specifies the external format for Goal,
What= meat % Backtrack for more solutions defined by:
Is it true: _17 eats 18? askable(Goal, ExtemFonnat)
yes.
17= susan. Question is Goal formatted according to ExtemFormat. Variables is a list of variables
_18= bananas. that appear in Goal accompanied by their corresponding keywords (as specified in
ExtemFonnat), added on a list VarsO. For example:
Answer= true
vVho = s11San ? format( X gives documents to Y,
What = bananas; 'Who' gives 'What' to 'Whom',
Is it true: _17 eats I 8? Question,[], Variables).
no. Question = 'Who' gives documents to 'Whom',
Answer= false Variables= [X/'Who', Y /'Whom'].
400 An Expert System Shell Implementation 401
( Variables= [],!,
% Procedure
write( 'ls it true: ') % Introduce question
%
% useranswer( Goal, Trace, Answer)
write( 'Any (more) solution to:') % Introduce question
%
),
% generates,through backtracking, user-supplied solutions to Goal.
write( Question),write('? '),
% Trace is a chain of ancestor goals and rules used for 'why' explanation.
getreply( Reply),!, % Reply= yes/no/why
useranswer( Goal,Trace, Answer) process( Reply,Goal, Question, Variables,Trace,Answer, N).
askable( Goal,_), % May be asked of the user
process( why,Goal, Question, Variables,Trace,Answer,N)
freshcopy( Goal,Copy), % Variables in Goal renamed
showtrace( Trace),
useranswer( Goal,Copy, Trace, Answer, 1).
ask( Goal, Question, Variables,Trace,Answer,N).
% Do not ask again about an instantiated goal process( yes,Goal,_, Variables,Trace,true,N)
nextindex( Next), % Get new free index for 'wastold'
useranswer( Goal,_,_,_,N)
Nextl is Next+ 1,
N > 1, % Repeated question?
'¾, No variables in Goal ( askvars( Variables),
instantiated( Goal),!,
01<, Do not ask again assertz( wastolcl( Goal,true,Next) ) % Record solution
fail.
% ls Goal implied true or false for all instantiations "> freshcopy( Goal,Copy), % Copy of Goal
useranswer( Goal,Copy,Trace,Answer,Nextl) 'Yo More answers?
useranswer( Goal, Copy,_,Answer,_) ).
wastold( Copy, Answer,_),
instance_of( Copy, Goal),!. % Answer to Goal implied process( no, Goal,_,_,_,false,N)
freshcopy( Goal,Copy),
01b Retrieve known solutions, indexed from N on, for Goal wastold( Copy,true,_),!, % 'no' means: no more solutions
assertz( encl_answers( Goal) ), % Mark end of answers
useranswer( Goal,_,_,true,N) fail
wastold( Goal,true,M),
M>=N. nextindex( Next), % Next free index for 'wastold'
assertz( wastold( Goal,false, Next) ). % 'no' means: no solution
% Has everything already been said about Goal?
format( Var,Name,Name, Vars,(Var/NameI Vars] )
useranswer( Goal,Copy,_,Answer,_) var( Var),!.
encl_answers( Copy),
instance_of( Copy,Goal),!, % Everything was already said about Goal format( Atom,Name, Atom, Vars,Vars)
fail. atomic( Atom), !,
atomic( Name).
% Ask the user for (more) solutions
format( Goal, Form, Question, VarsO, Vars)
useranswer( Goal,_,Trace,Answer,N) Goal= .. [FunctorI Argsl],
askuser( Goal,Trace,Answer,N). Form= .. [FunctorI Forms],
askuser( Goal, Trace, Answer,N) : formatall( Argsl, Forms,Args2,VarsO,Vars),
askable( Goal, ExtemFonnat), Question = .. [Functor I Args2].
fonnat( Goal,ExtemFormat, Question,[],Variables), % Get question format formatall( [],[],[], Vars, Vars).
ask( Goal, Question,Variables,Trace,Answer,N).
formatall( [XI XL],[FI FL],[QI QL],VarsO,Vars)
ask( Goal, Question, Variables,Trace,Answer,N)
nl, formatall( XL, FL, QL, VarsO, Varsl),
format( X, F, Q, Vars 1,Vars).
Figure 16.7 Expert system shell: querying the user and answering 'why' questions. askvars( [] ).
�i
402 An Expert System Shell Implementation 403
that variants of a goal (the goat with variables renamed) appear at several places. For
Figure 16.7 contd example:
askvars( [Variable/Name j Variables]) (X has Y) and % First occurrence - Goal 1
nl, write(Name), write(' = '),
read( Variable), (Xl has Yl) and % Second occurrence - Goal2
askvars(Variables).
showtrace( []) :-
nl, write('This was your question'), nl. Further suppose that the user will be asked (through backtracking) for several
showtrace( [Goal by Rule I Trace]) solutions to Goall. After that the reasoning process will advance to Goal2. As we
nl, write('To investigate, by '), already have some solutions for Goall we want the system to apply them automatic
write(Rule), write(', '), ally to Goal2 as well (since they obviously satisfy Goal2). Now suppose that the
write( Goal), system tries these solutions for Goal2, but none of them satisfies some further goat.
showtrace( Trace). So the system will backtrack to Goal2 and should ask the user for more solutions. If
instantiated( Tenn) : the user does supply more solutions then these will have to be remembered as well.
numbervars( Term, 0, 0). '¾, No variables in Term In the case that the system tater backtracks to Goall these new solutions will also
'Yo instance_of( Tl, TZ): instance of Tl is TZ; that is, have to be automatically applied to Goall.
% term Tl is more general than TZ or equally general as T2 In order to properly use the information supplied by the user at different places
we will index this information. So the asserted facts will have the form:
instance_of( Term, Terml) : % Instance of Term is Term 1
freshcopy( Term l, Term2), % Copy of Term 1 with fresh set of variables wastold(Goal, Truth Value, Index)
numbervars( Term2, 0, _ ), !,
Tenn = Term2. % This succeeds if Term 1 is instance of Term where Index is a counter of user-supplied answers. The procedure
freshcopy( Term, FreshTerm) : % Make a copy of Term with variables renamed
useranswer( Goal, Trace, Answer)
asserta(copy( Term) ),
retract( copy( FreshTerm) ), !. will have to keep track of the number of solutions already produced through
nextindex( Next) : % Next free index for 'wastold' backtracking. This can be accomplished by means of another procedure, useranswer
retract( lastindex( Last) ), !, with four arguments,
Next is Last + 1,
assert(lasti ndex(Next) ). useranswer(Goal, Trace, Answer, N)
% Initialize dynamic procedures lastindex/1, wastold/3, end_answers/1 where N is an integer. This call has to produce solutions to Goal indexed N or higher.
assertz(lastindex(0) ), A call
assertz(wastold( dummy, false, 0) ),
useranswer( Goal, Trace, Answer)
assertz(end_answers(dummy) ).
is meant to produce all solutions to Goal. Solutions will be indexed from 1 on, so we
have the following relation:
The other refinement, to avoid repeated questions to the user, will be more useranswer( Goal, Trace, Answer)
difficult. First, all user's answers should be remembered so that they can be retrieved useranswer(Goal, Trace, Answer, 1).
at some later point. This can be accomplished by asserting user's answers as elements
of a relation. For example: An outline of
assert( wastold(mary gives documents to friends, true) ). useranswer( Goal, Trace, Answer, N)
In a situation where there are several user-supplied solutions to the same goal there is: generate solutions to Goal by first retrieving known solutions indexed from N
will be several facts asserted about that goat. Here a complication arises. Suppose onwards. When these are exhausted then start querying the user about Goal and
404 An Expert System Shell Implementation 405
assert the thus obtained new solutions properly indexed by consecutive numbers. Such a numbervars procedure is often supplied as a built-in predicate in a Prolog
\Vhen the user says there are no more solutions, assert: system. If not, it can be programmed as follows:
end_answers( Goal) numbervars( Term, N, Nplusl)
If the user says in the first place that there are no solutions at all then assert: var( Term), !, % Variable?
Term= var/N,
wastold( Goal, false, Index) Nplusl is N + 1.
When retrieving solutions, useranswer will have to properly interpret such informa numbervars( Tenn, N, M)
tion. Term = .. [Functor I Args], 01r, Structure or atomic
However, there is a further complication. The user may also specify general numberargs( Args, N .. M). % Number variables in arguments
solutions, leaving some variables uninstantiated. If a positive solution is retrieved numberargs( [ ], N, N) :- !.
which is more general than or as general as Goal, then there is of course no point in numberargs( [X I L], N, M)
further asking about Goal since we already have the most general solution. If numbervars( X, N, NI),
wastold( Goal, false,_) numberargs( L, Nl, M).
numbervars( Term, N, M)
This procedure 'numbers' the variables in Term by replacing each variable in Term by 16.3.5 Top-level driver
some newly generated term so that these 'numbering' terms correspond to integers
between N and M - 1. For example, let these terms be of the form: For a convenient access to the shell from the Prolog interpreter we need a 'driver'
var/0, var/1, var/2, ... procedure, which may look like the procedure expert in Figure 16.9. expert starts the
execution and coordinates the three main modules of the shell shown in Figures
Then 16.6 to 16.8. For example:
?- Term= f( X, t( a, Y, X) ), numbervars( Term, 5, M).
?- expert.
will result in:
Question, please: % Prompt the user
Term= f( var/5, t( a, var/6, var/5) ) X isa animal and goliath isa X. % User's question
X = var/5
Y = var/6 Is it tn1e: goliath has hair?
M =7
406 An Expert System Shell Implementation 407
% Displaying the conclusion of a consultation and 'how' explanation % Top-level driving procedure
user who keeps interacting with the system. Therefore, we have to implement a for all X: not ( X eats meat)?
particular problem-solving behaviour and not just an input-output relation. Thus
and not as if it was existentially quantified, which was our intention:
a resulting program is in fact more procedLHally biased than usual. This is one
example when we cannot rely on Prolog's own procedural engine, but have to for some X: not ( X eats meat)?
specify the procedural behaviour in detail.
If the question explored is instantiated then this problem disappears.
16.3 .7 Negated goals Otherwise, proper treatment is more complicated. Some decisions can be as follows:
This is not satisfying. The problem stems from what we mean by a question such as: The condition
We in fact want to ask: Is there an X such that X does not eat meat? But the way this will 'protect' the subsequent condition
question is interpreted by explore (as defined) is as follows: not working( Device)
(1) ls there an X such that X eats meat? from being evaluated uninstantiated.
(2) Yes, tiger eats meat.
Thus:
Exercise
(3) not ( tiger eats meat) is false.
In short, the interpretation is: ls it true that no X eats meat? So we will get a positive 16.1 A knowledge base can in principle contain cycles. For example:
answer only in the case that nobody eats meat. Said_another way, explore will answer rulel:: if bottle_empty then john_drunk.
the question as if X was universally quantified: rule2:: if john_drunk then bottle_empty.
410 An Expert System Shell Summary 411
Using such a knowledge base, our explore procedure may start cycling between same Another refinement of the user-querying procedure would involve the planning
goals. Modify explore to prevent such cycling. Trace can be used for this. However, of an optimal querying strategy. The optimization objective would be to minimize
some care is necessary: if the current goal matches a previous goal, this should not be the number of questions asked of the user before a conclusion is reached. There
considered a cycle if the current goal is more general than the previous one. would be, of course, alternative strategies and which of them would eventually be
the shortest would depend on user's answers. A decision of what alternative strategy
to pursue can be based on some a priori probabilities to assess probabilistically the
�?.�.���.?..i.0.�..:.e..��.'..�.:. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
'cost' of each alternative. This assessment might have to be updated after each user's
16.4 answer.
There is another measure that can be optimized: the length of the derivation of a
Our expert system shell can be elaborated in a number of ways. Several critical conclusion. This would tend to produce simple 'how' explanations. We can reduce
comments and suggestions for elaboration can be made at this point. the complexity of explanations also by selectively treating individual rules. Thus
Our programs are a straightforward implementation of basic ideas, and do not some rules would not be put into Trace and Answer in the explore procedure. In this
pay much attention to the issue of efficiency. A more efficient implementation case the knowledge base would have to specify which rules are 'traceable', and
would require more sophisticated data structures, indexing or hierarchy of rules, etc. should therefore appear in explanations, and which should not.
Our explore procedure is susceptible to cycling if the rules in the knowledge base An intelligent expert system should be probabilistically guided so that it
'cyclicly' mention the same goal. This can be easily rectified by adding a cycle check concentrates on the currently most likely hypothesis among the competing ones.
in explore: test whether the current goal is an instance of another goal that is already It should query the user about the information that discriminates best among the
on Trace. top hypotheses.
Our 'how' explanation outputs a whole proof tree. In the case of a large proof tree Our example expert systems were of classification, or 'analysis', type as opposed
it would be better to OL1tput just the top part of the tree and then let the user 'walk' to the 'synthesis' type where the task is to construct something. The result can in
through the rest of the tree as he or she wishes. The user would then inspect the the latter case be a plan of actions to accomplish some task - for example, a
proof tree selectively by using commands such as 'Move down branch 1', 'Move plan for a robot, a computer configuration that satisfies a given specification, or
clown branch 2', ... , 'Move up', 'Enough'. a forced combination in chess. Our fault diagnosis example can be naturally
In the 'how' and 'why' explanations, our shell just mentions rules by their names, extended to involve actions; for example, if nothing can be inferred because devices
ancl cloes not show the rules explicitly. The user should be offered the option to are switched off the system may suggest 'Switch on light 3'. This would entail the
request rules to be displayed explicitly during a consultation session. problem of optimal plans: minimize the number of actions necessary to reach a
Querying the user so that the dialogue looks natural proved to be complicated. conclusion.
Our solution works to some extent, but further problems may appear in several
ways, for example:
Is it true: susan t7ies? Project
no.
Is it true: s11san isa good flyer?
Consider critical comments and possible extensions to our expert system shell, as
discussed, and design and implement corresponding improvements.
Of course not, if Susan cannot fly at all! Another example is:
Any (more) solution to: Somebody flies?
.
yes.
Somebody = bird. ?.�.��?.' 1/........................................................................................................................
Is it true: albatross flies?
The shell, developed and programmed in this chapter,
To cope with such defects, additional relations between concepts dealt with by the
expert system would have to be added. Typically, these new relations would specify interprets if-then rules,
hierarchical relations between objects and how properties are inherited. This can be provides 'how' and 'why' explanations, and
done using semantic network or frame representations, as introduced in Chapter 15. queries the user about the information needed.
412 An Expert System Shell
References
chapter 17
The design of our expert system shell is to some degree similar to that described by Hammond
(1984). Some of the examples used in the text are adapted from Winston (1984).
17 .1 �-�P'..�:�.�-�-i.n.�---�-�t_i_?_n.: .....................................................
Figure 17.1 shows an example of a planning task. It can be solved by searching in a
corresponding state space, as discussed in Chapter 11. However, here the problem
will be represented in a way that makes it possible to reason explicitly about the
effects of actions among which the planner can choose.
Actions change the current state of the planning world, thereby causing the
transition to a new state. However, an action does not normally change everything
in the current state, just some component(s) of the state. A good representation
should therefore take into account this 'locality' of the effects of actions. To
facilitate the reasoning about such local effects of actions, a state will be represented
as a list of relationships that are currently true. Of course, we choose to mention
413
414 Planning
Representing actions 41 5
1L
Here AddRels is a list of relationships that Action establishes. After Action is executed
==> in a state, AddRels is added to the state to obtain the new state. De!Rels is a list of
relationships that Action destroys. These relationships are removed from the state to
� which Action is applied.
2 3 .j I 2 3 4 For our blocks world the only kind of action will be:
move( Block, From, To)
Figure 17.1 A planning problem in the biocks world: find a sequence of actions that achieve
the goals: a on b, and b on c. These actions transform the initial state (on the where Block is the block to be moved, From is the block's current position and To
left) into a final state (on the right). is its new position. A complete definition of this action is given in Figure 17.2.
only those relationships that pertain to the planning problem. For planning in the % Definition of action move( Block, From, To) in blocks world
blocks world, such relationships are: %can( Action, Condition): Action possible if Condition true
on( Block, Object) can( move( Block, From, To), [ clear( Block), clear( To), on( Block, From)])
and block( Block), % Block to be moved
object( To), %'To' is a block or a place
clear( Object) To\== Block, 01<, Block cannot be moved to itself
A definition of possible actions for a planning problem implicitly defines the space 'l'b Closing a slot
of all possible plans; therefore, such a definition will also be referred to as a planning can( close_slot( X), [camera_outside_case, slot_open( X)]).
space. adds( close_slot( X), [slot_closed( X)]).
Planning in the blocks world is a traditional planning exercise., usually associated deletes( close_slot( X), [slot_open( X)]).
with robot programming when a robot is supposed to build stmctures from blocks.
However, we can easily find many examples of planning in our everyday life. % Rewinding film
Figure 17.3 gives the definition of a planning space for manipulating a camera. This can( rewind, [camera_outside_case, in( film), film_at_end)).
planning problem is concerned with getting a camera ready; that is, loading new adds( rewind, [film_at_start]).
film and replacing the battery if necessary. A plan to load a new film in this planning deletes( rewind, [film_at_end)).
space is as follows: open the case to get the camera out, rewind old film, open the
% Removing battery or film
film slot, remove old film, insert new film, close the film slot. A state in which
the camera is ready for taking pictures is: can( remove( battery), [slot_open( battery), in( battery)]).
can( remove( film), [slot_open( film), in( film), film_at_start]).
[ slot_closed( battery), slot_closed( film), in( battery),
adds( remove( X), [slot_empty( X)]).
ok( battery), in( film), film_at_start, film_unused]
deletes( remove( X), [in( X)]).
The goal of a plan is stated in terms of relationships that are to be established.
For the blocks world task in Figure 17.1, the goal can be stated as the list of % Inserting new battery or film
relationships: can( insert_new( X), [slot_open( X), slot_empty( X)]).
[ on( a, b), on( b, c)] adds( insert_new( battery), [in( battery), ok( battery)]).
acids( insert_new( film), [in( film), film_at_start, film_unused]).
For the camera problem, a goal list that ensures the camera is ready is: deletes( insert_new( X), [slot_empty( X)l).
17.2 Define the planning space for the monkey and banana problem introduced in This produces the list:
Chapter 2 where the actions are 'walk', 'push', 'climb' and 'grasp'.
[ clear( a), clear( b), clear( c), clear( 4), on( a, 1), on( b, 3), on( c, 2))
Now the action move( a, 1, b) can be- executed, which achieves the final goal
on( a, b). The plan found can be written as the list:
17.2 Deriving plans by means-ends analysis
········ · ···· ···················· ······· ···· ··· ····················· · ································· ······ ·· ····· ······· · ··············· [ move( c, a, 2), move( a, 1, b)]
Consider the initial state of the planning problem in Figure 17.1. Let the goal be: This style of reasoning is called means-ends analysis. The means are the available
on( a, b). The planner's problem is to find a plan - that is, a sequence of actions - that 0 actions, the ends are the goals to be achieved. Notice that in the foregoing example a
achieves this goal. A typical planner would reason as follows: correct plan was found immediately, without any backtracking. The example thus
illustrates how the reasoning about goals and effects of actions directs the planning
(1) Find an action that achieves on( a, b). By looking at the adds relation, it is process in a proper direction. Unfortunately, it is not true that backtracking can
found that such an action has the form always be avoided in this way. On the contrary, combinatorial complexity and
move( a, From, b) search are typical of planning.
The principle of planning by means-ends analysis is illustrated in Figure 17.4. It
for any From. Such an action will certainly have to be part of the plan, but we can be stated as follows:
cannot execute it immediately in our initial state.
(2) Now enable the action move( a, From, b). Look at the can relation to find the To solve a list of goals Goals in state State, leading to state FinalState, do:
action's precondition. This is:
If all the Goals are true in State then FinalState = State. Otherwise do the
[ clear( a), clear( b), on( a, From)] following steps:
In the initial state we already have clear( b) and on( a, From) (where From= 1), (1) Select a still unsolved goal Goal in Goals.
but not clear( a). Now the planner concentrates on clear( a) as the new goal to (2) Find an action Action that adds Goal to the current state.
be achieved. (3) Enable Action by solving the precondition Condition of Action, giving
(3) Look at the adds relation again to find an action that achieves clear( a). This is MidStatel.
any action of the form (4) Apply Action to MidStatel, giving MidState2 (in MidState2, Goal is tme).
move( Block, a, To) (5) Solve Goals in MidState2, leading to Fina!State.
o o o
Block= c and To= 2
,..0
Condition Goal Goals
So move( c, a, 2) can be executed in the initial state, resulting in a new state. PrePlan Action PostPlan
This new state is obtained from the initial state by: ., .,
• removing from the initial state all the relationships that the action State MidStatel MidState2 FinalState
move( c, a, 2) deletes;
• adding to the resulting list all the relationships that the action adds. Figure 17.4 The principle of means-ends planning.
420 Planning Deriving plans by means-ends analysis 421
plan( State, Goals, [], State) % Plan empty Figure 17.5 A simple means-ends planner.
satisfied( State, Goals). % Goals true in State
% The way plan is decomposed into stages by cone, the State and FinalState are the initial and final states of the plan respectively. Goals is the
% precondition plan (PrePlan) is found in breadth-first list of goals to be achieved and Plan is a list of actions that achieve the goals. It
% fashion. However, the length of the rest of plan is not
should be noted that this planning program assumes a definition of the planning
% restricted and goals are achieved in depth-first style.
space in which all the actions and goals are fully instantiated; that is, they do not
plan( State, Goals, Plan, Fina!State) :- contain any variables. Variables would require more complicated treatment. This
cone( PrePlan_. [Action I PostP!an], Plan), % Divide plan will be discussed later.
select( State, Goals, Goal), % Select a goal The planner can now be used to fmd a plan for placing block a on b, starting in
achieves( Action, Goal), 'Yo Relevant action
the initial state of Figure 17.1, as follows:
can( Action, Condition),
plan( State, Condition, Pre Plan, MidStatel), 'Vo Enable Action ?- Start= [ clear( 2), clear( 4), clear( b), clear( c), on( a, 1), on( b, 3), on( c, a)],
apply( MidStatel, Action, MidState2), % Apply Action plan( Start, [ on( a, b)], Plan, FinState).
plan( MidState2, Goals, PostP!an, Fina!State). % Achieve remaining goals
Plan= [ move( c, a, 2), move( a, 1, b)]
% satisfied( State, Goals): Goals are true in State
FinState= [ on( a, b), clear( 1), on( c, 2), clear( a), clear( 4), clear( c), on( b, 3)]
satisfied( State, []).
For the camera world, imagine an initial state with the battery weak and the film in
satisfied( State, [Goal I Goals])
the camera already used. Here are queries to find: a plan FixBattery to fix the battery;
member( Goal, State),
satisfied( State, Goals). a plan FixCamera to get the camera ready for taking pictures:
select( State, Goals, Goal) ?- Start= [ camera_in_case, slot_closed( film), slot_closed( battery), in( film),
member( Goal, Goals), film_at_end, in( battery)],
not member( Goal, State). % Goal not satisfied already plan( Start, [ ok( battery)], FixBattery, _ ).
% achieves( Action, Goal): Goal is add-list of Action FixBattery= [ open_case, open_slot( battery), remove( battery), insert_new( battery)]
achieves( Action, Goal) :- ?- Start= [ camera_in_case, slot_closed( film), slot_closecl( battery), in( film),
adds( Action, Goals), film_at_end, in( battery)],
member( Goal, Goals). can( take_pictures, CameraReady), % Condition for taking pictures
plan( Start, CameraReady, FixCamera, FinState).
% apply( State, Action, NewState): Action executed in State produces NewState
CameraReady= [ in( film), film_at_start, film_unused, in( battery),
apply( State, Action, NewState) :- ok( battery), slot_closed( film), slot_closed( battery)]
deletes( Action, DelList),
delete_all( State, DelList, State I),!, FixCamera= [ open_case, rewind, open_slot( film), remove( film),
adds( Action, AddList), insert_new( film), open_slot( battery), remove( battery),
cone( AddList, Statel, NewState). insert_new( battery), close_slot( film), close_slot( battery)]
% delete_all( Ll, L2, Diff) if Diff is set-difference of L 1 and L2 FinState= [ slot_closed( battery), slot_closed( film), in( battery), ok( battery),
in( film), film_at_start, filrn_unused, camera_outside_case]
delete_all( [], _ , []).
All this is very smooth: the shortest plans are found in all cases. However, further
delete_all( [X [ Ll], L2, Diff)
member( X, L2),!, experiments with our planner would reveal some difficulties. We will analyze the
delete_all( Ll, L2, Diff). defects and introduce improvements in the next section.
422 Planning Protecting goals 423
Exercise The defect revealed here is that the planner sometimes destroys goals that have
already been achieved. The planner easily achieved one of the two given goals,
17.3 Trace by hand the means-ends planning process for achieving on( a, 3) from the on( b, c), but then destroyed it immediately when it started to work on the other
initial situation of Figure 17.1. goal on( a, b). Then it attempted the goal on( b, c) again. This was reachieved in two
moves, but on( a, b) was destroyed in the meantime. Luckily, on( a, b) was then
reachieved without destroying on( b, c) again. This rather disorganized behaviour
leads, even more drastically, to total failure in the next example:
17.3 Protecting goals
plan( Start, [ clear( 2), clear( 3)], Plan, _)
If we try our planner on the blocks and camera worlds, it turns out that the blocks
The planner now indefinitely keeps extending the following sequence of moves:
world is much more complex. This may appear surprising because the definition of
the blocks world looks simpler than that of the camera world. The real reason for the move( b, 3, 2) to achieve clear( 3)
greater difficulty of the blocks world lies in its higher combinatorial complexity. In move( b, 2, 3) to achieve clear( 2)
the blocks world, the planner usually has more choice between several actions that move( b, 3, 2) to achieve clear( 3)
all make sense with respect to the means-ends principle. More choice leads to higher move( b, 2, 3) to achieve clear( 2)
combinatorial complexity. Experimenting with our planner on the blocks domain is
thus more critical and can reveal several defects of the planner.
Let us try the task in Figure 17 .1. Suppose that Start is a description of the initial Each move achieves one of the goals and at the same time destroys the other
state in Figure 17. l. Then the task of Figure 17.1 can be stated as the goal: one. The planning space is unfortunately defined so that the places 2 and 3
are always considered first in moving b from its current position to a new
plan( Start, ( on( a, b), on( b, c)], Plan,_) position.
One idea that is obvious from the foregoing examples is that the planner should
The plan found by the planner is:
try to preserve the goals that have already been achieved. This can be done bv
Plan = [ move( b, 3, c), maintaining the list of already achieved goals and, in the sequel, avoiding those
move( b, c, 3), actions that destroy goals in this list. Thus, we introduce a new argument into our
move( c, a, 2), plan relation:
move( a, 1, b),
move( a, b, 1), plan( State, Goals, ProtectedGoals, Plan, Fina!State)
move( b, 3, c),
move( a, 1, b)] Here ProtectedGoals is a list of goals that Plan 'protects'. That is, no action in Plan
may delete any of the goals in ProtectedGoals. Once a new goal is achieved, it is
This plan containing seven steps is not exactly the most elegant! The shortest added to the list of protected goals. The program in Figure 17.6 is a modification of
possible plan for this task only needs three steps. Let us analyze why our planner the planner of Figure 17.5 with goal protection introduced. The task of clearing 2
needs so many. The reason is that it pursues different goals at different stages of and 3 is now solved by a two-step plan:
planning, as follows:
move( b, 3, 2) achieving clear( 3)
move( b, 3, c) to achieve goal on( b, c) achieving clear( 2) while protecting clear( 3)
move( b, 2, 4)
move( b, c, 3) to achieve clear( c) to enable next move
move( c, a, 2) to achieve clear( a) to enable move( a, 1, b) This is now clearly better than before, although still not optimal because only one
move( a, l, b) to achieve goal on( a, b) move is really necessary: move( b, 3, 4).
move( a, b, 1) to achieve clear( b) to enable move( b, 3, c) Unnecessarily long plans result from the search strategy that our planner uses. Tc
move( b, 3, c) to achieve goal on( b, c) (again) optimize the length of plans, the search behaviour of the planner should be studied
move( a, l, b) to achieve goal on( a, b) (again) This will be done in the next section.
Procedural aspects and breadth-first regime 425
424 Planning
Short candida tes for PrePlan come first. PrePlan establishes a precond ition for Action.
% A means-ends planner with goal protection This entails the finding of an action whose precon dition can be achieved by as short
plan( InitialState,Goals, Plan, Fina!State) :- a plan as possible (in the iterative deepe ni ng f ashion). On the other hand, the
plan( lnitia!State, Goals,[ l, Plan, FinalState). candidate list for PostPlan is completely un instantianted, and thus its length is
unlimited . Therefore, the r esulting search be haviour is 'g lobally' depth first and
% plan( Initia!State, Goals, ProtectedGoals, Plan, FinalState):
% Goals true in Fina/State,ProtectedGoals never destroyed by Plan 'locally' breadth first. It is depth-first search with respe ct to the forward chainin g of
actio ns that are appen ded to the emerging plan. Each actio n is en abled by a
plan( State, Goals,_, [],State)
'preplan '. This prep lan is, on the other hand, found in the b readth-first fashion.
satisfied( State, Goals). % Goals true in State
On e way of minimizin g the length of plans is to force the planner into the
plan( State, Goals, Protected, Plan, FinalState) breadth-first regime so that all alte rnative short plans are considered before any
cone( PrePlan, [Action I PostP!anJ, Plan), % Divide plan longer one. We can impose this strategy by embedding our planner into a proced ure
select( State, Goals, Goal), % Select an unsatisfied goal
achieves( .-\ction,Goal), that genera tes candidate plans in the order of increasing length. For example, the
can( Action, Condition), followin g results in iterative deepenin g:
preserves( Action, Protected), % Do not destroy protected goals
plan( State, Condition, Protected, PrePlan,MiclStatel), breadth_first_plan( State, Goals, Plan, FinalState)
candidate( Plan), % Generate short plans first
apply( MidStatel, Action, MidState2),
plan( MidState2, Goals, [Goal I Protected], PostPlan,Fina/State). plan( State, Goals, Plan,FinalState).
'Yo preserves( .-\ction, Goals): Action does not destroy any one of Goals candidate( []).
preserves( Action, Goals) : % Action does not destroy Goals candidate( (First I Rest])
deletes( Action, Relations), candidate( Rest).
not ( member( Goal, Relations),
inserting a corresponding
member( Goal,Goals) ). M ore elegantly, the same effect can be programmed by
ur . Such a suitable ge nerator
candidate plan gener ator directly into our plan proc ed e
is simply
Figure 17 .6 A means-ends planner with goal protection. Predicates satisfied, select,
achieves and apply are defined as in Figure 17.5. cone( Plan,_,_)
which through backtracking gener ates lists of increasing length. Our planner of
.
P..�?��-ci.�.' .�.1...a_sp.e.��.s.-�.n.?..�.'..e.��.�.h.�!.i.'.��.:.��-\'.!'.�........ ... . . . .... .. ... . . . .. ....... .
Figur e 17.5 then bec omes modified as follows:
17.4
plan( State, Goals,Plan, FinState) :-
cone( Plan,_,_), % Generate short candidates first
The planners in Figures 17.5 and 17.6 use essentially depth-first search strategies,
cone( PrePlan, (Action I PostPlan], Plan),
although not entirely. To get a clear insight into wha t is goin g on, we have to find in
what order candidate p lans are gen erated by the planner. The goal
ner of Figure 17.6 into the
cone( PrePlan, [Action I PostPlan],Plan) Similar mod ific ation also brings the goal-protecting plan
breadth-first regime.
breadth-first regime, on
in procedure plan is important in this respect. Plan is not yet instantiated at this Let us try o ur modified planners, wo rking no w in the
ing th t Start s th initial situation of Figure 17.1,
point and cone generates, through backtracking, alt ernative candidates for PrePlan our two example tasks. Assum a i e
This is now optimal. But the task of Figure 17.1 will still be somewhat problematical. regime produced a much shorter plan for our example problem (although still
The goal suboptimal!), the computation time needed for this shorter plan is much longer
than the time needed by the depth-first planner to find the longer seven-step plan.
plan( Start, ( on( a, b), on( b, c)], Plan,_) So the depth-first planner should not be considered a priori inferior to the breadth
first planner, even if it is inferior with respect to the length of plans. It should also be
produces the plan:
noted that the breadth-first effect was in our planners achieved through the
move( c, a, 2) technique of iterative deepening, discussed in Chapter 11.
move( b, 3, a)
move( b, a, c)
move( a, 1, b)
We get this result with both of the planners, with or without protection, working in Exercise
the breadth-first regime. The second move above is superfluous and apparently
makes no sense. Let us investigate how it came to be included in the plan at all and 17.4 The natural places where domain-specific planning knowledge can be introduced
into our planner are the predicates select and achieves. They select the next goal to be
why even the breadth-first search resulted in a plan longer than optimal.
attempted (determining the order in which goals are achieved) and the action to be
We have to answer two questions. First, what reasoning led the planner to
tried. Redefine these two predicates for the blocks world, so that the goals and
construct the funny plan above? Second, why did the planner not find the optimal
actions are more intelligently selected. For this purpose, it is useful to add the
plan in which the mysterious move( b, 3, a) is not included? Let us start with the first
current state State as an extra argument to the predicate achieves.
question. The last move, move( a, 1, b), achieves the goal on( a, b). The first three
moves achieve the precondition for move( a, 1, b), in particular the condition
clear( a). The third move clears a, and part of the precondition for the third move
is on( b, a). This is achieved by the second move, move( b, 3, a). The first move clears
a to enable the second move. This explains the reasoning behind our awkward plan
and also illustrates what sort of exotic ideas may appear during means-ends 17 .5 Goal regression
············································· ············•·····························---·····
planning.
Let the
The second question is: Why after move( c, a, 2) did the planner not immediately Suppose we are interested in a list of goals Goals being true in some state S.
state just before S be SO and the action in SO be A. Now let us ask the question: What
consider move( b, 3, c), which leads to the optimal plan? The reason is that the
goals Goals0 have to be true in SO to make Goals true in S?
planner was working on the goal on( a, b) all the time. The action move( b, 3, c) is
completely superfluous to this goal and hence not tried. Our four-step plan achieves state SO: Goals0 state S: Goals
on( a, b) and, by chance, also on( b, c). So on( b, c) is a result of pure luck and not A
of any conscious effort by the planner. Blindly pursuing just the goal on( a, b)
and relevant preconditions, the planner saw no reason for move( b, 3, c) before Goals0 must have the following properties:
move( b, 3, a).
It follows from the above example that the means-ends mechanism of planning (1) Action A must be possible in SO, therefore Goals0 must imply the precondition
as implemented in our planners is incomplete. It does not suggest all relevant actions for A.
to the planning process. The reason for this lies in its locality. The planner only (2) For each goal G in Goals either:
considers those actions that pertain to the current goal and disregards other goals
• action A adds G, or
until the current goal is achieved. Therefore, it does not (unless by accident) produce
plans in which actions that pertain to different goals are interleaved. The key • G is in Goals0 and A does not delete G.
question to completeness, thereby ensuring that optimal plans are within the
g Goals
planning scheme, is to enable interaction between different goals. This will be done Determining Goals0 from given Goals and action A will be called regressin
those actions that add some
in the next section through the mechanism of goal regression. through action A. Of course, we are only interested in
various sets of goals and conditio ns are
Before concluding this section, a note is needed regarding the efficiency of the goal G in Goals. The relations between
depth-first and breadth-first planners discussed here. Although the breadth-first illustrated in Figure 17. 7.
428 Planning Goal regression 429
can(A) Goals
'
in the procedure plan. This planner finds the optimal, three-step plan for the
' problem of Figure 17.1.
.,""";), i;.,o;;:/,('!);>
; ;,;c;'(,\T�\��ifW:.ttS,1�1�·��,'i=�-�•iifffl\N"�l§-.j�}t,1•"
430 Planning Combining means-ends planning with best-first heuristic 431
plans backwards). Another heuristic would suggest that the selection of those
Figure 17.8 contd goals that are already true in the initial situation should be deferred.
% addnew( NewGoals, OldGoals, AllGoals): • relation achieves( Action, Goal), which decides which alternative action will be
0A,
OldGoals is the union of NewGoals and OldGoals tried to achieve the given goal. (In fact, our planners also generate alternatives
% NewGoals and OldGoals must be compatible when executing the can predicate, when actions become instantiated.) Some
addnew( [ ], L, L). actions seem better, for example, because they achieve several goals simul
addnew( [Goal I_], Goals,_) taneously; alternatively, through experience, we may be able to tell that some
impossible( Goal, Goals),
!,
% Goal incompatible with Goals
I •
action's precondition will be easier to satisfy than others.
decision about which of the alternative regressed goal sets to consider next:
fail. % Cannot be added continue working on the one that looks easiest, because that one will probably
addnew( [XI L1], 1.2, L3) be accomplished by the shortest plan.
member( X, L2), !, % Ignore duplicate
addnew( 1.1, L2, L3). This last possibility shows how we can impose the best-first search regime on our
addnew( [XI LI], L2, [XI L3]) planner. This im-olves heuristically estimating the difficulty of alternative goal sets
addnew( LI, L2, L3). and then continuing to expand the most promising alternative goal set.
To use our best-first search programs of Chapter 12, we have to specify
% delete_all( Ll, L2, Dift): Diff is set-difference of lists LI and l.2
the corresponding state space ancl a heuristic function; that is, we have to define the
delete_all( [ ],_, [ ]). following:
delete_all( [XI Ll], LZ, Dift)
member( X, L2), !, ( 1) A successor relation between the nodes in the state space s( Node 1, Node2, Cost).
delete_all( LI, LZ, Dift). ( 2) The goal nodes of the search by relation goal( Node).
delete_all( [X I Ll], L2, [XI Diftl) (3) A heuristic function by relation h( Node, HeuristicEstimate).
delcte_all( Ll, LZ, Dift). The start node of the search.
(4 l
One way of formulating the state space is for goal sets to correspond to nodes in the
state space. Then, in the state space there will be a link between two goal sets Goalsl
17.6 Combining means-ends planning with best-first heuristic and Goals2 if there is an action A such that
(1) A adds some goal in Goalsl,
The planners developed thus far only use very basic search strategies: depth-first or
(2) A does not destroy any goal in Goals 1, and
breadth-first search (iterative deepening) or a combination of these two. These
strategies are completely uninformed in the sense that they do not use any domain (3) Goals2 is a result of regressing Goalsl through action A, as already defined by
specific knowledge in choosing among alternatives. Consequently, they are in relation regress in Figure 17 .8:
principle very inefficient, except in some special cases. There are several ways of regress( Goals 1, A, Goals2)
introducing heuristic guidance, based on domain knowledge, into our planners.
Some obvious places where domain-specific planning knowledge can be introduced For simplicity we will assume that all the actions have the same cost, and will
into the planners are as follows: accordingly assign cost 1 to all the links in the state space. Thus the successor
relation will look like this:
• relation select( State, Goals, Goal), which decides in what order the goals are
s( Goals 1, Goals2, 1) :-
attempted. For example, a piece of useful knowledge about building block member( Goal, Goalsl), (Yo Select a goal
structures is that, at any time, everything has to be properly supported, and achieves( Action, Goal), % A relevant action
therefore structures have to be built in the bottom-up order. A heuristic selection can( Action, Condition),
rule based on this would say: the 'top-most' on relations should be achieved last preserves( Action, Goalsl),
(that is, they should be selected first by-the goal regression planner as it builds regress( Goalsl, Action, Goals2).
432 Planning Combining means-ends planning with best-first heuristic 433
Any goal set that is true in the initial situation of the plan is a goal node of the state The state-space definition in Figure 17.9 can now be used by the best-first
space search. The start node for search is the list of goals to be achieved by the plan. programs of Chapter 12 as follows. We have to consult the planning problem
Although the representation above contains all the essential information, it has a definition in terms of relations adds, deletes and can (Figure 17.2 for the blocks
small deficiency. This is due to the fact that our best-first search program finds a world). We also have to supply the relation impossible and the relation start, which
solution path as a sequence of states and does not include actions between states. describes the initial situation of the plan. For the situation in Figure 17.1, this is:
For example, the sequence of states (goal-lists) for achieving on( a, b) in the initial
situation in Figure 17.1 is: start( [ on( a, 1), on( b, 3), on( c, a), clear( b), clear( c), clear( 2), clear( 4)].
[ [ clear( c), clear( 2), on( c, a), clear( b), on( a, 1)], % True in initial situation To solve the task of Figure 17. l by our means-ends best-first planner, we can now call
[ clear( a), clear( b), on( a, 1)], 'Yr, True after move( c, a, 2) the best-first procedure by:
[ on( a, b)] ] 0A, True after move( a, 1, b)
Figure 17.9 A state-space definition for means-ends planning based on goal regression.
17.8 Modify the planning state-space definition of Figure 17.9 to introduce the cost of
actions:
Relations satisfied, achieves, preserves, regress, addnew and delete_all are
as defined in Figure 17.8. s( Statel, State2, Cost)
434 Planning Uninstantiated actions and partial-order planning 435
The cost can, for example, depend on the weight of the object moved and the move( b, a, 1)
distance by which it is moved. Use this definition to find minimal cost plans in move( b, a, 2)
the blocks world. move( b, a, 3)
move( b, a, 4)
move( b, a, c)
move( c, a, 1)
17.7 ··········
Uninstantiated actions and partial-order planning
move( c, a, 2)
···················
······················· · ········ ··········································································· · · A more powerful representation that avoids this inefficiency would allow un
The planners developed in this chapter are implemented in programs made as instantiated variables in goals. For the blocks world, for example, one attempt to
simple as possible to illustrate the principles. No thought has been given to their define such an alternative relation can would be simply:
efficiency. By choosing better representations and corresponding data structures,
can( move( Block, From, To), [ clear( Block), clear( To), on( Block, From))).
significant improvements in efficiency are possible. Our planners can also be
enhanced by two other essential mechanisms: allowing uninstantiated variables in Now consider the situation in Figure 17. l and the goal clear( a) again. Relation
goals and actions, and partial-order planning. They are briefly discussed in this achieves again proposes the action:
section.
move( Something, a, Somewhere)
This time, when can is evaluated, the variables remain uninstantiated, and the
17.7.1 Uninstantiated actions and goals precondition list is:
Here different( X, Y) means that X and Y do not denote the same object. A condition
Sh clB
like different( X, Y) does not depend on the state of the world. So it cannot be made
true by an action, but it has to be checked by evaluating the corresponding ==> ,
.
predicate. One way to handle such quasi goals is to add the following extra clause f d � �
to the procedure satisfied in our planners: 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
satisfied( State, [Goal I Goals])
holds( Goal), % Goal independent of State Figure 17.10 A planning task consisting of two independent subtasks.
satisfied( Goals).
Accordingly, predicates like different( X, Y) would have to be defined by the Planl = [ move( b, a, c), move( a, l, b)]
procedure holds: Plan2 = [ move( e, ct, t), move( ct, 8, e)]
holds( different( X, Y)) The important point is that these two plans do not interact with each other.
Therefore, only the order of actions within each plan is important; it does not matter
This definition could be along the following lines: in which order they are executed - first Pl and then P2, or first P2 and then Pl, or it
• If X and Y do not match then different( X, Y) holds. is even possible to switch between them and execute a bit of one and then a bit of
• If X and Y are literally the same (X == Y) then this condition is false, and it will
the other. For example, this is an admissible execution sequence:
always be false regardless of further actions in the plan. Such conditions could ( move( b, a, c), move( e, ct, f), move( ct, 8, e), move( a, 1, b)]
then be handled in a similar way as goals declared in the relation impossible.
In spite of this, our planners would, in the planning process, potentially consider all
• Otherwise (X and Y match, but are not literally the same), we cannot tell at the 24 permutations of the four actions, although there are essentially only four
moment. The decision whether they denote the same object or not should be alternatives - two permutations for each of the two constituent plans. This problem
postponed until X and Y become further instantiated. arises from the fact that our planners strictly insist on complete ordering of actions in
As illustrated by this example, the evaluation of conditions like different( X, Y) that a plan. An improvement then would be to allow, in the cases that the order does not
are independent of the state sometimes has to be deferred. Therefore, it would be matter, the precedence between actions to remain unspecified. Thus our plans
practical to maintain such conditions as an extra argument of the procedure plan would be partially ordered sets of actions, instead of totally ordered sequences.
and handle them separately from those goals that are achieved by actions. Planners that allow partial ordering are called partial-order planners (sometimes also
This is not the only complication introduced by variables. For example, consider non-linear planners).
the move: Let us look at the main idea of partial-order planning by considering the example
of Figure 17.1 again. The following is a sketch of how a non-linear planner may solve
move( a, l, X) this problem. Analyzing the goals on( a, b) and on( b, c), the planner concludes that
Does this move delete the relation clear( b)? Yes, if X = b, and not, if different( X, b). the following two actions will necessarily have to be included into the plan:
This means that we have two possibilities, and the corresponding two alternatives Ml = move( a, X, b)
are associated with the value of X: in one alternative, X is equal to b; in the other, an M2 = move( b, Y, c)
extra condition different( X, Y) is added.
There is no other way of achieving the two goals. But the order in which these two
actions are to be executed is not yet specified. Now consider the preconditions of the
two actions. The precondition for move( a, X, b) contains clear( a). This is not
17.7.2 Partial-order planning satisfied in the initial situation; therefore, we need an action of the form:
One deficiency of our planners is that they consider all possible orderings of actions M3 = move( U, a, V)
even when the actions are completely independent. As an example, consider the
This has to precede the move Ml, so we now have a constraint on the order of the
planning task of Figure 17.10 where the goal is to build two stacks of blocks from
actions:
two disjoint sets of blocks that are already well separated. The two stacks can be built
independently of each other by two plans, one for each stack: before( M3, Ml)
438 Planning Summary 439
Now we see if M2 and M3 can be the same move that achieves the objectives of both. immediately, as some objects cannot be seen (bottom objects cannot be seen by a
This is not the case, so the plan will have to include three different moves. Now the top-view camera). Introduce other relevant goal relations and modify our planners if
planner has to answer the question: ls there a permutation of the three moves necessary.
[ Ml, M2, M3] such that M3 precedes Ml, the permutation is executable in the initial
Modify the goal regression planner of Figure 17.8 so that it will correctly handle
situation and the overall goals are satisfied in the resulting state? Due to the
variables in goals and actions, according to the discussion in Section 17.7.1.
precedence constraint, only three out of altogether six permutations are taken into
account: Write a non-linear planner according to the outline in Section 17.7.2.
[M3, Ml, M2]
[M3, M2, Ml]
[ :Vf2, M3, Ml]
Summary
The second of these satisfies the execution constraint by the instantiation: u = c,
V = 2, X = 1, Y = 3. As can be seen from this example, the combinatorial complexity • [n planning, available actions are represented in a way that enables explicit
cannot be completely avoided by partial-order planning, but it can be alleviated. reasoning about their effects and their preconditions. This can be done by
stating, for each action, its precondition, its add-list (relationships the action
establishes) and delete-list (relationships the action destroys).
Projects • Means-ends derivation of plans is based on finding actions that achieve given
goals and enabling the preconditions for such actions.
Develop a program, using the techniques of this chapter, for planning in a more • Coal protection is a mechanism that prevents a planner destroying goals already
interesting variation of the simple blocks world used throughout this chapter. Figure achieved.
17.11 shows an example task in this new world. The new world contains blocks of
• Means-ends planning involves search through the space of relevant actions. The
different sizes and the stability of structures has to be taken into account. To make
the problem easier, assume that blocks can only be placed at whole-numbered usual search techniques therefore apply to planning as well: depth-first, breadth
positions, and they always have to be properly supported, so that they are safely first and best-first search.
stable. Also assume that they are never rotated by the robot and the move trajectories • To reduce search complexity, domain-specific knowledge can be used at several
are simple: straight up until the block is above any other block, then horizontally, stages of means-ends planning, such as: which goal in the given goal-list should
and then straight down. Design specialized heuristics to be used by this planner. be attempted next; which action among the alternative actions should be tried
first; heuristically estimating the difficulty of a goal-list in best-ftrst search.
A robot world, more realistic and interesting than the one defined in Figure 17.2, • Goal regression is a process that determines what goals have to be true before an
would also comprise perception actions by a camera or touch sensor. For example,
action, to ensure that given goals are true after the action. Planning with goal
the action look( Position, Object) would recognize the object seen by the camera at
regression typically involves backward chaining of actions.
Position (that is, instantiate variable Object). In such a world it is realistic to assume
• Allowing uninstantiated variables in goals and actions can make planning more
that the scene is not completely known to the robot, so it will possibly include, in its
plan, actions whose only purpose is to acquire information. This can be further efficient; however, on the other hand, it significantly complicates the planner.
complicated by the fact that some measurements of this kind cannot be done • Partial-order planning recognizes the fact that actions in plans need not be
always totally ordered. Leaving the order unspecified whenever possible allows
economical treatment of sets of equivalent permutations of actions.
d
[J l,1 1bl
• Concepts discussed in this chapter are:
action precondition, add-list, delete-list
0 1 2 3 4 5 6 means-ends principle of planning
goal protection
Figure 17.11 A planning task in another blocks world: goal regression
Achieve on( a, c), on( b, c), on( c, d). partial-order planning
440 Planning References 441
References Kowalski, R. and Sergot, M. (1986) A logic-based calculus of events. New Generation Computing
4: 67-95.
The early studies of basic ideas of means-ends problem solving and planning in artificial Newell, A., Shaw, J.C. and Simon, H.A. (1960) Report on a general problem-solving program for
intelligence were done by Newell, Shaw and Simon (1960). These ideas were implemented in a computer. Information Processing: Proc. Int. Con( on Information Processing. Paris: UNESCO.
Nilsson, N.J. (1980) Principles of Artificial Intelligence. Palo Alto, CA: Tioga; also Berlin: Springer
the celebrated program GPS (General Problem Solver) whose behaviour is studied in detail i n
Ernst and Newell (1969). Another historically important planning program i s STRIPS (Fikes and Verlag.
Nilsson 1971; 93), which can be viewed as an implementation of GPS. STRIPS introduced the Poole, D. Mackworth, A. and Goebel, R. (1998) Computational Intelligence: A Logical Approach.
representation of the planning space by the relations adds, deletes and can, which is also used Oxford University Press.
in this chapter. Two other logic-based representations for planning (not discussed in this Russell, S. and Norvig, P. (1995) Artificial Intelligence: A Modem Approach. Englewood Cliffs, NJ:
chapter) are situation calculus (Kowalski 1979) and event calculus (Kowalski and Sergot 1986; Prentice Hall.
Shanahan 1997). The mechanisms of STRlPS, and various related ideas and refinements, are Sacerdoti, E.D. (1977) A Structure for Plans and Behavior. New York: Elsevier.
described in c\'ilsson ( l 980), where the elegant formulations of planning in logic by Green and Shanahan, M. (1997) Solving the Frame Problem: Mathematical Investigation of the Common Sense
Kowalski are also presented. Warren's (1974) program WARPLAN is an early interesting planner Law of Inertia. MIT Press, Cambridge, MA.
written in Prolog. [t can be viewed as another implementation of STRIPS, refined in a certain Tate, A. (1977) Generating project networks. Proc. I/CAI 77. Cambridge, MA.
respect. The \\._.\RPLAN program appeared in other places in the literature - for example, in Waldinger, R.]. (1977) Achieving several goals simultaneously. In: lvfachine Intelligence 8
Coelho and Cotta (1988). Waldinger (1977) studied phenomena of interaction among (Elcock, E.W. and Michie, D., eds). Chichester: Ellis Horwood. Distributed by Wiley.
conjunctive goals, among them also the principle of goal regression. Goal regression corres Warren, D.I-l.D. (1974) WARPLAN: A System for Generating Plans. University of Edinburgh:
ponds to determining the weake5t precondition used in proving program correctness. Difficulties Department of Computational Logic, Memo 76.
that a planner "ith goal protection may have with the problem of Figure 17.1 are known in the Weld, D. (1994) An introduction to least commitment planning. Al Magazine 15: 27-61.
literature as the Sussman anomaly (see, for example, Waldinger 1977). Early developments of
partial-order planning are Sacerdoti (1977) and Tate (1977). Chapman (1987) is an attempt to
pro,ide a uniform theoretical framework for describing and investigating various mechanisms
in partial-order planning. Weld (1994) gives an overview of partial-order planning. Guaranteed
completeness of a planner is theoretically desirable, but its price in terms of combinatorial
complexity is high. Therefore, a practically more promising approach is one by Clark (1985)
who studied ,·arious ways of introducing domain-specific knowledge into a planner to reduce
combinatorial complexity. Allen, Hendler and Tate (1990) edited a collection of classical papers
on planning. The general books on Al by Russell and Norvig (1995) and by Poole et al. (1998)
include substantial material on planning. High quality papers on planning appear in the
Artificial Intelligence journal: see, for example, A[J volume 76 (1995). a special issue on planning
and scheduling.
A[J volume 7611995) Artificial Intelligence Vol. 76. Special issue on Planning and Scheduling.
,.\llen, J., Hendler, J. and Tate, A. (eels) (1990) Readings in Planning. San Mateo, CA: Morgan
Kaufmann.
Chapman, D. {1987) Planning for conjunctive goals. Artificial Intelligence 32: 333-377.
Clark, P. (1985) Towards an Improved Domain Representation for Planning. Edinburgh University:
Department of Artificial Intelligence, MSc Thesis.
Coelho, H. and Cotta, J.C. (1988) Prolog by Example. Berlin: Springer-Verlag.
Ernst, G.W. and Newell, A. (1969) GPS: A Case Study in Generality and Problem Solving. New York:
Academic Press.
Fikes, R.E. and Nilsson, N.J. (1971) STRIPS: a new approach to the application of theorem
proving to problem solving. Artificial Intelligence 2: 189-208.
Fikes, R.E. and Nilsson, N.J. (1993) STRIPS, a retrospective. Artificial Intelligence 59:
227-232.
Kowalski, R. (1979) Logic for Problem Solving. North-Holland_
The problem of learning concepts from examples 443
chapter 18 Between these two extremes lies another form of learning: learning from
examples. Here the initiative is distributed between the teacher and the learner.
The teacher provides examples for learning and the learner is supposed to make
generalizations about the examples - that is, find a kind of theory, explaining the
Machine Learning
given examples. The teacher can help the learner by selecting good training
examples and by describing the examples in a language that permits formulating
elegant general rules. In a sense, learning from examples exploits the known
empirical observation that experts (that is 'teachers') find it easier to produce good
18.1 Introduction 442
examples than to provide explicit and complete general theories. On the other
18.2 The problem of learning concepts from examples 443 hand, the task of the learner to generalize from examples can be difficult.
Leaming from examples is also called inductive leaming. Inductive learning is the
18.3 Learning relational descriptions: a detailed example 448
most researched kind of learning in artificial intelligence and this research has
18.4 Learning simple if-then rules 454 produced many solid results. From examples, several types of task can be learned:
one can learn to diagnose a patient or a plant disease, to predict weather, to predict
18.5 Induction of decision trees 462
the biological activity of a new chemical compound. to determine the biological
18.6 Learning from noisy data and tree pruning 469 degradability of chemicals, to preclict mechanical properties of steel on the basis of
its chemical characteristics, to make better financial decisions, to control a dynamic
18.7 Success of learning 476
system, or to improve efficiency in solving symbolic integration problems.
Machine learning techniques have been applied to all these particular tasks and
many others. Practical methods exist that can be effectively used in complex
applications. One application scenario is in association with knowledge acquisition
for expert systems. Knowledge is acquired automatically from examples, thus
Of the forms of learning, learning concepts from examples is the most common and helping to alleviate the knowledge-acquisition bottleneck. Another way of applying
best understood. Leaming algorithms depend on the language in which the learned machine learning methods is knowledge discovery in databases (KDD), also called duta
concepts are represented. In this chapter we develop programs that learn concepts mining. Data in a database are used as examples for inductive learning to discover
represented by if-then rules and decision trees. We also look at the pruning of interesting patterns in large amounts of data. For example, it can be found with data
decision trees as a method for learning from noisy data, when examples possibly mining, that in a supermarket a customer who buys spaghetti is likely also to buy
contain errors. Parmesan cheese.
In this chapter we will be mostly concerned with learning concepts from
examples. We will first detine the problem of learning concepts from examples
18.1 Introduction more formally. To illustrate key ideas we will then follow a detailed example of
learning concepts represented by semantic networks. Then we will look at induction
There are several forms of learning, ranging from 'learning by being told' to of rules and decision trees.
'learning by discovery'. In the former case, the learner is explicitly told what is
to be learned. In this sense, programming is a kind of learning by being told. The
main burden in this type oflearning is on the teacher although the learner's task can 18.2 The problem of learning concepts from examples
· · ·· ················· · · ··· · ················· · · · · · · · · · · ···· ··················· · ········· · · · ····· · · · ···················· ···· · · · · · · · ·· ·······
also be difficult as it may not be easy to understand what the teacher had in mind. So
learning by being told may require intelligent communication including a learner's
model of the teacher. At the other extreme, in learning by discovery, the learner 18.2.1 Concepts as sets
autonomously discovers new concepts merely from unstructured observations or by
planning and performing experiments in the environment. There is no teacher The problem of learning concepts from examples can be formalized as follows. Let U
involved here and all the burden is on the learner. The learner's environment plays be a universal set of objects - that is, all of the objects that the learner may
the role of an oracle. encounter. There is, in principle, no limitation on the size of U. A concept C can be
442
The problem of learning concepts from examples 445
444 Machine Learning
H
formalized as a subset of objects in U. To learn concept C means to learn t o recognize
objects in C. In other words, on ce C is learned, the sy stem is able, for any object X in
U, to recognize whether Xis in C. 3
This definition of concept is sufficiently general to enable the formalization of
suc h diverse concepts as an arch, a certain disease, arithmetic multiplication, or the 2
concept of poisonous:
+
• The concept of poisonous: For example, in the world U of mushrooms, the
+
..I-
+ +
+ ..:...
c oncep t'poisonous' is the set of all poisonous mushrooms.
' + � +
+
• The con cept of an ar ch in the blocks world: The universal set U is the set of all
structures made of blocks in a blocks world. Ar c h is the subset of U containing all 0 2 4 w
the arch-like stru c ture s and nothing else.
W (width) and H
• Figure 18.1 Examples for learning about mushrooms. The attributes are
The concept of multipli cation: The universal set U is the set of tuples of les of edible mushrooms, the
(height) of a mushroom. The pluses indicate examp
numbers. Mult is the set of all triples of numbers (a, b, c) such that a* b = c.
minuses examples of poisonous ones.
More formally:
Mult = {(a.b,c)la * b = c} learning is that objects that look in some way similar also belong to the same c lass.
• The concept of a certain disease D: U is the set of all possible patient descriptions Generally, the world appears to be kind in that this similarity assumpt ion is usually
in terms of some chosen repertoire of f eature s. D is the set of all those t rue in real life, and t his is what makes machine l earning from examp les possible.
descriptions of patients that suffer from the disease in question. However, how one det ermines that two obje cts are similar and others are not is
another quest ion. What is the measure of similarity, either exp licit or implicit?
Learning systems differ significantly in this respect.
18.2.2 Examples and hypotheses For the same reason of sim ilarity, ano ther mushroom with W = S and H = 4
wou ld quite obviously seem to be poisonous. However, a mushroom with W = 2
To introduce some terminology, consider the following, hypothetical problem of
and H = 2 is hard t o deci de and any classification seems reasonable and risky.
learning whether a mushroom is edible or poisonous. We have collected a number
U sually t he result of learning is a concept description, or a classifier t hat will classify
of example mushrooms and for each of them we have an expert opinion. Suppose
new objects. Such a classifier can be stated in various ways, using various formalisms.
that eac h mushroom is (unreali stically simply!) described just by its height and its
These formalisms are altematively called concept description languages or hypothesis
width. We say that eac h of our example objects has two attributes: heigh t and width,
languages. The reason for calling them hypothesis languages is tha t they describe the
in centimetres. In our case both attributes are numerical. In addition, for each
learner's hypotheses, on the basis of t he learning data, about the target concept.
example mushroom its class is also given: poisonous or edible. From the point of
Usually the learner is never sure that a hypothesis, induced from t he data, actually
view of the concept ' edible', the two class values are appropriately abbreviated in to
does correspond to the target concept .
'+' (edible) and '-' (not edible). A ccordingly, the given edible mushrooms are
Here are some possible hypotheses that can be indu ced from the mushroom dat a:
positive examples for the concept 'edi ble'; the given poisonous mushrooms are
negative examples for the con cept'edible'. If 2 < W and W < 4 and H < 2
Hypothesis 1:
Figure 18. l shows our learning data. To learn about mushrooms, then, means to then'edible' else'poisonous'
be able to classify a new mushroom into one of the two classes'+' or'-'. Suppose
Hypothesis 2: If H > W then'poisonous'
now that we have a new mushroom whose attributes are: W = 3, H = 1. ls it edible
else if H > 6 - W then 'poisonous'
or poisonous? Looking at the examples in Figure 18.1, most people would say
else'edible'
'edible' without much thought. Of course, there is no guarantee that this mushroom
If H < 3 - (W - 3) then'edible' else'poisonous'
2
actually is edible, and there may be a surprise. So this classification is still a Hypothesis 3:
hypothesis. However, this hypothesis seems very likely because the attribut e values three hypotheses are stated in the
of this mushroom are similar to those of many known edible mushrooms, and These hypotheses are illustrated in Figure 18.2. All
hypot hesis language in machine learning
dissimilar to all the poisonous ones. In general, the main assumption in machine form of if-then rules. Another popular
446 Machine Learning
The problem of learning concepts from examples 447
H
18.2.3 Description languages for objects and concepts
3 f-- For any kind of learning we need a language for describing objects and a language
Figure 18.3 Hypothesis 1 represented by a decision tree. Internal nodes of the tree are
labelled by attributes, the leaves by classes, and branches correspond to attribute 18.2.4 Accuracy of hypotheses
values. For example, the left-most branch corresponds to W < 2. The left-most
leaf says that a mushroom that falls into this leaf is poisonous (class'-') An The problem of learning from examples is usually formulated as follows. There i
object falls into a particular leaf if the object satisfies all the conditions along the some target concept C that we want to learn about. There is some hypothesi
path from the root to the leaf. language L in which we can state hypotheses about C. No definition of C is known
448 Machine Learning Learning relational descriptions: a detailed example 449
and the only source of information for learning about C is a set of classified
examples. Usually, examples are pairs (Object, Class) where Class says what concept
Object belongs to. The aim of learning is to find a formula H in the hypothesis
o==a An arch
TI
However, how can we know how well H corresponds to C? The only way to estimate
how well H corresponds to C is to use the example set S. We can evaluate how well H
performs on the example set S. If H usually classifies the examples in S correctly (that Notanarch
is, into the same classes as given in the examples), then we may hope that H will
classify other, new objects correctly as well. So one sensible policy is to choose,
□ □
among possible hypotheses, one that re-classifies all the example objects into the
same class as given in the set S. Such a hypothesis is said to be consistent with
the data. ,\ consistent hypothesis has 100 percent classification accuracy on the
learning data. Of course, we are more interested in a hypothesis' predictive accuracy. Not an arch
How well does the hypothesis predict the class of new objects, those not given in S?
The prediction accuracy is the probability of correctly classifying a randomly chosen
object in the domain of learning. Possibly surprisingly, it sometimes turns out that
h�·potheses that achieve the highest accuracy on the learning data S will not stand
the best chance to also achieve the highest accuracy on new data, outside S. This A A•'"h
observation pertains particularly to the case of learning from noisy data when the
learning data contain errors. This will be discussed in Section 18.6.
The most usual criterion of success in inductive machine learning is the Figure 18.4 A sequence of examples and counter-examples for learning the concept of arch.
predictive accuracy of the induced hypothesis. However, there are other criteria
of success, most notably the criterion of comprehensibility or 'understandability' of
induced hvpotheses. How meaningful is an induced hypothesis to a human expert7 (2) postl and post2 are rectangles; lintel can be a more general figure, a kind
This will be discussed in more detail in Section 18.7. of polygon, for example (this may be concluded from examples 1 and 4 in
Figure 18.4).
(3) postl and post2 must not touch (thi:; can be concluded from the negative
example 2).
18.3 Learning relational descriptions: a detailed example postl and post2 must suport lintel (this can be concluded from the negative
·············-·-··························· ·············· ············· ··························· ···· ································· ···· (4)
Leaming about structures in the blocks world, and about arches in particular, was example 3).
introduced as a study domain by Winston (1975) in his early work in machine In general, when a concept is learned by sequentially processing the learning
learning. We will use it here to illustrate some important mechanisms involved examples, the learning process proceeds through a sequence of hypotheses, H 1 , H2 ,
in learning. Our treatment will basically, although not entirely, follow Winston's etc., about the concept that is being learned. Each. hypothesis in this sequence is an
program called ARCHES. approximation to the target concept and is the result of the examples seen so far.
The program ARCHES can be made to learn the concept of an arch from examples After the next example is processed, the current hypothesis is updated resulting in
as shown in Figure 18.4. The given examples are processed sequentially and the the next hypothesis. This process can be stated as the following algorithm:
learner gradually updates the current hypothesis about the arch concept. In the case
of Figure 18.4, after all four examples have been processed by the learner, the
hypothesis (that is, the learner's final understanding of an arch) may informally look To learn a concept C from a given sequence of examples E 1 , E2, • • •, En (where E 1
like this: must be a positive example of C) do:
(1) Adopt E1 as the initial hypothesis H1 about C ..
( 1) An arch consists of three parts; let us call them postl, post2 and lintel.
450 Machine Learning Learning relational descriptions: a detailed example 451
(2) Process all the remaining examples: for each Ei (i = 2, 3, ...) do:
2.1 Match the current hypothesis H;_ 1 with E;; let the result of matching be
some description D of the differences between H;_ 1 and E;.
2.2 Act on H;_1 according to D and according to whether E; is a positive or
a negative example of C. The result of this is a refined hypothesis H;
about C.
touch
The final result of this procedure is H11 , which represents the system's understanding
of the concept C as learned from the given examples. In an actual implementation,
steps 2.1 and 2.2 need some refinements. These are complicated and vary between
different learning systems. To illustrate some ideas and difficulties, let us consider in
more detail the case of learning about the arch from the examples in Figure 18.4.
First, we have to become more specific about the representation. The ARCHES
program uses semantic networks to represent both learning examples and concept
descriptions. Figure 18.5 shows examples of such semantic networks. These are
graphs in which nodes correspond to entities and links indicate relations between
musLnoL touch
entities.
The first example, represented by a semantic network, becomes the current
hypothesis of what an arch is (see H 1 in Figure 18.5).
The second example (E2 in Figure 18.5) is a negative example of an arch. It is easy
to match E2 to H 1. As both networks are very similar it is easy to establish the
correspondence between the nodes and links in H 1 and E.2. The result of matching
shows the difference D between I-1 1 and E2 . The difference is that there is an
extra relation, touch, in E 2. Since this is the only difference, the system concludes
that this must be the reason why E2 is not an arch. The system now updates the isa triangle isa
current hypothesis H 1 by applying the following general heuristic principle of rectangle
musLnoLtouch
learning:
i{
example is negative and
example contains a relation R which is not in the current hypothesis H
then
forbid R in H (add must_not_R in H) isa
The result of applying this rule on H 1 will be a new hypothesis H2 (see Figure 18.5). stable_poly
Notice that the new hypothesis has an extra link rnust_not_touch, which imposes an
additional constraint on a structure should it be an arch. Therefore we say that this rectangle
new hypothesis H 2 is more specifi.c than H 1.
The next negative example in Figure 18.4 is represented by the semantic network musLnoLtouch
E3 in Figure 18.5. Matching this to the current hypothesis H2 reveals two differences:
Figure 18.5 Evolving hypotheses about the arch. At each stage the current hypothesis Hi is
two support links, present in H 2 , are not present in E.3 . Now the learner has to make a compared with the next example E,+ 1 and the next, refined hypothesis Hi+ 1 is
guess between three possible explanations: produced.
452 Machine Learning Learning relational descriptions: a detailed example 453
/
(1) E3 is not an arch because the left support link is missing, or
(2) E 3 is not an arch because the right support link is missing, or
(3) E3 is not an arch because both support links are missing. figure
Accordingly, the learner has to choose between the three possible ways of updating
the current hypothesis. Let us assume that the learner's mentality is more radical
/� polygon circle
/�
than conservative, thus favouring explanation 3. The learner will thus assume that
both support links are necessary and will therefore convert both support links in H 2
into must_support links in the new hypothesis H3 (see Figure 18.5). The situation of convex_poly concave_poly
/�
missing links can be handled by the following condition-action rule, which is
another general heuristic about learning:
/1�\�
stable_poly unstable_poly
if
example is negative and
example does not contain a relation R which is present in the
current hypothesis H
triangle rectangle trapezium unstable_triangle hexagon
then
require R in the new hypothesis (add must_R in H)
Figure 18.6 A hierarchy of concepts.
Notice again that, as a result of processing a negative example, the current
hypothesis has become still more specific since further necessary conditions are
introduced: two must_support links. Notice also that the learner could have chosen a
more conservative action; namely, to introduce just one must_support link instead of
two. Obviously, then, the individual learning style can be modelled through the set
of condition-action rules the learner uses to update the current hypothesis. By
varying these rules the learning style can be varied between conservative and
cautious to radical and reckless.
The last example, E4 , in our training sequence is again positive. Matching the
corresponding semantic networks E4 to H3 shows the difference: the top part is a
triangle in E4 and a rectangle in H3 . The learner might now redirect the corres
ponding isa link in the hypothesis from rectangle to a new object class:
TT trapezium
rectangle
isa
,__
454 Machine Learning
• Given two hypo theses, one of them may be more general than the o ther, or
0
m ore specific than the other; or they can be incomparable with respect to their
0
generality or specificity.
• Modifica tions of a hypo thesis during the learning process enhance ei ther: the
,.
generality of the hypo thesis in o rder to make the hypo thesis match a given
po sitive example; or the specificity of the hypo thesis in order to prevent the
0
hypothesis from matching a negative example.
•
f
C oncept modification principles for a given learning system can be represented
as condition-action rules. Ry means of such rules, the 'mentality' of the learning
system can be modelled, ran ging from conservative to reckless.
The pre,-ious edition of this book (Bratko, Prolog Progrwrzmins for Artificial Intelligence,
Addison-Wesley J 990) in cludes a Prolog implementation of learning relational
descriptions as discussed in this section. Here we omit this implementation.
although the ARCHES program is a very good illustration of fundamental concepts
in machine learning. However, there are learning algorithms that have established
themselves as more effective learning tools for practical applications. These include
the learn ing o f if-then rules (Section 18.4) and decision trees (Section 18.5). We ,,, 'i .aw;+•
return to the learning of relational descriptions in the next chapter, using the
framework of inductive logic programming.
Figure 18.8 Camera image of some objects.
attributes are: the size, the shape and the number of holes in an object. Let the
18.4 possible values of these a ttributes be:
In this section we look at a kind of learning where examples and hypo theses are Assume that the vision system has extracted the three attribute values for each
described in terms of a se t of attributes. In principle, attributes can be of various object. Figure 18.9 shows the attribute definitions and these examples represented as
types, depen din g on their possible values. So an attribute can be numerical or non a set of Prolog clauses of the form:
numerical. Further, if the attri bute is non-numerical, its set of values can be ordered
or unordered. We will limit ourselves to non-numerical attributes with uno rdered
example( Class, [ Attributel= Vall, Attribute2= Val2, ...]).
value sets. Such a set is typically small, containing a few values only. program that is
Now assume that these examples are commun icated to a learning
An object is described by specifying concrete values of the attributes in the object. The result f lear ing will be a description o f
supposed to learn about the five classes. o n
Such a description is then a vector of attribute values. rules that ca be used f r classifyi g new objects. The
the classes in the form of n o n
Figure 18.8 shows some objects that will be used as an illustration in this section. exempl ified by the follow ng p ssible rules for the classes nut and
format of rules is i o
These objects belong to five classes: nut, screw, key, pen, scissors. Suppose that these
key:
objects have been shown to a vision system. Their silhouettes have been captured by
a camera and further processed by a vision-program. This program can then extract nut<== ([size= small, holes= ll]
some attribute values of each object from the camera image. In our case, the key<== [[shape= long, holes= 1], [shape= other, holes= 2]]
'156 Machine Learning Learning simple if-then rules 457
attribute( size, [small, large]). matches the rule for key by satisfying the first attribute-value list in the rule. Thus
attribute( shape. [long, compact, other]). attribute values in Conj are related conjunctively: none of them may be contradicted
attribute( holes, [none, 1, 2, 3, many]). by the object. On the other hand, the lists Conjl, Conj2, etc., are related dis
example( nut, [size= small, shape= compact, holes= 1]). junctively: at least one of them has to be satisfied.
example( screw, [size= small, shape= long, holes= none]). The matching between an object and a concept description can be stated in
example( key, [size =c small, shape= long, holes= l]). Prolog as:
example( nut, [size= small, shape= compact, holes= 1]).
example( key, [size= large, shape= long, holes= l]). match( Object, Description)
example( screw, [size= small, shape= compact, holes= none]). member( Conjunction, Description),
example( nut, [size= small, shape= compact, holes= l]). satisfy( Object, Conjunction).
example( pen, [size= large, shape= long, holes-= none]).
satisfy( Object, Conjunction)
example( scissors, [size=• large, shape= long, holes= 2]).
not (
example( pen. [size= large, shape= long, holes= none]).
member( Att = Val.. Conjunction), '¾, Value in concept
example( scissors, [size= large, shape= other, holes= 2]).
member( Att= ValX, Object), % and value in object
example( key, [size "" small, shape= other, holes= 2]).
ValX \== Val). % are different
Figure 18.9 Attribute definitions and examples for learning to recognize objects from their Notice that this definition allows for partially specified objects when some attribute
sdhouettes (from Figure 18 8) value may be unspecified - that is, not included in the attribute-value list. In such a
case this definition assumes that the unspecified value satisfies the requirement in
The meaning of these rules is: Conjunction.
.-\n object is a nut if
its size is small and
it has 1 hole.
18.4.2 Inducing rules from examples
An object is a key if Now we will con�der how rules can be constructed from a set of examples. As
its shape is long and opposed to the arches example of the previous section, a class description will not be
it has 1 hole constructed by processing the examples sequentially one-by-one. Instead, all the
or examples will be processed 'in one shot'. This is also called batch learning as opposed
its shape is 'other' and to incremental learning.
it has 2 holes. The main requirement here is that the constructed class description matches
The general form of such rules is: exactly the examples belonging to the class. That is, the description is satisfied by all
the examples of this class and no other example.
Class<== [Conjl, Conj2, ... ]
When an object matches a description, we say that the description covers the
where Conj 1. Conj2, etc., are lists of attribute values of the form: object. Thus we have to construct a description for the given class that covers all
[Attl = Val 1, Att2= Val2, ...] the examples of this class and no other example. Such a description is said to be
complete and sound: complete because it covers all the positive examples and sound
A class description [ Conj 1, Conj2, ... ] is interpreted as follows: because it covers no negative example.
(1) an object matches the description if the object satisfies at least one of Conjl, A widely used approach to constructing a consistent hypothesis as a set of if-then
Conj2, etc.; rules is the covering algorithm shown in Figure 18.10. It is called the 'covering'
an object satisfies a list of attribute values Conj if all the attribute values in Conj algorithm because it gradually covers all the positive examples of the concept
(2)
are as in the object. learned. The covering algorithm starts with the empty set of rules. It then iteratively
induces rule by rule. No rule may cover any negative example, but has to cover some
For example, an object described by: positive examples. Whenever a new rule is induced, it is added to the hypothesis,
[ size= small, shape= long, holes= l] and the positive examples covered by the rule are removed from the example set.
458 Machine Learning
Learning simple if-then rules 459
To induce a list of rules RULELIST for a set S of classified examples, do: '¾, Learning of simple if-then rnles
RULELIST := empty;
:- op( 300,xfx, <== ).
E := S;
while E contains positive examples do % learn( Class): collect learning examples into a list, construct and
begin % output a description for Class,and assert the corresponding rule about Class
RULE:= lnduce0neRule(E);
Add RULE to RULELIST; learn( Class) :-
bagof( example( ClassX, Obj),example( ClassX,Obj), Examples), % Collect examples
Remove from E all the examples covered by RULE
learn( Examples, Class, Description), IJlo Induce rule
end
nl, write( Class), write(' <== '),nl, % Output rule
writelist( Description),
Figure 18.10 The covering algorithm Procedure lnduceOneRule(E) generates a rule that assert( Class <== Description). 11/t, Assert rule
learn_conj( Examples, Class, Conjunction) Figure 18.11 A program that induces if-then rules.
Learning simple if-then rules 461
460 Machine Learning
The attribute-value list Conjunction emerges gradually, starting with the empty list
Figure 18.11 contd and adding to this list conditions of the form:
filter( Examples, Cond, Examplesl) Attribute = Value
findall( example( Class, Obj),
( member( example( Class, Obj), Examples), satisfy( Obj, Cond)), Notice that, in this way, the attribute-value list becomes more and more specific (it
Examplesl). covers fewer objects). The attribute-value list is acceptable when it becomes so
% remove( Examples, Conj, Examples!): specific that it only covers positive examples of Class.
% removing from Examples those examples that are covered by Conj gives Examplesl The process of constructing such a conjunction is highly combinatorial. Each
time a new attribute-value condition is added, there are almost as many alternative
remove( [ ], _ , []).
candidates to be added as there are attribute-value pairs. It is not easy to immedi
remove( [example( Class, Obj)I Es], Conj, Esl)
ately see which of them is the best. In general, we would like to cover all the positive
satisfy( Obj, Conj), !, % First example matches Conj
remove( Es, Conj, Es 1). % Remove it examples with as few rules as possible, and with as short rules as possible. Thus
learning can be viewed as a search among possible descriptions with the objective of
remove( [EI Es], Conj, [EI Esll) % Retain first example
remove( Es, Conj, Es 1). minimizing the length of the concept description. Because of the high combin
atorial complexity of this search, we normally have to resort to some heuristic. The
satisfy( Object. Conj) :-
not( member( Att = Val, Conj), program in Figure 18.11 relies on a heuristic scoring functton that is used locally. At
member( Att = ValX, Object), each point, only the best-estimated attribute value is added to the list, immediately
ValX \== Val). disregarding all other candidates. The search is thus reduced to a deterministic
score( Examples, Class, AttVal, Score) procedure without any backtracking. This is also called greedy search or hill-climbing.
candidate( Examples, Class, AttVal), 0/r, A suitable attribute value It is 'greedy' because it always chooses the best-looking alternative. However, in such
filter( Examples, [ AttVal], Examples!), % Examples) satisfy condition Att = Val search there is a risk of missing the shortest concept description.
length( Examplesl, Nl), % Length of list The heuristic estimate is simple and based on the following intuition: a useful
count_pos( Examplesl, Class, NPosl), 'Yi, Number of positive examples attribute-value condition should discriminate well between positive and negative
1\'Posl > 0, % At least one positive example examples. Thus, it should cover as many positive examples as possible and as few
Score is 2 'NPosl - Nl. % matches AttVal
negative examples as possible. Figure 18.12 shows the construction of such a heuristic
candidate( Examples, Class, Att = Val) scoring function. This function is in our program implemented as the procedure:
attribute( Att, Values), % An attribute
member( Val, Values), % A value score( Examples, Class, AttributeValue, Score)
suitable( Att = Val, Examples, Class).
suitable( AttVal, Examples, Class) :- % At least one negative example
member( example( ClassX, ObjX), Examples), % must not match AttVal
ClassX \== Class, % Negative example
not satisfy( ObjX, [ AttValj), !. % that does not match
'Vo count_pos( Examples, Class, N): NEG
% N is the number of positive examples of Class
count_pos( [ ], _ , 0).
count_pos( [example( ClassX,_ )I Examples], Class, N)
count_pos( Examples, Class, Nl),
( ClassX = Class, !, N is Nl + l; N = Nl).
Figure 18.12 Heuristic scoring of an attribute value. POS is the set of positive examples of the
write!ist( []).
class to be learned; NEG is the set of negative examples of this class. The shaded
writelist( [XI L]) area, ATTVAL, represents the set of objects that satisfy the attribute-value
tab( 2), write( X), nl, condition. The heuristic score of the attribute value is the number of positive
writelist( L).
examples in ATTVAL minus the number of negative examples in ATTVAL.
462 Machine Learning Induction of decision trees 463
Score is the difference between the number of covered positive and covered negative
examples.
The program in Figure 18.11 can be run to construct some class descriptions for
the examples in Figure 18.9 with the query:
none
?- learn( nut), learn( key), learn( scissors).
nut<==
[shape= compact, holes= 1]
key<==
II\ T\
null
[shape = other, size= small]
[holes= 1, shape= long]
scissors<== small large compact other
\
small large
\
long
I I I
[holes= 2, size= large]
The procedure learn also asserts rules about the corresponding classes in the
I \
screw pen key nut null key scissors
program. These rules can be used to classify new objects. A corresponding recog
nition procedure that uses the learned descriptions is: Figure 18.13 A deos1on tree induced from examples of Figure 18.9 (shown graphically in
classify( Object, Class) :-
Figure 18 8)
Class<== Description, % Learned rule about Class
member( Conj, Description), % A conjunctive condition
satisfy( Object, Conj). % Object satisfies Conj In comparison with if-then rules to describe classes, discussed in the previous
section, trees are a more constrained representation. This has both advantages and
disadvantages. Some concepts are more awkward to represent with trees than with
rules: although every rule-based description can be translated into a corresponding
18.5 Induction of decision trees decision tree, the resulting tree may have to be much lengthier than the rule-based
description. This is an obvious disadvantage of trees.
On the other hand, the fact that trees are more constrained reduces the
18.5.1 Basic tree induction algorithm combinatorial complexity of the learning process. This may lead to substantial
improvement in the efficiency of learning. Decision tree learning is one of the most
Induction of decision trees is probably the most widespread approach to machine efficient forms of learning. It should be noted, however, that computational
learning. In this case, hypotheses are represented by decision trees. Induction of efficiency of learning is only one criterion of success in learning, as will be discussed
trees is efficient and easy to program. later in this chapter.
Figure 18.13 shows a decision tree that can be induced from the examples of The basic tree induction algorithm is as shown in Figure 18.14. The algorithm
Figure 18.9 (that is, the objects in Figure 18.8). Internal nodes in the tree are labelled aims at constructing a smallest tree consistent with the !earning data. However,
with attributes. The leaves of the tree are labelled with classes or the symbol 'null'. search among all such trees is prohibitive because of combinatorial complexity.
'null' indicates that no learning example corresponds to that leaf. Branches in the Therefore the common approach to tree induction is heuristic, without a guarantee
tree are labelled with attribute values. In classifying an object, a path is traversed in of optimality. The algorithm in Figure 18.14 is greedy in the sense that it always
the tree starting at the root node and ending at a leaf. At each internal node, we chooses 'the most informative' attribute, and it never backtracks. There is no
follow the branch labelled by the attribute value in the object. For example, an guarantee of finding a smallest tree. On the other hand, the algorithm is fast and
object described by: has been found to work well in practical applications.
Even in its simplest implementation, this basic algorithm needs some refinements:
[size= small, shape= compact, holes= 1]
(1) If S is empty then the result is a single-node tree labelled 'null'.
would be, according to this tree, classified as a nut (following .the path: holes = 1,
shape= compact). Notice that, in this case, the attribute value size= small is not (2) Each time a new attribute is selected, only those attributes that have not yet
needed to classify the object. been used in the upper part of the tree are considered.
464 Machine Learning Induction of decision trees 465
To construct a decision tree T for a learning set S do: information should be smaller than the initial information. The 'most informative'
attribute is the one that minimizes the residual information. The amount of
if all the examples in S belong to the same class, C, information is determined by the well-known entropy formula. For a domain
then the result is a single node tree labelled C exemplified by the learning set S, the average amount of information I needed to
otherwise
select the most 'informative' attribute, A, whose values are v1 ... , v,,; classify an object is given by the entropy measure:
partition S into S 1 , ... , S,, according to the values of A; I= - LP(c) log 2 p(c)
construct (recursively) subtrees T1, .... T,, for S 1 ..... S,,;
final result is the tree whose root is A and whose subtrees are T1, ... , T,,,
and the links between A and the subtrees are labelled by v 1 , .... v,, ; where c stands for classes and p(c) is the probability that an object in S belongs to
thus the decision tree looks like this: class c. This formula nicely corresponds to the intuition about impurity. For a
completely pure set, the probability of one of the classes is 1, and O for all other
A
classes. For such a set, information I= 0. On the other hand, information I is
VIA maximal in the case that the probabilities of all the classes are equal.
1
�vn
After applying an attribute A, the set S is partitioned into subsets according to the
values of A. The residual information I,es is then equal to the weighted sum of
the amounts of information for the subsets:
Tl T, T,,
l,e,;(A) = - LP(v) LP(c Iv) log2p(c Iv)
V C
Figure 18.14 The basic tree induction algorithm. where v stands for the values of A, p(v) is the probability of value v in set S, and
p(c iv) is the conditional probability of class c given that attribute A has value v. The
If S is not empty, and not all the objects in S belong to the same class, and there probabilities p(v) and p(c Iv) are usually approximated by statistics on set S.
(3)
is no attribute left to choose, then the result is a single-node tree. It is useful to Further refinements can be made to the information-theoretic criterion given
label this node with the list of all the classes represented in S, together with above. A defect of this criterion is that it tends to favour attributes with many values.
their relative frequencies in S. Such a tree is sometimes callee! a class probability Such an attribute wiJI tend to split the set S into many small subsets. If these subsets
tree, instead of a decision tree. In such a case the set of available attributes is not are very small, with just a few examples, they will tend to be pure anyway, regardless
sufficient to distinguish between class values of some objects (objects that of the genuine correlation between the attribute and the class. The straightforward
belong to different classes have exactly the same attribute values). Ires will thus give such an attribute undeserved credit. One way of rectifying this is
information gain ratio. This ratio takes into account the amount of information I(A)
(4) We have to specify the criterion for selecting the 'most informative' attribute.
needed to determine the value of an attribute A:
This will be discussed in the next section.
I(A) = - LP(v) log2(p(v))
V
18.5.2 Selecting 'the best' attribute I(A) will tend to be higher for attributes with more values. Information gain ratio is
defined as:
Criteria for selecting 'the best' attribute are a much investigated topic in machine
. . Gain(A) I - Ire,(A)
teaming. Basically these criteria measure the 'impurity', with respect to class, of a set GatnR at10(A) = --- = ---'--
!(A) I(A)
of examples. A good attribute should split the examples in subsets as pure as
possible. The attribute to choose is the one that has the highest gain ratio.
One approach to attribute selection exploits ideas from information theory. Such Another idea of circumventing the problem with many-valued attributes is the
a criterion can be developed as follows. To classify an object, a certain amount of binarization of attributes. An attribute is binarized by splitting the set of its values
information is needed. After we have learned the value of some attribute of the into two subsets. As a result, the splitting results in a new (binary) attribute whose
object, we only need some remaining amount of information to classify the object. two values correspond to the two subsets. When such a subset contains more than
This remaining amount will be called the 'residuaUnforrnation'. Of course, residual one value, it can be split further into two subsets, giving rise to another binary
466 Machine Learning Induction of decision trees 467
attribute,etc. When choosing a good split,the criterion generally is to maximize the examples and available attributes into lists, used as arguments for our induction
information gain of the so obtained binary attribute. After all the attributes have procedure:
been made binary, the problem of comparing many-valued attributes with few induce_tree( Tree)
valued attributes disappears. The straightforward residual information criterion findall( example( Class, Obj),example( Class, Obj), Examples),
gives a fair comparison of the (binary) attributes. finclall( Att, attribute( Att, ),Attributes),
There exist other sensible measures of impurity,like the Gini index, defined by: induce_tree( Attributes, Examples, Tree).
Gilli= I:_,p(i)pU) The form of the tree depends on the following three cases:
i¥,j
(1) Tree= null if the example set is empty.
where i and j are classes. After applying attribute A,the resulting Gini index is:
(2) Tree = leaf( Class) if all of the examples are of the same class Class.
Gi11i(A) = I:_,p(v) I:_,p(i I v)pU Iv) (3) Tree= tree( Attribute, [ Vall : SubTreel, Val2: Subtree2, . ..]) if examples belong
if i to more than one class, Attribute is the root of the tree, Vall, Val2, ... are
where v stands for values of A and p( i I v) denotes the conditional probability of class Attribute's values, and SubTreel, SubTree2, ... are the corresponding decision
i given that attribute A has value v. subtrees.
It should be noted that impurity measures are used here to assess the effect of a
These three cases are handled by the following three clauses:
single attribute. Therefore,the criterion 'most informative' is local in the sense that
it does not reliably predict the combined effect of several attributes applied jointly. 'i-'iJ induce_tree( Attrlbutes, Examples, Tree)
The basic tree induction algorithm is based on this local minimization of impurity. induce_tree( _,[ ],null) :- !.
As mentioned earlier, global optimization would be computationally much more
induce_tree( _ , [example( Class,_ ) I Examples], leaf( Class))
expensive. not ( member( example( ClassX, _), Examples), % No other example
ClassX \== Class),!. '¾, of dlfferent class
Exercises induce_tree( Attributes, Examples, tree( Attribute, SubTrees))
choose_attribute( Attributes,Examples, Attribute),
18.1 Consider the problem of learning from objects' silhouettes (Figure 18.9). Calculate de!( Attribute,Attributes, RestAtts),
attribute( Attribute, Values),
% Delete Attribute
the entropy of the whole example set with respect to class,the residual information induce_trees( Attribute, Values,RestAtts, Examples, SubTrees).
for attributes 'size' and 'holes',the corresponding information gains and gain ratios.
Estimate the probabilities needed in the calculations simply by the relative induce_trees induces decision SubTrees for subsets of Examples according to Values of
frequencies, e.g.p(nut)= 3/12 or p(nut I holes=l) = 3/5. Attribute:
18.2 Disease D occurs in 25 percent of all the cases. Symptom S is observed in 75 percent % induce_trees( Att, Vals, RestAtts, Examples, SubTrees)
of the patients suffering from disease D, and only in one sixth of other cases. induce_trees(_, (], _ ,_ ,(] ). % No attrlbutes, no subtrees
Suppose we are building a decision tree for diagnosing disease D, so the classes are induce_trees( Att, (Vall I Vais], RestAtts, Exs, (Vall : Treel I Trees])
just D (person has D) and ~D (does not have D). S is one of the attributes. What attval_subset( Att= Vall,Exs, ExampleSubset),
are the information gain and gain ratio of attribute S? induce_tree( RestAtts, ExampleSubset, Treel),
induce_trees( Att, Vais, RestAtts, Exs,Trees).
18. 5 .3 Implementing decision tree learning attval_subset( Attribute = Value, Examples, Subset) is true if Subset is the subset of
examples in Examples that satisfy the condition Attribute = Value:
Let us now sketch a procedure to induce decision trees: attval_subset( Attribute Value, Examples,ExampleSubset)
induce_tree( Attributes, Examples, Tree) findall( example( Class, Obj),
( member( example( Class, Obj),Examples),
where Tree is a decision tree induced from Examples using attributes in list Attributes. satisfy( Obj,[ Attribute Value])),
If the examples and attributes are represented as in Figure 18.9,we can collect all the ExampleSubset).
468 Machine Learning Learning from noisy data and tree pruning 469
The predicate satisfy( Obj ect, Description) is defined as in Figure 18.11. The predicate holes
choose_attribute selects the attribute that discriminates well among the classes. This none
size
involves the impurity criterion. The following clause minimizes the chosen impurity
small ==> screw
measure using setof. setof will order the available attributes according to increasing large ==> pen
impurity:
choose_attribute( Atts, Examples, BestAtt) : shape
se tof( lmpuri ty/Att, long ==> key
( member( Att, Atts), impurityl( Examples, Att, Impurity)), compact ==> nut
[ :\!inlmpurity/BestAtt I_]). other ==> null
2
Procedure size
small ===> key
impurity!( Examples, Attribute, Impurity) large ==> scissors
3 ==> null
implements a chosen impurity measure. Impurity is the combined impurity of the
many ==> null
subsets of examples after dividing the list Examples according to the values of
A ttribute .
18.6 Learning from noisy data and tree pruning
..........................................................................................................................................
Exercises In many applications the data for learning are imperfect. One common problem is
errors in attribute values and class values. In such cases we say that the data are noisy.
18.3 Implement a chosen impurity measure by writing the impurityl predicate. This Noise of course makes the learning task more difficult and requires special mechan
measure can be, for example, the residual information content or Gini index as isms. In the case of noise, we usually abandon the consistency requirement that
discussed previously in this section. For the examples in Figure 18.9 and attribute induced hypotheses correctly reclassify the learning examples. We allow the learned
size, use of the Gini index as impurity measure gives: hypothesis to misclassify some of the learning objects. This concession is sensible
?- Examples= . . . % Examples from Figure 18.9 because of possible errors in the data. We hope the misclassified learning objects are
impurity!( Examples, size, Impu rity). those that contain errors. Misclassifying such objects only indicates that erroneous
Impurity= 0.647619 data have in fact been successfully ignored.
Inducing decision trees from noisy data with the basic tree induction algorithm
Approximating probabilities by relative frequencies, Impuri ty is calculated as has two problems: first, induced trees unreliably classify new objects and, second,
follows: induced trees tend to be large and thus hard to understand. It can be shown that
Impu rity= Gi11i( size) some of this tree complexity is just the result of noise in the learning data. The
= p(small) ,(p(nut I small)*p(screw I small) + . . . ) + p(large)*( . . learning algorithm, in addition to discovering genuine regularities in the problem
= 7/12 * (3/7 * 2/7 + . . . ) - 5/12 * ( . . . ) domain, also traces noise in the data.
= 7/12 * 0.653061 + 5/12 * 0.64 As an example, consider a situation in which we are to construct a subtree of a
= 0.647619 decision tree, and the current subset of objects for learning is S. Let there be 100
18.4 Complete the tree induction program of this section and test it on some objects in S, 99 of them belonging to class Cl and one of them to class CZ. Knowing
learning problem, for example the one in Figure 18.9. Note that the procedure that there is noise in the learning data and that all these objects have the same
choose_attribute, using setof, is very inefficient and can be improved. Also add a values of the attributes already selected up to this point in the decision tree, it seems
procedure plausible that the class CZ object is in S only as a result of an error in the data. If so, it
is best to ignore this object and simply return a leaf of the decision tree labelled with
show( DecisionTree)
class Cl. Since the basic tree induction algorithm would in this situation further
for displaying decision trees in a readable form. For the tree in Figure 18.13, a expand the decisiori tree, we have, by stopping at this point, in effect pnmed a
suitable form is: subtree of the complete decision tree.
,,;-:.Zf.®:?iW", =
{I' ,
470 Machine Learning Learning from noisy data and tree pruning 471
T
Tree pruning is the key to coping with noise in tree induction programs. A
program may effectively prune decision trees by using some criterion that indicates , Prune here''
,'
whether to stop expanding the tree or not. The stopping criterion would typically ...
take into account the number of examples in the node, the prevalence of the
majority class at the node, to what extent an additional attribute selected at this
node would reduce the impurity of the example set, etc. Tl Tz
This kind of pruning, accomplished through stopping the tree expansion, is called
forward pruning as opposed to another kind of pruning, called post-pruning. Post Figure 18.16 Deciding about pruning.
pruning is clone after the learning program has first constructed the complete decision
tree. Then parts of the tree that seem unreliable are pruned away. This is illustrated in
Figure 18.15. After the bottom parts of the tree are removed, the accuracy of the tree methods that offer different, more or less satisfactory answers. \Ve will now look at
on new data may increase. This may appear paradoxical because by pruning we in one method of post-pruning, known as minimal error pmning.
fact throw away some information. How can accuracy increase after that? The key decision in post-pruning is whether to prune the subtrees below a given
This can be explained by that we are actually pruning the unreliable parts of the node or not. Figure 18.16 illustrates the situation. Tis a decision tree whose root is
tree, those that contribute to the tree 's misclassification errors most. These are nodes, T1, T2 , ... are T's subtrees, and p; are the probabilities of a random object
the parts of the tree that mainly trace noise in the data and nN the genuine passing from s to subtree T;.Tcan be in turn a subtree of a larger decision tree. The
regularities in the learning domain. Intuitively it is easy to see why the bottom parts question is to decide whether to prune belows (i.e. remove the subtreesT1, ... ), or
of the tree are the least reliable. Our top-down tree induction algorithm takes into not to prune. We here formulate a criterion based on the minimization of the
account all the learning data when building the top of the tree. When moving down expected classification error. We assume that the subtreesT1 , ... have already been
the tree, the learning data gets fragmented among the subtrees. So the lower parts optimally pruned using the same criterion on their subtrees.
of the tree are induced from less data.The smaller the data set, the greater the danger The classification accuracy of a decision tree T is the probahi lity that T will
that it is critically affected by noise. This is why the lower parts of the tree are correctly classify a randomly chosen new object. The classification error of Tis the
generally less reliable. opposite, i.e. the probability of incorrect classification. Let us analyze the error for
Of the two types of pruning, forward pruning and post-pruning, the latter is the two cases:
considered better because it exploits the information provided by the complete tree.
(1) If Tis pruned just belows thens becomes a leaf.s is then labelled with the most
Forward pruning, on the other hand, only exploits the information in the top part of
likely class C ats, and everything in this leaf is classified into class C. The error
the tree.
at s is the probability that a random object that falls into s belongs to a class
The big question remains, however, how to determine exactly which subtrees to
other than C. This is called the static error ats:
prune and which not. If we prune too much then we may throw away also healthy
information and then the accuracy will decrease. So, how can we know that we have e(s) = p( class -fc CI s)
not pruned too little or too much? This is an intricate question and there are several
(2) If the tree is not pruned just belows, then its error is the weighted sum of the
errors £(T1 ). £(T2), ... of the optimally pruned subtrees T1, Tz, ... :
P1E(T1) + p2E(T2) +.
This is called the backed-up error.
1\/------\-- -/
The decision rule about pruning belows then is: if the static error is less than or
equal to the backed-up error then prune, otherwise do not prune. Accordingly we
--\--._
__
can define the error of the optimally pruned tree T as:
Figure 18.15 Pruning of decision trees. After pcuning the accuracy may increase. Of course, ifThas no subtrees then simply E(T) = e(s).
472 Machine Learning Learning from noisy data and tree pruning 473
The remaining question is how to estimate the static error e(s), which boils down This provides a nice interpretation of them-estimate: probability pis simply equal to
to the probability of the most likely class C at nodes? The evidence we can use for the a priori probability p", modified by the evidence coming from N examples. If
this estimate is the set of examples that fall into node s. Let this set be 5, the total there are no examples then N = 0 and p = Pa - If there are many examples (N very
number of examples in 5 be N, and the number of examples of class C be n. Now, the large) then p"' n/N. Otherwise p is between these two values. The strength of
estimate of the probability of C at s is, in fact, an intricate problem. Most people the prior probability is varied by the value of parameter m (m ?o 0): the larger m, the
would immediately propose that we just take the proportion n/N (relative greater the relative weight of the prior probability.
frequency) of class C examples at node s. This is reasonable if the number of Parameter m has a specially handy interpretation in dealing with noisy data. If
examples at sis large, bL1t obviously becomes debatable if this number is small. For the domain expert believes that the data are very noisy and the background
instance let there be just one example at s. Then the proportion of the most likely knowledge is trustworthy, then he or she will set m high (e.g. m = 100, giving much
class is 1/1 = 100 percent, and the error estimate is 0/l = 0. But given that we only weight to prior probability). If on the other hand the learning data are trustworthy
have one example at s, this estimate is statistically completely unreliable. Suppose and the prior probability less so then m will be set low (e.g. m = 0.2, thus giving
we were able to get another learning example at s, and that this example was of much weight to the data). In practice, to avoid the uncertainty with appropriate
another class. This single additional example would then drastically change the setting of parameter m, m can be varied. In this way a sequence of differently pruned
estimate to 1/2 = 50 percent 1 trees are obtained, each of them being optimal with respect to a different value of rn.
Another good illustration of the intricacies in estimating probabilities is the Such a sequence of trees can then be studied by the domain expert, who may be able
outcome of a flip of a particular coin. Suppose that in the first experiment with to decide which of the trees make more sense.
the coin we get the head. The relative frequency now gives that the probability There is now one remaining and non-trivial question: How to determine the prior
of the head is 1. This is completely counterintuitive because our a priori expectation probability p.i? If there is expert knowledge available then this should be used in
is that this probability is 0.5. Even if this coin is not quite 'honest', the probability of setting Pa - If not, the commonly used technique is to determine the prior prob
the head should still be close to 0.5, and the estimate 1 is obviously inappropriate. abilities by statistics on the complete learning data (not just the fragment of it at
This example also indicates that the probability estimate should depend not only on nodes), using simply the relative frequency estimate on the complete, large set. An
the experiments, but also on the prior expectation about this probability. alternative (often inferior to this) is to assume that all the classes are a priori equally
Obviously we need a more elaborate estimate than relative frequency. We here likely and have uniform prior probability distribution. This assumption leads to a
present one such estimate, callee! m-estimate, that has a good mathematical special case of the m-estimate, known as the Laplace estimate. If there are k possible
justification. According to the m-estimate, the expected probability of event C is: classes altogether, then for this special case we have:
n-;- Pam P" = 1/k, m=k
p=
N-;-m
Here Pa is the a priori probability of C, and m is a parameter of the estimate. The So the Laplace probability estimate is:
formula is derived using the Bayesian approach to probability estimation. Roughly,
n+l
the Bayesian procedure assumes that there is some, possibly very vague, a priori p=
+
knowledge about the probability of event C. This a priori knowledge is stated as the
prior probability distribution for event C. Then experiments are performed giving This is handy because it does not require parameters Pa and m. On the other hand, it
additional information about the probability of C. Prior probability distribution is is based on the usually incorrect assumption of the classes being a priori equally
updated with this new, experimental information, using Bayes' formula for con likely. Also, it does not allow the user to take into account the estimated degree of
ditional probability. The m-estimate formula above gives the expected value p of noise.
this distribution. So the formula allows us to take into account the prior expectation Figure 18.17 shows the pruning of a decision tree by minimum-error pruning,
about the probability of C, which is useful if we have some background know using the Laplace estimate. In this figure, the left-most leaf of the unpruned tree has
ledge about C in addition •o the given examples. This prior expectation is in the class frequencies [3, 2J, meaning that there are three objects of class Cl and two
m-estimate formula expressed by p" and rn, as discussed below. objects of class CZ falling into this leaf. The static error estimate, using Laplace
The m-estimate formula can be rewritten as: formula, for these class frequencies is:
m n N n +l 3+ 1
p = Pa * e(b-/eI'
A- ) = 1 - -- = 1 ---- = 0.429
N+m +N * N + m N+k ::i+2
474 Machine Learning cearning from noisy data and tree pruning 475
Before pruning:
Because the backed-up estimate is greater than the static estimate, the subtrees of b
are pruned and the error at b after pruning is:
E(b) = 0.375
The use of m-estimate for estimating the error at the nodes of a decision tree can
be criticized on the following grounds. The m-estimate formula assumes that the
available data is a random sample. However, this is not quite true for the data subsets
at the nodes of a decision tree. During tree construction, the data subsets at th,·
[3. 2J [l, OJ [l, OJ nodes of a tree have been selected by the attribute selection criterion among possible
0.429 0.333 0.333 partitions, according to the attributes, of the learning data. In spite of thi,
theoretical reservation, experience has shown that minimal-error pruning witl·
[I. I] [0. I]
0.5
m-estimate works well in practice. When exploring data with decision trees
0.333
experimenting with various values of parameter m for pruning is particularly useful.
The minimum-error pruning strategy can in principle use any approach tl
estimating errors. In any case, the expected error of the pruned tree will bL
minimized. Of course, a pruned tree thus obtained is only optimal with respect
After pruning: to the particular error estimates. One alternative approach to estimating errors is tr
partition the available learning data 5 into two subsets: 'growing set' and 'prunin�
set'. The growing set is used to build a complete decision tree consistent with th,
'growing' data. The pruning set is used just to measure the error at the nodes o
the tree, and then prune so as to minimize this error. Since the pruning set i·
independent of the growing set, we can simply classify the examples in the prunin:,
set by the decision tree and count the tree's errors. This gives an estimate o
[4, 2]
predictive accuracy on new data. The approach with a pruning set is sensible wher'
the learning data are ample. However, it is at a disadvantage when the learning daL:
[l, 2] [I, OJ
are scarce. In such a case, holding out a pruning set means even less data for growin�
the tree.
Figure 18.17 Minimal-error pruning of decision trees. Pairs of numbers in square brackets are
the numbers of class C 1 and class C2 examples that fall into the corresponding
nodes. The other numbers attached to the nodes are error estimates. For
Exercises
internal nodes, the first number is a static error estimate and the second number
is a backed-up estimate. The lower of the two numbers (in bold face) is 18.5 A volcano ejects lava completely randomly. In long-term statistics, the volcano wa'
propagated upwards. on average, active one day in ten days and inactive the remaining days. In the mm
recent 30 days it was active 25 days. This recently increased activity was taken int,
account by experts in the forecast for the next day. The radio news said that th•.
For the right-hand child of node b, the error estimate is e(b_right) = 0.333. For probability of the volcano being active the next day was at least 30 percent. Th
node b, the static error estimate is: prediction on TV quoted the probability between SO and 60 percent. Both expert,
on radio and TV, were known to use them-estimate method. I-low can the differenc,
e(b) = 0.375 in their estimates be explained?
The backed-up error estimate for b is: 18.6 Write a procedure
BackedUpError(b) = S/6 * 0.429 + 1/6 * 0.333 = 0.413 pnmetree( Tree, PrunedTree)
476 Machine Learning Success of learning 477
that prunes a decision tree Tree according to the minimum-error pruning method • Comprelzensibility (understandability) of the induced hypothesis H: It is often
discussed in this section. The leaves of Tree contain class frequency information important for the generated description to be comprehensible in order to tell the
represented as lists of integers. Use the Laplace estimate in the program. Let the user something interesting about the application domain. Such a description
number of classes be stored in the program as a fact: number_of_classes( K). can also be used by humans directly, without machine help, as an enhancement
to humans' own knowledge. In this sense, machine learning is one approach to
computer-aided synthesis of new knowledge. Donald Michie (1986) was an early
proponent of this idea. The comprehensibility criterion is also very important
18.7 when the induced descriptions are used in an expert system whose behaviour
has to be easy to understand.
We started this chapter by considering the ARCHES style of learning relational • Computational complexity: What are the required computer resources in terms
descriptions. ARCHES is a good illustration of the main principles and ideas in of time and space 7 Usually, we distinguish between two types of complexity:
learning. Also, its sequential style of processing examples is perhaps closer to human (1) Generation complexity (resources needed to induce a concept description
learning than the one-shot learning in the other two approaches discussed in this from examples).
chapter. On the other hand, the learning of attribute-value descriptions, in the form
of either rules or trees, is simpler and better understood than learning relational (2) Execution complexity (complexity of classifying an object using the
descriptions. Therefore, the simpler approaches have until now enjoyed more induced description).
attention and success in practical applications.
It has been shown that learning is inherently a combinatorial process. It involves
search among possible hypotheses. This search can be heuristically guided. It can 18.7.2 Estimating the accuracy of learned hypotheses
also be signiricantly constrained by the 'linguistic bias'; that is, the effect of the
hypothesis language only allowing certain forms of hypotheses to be constructed. The usual question after learning has been clone is: How well will the learned
Both our algorithms for learning if-then rules and decision trees were very strongly hypothesis predict the class in new data? Of course, when new data become
heuristically guided - for example, by the impurity criterion. Therefore, these available, this accuracy can simply be measured by classifying the new objects and
algorithms are efficient, although in our Prolog implementation (Figure 18.11) the comparing their true class with the class predicted by the hypothesis. The difficulty
evaluation of such a hell[istic itself is not efficiently implemented. These heuristics is that we would like to estimate the accuracy before any new data become available.
require the computation of certain statistics for which Prolog is not the most The usual approach to estimating the accuracy on new data is to randomly split
suitable. the available data into two sets: training set and test set. Then we run the learning
fn general, the success of learning is measured by several criteria. In the following program on the training set, and test the induced hypothesis on the test set as if this
sections we will look at some usual criteria, and discuss the benefits of tree pruning was new, future data. This approach is simple and unproblematic if a lot of data is
in the view of these criteria. available. A common situation is, however, shortage of data. Suppose we are
learning to diagnose in a particular area of medicine. We are limited to the data
about past patients, and the amount of data cannot be increased. When the amount
of learning data is small, this may not be sufficient for successful learning. The
18.7.1 Criteria for success of learning shortage of data is then aggravated by holding out part of the data as a test set.
When the number of learning examples is small, the results of learning and
Here are some usual criteria for measuring the success of a learning system:
testing are susceptible to large statistical fluctuations. They much depend on the
• Classification accuracy: This is usually derined as the percentage of objects particular split into a training set and test set. To alleviate this statistical uncertainty,
correctly classified by the induced hypothesis H. We distinguish between two the learning and testing is repeated a number of times (typically, ten times), each
types of classification accuracy: time for a different random split. The accuracy results are then averaged and their
variance gives an idea of the stability of the estimate.
(1) Accuracy on new objects; that is, those not contained in the training set S. An elaboration of this type of repetitive testing is k-fold cross-validation. Here the
(2) Accuracy on the objects in S. (Of course, this is only interesting when complete learning set is randomly split into k subsets. Then the learning and testing
inconsistent hypotheses are allowed.) is repeated for each of the subsets as follows: the ith subset is removed from the data,
£.� • ::::J0 : 0�w� k,..·,;.'-=' "''""-
J Y -ZN° ···sw"<c. v �-·�<>·YS7'77'vV� /�-;;,•,. ·
the rest of the data is used as a training set, and the ith subset is then used for testing available for such experiments from the well-known UC! Repository for Machine
the induced hypothesis. The k accuracy results so obtained are averaged and their Learning (University of California at Irvine; http://www.ics.uci.edu/~mlearn/
variance is computed. There is no particular method to choose k, but the choice MLRepository.html).
k = 10 is the most usual in machine learning experiments.
A particular case of cross-validation arises when the subsets only contain one
element. In each iteration, the learning is done on all the data but one example, and
the induced hypothesis is tested on the remaining example. This form of cross
validation is called 'leave-one-out'. It is sensible when the amount of available data
is particularly small. • Forms of learning include: learning by being told, learning from examples,
learning by discovery. Learning concepts from examples is also called inductive
learning. This form of learning has attained most success in practical applications.
18. 7.3 How pruning affects accuracy and transparency of decision trees
• Learning from examples involves:
Pruning of decision trees is of fundamental importance because of its benefi.cial objects and concepts as sets;
effects when dealing with noisy data. Pruning affects two measures of success of positive and negative examples of concept to be learned;
learning: fi.rst, the classification accuracy of a decision tree on new objects and, hypotheses about target concept;
second, the transparency of a decision tree. Let us consider both of these effects of hypothesis language.
pruning. • The goal of learning from examples is to construct a hypothesis that 'explains'
The comprehensibility of a description depends on its structure and size. A well suffi.ciently well the given examples. Hopefully such a hypothesis will accurately
structured decision tree is easier to understand than a completely unstructured tree. classify future examples as well. A hypothesis is consistent with the learning
On the other hand, if a decision tree is small (consisting of just ten or so nodes) then examples if it classifi.es all the learning data as given in the examples.
it is easy to understand regardless of its structure. Since pruning reduces the size of a
• The inductive learning process involves search among possible hypotheses. This
tree, it contributes to the comprehensibility of a decision tree. As has been shown
is inherently combinatorial. To reduce computational complexity, typically this
experimentally in many noisy domains, such as medical diagnosis, the reduction
process is heuristically guided.
of tree size can be dramatic. The number of nodes is reduced to, say, ten percent of
their original number, while retaining at least the same level of classifi.cation • During its construction, a hypothesis may be generalized or specialized.
accuracy. Normally, the fi.nal hypothesis is a generalization of the positive examples.
Tree pruning can also improve the classifi.cation accuracy of a tree. This effect of • Programs developed in this chapter are:
pruning may appear counter-intuitive since, by pruning, we throw away some • A program that learns if-then rules from examples defi.ned by attribute-value
information, and it would seem that as a result some accuracy should be lost.
vectors.
However, in the case of learning from noisy data, an appropriate amount of pruning
• A program that learns decision trees from examples defi.ned by attribute
normally improves the accuracy. This phenomenon can be explained in statistical
terms: statistically, pruning works as a sort of noise suppression mechanism. By value vectors.
pruning, we eliminate errors in learning data that are due to noise, rather than • The pruning of decision trees is a powerful approach to learning from noisy data.
throw away healthy information. Minimal-error pruning method was presented in detail.
• Difficulty of estimating probabilities from small samples was discussed, and the
Project I m-estimate was introduced.
• Criteria for assessing the success of a method of learning from examples include:
Carry out a typical project in machine learning research. This consists of imple • accuracy of induced hypotheses;
menting a learning algorithm, and testing its accuracy on experimental data sets
• comprehensibility of learned concept descriptions;
using the 10-fold cross-validation. Investigate how tree pruning affects the accuracy
on new data. Investigate the effect of minimal error pruning when varying the value • computational efficiency both in inducing a hypothesis from data, and in
of parameter m in m-estimate. Many real-life learning data sets are electronically classifying new objects with the induced hypothesis.
480 Machine Learning References 481
• The expected accuracy of learned hypotheses on new data is usually estimated independently discovered several pertinent mechanisms, including pruning of decision trees.
by cross-validation. 10-fold cross-validation is the most common. The leave-one The technique of minimal-error tree pruning was introduced by Niblett and Bratko (1986) and
out method is a special form of cross-validation. improved with the m-estimate in Cestnik and Bratko (1991). The m-estimate formula was
• Concepts discussed in this chapter are:
derived by Cestnik (1990) using the Bayesian procedure for probability estimation. Esposito
et al. (1997) carried out detailed experimental comparison of various methods of tree pruning.
machine learning Several other refinements, in addition to pruning, of the basic tree induction algorithm are
learning concepts from examples, inductive learning necessary to make it a practical tool in complex applications with real-life data. Several such
hypothesis languages refinements, along with early experiments in which machine-learned knowledge outperforms
relational descriptions human experts, are presented in Cestnik et al. (1987) and Quinlan (1986). Structuring decision
attribute-value descriptions trees to improve their transparency was studied by Shapiro (1987).
Let us mention some of the many interesting developments in and approaches to machine
generality and specificity of hypotheses
learning that could not be covered in this chapter.
generalization and specialization of descriptions
An early attempt at machine learning is Samuel's ( 1959) learning program that played the
,\RCHES-type learning of relational descriptions game of checkers. This program improved its position-evaluation function through the
learning of if-then rules experience it obtained in the games it played.
top-down induction of decision trees Mitchell (1982) developed the idea of version space as an attempt to handle economically the
learning from noisy data search among alternative concept descriptions.
tree pruning, post-pruning, minimal-error pruning In reinforcement learning (Sutton and Barto 1998) the learner explores its environment by
estimating probabilities performing actions and receiving rewards from the environment. In learning to act so as to
cross validation maximize the reward, the difficulty is that the reward may be delayed, so it is not clear which
particular actions are to be blamed for success or failure.
In instance-based learning, related to case-based reasoning (Kolodner 1993), the learner
References reasons about a given case by comparing it to similar previous cases.
Neural networks (also introduced in Mitchell (1997)) is a large area of learning which aspires
\fitchell's book (1997) is an excellent general introduction to machine learning (ML). [n to some extent to resemble biological learning. The learning occurs through adjustment of
\fichalski, Bratko and Kubat (1998), ML applications are presented in a variety of areas, ranging numerical weights associated with artificial neurons in the network. Neural networks have not
from engineering to medicine, biology and music. Current research in machine learning is been at the centre of Al because they lack explicit symbolic representation of what has been
published in the Al literature, most notably in the journals i'v[achine Leaming (Boston: Kluwer) learned. Hypotheses resulting from neural learning can be very good predictors, but are hard to
and Artificial [ntelligence (Amsterdam: North-Holland), and at annual conferences !C'vfL (Int. understand and interpret by humans.
Conf. on :-fachine Leaming), ECML (European Conf. on Machine Learning) and COLT An interesting approach to learning is e,1pla11ation-based learning (e.g. Mitchell et al. 1986),
(Computational Learning Theory). Many classical papers on ML are included in Michalski, also known as analytical learning. Here the learning system uses background knowledge to
Carbonell and Mitchell (1983, 1986) and Kodratoff and Michalski (1990). Gillies (1996) 'explain' a given example by a deductive process. As a result, background knowledge gets
investigates implications of the technical developments in ML and its applications regarding compiled into a more efficiently executable form. One approach to explanation-based general
a traditional controversy in the philosophy of science. ization is programmed in Chapter 23 of this book as a kind of meta-programming exercise.
:vfany data sets for experimentation with new ML methods are electronically available from Some learning programs learn by autonomous discovery. They discover new concepts
the UC[ Repository (University of California at Irvine; http://www.ics.uci.edu/-mleam/ through exploration, making their own experiments. A celebrated example of this kind of
\fLRepositorv.html). program is AM, Automatic Mathematician (Lenat 1982). AM, for example, starting with the
Our example of learning relational descriptions (Section 18.3) follows the ideas of an early concepts of set and 'bag', discovered the concepts of number, addition, multiplication, prime
learning program ARCHES (Winston 1975). The program in Section 18.4 that constructs number, etc. Shrager and Langley (1990) edited a collection of papers on ML for scientific
attribute-based descriptions is a simple version of the AQ-type learning. The AQ family of discovery.
learning programs was developed by Michalski and his co-workers (for example, Michalski Logic formalisms are used as a hypothesis language in the approach to learning called
1983). CN2 is a well known program for learning if-then rules (Clark and Niblett 1989). inductive logic programming (!LP), which is studied in the next chapter of this book.
Induction of decision trees is one of the most widely used approaches to learning, also The mathematical theory of learning is also known as COLT (computational learning
known under the name TDIDT (top-down induction of decision trees, coined by Quinlan, theory; e.g. Kearns and Vazirani 1994). It is concerned with questions like: How many
1986). TD[DT learning was much int1uenced by Quinlan's early program ID3, Iterative examples are needed so that it is likely to attain some specified accuracy of the induced
Dichotomizer 3 (Quinlan 1979), and tree induction is therefore often simply referred to as hypothesis? What are various theoretical complexities of learning for different classes of
ID3. Tree induction was also studied outside AI by Breiman and his co-workers (1984), who hypothesis languages?
482 Machine Learning References 483
Breiman, L., Friedman, J.H., Olshen, R.A. and Stone, C.J. (1984) Classification and Regression Shrager, J. and Langley, P. (1990) Computational Models of Scientific Discovery and Theory
Trees. Belmont, CA: Wadsworth Int. Group. Formation. San Mateo, Ci\: Morgan Kaufmann.
Clark, P. and Niblett, T. (1989) The CN2 induction algorithm. Machine Leaming, 3, Sutton, R.S. and Barto, A.G. (1998) Reinforcement Leaming: A11 Introduction. Cambridge, MA: MlT
262-284. Press.
Cestnik, B. ( 1990) Estimating probabilities: a crucial task in machine learning. Proc. £CAI 90, Winston, P.H. (1975) Learning structural descriptions from examples. [n: The Psychology of
Stockholm. Computer Vision (Winston, P.H., ed.). McGraw-Hill.
Cestnik, B. and Bratko, L (1991) On estimating probabilities in tree pruning. Proc. European
Con( on lvfachine Leaming, Porto, Portugal. Berlin: Springer-Verlag.
Cestnik, B., Kononenko, I. and Bratko, L (1987) ASISTANT 86: a knowledge elicitation tool for
sophisticated users. In: Progress in Machine Leaming (Bratko, L and Lavrac, N., eds).
Wilmslow, England: Sigma Press; distributed by Wiley.
Esposito, F., Malerba, D. and Semeraro, G. (1997) A comparative analysis of methods for
pruning decision trees. IEEE Trans. Pattern Analysis and Machine Intelligence 19: 416-491.
Gillies, D. (1996) Artificial Intelligence and Scientific ,vfethod. Oxford University Press.
Kearns, M.J. and Vazirani, U.V. (1994) An Introduction to Computational Learning Theory.
Cambridge, MA: MIT Press.
Koclratoff, Y. and Michalski, R.S. (1990) ivfachine Learning: An Artificial Intelligence Approach,
Vol. III. Morgan Kaufmann.
Kolodner, J.L. (1993) Case-Based Reasoning. San Francisco, CA: Morgan Kaufmann.
Lenat, D.B. (1982) AM: discovery in mathematics as heuristic search. In: Knowledge-Based
Systems in Artificial Intelligence (Davis, R. and Lenat, D.B., eds). McGraw-Hill.
:Vlichalski, R.S. (1983) A theory and methodology of inductive learning. [n: 1'vfacl!i11e Leaming:
An Artificial Intelligence Approach (Michalski, R.S., Carbonell, J.G. and Mitchell, T.M., eels).
Tioga Publishing Company.
Michalski, R.S., Bratko and I., Kubat, M. (eds) (1998) Machine Learning and Data Mining: Methods
and Applications. Wiley.
Michalski, R.S., Carbonell, j.G. and Mitchell, T.M. (eels) (1983) Machine Leaming: An Artificial
Intelligence Approach. Palo Alto, CA: Tioga Publishing Company.
Michalski, R.S., Carbonell, J.G. and Mitchell, T.M. (eds) (1986) Machine Learning: An Artificial
Intelligence Approach, Volume II. Los Altos, CA: Morgan Kaufmann.
Michie, D. (1986) The superarticulacy phenomenon in the context of software manufacture. In:
Proc. ofthe Royal Society, London, A 405: 189-212. Also reprinted in Expert Systems: Automating
Knowledge Acquisition (Michie, D. and Bratko, I.). Harlow, England: Addison-Wesley.
:V(itchell, T.M. (1982) Generalization as search. Artificial Intelligence, 18: 203-226.
Mitchell, T.M. (1997) Machine Learning. McGraw-Hill.
Mitchell, T.M., Keller, R.M. and Keclar-Cabelli, S.T. (1986) Explanation-based generalisation: a
unifying view. lvfachine Leaming, 1: 47-80.
Niblett, T. and Bratko, I. (1986) Learning decision rules in noisy domains. In: Research and
D�eloprnent in Expert Systems III (Bramer, M.A., ed.). Cambridge University Press.
Quinlan, J.R. (1979) Discovering rules by induction from large collections of examples. In:
Expert Systems in the Microelectronic Age (Michie, D., ed.). Edinburgh University Press.
Quinlan, J.R. (1986) [nduction of decision trees. Machine Leaming 1: 81-106.
Samuel, A.L. (1959) Some studies in machine learning using the game of checkers. IBM foumal
o(Research and Development, 3: 211-229. Also in Computers and Thought (Feigenbaum, E.A.
and Feldman, L. eds). McGraw-Hill, 1963.
Shapiro, A. (1987) Structured Induction in Expert Systems. Glasgow: Turing Institute Press, in
association with Addison-Wesley.
Introduction 485
(covers all the positive examples) and consistent (does not cover any negative domain can be included in background knowledge. If this involves the treatment of
example). time and space, axioms of reasoning about time and space may be included as
Usually both BK and H are simply sets of Prolog clauses, that is Prolog programs. background knowledge. In a typical application of !LP, the emphasis is on the
So from the viewpoint of automatic programming in Prolog this corresponds to the development of a good representation of examples together with relevant back
following. Suppose our target predicate is p(X) and we have, among others, a positive ground knowledge. A general purpose ILP system is then applied to carry out the
example p(a) and a negative example p(b). A possible conversation with program BK induction.
would be: The power of hypothesis language and flexibility of background knowledge in !LP
do not come without a price. This flexibility adds to the combinatorial complexity
?- p(a). % A positive example
no of the learning task. Therefore attribute-value learning, such as decision trees, is
% Cannot be derived from BK
much more efficient than !LP. Therefore in learning problems where attribute-valLie
?- p(b). ')b A negative example representations are adequate, attribute-value learning is recommended for efficiency
no '¼1 Cannot be derived from BK reasons.
In this chapter we will develop an !LP program called HYPER (Hypothesis refiner),
l'iow suppose an !LP system would be callee! to automatically induce an additional
which constructs Prolog programs through gradual refinement of some starting
set of clauses H and add them to the program BK. The conversation with the so
hypotheses. To illustrate the main ideas, we will first develop a simple and
extended program BK plus H would now be:
inefficient version of it, called MIN!HYPER. This will then be elaborated into HYPER.
?- p(a). '¾, A positive example
yes % Can be derived from BK and H
?- p(b).
no
(}() A negative exarnple
'¼, Cannot be derived from BK and H
19.2 .Constructing Prolog programs from examples
...... .................... ...................................... . ................... . . . .. . .. . ..
% Learning from family relations Prolog-like interpreter with special properties that we will implement specifically for
use in !LP.
% Background knowledge
backliteral( parent(X, Y), [X, Y]). % A background literal with vars. [X, Yl
backliteral( male(X), [X]). 19.2.2 Refinement graph
backliteral( female(X), [X]).
Let us now consider how a complete and consistent hypothesis can be generated for
prolog__predicate( parent(_,_ )). % Goal parent(_,_) executed directly by Prolog the learning problem of Figure 19.1. We may start with some overly general
prolog__predicate( male(_)).
hypothesis that is complete (covers all the positive examples), but inconsistent (also
prolog__predicate( female(_ )).
covers negative examples). Such a hypothesis will have to be specialized in a way tn
parent( pam, bob). retain its completeness and attain consistency. This can be done by searching
parent( tom, bob).
a space of possible hypotheses and their refinements. Each refinement takes a
parent( tom, liz).
parent( bob, ann). hypothesis Hl and produces a more specific hypothesis 1-12, so that 1-12 covers
parent( bob, pat). a subset of the cases covered by H l.
parent( pat, jim). Such a space of hypotheses and their refinements is called a refznement graph.
parent( pat, eve). Figure 19.2 shows part of such a refinement graph for the learning problem
female( pam). of Figure 19.1. The nodes of this graph correspond to hypotheses, the arcs between
male( tom). hypotheses correspond to refinements. There is a directed arc between hypothese,
male( bob). 1-11 and 1-12 if 1-12 is a refinement of Hl.
female( liz). Once we have a refinement graph the learning problem is reduced to searchin/'.
female( ann). this graph. The start node of search is some over-general hypothesis. A goal node o!
_____----r------ --
female( pat).
male( jim). has_daughter(X).
female( eve).
% Positive examples
has_daughter(X) : has_daughter(X) : has_daughter(X) :-•
ex( has_daughter(tom)). % Tom has a daughter
+
male(Y). female(Y). parent(Y, Z).
i
ex( has_daughter(bob)).
ex( has_daughter(pat)).
% Negative examples has_daughter(X) :
i
has_daughter(X) : femalc(Y), has_daughter(X) :
+\
nex( has_daughter(pam)). % Pam doesn't have a daughter male(X). parent(X, Z).
I\
parent(S, T).
I
nex( has_daughter(jim)).
I
start_hyp( [ [has_daughter(X)] / [X] ] ). % Starting hypothesis t
has_daughter(X) :
parent(X, Z),
Figure 19.1 A definition of the problem of learning predicate has_daughter.
+
female(U).
to the body of clauses that are being constructed. Background literals are calls to BK
predicates that may be specified directly in Prolog. The predicate prolog__predicate
has_daughter(X) :
specifies goals that are to be directly executed by the Prolog interpreter. For example, parent(X, Z),
prolog__predicate( parent(X, Y)). female(Z).
says that goals that match parent(X,Y) are evaluated by Prolog directly, executing the Figure 19.2 Part of the refinement graph for the learning problem of Figure 19.1. Many
BK predicate parent. Other goals, such as-has_daughter(tom), will be executed by a possible refinements are omitted in this diagram.
490 Inductive Logic Programming Constructing Proiog programs from examples 491
search is a hypothesis that is consistent and complete. In our example of Figure 19.2, where the variable 1.1 is refined into the structure (X2 I L2]. For simplicity, the
it is sufficient that all the hypotheses are just single clauses. In general, hypotheses program MINIHYPER that we will develop in this section will not be able to handle
consist of multiple clauses. problems that require structured terms. We will defer this until the next section
To implement this approach we have to design two things: when we develop a more sophisticated program HYPER.
Another point obvious from Figure 19.2 is the combinatorial complexity of
(1) A refi-nement operator that will generate refinements of hypotheses (such an refinement graphs and the ensuing search complexity. Again, in MINIHYPER we will
operator defines a refinement graph).
not worry about this and simply use the uninformed iterative deepening search.
(2) A search procedure to carry out the search. This will also later be improved to a best-first search in HYPER.
In the graph of Figure 19.2, there are two types of refinement. A refinement of a
clause is obtained by either:
19.2.3 Program MINIHYPER
(1) matching two variables in the clause, or
(2) adding a background literal to the body of the clause. We are now ready to start writing our first !LP program. We will choose to represent
hypotheses as lists of clauses:
An example of the first type of refinement is:
Hypothesis= [Clausel, Clause2, ... ]
has_c!aughter(X) :- parent(Y,Z).
Each clause will be represented by a list of literals (the head of the clause followed by
is refined into the body literals) and the list of variables in the clause:
has_c!aughter(X) :- parent(X,Z). Clause= [Head, BodyLiterall, llodyLiteral2, ...] / [Varl, Var2, ...J
by matching X=Y. An example of the second type of refinement is: For example, the hypothesis
which stands for p(X) :- p(X). This may lead to an infinite loop. We have to make
/c, Interpreter for hypotheses
0
our prove predicate immune from such loops. An easy way to do this is by limiting % prove( Goal,Hypo, Answ):
the length of proofs. When proving Goal, if the number of predicate calls reaches % Answ= yes, if Goal derivable from Hypo in at most D steps
this limit, then the procedure prove will simply stop without reaching a definitive % Answ= no, if Goal not derivable
answer. The argument Answer of prove will therefore have the following possible % Answ= maybe,if search terminated after D steps inconclusively
meanings: prove( Goal,Hypo, Answer) :-
max_proof_length( D),
Answer = yes: Goal has been derived from Hypothesis within proof limit
prove( Goal,Hypo,D, RestD),
Answer= no: Goal definitively cannot be derived from Hypothesis even if (RestD >= 0,Answer= yes % Proved
the limit was relaxed
Answer= maybe: proof search was terminated because maximum proof length RestD < 0,!,Answer= maybe % Maybe, but it looks like inf. loop
D was reached ).
The case 'Answer= maybe' means any one of the following three possibilities if Goal prove( Goal,_ ,no). % Otherwise Goal definitely cannot be proved
was executed by the standard Prolog interpreter: 'Yc, prove( Goal, Hyp,MaxD, RestD):
% MaxD allowed proof length, RestD 'remaining length' after proof;
(1) The standard Prolog interpreter (with no limit on proof length) would get into % Count only proof steps using Hyp
infinite loop.
prove( G,H,D,D)
(2) The standard Prolog interpreter would eventually find a proof of length greater D < 0,!. '¼, Proof length overstepped
than limit D.
prove([],_ ,D,D) :- !.
(3) The standard Prolog interpreter would find,at some length greater than D,that prove( [Gl I Gs],Hypo,DO, D) !,
this derivation alternative fails. Therefore it would backtrack to another prove( Gl,Hypo,DO,Dl),
alternative and there possibly find a proof (of length possibly no greater than prove( Gs, Hypo,Dl,D).
D), or fail, or get into an infinite loop. prove( G,_ ,D,D) :-
Figure 19.3 gives the program code for prove. The proof-length limit is specified by prolog_preclicate( G), 0
;,, Background predicate in Prolog?
(1) When proving a positive example, 'maybe' is treated as 'the example not Figure 19,3 A loop-avoiding interpreter for hypotheses
covered'.
(2) When proving a negative example,'maybe' is treated as not 'not covered',that The rest of the MINIHYPER program is given in Figure 19 .4. The predicates in this
is as covered. program are as follows.
A rationale behind this cautious interpretation of 'maybe' is that, among the refine( Clause,Vars,NewClause,NewVars)
possible complete hypotheses, computationally efficient ones are preferred. The Refines a given clause Clause with variables Vars and produces a refined clause
answer 'maybe' at best indicates that the hypothesis is computationally inefficient, NewClause with new variables NewVars. The refined clause is obtained by matching
and at worst that it is incomplete. two variables in Vars or by adding a new background literal to Clause. New literals are
494 Inductive Logic Programming Constructing Prolog programs from examples 495
max __proof_length( 6). 1/c, Max. proof length, counting calls to 'non-Prolog' pred.
1
9i, depth_first( Hypo, Hyp, MaxD):
% refine Hypo into consistent and complete Hyp in at most Maxi) steps max_clause_length( 3). % Max. number of literals in a clause
in the body will fail unless the argument X is instantiated (tested by atom(X)). This There is another generality relation frequently used in !LP, called tJ-subsumption.
will render many useless hypotheses incomplete and they will immediately be Although we will not be directly using 0-subsumption in the programs in this
discarded from search. The search task now becomes easier, although it may still chapter, we will introduce it here for completeness.
take considerable time, that is minutes or tens of minutes, depending on the First let us define the notion of substitution. A substitution O = (Yarl/Terml,
computer and implementation of Prolog. Eventually the goal induce(H) will result Var2/Term2, ... l is a mapping of variables Varl, Var2, etc. into terms Term1, Term2,
in: etc. A substitution ti is applied to a clause C by substituting the clause's variables by
terms as specified in 0. The application of substitution 8 to clause C is written as CO.
H = [[predecessor(A,B),[atorn(A),parent(A,C)],[atorn(C),predecessor(C,B)l] / [A,C,B], For example:
[predecessor(D,E),[atom(D),parent(D,E)JJ/[D,E]l
C = has_c!aughter(X) parent(X,Y), female(Y).
This is an expected definition of predecessor. 0 = ( X/tom, Y/!izl
It is clear that M!NlHYPER will soon nm out of steam when faced with slightly CO= has_c!aughter(tom) parent(tom,liz.), female(liz).
more difficult learning problems. In the next section we will therefore make several
improvements. Now we can define f/-subsumption.It is a generality relation between clauses. Clause
C 1 /J-subsumes clause C 2 if there is a substitution Osuch that every literal in C 1 0
occurs in C2. For example, the clause
Exercise parent(X, Y).
say that a variable of type list can be refmed into [XIL] where Xis of type item and Lis member(Xl,L 1).
+
of type list, or this variable is replaced by the constant [] (with no variables). We member(X2,L2).
assume throughout that ':' has been introduced as an infix operator.
In Figure 19.5, the clause Refine term Ll = [X3IL3]
member(Xl, [X3!L3)).
start_clause( [ member(X,L) ] / [ X:item, L:list] ).
mernber(X2,L2).
declares the form of clauses in the start hypotheses in the refinement graph. Each + Match Xl = X3
start hypothesis is a list of up to some maximum number of (copies of) start clauses.
rnember(Xl, [XljL3]).
The list of start hypotheses will be generated automatically. The maximum number member(X2,L2).
of clauses in a hypothesis is defined by the predicate max_clauses. This can be set
appropriately by the user according to specifics of the learning problem. + Refine term l.2 = [X4jL4]
Let us now state the refinement operator in HYPER in accordance with the member(Xl, [Xljl.3)).
foregoing discussion. To refine a clause, perform one of the following:
( 1) Match two variables in the clause, e.g. Xl = XZ. Only variables of the same type
can be matched.
+
member(XZ, [X4jL4)).
Aclcl literal member (XS,LS) ancl match i!lf'Ut LS = 1.4
member(Xl, [XllL3)).
Refine a variable in the clause into a background term. Only terms defined by member(XZ, [X4jL4}) :- member (XS,L4).
(2)
the predicate term/3 may be used and the type of the variable and the type of I Match XZ = XS
the term have to match.
r
member(Xl, [XllL3}).
(3) . -\dd a background literal to the clause. All of the literal's input arguments have member(XZ, [X4jL4l) :- member (X2,L4).
to be matched (non-deterministically) with the existing variables (of the same
type) in the clause. Figure 19.6 The sequence of refinements from a start hypothes;s to a target hypothesis
Notice the fourth step when a literal is added and its input argument immediately
,\s in i'vfL,:IJ-IYPER, to refine a hypothesis Ho, choose one of the clauses Co in Ho,
matched with an existing variable of the same type.
refine clause Co into C, and obtain a new hypothesis H by replacing Co in Ho with C.
Figure 19.6 shows a sequence of refinements when learning about member.
In the HYPER program, we will add to this some useful heuristics that often save after its refinement is, however, not overly restrictive. If a solution requires a
complexity. First, if a clause is found in Ho that alone covers a negative example, hypothesis with more clauses, then such a hypothesis can be generated from
then only refinements arising from this clause are generated. The reason is that such another start hypothesis that has a sufficient number of clauses.
a clause necessarily has to be refined before a consistent hypothesis is obtained. The
second heuristic is that 'redundant' clauses (which contain several copies of the
same literal) are discarded. And third, 'unsatisfiable clauses' are discarded. A clause is 19.3.2 Search
unsatisfiable if its body cannot be derived by predicate prove from the current
hypothesis. Search starts with a set of start hypotheses. This is the set of all possible bags of user
This refinement operator aims at producing least specifzc specializations (LSS). A defined start clauses, up to some maximal number of clauses in a hypothesis.
specialization H of a hypothesis Ho is said to be least specifzc if there is no other Multiple copies of a start clause typically appear in a start hypothesis. A typical start
specialization of Ho more general than H. However, our refinement operator really clause is something rather general and neutral, such as: cone( Ll, LZ, L3). In search,
only approximates LSS. This refinement operator does LSS under the constraint that the refinement graph is treated as a tree (if several paths lead to the same hypothesis,
the number of clauses in a hypothesis after the refinement stays the same. Without several copies of this hypothesis appear in the tree). The search starts with multiple
this restriction, an LSS operator would have to increase the number of clauses in a start hypotheses that become the roots of disjoint search trees. Therefore strictly
hypothesis. This would lead to a rather impractical refinement operator due to speaking the search space is a forest.
complexity. The number of clauses in a refined hypothesis could become very large. HYPER performs a best-first search using an evaluation function Cost(Hypothesisl
The limitation in our program to preserve the number of clauses in the hypothesis that takes into account the size of a hypothesis as well as its accuracy with respect to
504 Inductive Logic Programming Program HYPER 50
best_search( Hyps, Hyp) As an illustration let us execute HYPER on the problem of learning about the
Starts with a set of start hypotheses Hyps, generated by predicate start_hyps/1, and predicate member(X,L). The problem definition in Figure 19.5 has to be loaded into
performs best-first search of the refinement forest until a consistent and complete Prolog in addition to HYPER. The question is:
hypothesis Hyp is found. It uses the cost of hypotheses as the evaluation function to
?- induce(H), show_hyp(H).
guide the search. Each candidate hypothesis is combined with its cost into a term of
the form Cost:Hypothesis. When the list of such terms is sorted (by merge sort), the During the execution HYPER keeps displaying the current coLmts of hypotheses
hypotheses are sorted according to their increasing costs. (generated, refined and waiting-to-be-refined), and the hypothesis currently being
refined. The fmal results are:
prove( Goal, Hyp, Answer)
Hypotheses generated: 105
Proof-length limited interpreter defined in Figure 19.3.
Hypotheses refined: 26
To be refined: 15
cval( Hyp, Cost)
Evaluation function for hypotheses. Cost takes into account both the size of Hyp memher(A,[AIBI).
member(C,[AIB])
and the number of negative examples covered by Hyp. If Hyp does not cover any
membcr(C,B).
negative example then Cost= 0.
The induced hypothesis is as expected. Before this hypothesis was found, 105
start_hyps( Hyps) hypotheses were generated all together, 26 of them were refmed, anc! 15 of them
Generates a set Hyps of start hypotheses for search. Each start hypothesis is a list of were still in the list of candidates to be refi.ned. The difference 105 - 26 - 15 = 51
up to MaxClauses start clauses. MaxClauses is defined by the user with the predicate hypotheses were found to be incomplete and were therefore discarded irnmecliatelv.
max_clauses. Start clauses are defmed by the user with the predicate start_clause. The needed refinement depth for this learning problem is 5 (Figure I 9.6). The total
number of possible hypotheses in the refinement space defined by the (restricted)
show_hyp( Hyp) refinement operator in HYPER is seve:al thousands. HYPER only searched a fraction
Displays hypothesis Hyp in the usual Prolog format. of this (less than 10 percent). Experiments show that in more complex learning
problems (list concatenation, path finding) this fraction is mL1ch smaller.
init__counts, show_counts, aclell (Counter)
Initializes, displays and updates counters of hypotheses. Three types of hypotheses
are counted separately: generated (the number of all generated hypotheses), Exercise
complete (the number of generated hypotheses that cover all the positive examples),
and refined (the number of all refined hypotheses). 19. 5 Define the learning problem, according to the conventions in Figure 19.5, for
learning predicate conc(Ll, L2,L3) (list concatenation) ancl run HYPER with this
start_clause( Clause) definition. Work out the refinement depth of the target hypothesis and estimate the
User-defined start clauses, normally something very general like: size of the refinement tree to this depth for a two-c:ause start hvpothesis. Compare
this size with the number of hypotheses generated and refined by HYPER.
start_clause( [ member(X,L)] / [ X:item, L:list] ).
Simultaneously learning two predicates odd(L) and even(L) Hypotheses generated: 115
Hypotheses refined: 26
HYPER can be applied without modification to multi-predicate learning, that is To be refined: 32
learning several predicates simultaneously where one predicate may be defined in even([]).
terms of another one. This may even involve mutual recursion when the predicates oclc!([AIBJ) :-
learned call each other. We will here illustrate this by learning the predicates even(B).
odd(List) and even(List) (true for lists of odd or even length respectively). Figure 19.8 even([AIB]) :-
shows a definition of this learning problem. The result of learning is: odd(B).
Hypotheses generated: 85 The first, non mutually recursive definition can be prevented by a more restrictive
Hypotheses refined: 16 definition of term refinement. Such a more restrictive definition would allow lists to
To be refincd: 29 be refined to depth 1 only. This can be achieved by replacing type list with type
even([ ]). list(D), and changing the first clause aboL1t list refinement into:
even([.-\,BIC]) :-
term( list(D), [XII..], [ X:item, L:list(l)]) :- var(D).
even(C).
odd([AIBI) This defmition prevents a variable of type list(l) from being refined further. So terms
even(B). like [X,Y[L] cannot be generated. Of course, the other clause about term and the
This corresponds to the target concept. However, HYPER found a definition that start_clause predicate would have to be modified accordingly.
is not mutually recursive. By just requesting another solution (by typing a semi
colon as usual), HYPER continues the search and next finds a mutually recursive Learning predicate path(StartNode, Goa/Node, Path)
definition:
Figure 19.9 shows a domain definition for learning the predicate path in a directed
graph (specified by the background predicate link/2). The learning is accomplished
% Inclucing ode! and even length for lists smoothly, resulting in:
backliteral( even( L), [ L:list], [ ]). Hypotheses generated: 401
backliteral( odd( L), [ L:list], [ ]). 35
Hypotheses refined:
term( list, [XIL], [ X:item, L:list)). To be refined: 109
term( list, [ ], [ ]).
path(A,A,[A]).
prolog_predicate( fail). path(A,C,[A,BIE]) :-
start_clause([ odd( L) ] / [ L:list]). link(A,B),
start_clause([ even( L) ] / [ L:Iist]). path(B,C,[DIE]).
ex( even( [ ])). The last line of the induced definition may appear surpnsmg, but it is in this
ex( even( [a,b])). context, in fact, equivalent to the expected path(B.C,[BIE]) and requires one refine
ex( odd( [a])). ment step less. The fact that only 35 hypotheses were refined in this search may
ex( odd( [b,c,d])). appear rather surprising in the view of the following facts. The refinement depth of
ex( odd( [a,b,c,d,e])). the path hypothesis found above is 12. An estimate shows that the size of the
ex( even( [a,b,c,d])).
refinement tree to this depth exceeds 1o t7 hypotheses! Only a tiny fraction of this is
nex( even( [a])). actually searched. This can be explained by the fact that the hypothesis complete
nex( even( [a,b,c])).
ness requirement constrains the search in this case particularly effectively.
nex( odd( [ ])).
nex( odd( [a,b])).
nex( odd( [a,b,c,d])). Learning insertion sort
Figure 19.10 gives a definition for this learning problem. This definition invites
Figure 19.8 Learning about odd-length and even�length lists simultaneously. debate because background knowledge is very specifically targeted at inducing
514 Inductive Logic Programming Program HYPER 515
GB
ako( rectangle, X) :-
member( X, [a1,a2,a3,a4,a5,b1,b2,b3,b4,b5,c1,c2,c3]). % .--\ll rectangles
�
lj �
ako( triangle, c4). % Stable triangle
c2 ako( unstable_triangle, cS). % Triangle upside down
isa( Figure1, Figure2) :- % Figure I is a Figure2
ako( Figure2, Figurel).
isa( FigO, Fig) :-
ako( Figl, FigO),
���
� l:J l:J �
isa( Figl, Fig).
support(al,cl). support(bl,cl).
V �
support(a3,c3). support(b3,c3). touch(a3,b3).
Figure 19.11 A blocks world with two examples of an arch and three counter examples. support(a4,c4). support(b4,c4).
The blocks al, bl and cl form one example, the blocks a4, b4 and c4 form support(a5,c5). support(b5,c5).
the other example. The blocks a2, b2 and c2 form one of the counter
Figure 19.12 Learning the concept of arch.
examples.
518 Inductive Logic Programming References 519
hypothesis refinement
Figure 19.12 contd
refinement graphs over clauses or hypotheses
start_clause( [ arch(X, Y,Z)] / [X:object,Y:object,Z:object]). 0-subsumption
ex( arch(al,bl,cl)). automatic programming from examples
ex( arch(a4,b4,c4)).
nex( arch(a2,b2,c2)).
nex( arch(a3,b3,c3)). References
nex( arch(a5,b5,c5)).
nex( arch(al,b2,cl)). The term inductive logic programming was introduced by Stephen Muggleton (1991). The
nex( arch(a2,bl,cl)). early work in this area, before the term was actually introduced, includes Plotkin (1969),
Shapiro (1983) and Sammut and Banerji (1986). The HYPER program of this chapter is based on
Bratko (1999). The book by Lavrac and Dzcroski (1994) gives a good introduction to !LP.
Muggleton (1992) and De Raedt (1996) edited collections of papers on ILP. FOIL (Quinlan
1990) and Progol (Muggleton 1995) are among the best-known 11.P systems Bratko et al. ( l 998)
Summary
········ ············ ·
review a number of applications of [LP.
· · ·············· · · · ·· ································ ······················ ··· · · ······ -- ·········· · ···· · ········ · · · ··
• Inductive logic programming (!LP) combines logic programming and machine
Bratko, I. (1999) Refming complete hypotheses in !LP. /11{/uctive Logic Programmitzg; Proc. /LP-99
(Dzeroski, S. and Flach, P., eds). LNAf 1634, Springer.
learning. Bratko, I., Muggleton, S. and Karalic, A. (1998) Applications of inductive logic programming.
• !LP is inductive learning using logic as the hypothesis language. !LP is also an In: ,'vfachine Leanzi11g and Data Mining: Methods and Applicatio11.1 (Michalski, R.S., Bratko. I.
and Kubat, M., eels). Chichester: Wiley.
approach to automatic programming from examples.
De Raedt, I.. (ed.) ( 1996) Advances in /ruluctive Logic Progrwmnin:r Amsterdam: [OS Press.
• In comparison with other approaches to machine learning: (1) !LP uses a more Lavr,c, N. and Dzeroski, S. (1994) Inductive Logic Progrwnmirzg: Techniques wul Applications.
expressive hypothesis language that allows recursive definitions of hypotheses, Chichester: Ellis Horwood.
(2) !LP allows more general form of background knowledge, (3) !LP generally has Muggleton, S. (1991) Inductive logic programming. New Generation Computing 8: 295-31S.
greater combinatorial complexity than attribute-value learning . !vfuggleton, S. (ed.) (1992) /11ductive Logic Prograrr1111i11g. London: Academic Press.
• In a refinement graph over clauses, nodes correspond to logic clauses, and arcs
Muggleton, S. (1995) fnverse entailment and Progol. New Generation Computing 13: 245-286.
Plotkin, G. (1969) A note on inductive generalisation. fn: Meltzer, B. and Michie, D. (eds)
correspond to refinements between clauses. Machine Intelligence 5. Edinburgh University Press.
• In a refinement graph over h,vpotlzeses, nodes correspond to sets of logic clauses Quinlan, J.R. (1990) Learning logical definitions from relations. Machine Leaming 5: 239-266.
(Prolog programs), and arcs to refinements between hypotheses. Sammut, C. and Banerji, R. (1986) Learning concepts by asking questions. In: Michalski, R.S.,
• ..\ refinement of a clause (a hypothesis) results in a more specific clause
Carbonell, J. ancl Mitchell, T. (eds) Machine Leaming: An Artifrcial Intelligence Approach,
Volume I/. San Mateo, CA: Morgan Kaufmann.
(hypothesis). Shapiro, E. (1983) Algorit/1111ic Program Debugging. Cambridge, MA: MIT Press.
• A clause can be refined by: (1) matching two variables in the clause, or
(2) substituting a variable with a term, or (3) adding a literal to the body of the
clause.
• 0-subsumption is a generality relation between clauses that can be determined
syntactically based on substitution of variables.
• Program HYPER developed in this chapter induces Prolog programs from
examples by searching a refinement graph over hypotheses.
• Concepts discussed in this chapter are:
inductive logic programming
clause refinement
Common sense, qualitative reasoning and naive physics 521
chapter 20 7
\ : /
Table 20. l Examples of quantitative statements and their qualitative abstractions. Abstraction of increasing time sequences
Quantitative statement Qualitative statement
A whole table giving the values of Amowzt at consecutive time points between time O
Level(3.2 s) = 2.6 cm Level(tl) =zero .. top and 159.3 s may be abstracted into a single qualitative statement: The value of
Leve/(3.2 s) =2.6 cm Level(tl) = pos Amount in the time interval between start and md is between zero and fizll, and is
d/dt Level(3.2 s) = 0.12 m/s
increasing. This can be formally written as:
Level (tl) increasing
Amount=Level* (Level+ S.7) M .;. (Amount, Level) Arnount(start .. end) =zero .. full/inc
Time Amount Qualitative reasoning is related to quulitative modelling. Numerical models are an
0.0 0.00 Amowzt(start .. end) = zero .. top/inc
abstraction of the real world. Qualitative models are often viewed as a further
abstraction of numerical models. In this abstraction some quantitative information
0.1 0.02
is abstracted away. For example, a quantitative model of the water flow in a river
may state that the flow Flow depends on the level Level of water in the river in some
159.3 62.53 complicated way which also takes into account the shape of the river bee!. ln a
qualitative model this may be abstracted into a monotonically increasing relation:
r-.r + (Level, Flow)
Qualitative abstraction: Level at time tl is between the bottom and the top of the
bath tub. This may be written formally as: This just says that the greater the level the greater the flow, without specifying this
in any more concrete and detailed way. Obviously, it is much easier to design such
Lei·e/(tl) =zero .. top coarse qualitative models than precise quantitative models.
Notice here that 3.2 s has been replaced by a symbolic time point n. So instead of
giving exact time, this just says that there is a time point, referred to as tl, at which
Level has the given qualitative value. Regarding this qualitative value, the whole set 20.1 J Motivation and uses for qualitative modelling and reasoning
of numbers between 0 and 62.53 has been collapsed into a symbolic interval
zero .. top. A further abstraction would be to disregard the top of the tub as an This section discusses advantages and disadvantages of qualitative modelling with
important value, and simply state: Level at time tl is positive, written as: respect to the traditional, quantitative modelling. Of course, there are many
situations where a qualitative model, due to lack of precise numerical information.
Le,·el(tl) = pas is not sufficient. However, there are also many situations in which a qualitative
model has advantages.
First, qualitative modelling is easier than quantitative modelling. Precise relations
Abstraction of time derivatives into directions of change
among the variables in the system to be modelled may be hard or impossible to
Quantitative statement about the time derivative of Level: determine, but it is usually still possible to state some qualitative relations among
the variables. Also, even if a complete quantitative model is known, such a model
d
- Level(3.2 s) = 0.12 m/s still requires the knowledge of all the, possibly many, numerical parameters in the
( t1
model. For example, a numerical physiological model may require the precise
Qualitative abstraction: Level at time tl is increasing. electrical conductance of a neuron, its length and width, etc. These parameters
may be hard or impossible to measure. Yet, to run such a numerical model, a
numerical simulator will require the values of all these parameters to be specified by
Abstraction of functions into monotonic relations the user before the simulation can start. Usually the user will then make some
Quantitative statement: Amount= Level* (Level+ 5.7) guesses at these parameters and hope that they are not too far off their real values.
A qualitative abstraction: For Level> 0, Amount is a monotonically increasing But then the user will not know how far the simulation results are from the truth.
function of Level, written formally as: M�_(Amount, Level). That is, if Level increases The user will typically not know even if the obtained results are qualitatively correct.
then Amount increases as well, and vice versa. With a qualitative model, much of such guesswork can be avoided, and in the end
524 Qualitative Reasoning Qualitative reasoning about static systems 525
the user will at least be sure about the qualitative correctness of the simulations. So, the system do not change in time). In Section 20.3 we look at qualitative reasoning
paradoxically, quantitative results, although more precise than qualitative results, about dynamic systems, which also requires reasoning about changes in time. The
are in greater danger of being incorrect anc! completely useless, because the mathematical basis for the approach in the latter section consists of qualitative
accumulated error may become too gross. For example, in an ecological model, differential equations (QDE), an abstraction of ordinary differential equations.
even without knowing the precise parameters of growth and mortality rates, etc. for
the species in the model, a qualitative model may answer the question whether
certain species will eventually become extinct, or possibly different species will
interchange their temporal domination in time cycles. A qualitative simulator may 20.2 Qualitati ve reasoning about static systems
·········· ····· ············· · ···· · ······ · ···················· · ·························· · · · · ···· · ··· ·· ····· · · ··· · · · · ···· · ··········· · ·
· · ··
fine! such an answer by finding all the possible qualitative behaviours that corres
pond to all possible combinations of the values of the parameters in the model. Consider simple electric circuits consisting of switches, bulbs and batteries
Another point is that for many tasks, numerical precision is not required. Often it (Figure 20.2). Switches can be open or closed (off or on), bulbs can be light or dark,
only obscures the essential properties of the system. Generic tasks in which blown or intact. We are interested in questions related to prediction, diagnosis or
qualitative modelling is often more appropriate include functional reasoning, control. A diagnostic question about circuit 1 is: If the switch is on and the bulb is
diagnosis and structural synthesis. We will look at these tasks in the following dark, what is the state of the bulb? Simple qualitative reasoning suffices to see that
paragraphs. the bulb is blown.
Functional reasoning is concerned with questions like: How does a device or a A more interesting diagnostic question about circuit 2 is: Seeing that bulb 2 is
system work? For example: light and bulb 3 dark, can we reliably conclude that bulb 3 is blown? Qualitative
e
reasoning confirms that bulb 3 is necessarily blown. For bulb 2 to be light, ther
How does the thermostat work? must be a non-zero current in bulb 2, and switch 2 must be on. If there is a non-zero
How does a lock work? current in bulb 2, there must be a non-zero voltage on bulb 2. This requires that
How c!oes a clock work? switch 3 is off. The same non-zero voltage is on bulb 3. So with the same voltage,
How does the refrigerator attain its cooling fonction?
bulb 2 is light and bulb 3 is dark; therefore, bulb 3 must be blown.
How does the heart achieve its blood-pumping function7 In our qualitative model of these circuits, electric currents and voltages will just
In all these cases we are interested in the (qualitative) mechanism of how the system have qualitative values 'pos', 'zero' and 'neg'. The abstraction rule for converting a
works. If the numerical values of the parameters of the system change a little, real number X into a qualitative value is:
usually the basic functional mechanism is still the same. All hearts are a little
if X > 0 then pos
different, but the basic functional principle is always the same.
if X = 0 then zero
In a diagnostic task we are interested in defects that caused the observed abnormal
if X < 0 then neg
behaviour of the system. Usually, we are only interested in those deviations from the
normal state that caused a behaviour that is qualitatively different from normal. Sl Bl S2 B2
The problem of structural synthesis is: Given some basic building blocks, find their
combination which achieves a given function. For example, put the available SwVolt Bulb Volt
components together to achieve the effect of cooling. In other words, invent the S3
refrigerator from 'first principles'. The basic building blocks can be available
technical components, or just the laws of physics, or materials with certain proper
ties. In such design from first principles, the goal is to synthesize a structure capable
of achieving some given function through some mechanism. In the early, most
innovative stage of design, this mechanism is described qualitatively. Only at a later B3
stage of design, when the structure is already known, does quantitative synthesis Circuit l
also become important.
Circuit 2
The use of qualitative models requires qualitative reasoning. In the remainder of
this chapter we will discuss and implement some ideas for qualitative modelling and
reasoning. First, in Section 20.2, we look at static systems (where the quantities in Figure 20.2 Simple circuits made of switches, bulbs and batteries.
526 Qualitative Reasoning Qualitative reasoning about static systems 527
In standard, numerical models of electric circuits, we use some basic laws such as bulb( BulbState, Lightness, BulbVolt, Curr),
Kirchhoff's laws and Ohm's law. Kirchhoff's laws state that (1) the sum of all the qsum( SwVolt, BulbVolt, pos). % Battery voltage= pos
voltages along any closed loop in a circuit is 0, and (2) the sum of all the currents % A more interesting circuit made of a battery, three bulbs ancl
into any vertex in a circuit is 0. To apply these laws in a qualitative model, we need a % three switches
qualitative version of arithmetic summation. In our program, the usual arithmetic
summation X + Y = Z will be reduced to a qualitative summation, implemented as circuit2( Swl, Sw2, Sw3, Bl, BZ, B3, Ll, LZ, L3)
switch( Swl, V Swl, Cl),
the predicate:
bulb( Bl, Ll, VB!, Cl),
qsurn( X, Y, Z) switch( Sw2, VSwZ, CZ),
bulb( HZ, LZ, VBZ, CZ),
The qsurn relation can be defrned simply by a set of facts. These state, for example, qsum( VSw Z, VBZ, V3),
that the sum of two positive numbers is a positive number: switch( Sw3, V3, CSw3),
bulb( 83, L3, V3, CB3),
qsum( pos, pos, pos). qsum( V Swl, VB!, VI),
qsum( Vl, V3, pos),
The sum of a positive and a negative number can be anything:
qsum( CSw3, CB3, C3),
qsum( pos, neg, pos). qsum( CZ, C:3, Cl).
qsum( pos, neg, zero).
qsum( pos, neg, neg). % qsum( Ql, QZ, Q3):
'Xi Q3 = Ql -'- Q2, qualitative sum over domain [pos,zero,negl
This summation is 'non-deterministic'. Due to lack of precise information, lost
qsum( pos, pos, pos).
in the qualitative abstraction, we sometimes cannot tell what the actual result
qsum( pos, zero, pos).
of summation is. This kind of non-determinism is rather typical of qualitative qsum( pos, neg, pos).
reasoning. qsum( pos, neg, zero).
The program in Figure 20.3 specifies a qualitative model of our circuits, and qsum( pos, neg, neg).
carries out the qualitative reasoning about this model. A model of a circuit specifies qsum( zero, pos, pos).
the components of the circuit, and takes into account the connections among the qsum( zero, zero, zero).
qsum( zero, neg, neg).
qsum( neg, pos, pos).
qsum( neg, pos, zero).
% Modelling simple electric circuits
qsum( neg, pos, neg).
'¼, Q ualitative values of voltages and currents are: neg, zero, pos
qsum( neg, zero, neg).
'¾, Definition of switch qsurn( neg, neg, neg).
0
A, switch( SwitchPosition, Voltage, Current)
switch( on, zero, AnyCurrent). % Switch on: zero voltage Figure 20.3 Qualitative modelling program for simple rncuits.
switch( off, AnyVoltage, zero). % Switch off: zero current
% Definition of bulb
% bulb( BulbState, Lightness, Voltage, Current) components. The qualitative behaviour of the two types of component, switches
bulb( blown, dark, AnyVoltage, zero), and bulbs, is defined by the predicates:
bulb( ok, light, pos, pos).
switch( SwitchPosition, Voltage, Current)
bulb( ok, light, neg, neg).
bulb( BulbState, L ightness, Voltage, Current)
bulb( ok, dark, zero, zero).
Their qualitative behaviours are simple and can be stated by Prolog facts. For
% A simple circuit consisting of a bulb, switch and battery
example, an open switch has zero current and the voltage can be anything:
circuit!( SwitchPos, BulbState. Lightness)
switch( SwitchPos, SwVolt, Curr), switch( off, AnyVoltage, zero).
Qualitative reasoning about dynamic systems 529
528 Qualitative Rea,oning
A blown bulb is dark, has no current and any voltage: example, if bulb 1 is light, bulb 3 is dark, and switch 3 is off, what are the states of
the bulbs?
bulb( blown, dark, AnyVoltage, zero).
?- circuit2( _,_, off, Bl, B2, 83, light,_, dark).
An intact bulb is light unless both the voltage and current in the bulb are zero. Here Bl= ok
we are assuming that any non-zero current is sufficiently large to make a bulb light. B2= ok
The voltage and the current are either both zero, both positive, or both negative. B3= blown
Notice that this is a qualitative abstraction of Ohm's law:
Control-type question
Voltage= Resista nce , Curre nt
vVhat should be the control input to achieve the desired output? For example, what
Since Resistance is positive, Voltage and Current must have the same sign and should be the positions of the switches to make bulb 3 light, assuming all the bulbs
therefore the same qualitative value. intact?
Once our components have been defined, it is easy to define a whole circuit. A ?- circuit2( SwPosl, SwPos2, SwPos3, ok, ok, ok, _,_, light).
particular circuit is defined by a predicate, such as: SwPosl= on
circuit!( SwPos, BulbState, Lightness) SwPos2 = o n
Swl'os3= off;
Here the switch position, the state of the bulb and the lightness have been assumed SwPosl= o n
to be the important properties of the circuit, hence they were made the arguments of SwP os2= off
circuit!. Other quantities in the circuit, such as the current in the bulb, are not SwP os3= off
visible from the outside of predicate circuit!. The model of the circuit consists of
stating that these arguments have to obey:
(1) The laws of the bulb
Exercises
(2) The laws of the switch 20.1 Define the qualitative multiplication relation over signs:
(3) The (qualitative) Kirchhoff's law: switch voltage+ bulb voltage= battery voltage. qmult( A, B, C)
The physical connections between the components are also reflected in that the where C = A*B, and A, Band C can be qualitative values pos, zero or neg.
switch current is equal to the bulb current.
20.2 Define qualitative models of a resistor and a diode:
The model of circuit 2, although more complex, is constructed in a similar way.
Here are some usual types of questions that the program of Figure 20.3 answers resist or( Voltage, Current)
easily. diode( Voltage, Curren t)
The diode only allows current in one direction. In a resistor, the signs of Voltage and
Prediction-type question Current are the same. Define qualitative models of some circuits with resistors,
What will be the observable result of some 'input' to the system (switch positions), diodes and batteries.
given some functional state of the system (bulbs OK or blown). For example, what
happens if we turn on all the switches, and all the bulbs are OK?
?- circuit2(on,on,o n, ok,ok,ok,Ll,L2,L3).
Ll = light I 20 • 3 ·Qualitat
······ · ·················· ···············dynamic
·················g·· about
ive reasonin systems
········ · ····· ········ ·· ·· ············· ······························ ····· · · · ·
L2= dark
L3= dark In this section we consider an approach to qualitative reasoning about dynamic
systems. The approach considered here is based on the so-called qualitative
Diagnostic-type question differential equations (QDE). QDEs can be viewed as a qualitative abstraction of
Given the inputs to the system and some observed manifestations, what is the ordinary differential equations. To develop the intuition and basic ideas of this
system's functional state (normal or malfunctioning; what is the failure?). For approach, let us consider an example of filling the bath tub with an open drain
530 Qualitative Reasoning Qualitative reasoning about dynamic systems 531
7 between zero and top. We take that these values are all sufficiently similar am\
therefore qualitatively the same. So we write:
\ : /
'CJ
Level = zero ..top/inc
The next qualitative change occurs when the level stops increasing and become,
steady:
Level = zero ..top/std
Figure 20.4 Bath tub with open drain and constant input flow. This is the final qualitative state of the water level.
We will now formalize in more detail the qualitative reasoning indicated above.
First we defme a qualitative model of the bath tub system. The variables in the sys tern
(Figure 20.4). To begin with we will carry out informally some qualitative reasoning
are:
about this system. The variables we observe are: the flow into the tub, the flow out,
the amount of water in the tub, and the level of water. Level= level of water
Let the process start with an empty bath tub. The outflow at the drain depends on Amount= amount of water
the level of water: the higher the level, the greater the outflow. The inflow is Inflow= input flow
constant. Net flow is the difference between the inflow and outflow. Initially the Outflow= output flow
level is low, the inflow is greater than the outflow, and therefore the amount of Netflow= net flow (Netflow = lntlow - Outflow)
water in the tub is increasing. Therefore the level is also increasing, which causes the
outflow to increase. So at some time, the outflow may become equal to the inflow. For each variable we specify its distinguished values, called /a11dmarks. Typically we
According to a precise quantitative analysis, this only happens after a 'very long' include minus infinity (minf), zero and infinity (inf) among the landmarks. For
time (infinite). When this happens, both flows are at equilibrium and the level of Level, the top of the bath tub is also an important value, so we choose also to includl
water becomes steady. The (quantitative) time behaviour of the water level looks like it among the landmarks. On the other hand, as the level is always non-negative,
the one in Figure 20.5. there is no need for including minf among the landmarks for Level. The landmarks
The quantitative behaviour of the level in Figure 20.5 can be simplified into a are always ordered. So for Level we have the following ordered set of landmarks:
qualitative behaviour as follows. Initially, the level is zero and increasing. We choose zero < top < inf
to represent this as:
For Amount we may choose these landmarks:
Level = zero/inc
zero < full < inf
In the subsequent time interval, the level is between zero and top, and it is
increasing. We clo not qualitatively distinguish between the exact numerical values Now we define the dependences among the variables in the model. These depend
ences are callee! constraints because they constrain the values of the variables.
We will use some types of constraints typical of qualitative reasoning. One such
Level constraint states the dependence between Amount and Level: the greater the amount
of water, the greater the level. We write:
top
Mt( Amount, Level)
steady
In general the notation M i- (X,Y) means that Y is a monotonically increasing
increasing function of X: whenever X increases, Y also increases and vice versa. Mt(X,Y)
zero means that Y is a monotonically increasing function of Y such that Y(O) = 0. We say
Time that (0,0) is a pair of corresponding values for this M '" relationship. Another pair of
corresponding values for this M + relationship is (full,top). Notice that M i- (X,Y) is
Figure 20.5 The behaviour of water level in time. equivalent to M + (Y,X).
532 Qualitative Reasoning Qualitative reasoning about dynamic systems 533
The monotonically increasing constraint is very convenient and often greatly Table 20.2 Types of qualitative constraints.
alleviates the definition of models. By stating M0(Amount,Level), we just say that M'(X,Y) Y is a monotonically increasing function of X
the level will rise whenever the amount increases, and the level will drop whenever t.,r-(X,Y) Y is a monotonically decreasing function of X
the amount decreases. Notice that this is true for every container of any shape. If
sum(X,Y,ZJ Z= X+Y
instead we wanted to state the precise quantitative functional relation
mimts(X,Y) y= -X
Amount= f(Level) mult(X, Y,Z) Z= X•Y
deriv(X,Y) Y= dX/dt (Y is time derivative of X)
this would depend on the shape of the container, illustrated in Figure 20.6.
Qualitatively, however, the relation between the level and the amount is always
monotonically increasing, regardless of the shape of the container. So to define a some time point when the level reaches the top and water starts overflowing. All
qualitative model of the bath tub, we do not have to study the intricacies of the (possibly complicated) containers share this qualitative behaviour.
shape. This often greatly simplifies the modelling task. Our simplified, qualitative Similarly, the precise relation between Outflow and Level may be complicated.
model still suffices for reliably deriving some important properties of the modelled Qualitatively, we may simply state that it is monotonically increasing.
system. For example, if there is a flow into a container with no outflow, the amount The types of constraints we w ill be using in our qualitative models are shown in
LJ
will be increasing and therefore the level will be increasing as well. So there will be Table 20.2. In the bath tub model we have the following constraints:
L
M�( Amou nt, Level)
Amount
\[0( Level, Outflow)
s um ( Outflow, Netflow, Inflow)
deriv( Amount , Netflow)
n
I flow=constant = inflow/std
Level As usual, variable names start with capital letters, and constants start with lower case
letters.
\J L
Sometimes it helps to illustrate the constraints by a graph. The nodes of the graph
Amount correspond to the variables in the model; the connections among the nodes
correspond to the constraints. Figure 20.7 shows our bath tub model represented
by such a graph.
Now let us carry out some qualitative simulation reasoning using the model in
I r
Figure 20.7. Without causing ambiguity, we will use a somewhat liberal notation.
Level
Writing Amount=zero will mean: the qualitative value of Amount is zero. Writing
deriv
Amount ----► Nett1ow
Amount
�
MO 1,,.. C '""""'"'
Level
Level Outflow
+
M0
Figure 20.6 The precise relation between the amount and the level depends on the shape of
a container. However, the amount isalwaysa monotonically increasing function
of the !eve!. Figure 20.7 A graphical representation of the bath tub model.
534 Qualitative Reasoning Qualitative reasoning about dynamic systems 535
Amount=zero/inc will mean: the qualitative value of Amount is zero and it is cannot spend more than an instant at a landmark value. Therefore a transition from
increasing. We start with the initial condition: 'zero/inc' to 'zero/inc' is not possible.
The smoothness assumption is obviously reasonable in our bath tub system, at
Amount =zero
least when the level is between zero and top, so there is no water overflow. Given the
Due to the constraint Mt between Amount and Level (Figure 20.7), we infer: smoothness constraint, the next qualitative state of Level is:
Level=zero Level=zero ..top/inc
This is propagated through the other Mt constraint to infer: This value, and the constraints in the model of Figure 20.7 determine the qualitative
states of the other variables. So the next qualitative state of the system is:
Outflow =zero
Level = zero ..top/inc
Now the constraint Outflow+ Netflow=Inflow instantiates to: Amount=zero .. full/inc
zero+ Netflow =inflow Outflow=zero . .inflow/inc
Netflow=zero ..inflow/dee
This yields: Nettlow=inflow
Now consider the deriv constraint between Amount and Netflow, that is Netflow is What are the next possible qualitative states of Level7 There are now four possibilities:
equal to the time derivative of Amount. Since Netflow = inflow > zero, we infer that zero. top/inc
Amount must be increasing: zero .. top/std
Level= 1
Amount =zero/inc top/std
top/inc
Propagating this through the Mi;- constraints we have:
In the first case Level's qualitative value is the same as in the previous state. Then our
Level= zero/inc
Outflow =zero/inc model determines that the other variables also stay unchanged. So the previous
qualitative state description still holds, and there is no need to introduce a new
Now the constraint Outflow + Nettlow= Inflow instantiates to: qualitative state in this case. Notice that, as in this case, a qualitative state may last
zero/inc+ Netflow = inflow/std for a whole time interval.
The remaining three possible transitions correspond to three alternative beha
To satisfy this constraint, Nettlow has to be: viours of the system:
Netflow =inflow/dee (1) Level stops increasing and becomes steady before it has reached the top. The
Thus we have the complete initial qualitative state of the bath tub: constraints in Figure 20.7 dictate that the other variables also become steady
and there is no change from now on. So the final state of the simulation in this
Amount= zero/inc case is:
Level= zero/inc
Outt1ow=zero/inc Level =zero ..top/std
Netflow =inflow/dee Amount= zero ..full/std
Outflow=inflow/std
Now let us consider the possible transitions to a next qualitative state of the system. Netflow =zero/std
We assume that all the variables behave smoothly: their values change continuously
in time, and their time derivatives are continuous. Consequently, a variable that is Strictly, this steady state is only reached after infinite time, but that makes no
negative cannot become positive without first becoming zero. So a negative quantity difference in our qualitative description, because it does not take into account
can in the next time point either stay negative or become zero. Similarly, a variable the durations of time intervals.
that is increasing can either stay increasing or become steady. But it cannot (2) Level becomes steady exactly at the moment when it reaches the top. Although
instantaneously become decreasing, it has to become steady first. In other words, if this is theoretically possible, in reality this is an unlikely coincidence. Then all
a variables direction of change is 'inc' then it can either stay 'inc' or become 'std', the other variables become steady, and this is again a final state of the
but not 'dee'. Another constraint on possible transitions is that a changing variable simulation, similar to case 1.
536 Quaiitative Reasoning A qualitative simulation program 537
(3) The level reaches the top and is at this moment still increasing. Then water correspond( sum( DomI:zero, Dom2:L, Dorn2:L))
starts flowing over the top and our model of Figure 20.7 no longer holds. At qrnag( Dom2:L), L \==zero, not (L =_ .._). % Lis nonzero landmark in Dom2
this point there is a discontinuous change into a new 'operating region'. A new correspond( sum( VI, VZ, V3))
model would now be needed for this new operating region. Discontinuous correspond( VI, V2, V3). ,1, User-defined corr. values
0
Figure 20.8 shows a qualitative simulation program that executes QDE models along relative_qmag( Domain:MI, Domain:M2, Sign)
the lines of the previous section. The following paragraphs describe details of this landmarks( Domain, Lands),
compare_lands( Ml, M2, Lands, Sign), !.
implementation.
% qdir( Qdir, Sign):
% Qdir is qualitative direction of change with sign Sign
% An interpreter for Qualitative Differential Equations
qdir( dee, neg).
op( 100, xfx, .. ). qdir( std, zero).
op( 500, xfx, :). qciir( inc, pos).
QMag is a qualitative magnitude, which can be a landmark or the interval between two
Figure 20.8 contd
adjacent landmarks, written as Landl .. Land2. Dir is a direction of change whose
11/o simulate( SystemStates, l"vlaxLength): possible values are: inc, std, dee. Two example qualitative states of Outflow are:
% SystemStates is a sequence of states of simulated system
01<
flow: inflow/dee
, not longer than MaxLength flow: zero .. inflow/dee
simulate( [State], MaxLength)
( MaxLength = 1 A qualitative state of a system is the list of qualitative states of the system's variables.
% Max length reached
For example, the initial state of the bath tub system consists of the values of the four
not legal_trans( State, _) % No legal next state variables Level, Amount, Outflow and Netflow:
) , !.
( level:zero/inc, amount:zero/inc, flow:zero/inc, flow:inflow/dec)
simulate( [Statel,State2 I Rest], MaxLength)
MaxLength > 1, NewMaxL is MaxLength - 1, A qualitative behaviour is the list of consecutive qualitative states.
legal_trans( State 1, State2),
simulate( [State2 I Rest], NewMaxL).
'1/r, simulate( [nitialState, QualBehaviour, MaxLength) 20.4.2 Constraints
simulate( InitialState, [lnitialState I Rest], MaxLength)
legalstate( lnitialState), The program of Figure 20.8 implements three types of QDE constraints as the
% Satisfy system's model
simulate( (Initia!State I Rest], MaxLength). predicates: deriv( X, Y), sum( X, Y, Z), mplus( X, Y). The arguments X, Y and Z are all
qualitative states of variables. We look at each of these constraints in turn.
% compare_lands( X 1, X2, List, Sign):
Constraint deriv( X, Y): Y is qualitatively the time derivative of X. This is very
0;,
, if X 1 before X2 in List then Sign = neg
% if X2 before XI then Sign = pos else Sign = zero simple to check: the direction of change of X has to agree with the sign of Y.
Constraint mplus( X, Y): Y is a monotonically increasing function of X. Here X
compare__lands( Xl, X2, [First I Rest], Sign) and Y have the form Dx:QmagX/DirX and Dy:QmagY /DirY. First, the directions of
Xl = X2, !, Sign = zero
change have to be consistent: DirX = DirY. Second, the given corresponding values
Xl = First, !, Sign= neg have to be respected. The technique of checking this is based on 'relative qualitative
magnitudes' of X and Y (relative with respect to the pairs of corresponding values).
X2 = First, !, Sign= pos For example, the relative qualitative magnitude of level:zero .. top with respect to top
is neg. For each pair of corresponding values, the qualitative magnitudes of X and Y
compare_lands( Xl, X2, Rest, Sign). are transformed into the relative qualitative magnitudes. The resulting relative
qualitative magnitude of X has to be equal to that of Y.
Constraint sum( X, Y, Z): X -'- Y = Z, where all X, Y and Z are qualitative states of
variables of the form Domain:Qmag/Dir. Both the directions of change and the
20.4.1 Representation of qualitative states qualitative magnitudes have to be consistent with the summation constraint. First,
the consistency of directions of change is checked. For example,
The variables in the model can take qualitative values from domains. For example, inc + std = inc
Outflow and Netflow can have a value in terms of the landmarks from the domain
'flow', defined by the bath tub model. A domain is defined by its name and its is true, and
landmarks, for example: inc + std = std
landmarks( flow, ( minf, zero, inflow, inf]). is false. Second, the qualitative magnitudes must be consistent with summation. In
A qualitative state of a variable has the form: particular, they have to be consistent with respect to all the given corresponding
values among X, Y and Z. The following are three examples of qualitative
Domain: QMag/Dir magnitudes satisfying the sum constraint:
j
542 Qualitative Reasoning A qualitative simulation program 543
The sum constraint then means: Our simulator of Figure 20.8 can easily be used for running other models.
Figure 20.10 shows an electric circuit with two capacitors and a resistor. Figure
xO.;.. 6X + yO + 6Y = zO + 6Z
20.11 shows a qualitative model of this dynamic circuit and the corresponding
Since xO + yO = zO, it follows that 6X + 6Y = .6.Z. The signs of 6X, 6Y and .6.Z are initial state. In the initial state, the left capacitor has some initial voltage whereas
the relative qualitative magnitudes of X, Y 3:nd Z with respect to xO, yO and zO. These the right capacitor is empty. The query to start the simulation and the simulator's
relative qualitative magnitudes have to satisfy the relation qsurn. answer (slightly edited) are:
544 Qualitative Reasoning A qualitative simulation program 545
'¼, A bath tub model % Qualitative model of electric circuit with resistor and capacitors
landmarks( amount, [ zero, full, in!l). landmarks( volt, [minf. zero, v0, intl). % Voltage on capacitors
landmarks( level, [ zero, top, intl). landmarks( voltR, [minf, zero, v0, in!l). % Voltage on resistor
landmarks( flow, [ minf, zero, inflow, intl). landmarks( current, [minf, zero, intl).
correspond( amount:zero, level:zero). correspond( voltR:zero, current:zero).
correspond( amount:full, level:top).
legalstate( [ UCl, UC2, UR, CurrR])
legalstate( [ Level, Amount, Outflow, Nettlow]) sum( UR, UC2, UCl),
mplus( Amount, Level), mplus( UR, CurrR), % Ohm's law for resistor
mplus( Level, Outflow), deriv( UC2, CurrR),
Inflow= flow:inflow/std, % Constant inflow sum( CurrR, current:CurrCl, current:zero/std), % CurrCl - CurrR
sum( Outflow, Netflow, Inflow), % Netflow = Inflow - Outflow deriv( UCl, current:CurrCl). % CurrCl = cl/dt UCl
deriv( Amount, Netflow),
initial( [ volt:vO/clec, volt:zero/inc, voltR:vO/dec, current:zero .. inf/dee]).
not overflowing( Level). % Water not over the top
overflowing( level:top .. inf/_). % Over the top
Figure 20.11 A qualitative model of the circuit in Figure 20.10.
initial( [ level: zero/inc, % [nitial level
amount: zero/inc, % [nitial amount
flow: zero/inc, % [nitial outflow
flow: inflow/dee]). % [nitial inflow
Exercises
Figure 20.9 A qualitative model of bath tub. 20.3 There are two variables in the system: X and Y. Their (quantitative) time behaviours
have the form:
?- initial( 5), simulate( 5, Behaviour, 10). X(t) = ahsin(kht), Y(t) = a2*sin(k2*t)
Behaviour=
[ [volt:v0/clec,volt:zero/inc, .. ], al, a2, kl and k2 are constant parameters of the system, all of them greater than 0.
[volt:zero ..v0/clec, voit:zero ..v0/inc, ...], The initial time point is t0 = 0, so the initial qualitative state of the system is:
[volt:zero .. v0/stcl, volt:zero .. v0/stcl, ...] ] X(tO) = Y(t0) = zero/inc. (a) Give all the possible sequences of the first three
Basically this says that the voltage on capacitor Cl will be decreasing and the voltage qualitative states of this system. (b) Now suppose that there is a qualitative
on C2 will be increasing until both voltages become equal (the current in and constraint between X and Y: Mr\"(X,Y). Give all the possible sequences of the first
voltage on the resistor become zero). three qualitative states of the system consistent with this constraint.
20.4 A qualitative model of a system contains variables X, Y and Z, and the constraints:
UR
M0(X,Y)
R sum(X,Y,Z)
The landmarks for the three variables are:
L
X, Y: min£, zero, inf
UC! c,J1uc, Z: minf, zero, landz, inf
At time t0, the qualitative value of x is x(t0) = zero/inc. What are the qualitative
values Y(tO) and Z(t0)? What are the possible qualitative values of X, Y and Z
in the next qualitative state of the system which holds over time interval tO .. tl,
until the next qualitative change? After the next qualitative change, at time tl, what
Figure 20.10 An electric circuit with two capacitors and a resistor.
are the possible new qualitative values X(tl), Y(tl) and Z(tl)?
546 Qualitative Reasoning
Discussion of the qualitative simulation program 547
arbitrarily close to xl in the initial state. No matter how close the initial Xis to xl,
there is always another real number between the initial X and xl, and X will
first have to move to this number before reaching xl. How can the program in
Figure 20.8 be modifred to fix this deficiency? Hint: modify procedure legaUrans.
Container A Container B
In the bath tub case, non-determinism was due to the lack of information in the
0/o Model of block on spring
model. All the three generated behaviours were consistent with the model. Real bath
tubs are possible whose actual behaviours correspond to the three qualitative landmarks( x, [ minf, zero, inf]). 0/o Position of block
behaviours found by the simulator. So the simulator's results quite reasonably landmarks( v, [ minf, zero, vO, inf]). 01b Velocity of block
branch three ways. However, a more problematic kind of combinatorial branching landmarks( a, [ minf, zero, inf]). 0/o Acceleration of block
in QSIM-type simulation is also possible. This kind of simulation may sometimes correspond( x:zero, a:zero).
generate behaviours that do not correspond to any concrete quantitative instance of lega!state( [ X, V, A])
the qualitative model. Such behaviours are simply incorrect; they are inconsistent deriv( X, V ),
with the given qualitative model. Technically they are called spurious behaviours. deriv( V, A),
As an example consider a simple oscillating system consisting of a sliding block MinusA = a:_,
sum( A, MinusA, a:zero/stcl), % lvlinusA = -A
and a spring (Figure 20.13). We assume zero friction between the block and the
mplus( X, MinusA). % Spring pulling mass back
surface. Assume that initially at time tO the spring is at its 'rest length' (X = zero)
and the block has some initial velocity vO to the right. Then X will be increasing and initial( [ x:zero/inc, v:vO/std, a:zero/dec]).
the spring will be pulling the block back, causing negative acceleration of the block,
until the block stops and starts moving backwards. It will then cross zero with some Figure 20.14 A qualitative model of the block and spring system.
negative velocity, reach the extreme position on the left, and return to X = zero. As
there is no friction, we may expect that the block's velocity will be at that time vO
again. Now the whole cycle will be repeated. The resulting behaviour is steady [x:minf .. zero/inc, v:vO/inc, a:zero ..inf/dee]
[x:minf .. zero/inc, v:vO .. inf/inc, a:zero .. inf/dee]
oscillation. [x:zero/inc, v:vO .. inf/std, a:zero/dec]
Let us try to model this with our simulator of Figure 20.8. A quantitative model is:
Here the velocity has reached the initial velocity vO already before X became equal
cf2
X =A zero. At the time when X = zero, velocity is already greater than vO. This looks like a
dtz
physically impossible case: the total energy in the system has increased, and the
kX
A=- behaviour looks like an increasing oscillation. In the second branch, state 8 is
m
followed by this:
X is the position of the block, A is its acceleration, m is its mass, and k is the
[x:zero/inc, v:zero ..vO/std, a:zero/dec]
coefficient of the spring. An appropriate qualitative model is given in Figure 20.14. [x:zero ..inf/inc, v:zero ..vO/dec, a:minf ..zero/dee]
Let us execute this model from the initial state with: [x:zero .. inf/std, v:zero/dec, a:minf .. zero/std!
?- initial( S), simulate( S, Beh, 11).
Here the block has reached X = zero with velocity lower than v0. The total energy in
The generated behaviour Beh is as expected up to state 8: the system here is less than in the initial state, so this appears to be a decreasing
oscillation. In the third branch, state 8 is followed by:
[ x:minf .. zero/inc, v:zero ..vO/inc, a:zero .. inf/dee]
[x:zero/inc, v:vO/stcl, a:zero/decj
Here the behaviour branches three ways. In the first branch the behaviour continues etc.
as follows:
This corresponds to the expected case of steady oscillation.
The question is: Are the two unexpected behaviours only a consequence of lack of
information in the qualitative model, or is there a problem in the simulation
�
algorithm? It can be shown that, in fact, the model, although qualitative, contains
enough information to allow the steady oscillation only. Therefore the other two
behaviours, increasing and decreasing oscillations, are mathematically inconsistent
with the model. They are said to be spurious. The weakness is in the simulation
Figure 20.13 Sliding block and spring, no friction between block and support surface. algorithm. The immediate question is then: Why not quickly fix the bug in
550 Qualitative Reasoning
Summary 551
the algorithm? The difficulty is that this is not a simple bug, but a complicated general, but it is straightforward in the case of the block and sprin g system. We
computational problem: How to check qualitative behaviours against all the know that the energy in this system must be constant, because there is no loss of
constraints imposed by the model? QSIM-type simulation algorithm checks energy through friction, and there is no input of energy into the system. The total
the consistency of individual states, but not also their sequences as a whole. energy is the sum of kinetic and potential energy. So the energy conservation
Although improvements have been found that eliminate many spurious behavio urs,
constraint is:
a complete solution has not been discovered.
Given this drawback of the QSIM-type simulation, do we have any guarantees ½ mV 2 + ½kX 2 = const.
regarding the results of simulation? There is a theorem (Kuipers 1986) that QSIM is
guaranteed to generate all qualitative behaviours consistent with the model. S o m is the mass of the block, and k is the coefficient of the spring. A weaker v ersion of
incorrectness is limited to the opposite cases: QSIM may generate incorrect behav this constraint is sufficient for our purpose. Namely, a consequence of th e above
iours, those not consistent with the model (spurious behaviours). Figure 20.JS energy conservation constraint is: whenever X = 0, V 2 = vci 2 . An e quivalent
illustrates the relations between the abstraction levels involved: differential equa constraint is: if V = v0 then X = 0, and if X = 0 th en V = v0 or V = -v0. The
tions and their solutions at the quantitative level, and QDEs and qualitative following modification of the block and spring model of Figure 20.14 does not
behaviours generated by QSIM at the qualitative level. Those generated behaviours generate any spurious solutions:
that are not abstractions of any quantitative solutions are spurious.
The practical significance of the 'one sided' correctness of QSIM is the following. legalstate( [ X, V, A])
Suppose that a QSIM-type simulator has generated some qualitative behaviours from deriv( X, V),
our model. Now we know that this set is complete in the sense that there exists no deriv( V, A),
other behaviour of our modelled system that is not included among the results. We MinusA = a:_,
sum( A, MinusA, a:zero/std), 01c, MinusA = -A
know that nothing else can happen. However, we have no guarantee that all the '¼, Spring pulling block back
mplus( X, MinusA),
generated behaviours are in fact possible. energy( X, V). % Weak energy conservation constraint
One way to eliminate spurious behaviours is to add more constraints to the
model, whereby the correctness of the model should be preserved. This is not easy in energy( X, V) :-
V = v:v0/_, !, X = x:zero/_ % If V=v0, X must be zero
DEi
Summary
································································································ ··················· .....................
SI
DE2 S2 • 'Proper physics' solutions often comprise more numerical details than needed
for everyday purposes. Common sense, qualitative reasoning and 'naive physics'
DE3 S3
are therefore more appropriate in such cases.
Figure 20.15 Qualitative abstractions of differential equations and their solutions. QDE is the • Qualitativ e modelling and reasoning is usually viewed as an abstraction of
qualitative abstraction of three differential equations DE1, DE2 and DE3. S1, S2 quantitative modelling and reasoning. Numbers are reduced to their signs,
and S3 are respective solutions of the three differential equations. Qualitative symbolic values (sometimes called landmarks) or intervals. Time may be
behaviours 01, Q2 and 03 are generated as solutions of ODE. 01 is the simplified into symbolic time points and intervals. Time derivatives may
qualitative abstraction of S1, 02 is the abstraction of both S2 and S3. 03 is be simplified into directions of change (increasing, dec reasing or steady).
spurious because it is not the abstraction of the solution of any corresponding Concrete functional relationships may b e simplified into more vague ones, such
differential equation. as monotonic relationships.
552 Qualitative Reasoning
References 553
• Qualitative models are easier to construct than
quantitative models. Due to Jack descriptions at higher levels than qualitative differential equations (QDE), they are strongly
of numerical precision, qualitative models are
not always sufficient, but they related to QDEs. QDEs can be viewed as the underlying low-level formalism to which these
generally appropriate for the tasks of diagnosis, are
functional reasoning, and design higher-level descriptions can be compiled. de Kleer and Williams (1991) edited another special
from 'first principles'. issue on qualitative reasoning of the Artificial Intelligence journal. A useful selection of
• In qualitative reasoning, arithmetic operations important papers published prior to 1990 was edited by Weld and de Kleer (1990). Faltings
are reduced to qualitative
arithmetic. An example is qualitative summ and Struss (1992) is another collection of papers on qualitative reasoning. A specialized forum
ation over the signs pas, zero and
neg. Typically, qualitative arithmetic is non-d for rapid publication of ongoing research in qualitative reasoning is the annual Workshop on
eterministic.
• Qualitative differential equations (QDE) are
Qualitative Reasoning.
an abstraction of differential The qualitative simulation program for dynamic systems in this chapter is based on the
equations. QSIM is a qualitative simulation
algorithm for models defined by QSIM algorithm (Kuipers 1986) for QDE models. In the interest of simplicity, our program does
QDEs. The main underlying assumption in
QSIM-type simulation is that of not implement the full repertoire of qualitative constraints in QSIM, it does not introduce new
'smoothness': within the same 'operating regio landmarks during simulation, and it does not make explicit difference between time points and
n', the values of variables can
only change smoothly. time intervals. Makarovic (1991), and Sacks and Doyle (1991) analyze the difficulties of QSIM
• The difficulty of ensuring correctness of quali style simulation. Improvements that alleviate the problem of spurious behaviours, and some
tative simulation is asymmetrical other developments of QSIM, are described in Kuipers (1994). Further contributions to the
in the following sense. It is relatively easy
to ensure that all the behaviours treatment of spurious behaviours are Say (1998a, 1998b) and Say and Kuru (1993). Exercise 20.7
consistent with the given QDEs are, in fact,
generated by the simulator. On the was suggested by Cem Say (personal communication).
other hand, it is hard to ensure that only
the behaviours consistent with Of course, qualitative reasoning about physical systems does not have to be based on
the given QDEs are generated. This problem
is known as the problem of spurious differential equations or their direct abstraction. An approach that does not assume any
behaviours. underlying connection to differential equations was applied hy Bratko, Mozetic and Lavrac
• Concepts discussed in this chapter are: (1989) in the modelling of a complex physiological system. A model of the heart explaining the
relations between cardiac arrhythmias and ECG signals was defined in terms of logic-based
qualitative reasoning qualitative descriptions.
qualitative modelling Forbus and Falkenhainer (1992) explored an interesting idea of combining qualitative and
common sense and naive physics numerical simulation.
qualitative abstractions An important practical question is how to construct qualitative models, and can this be
landmarks automated. Automated construction of QDE models of dynamic systems from observed
qualitative arithmetic behaviours of the modelled systems was studied by Coiera (1989), Bratko, Muggleton and
qualitative summation Varsek (1991), Kraan, Richards and Kuipers (1991), Varsek (1991), Say and Kuru (1996),
qualitative differential equations (QDEs) and Hau and Coiera (1997). Small scale QDE-type models were synthesized from given
monotonic functional cC::nstraints behaviours in all these works. Mozetic (1987a; 1987b; also described in Bratko et al. 1989)
qualitative value, qualitative state of a variable synthesized by means of machine learning a substantial non-QDE-type qualitative model of the
qualitative simulation electrical behaviour of the heart.
the QSIM algorithm
smoothness assumption in qualitative simulation Bobrow, O.G. (ed.) (1984) Artificial Intelligence fournal, Vol. 24 (Special Volume on Qualitative
of dynamic systems
smooth qualitative transitions Reasoning about Physical Systems). Also appeared as: Qualitative Reasoning about P/1,vsical
spurious behaviours S,vstems. Cambridge, MA: MIT Press 1985.
Bratko, I., Mozetic, I. and Lavrac, N. (1989) KARDIO: a Stud,v in Deep and Qualitative Knowledge
(or Expert S,vstems. Cambridge, MA: MIT Press.
References Bratko, I., Muggleton, S. and Varsek, A. (1991) Learning qualitative models of dynamic systems.
Proc. Inductive Logic Programming ILP-91 (Brazdil, P., ed.), Viana do Castelo, Portugal. Also in:
Inductive Logic Programming (Muggleton, S., ed.). London: Academic Press 1992.
Bobrow (1984) edited a special issue of the Artificial Intelligence journal entitled Qualitative Coiera, E. (1989) Generating qualitative models from example behaviours. DCS Report No. 8901,
Reasoning about Physical Systems. Some papers in that volume became classical references in
School of Computer Sc. and Eng., Univ. of New South Wales, Sydney, Australia.
the qualitative reasoning field, such as the papers by de Kleer and Brown (1984) on confluences, de Kleer, J. and Brown, J.S. (1984) Qualitative physics based on confluences. Artificial Intelligence
and by Forbus on Qualitative Ph,vsics Theor,v. Although these papers are more interested in
fournal, 24: 7-83.
554 Qualitative Reasoning
de Kleer, J. and Williams, B.C. (eds) (1991) Artificial Intelligence Joumal, Vol. 51 (Special Issue on
Qualitative Reasoning about Physical Systems II).
Faltings, B. and Struss, P. (eds) (] 992) Recent Advances in Qualitative Physics. Cambridge, MA:
MIT Press.
Grammar Rules
Hau, D.T and Coiera, E.W. (1997) Learning qualitative models of dynamic systems. Machine
Learning Journal, 26: 177-211.
Kraan, J.C., Richards, B.L and Kuipers, B.J. (1991) Automatic abduction of qualitative models.
Proc. 5th Int. Workshop on Qualitative Reasoning about Physical Systems.
Kuipers, B._J. (1986) Qualitative simulation. Artificial Intelligence Journal, 29, 289-338 (also in 21.1 Grammar rules in Prolog 555
Weld and de Kleer (1990)).
Kuipers, B.J. (1994) Qualitative Reasoning: Modeling and Simulation with Incomplete Knowledge. 21.2 Handling meaning 563
Cambridge, MA: MIT Press.
Makarovic, A. (1991) Parsimony in Model-Based Reasoning. Enschede: Twente University, PhD 21.3 Defining the meaning of natural language 568
Thesis, ISBN 90-9004255-5.
Mozetic, L (1987a) Learning of qualitative models. In: Progress in Machine Learning (Bratko, I.
and Lavrac, N., eds) Wilmslow, UK: Sigma Press.
Mozetic, 1. (1987b) The role of abstractions in learning qualitative models. Proc. Fourth Int.
Workshop on Machine Learning, Irvine, CA: Morgan Kaufmann.
Sacks, E.P. and Doyle, _J. (1991) Prolegomena to any future qualitative physics. New Jersey:
ional extension called DCG (definite
Princeton University, Report CS-TR-314-91. Also appeared in Computational Jntell,gence Many Prolog implementations provide a notat
implement formal grammars in Prolog.
Journal. clause grammars). This makes it very easy to
table by Prolog as a syntax analyzer.
Say, A.C.C. (1998a) L'H6pital's filter for QSIM. IEEE Trans. Pattern Analysis and Machine A grammar stated in DCG is directly execu
ntics of a language so that the meaning
Intelligence, 20: 1-8. DCG also facilitates the handling of the sema
synta x. This chapter shows how DCG
Say, A.CC. (1998b) Improved infinity filtering in qualitative simulation. Proc. Qualitative of a sentence can be interleaved with the
x and meaning of non-trivial natural
Reasoning Workshop 98, Menlo Park, CA: AAA! Press. enables elegant definitions of the synta
an that admires a man that paints likes
Say, A.CC. and Kuru, S. (1993) Improved filtering for the QSIM algorithm. IEEE Trans. Pattern
language sentences, such as: 'Every wom
Analysis and Machine Intelligence, 15: 967-971.
Monet'.
Say, A.CC. and Kuru, S. (1996) Qualitative system identification: deriving structure from
behavior. Artificial Intelligence, 83: 75-141.
��..�?.1 ..i.0..?:.�.1.?.�........................................................................ols.........Such
........
Varsek, A. (1991) Qualitative model evolution. Proc. IJCAI-91, Sydney 1991.
Weld, D.S. and de Kleer, J. (1990) Readings in Qualitative Reasoning about Physical Systems, San 21.1 �.'.�.0}.0}.�:..
Mateo, CA: Morgan Kaufmann.
sets of sequences of symb a
Grammar is a formal device for defining
act, witho ut any practical meaning, or, more
sequence of symbols can be abstr
ramming language, or a whole program;
interestingly, it can be a statement in a prog
such as English.
it can be a sentence in a natural language
us-Naur form), which is commonly
One popular grammar notation is BNF (Back
langu ages. We will start our discussion by
used in the definition of programming
production rules. Here is a simple BNF
considering BNF. A grammar comprises
grammar of two rules:
(s) ::=ab
(s)::=a(s)b
555
556 Language Processing with Grammar Rules Grammar rules in Prolog 557
The first rule says: whenever the symbols appears in a string, it can be rewritten with Such a sequence triggers the corresponding sequence of steps performed by the
the sequence ab. The second rule says that s can be rewritten with the sequence a, robot. We will call a sequence of steps a 'move'. A move, then, consists of one step,
followed bys, followed by b. In this grammar, sis always enclosed by brackets ' ( ) '. or a step followed by a move. This is captured by the following grammar:
This indicates that sis a no n-terminal symbol of the grammar. On the other hand, a (move)::= (step)
and bare terminal symbols. Terminal symbols can never be rewritten. In BNF, the (move)::= (step)(move)
two production rules above are normally written together as one rule: (step)::= up
(s)::=a bja(s)b (step)::= down
But for the purpose of this chapter we will be using the expanded, longer form. We will use this grammar later to illustrate how the meaning can be handled within
A grammar can be used to generate a string of symbols, called a sen tence. The Prolog's grammar notation.
generation process always starts with some starting non-terminal symbol, sin our As shown earlier, a grammar generates sentences. In the opposite direction,
example. Then symbols in the current sequence are replaced by other strings a grammar can be used to recog nize a given sentence. A recognizer decides whether a
according to the grammar rules. The generation process terminates when the given sentence belongs to some language; that is, it recognizes whether the sentence
current sequence does not contain any non-terminal symbol. In our example can be generated by the corresponding grammar. The recognition process is
grammar, the generation process can proceed as follows. Start with: essentially the inverse of generation. In recognition, the process starts with the
given string of symbols, to which grammar rules are applied in the opposite
s direction to generation: if the current string contains a substring, equal to the
right-hand side of some rule in the grammar, then this substring is rewritten
Now by the second rule, s is replaced by:
with the left-hand side of this rule. The recognition process terminates successfully
as b when the complete given sentence has been reduced to the starting non-terminal
symbol of the grammar. If there is no way of reducing the given sentence to the
The second rule can be used again giving: starting non-terminal symbol, then the recognizer rejects the sentence.
In such a recognition process, the given sentence is effectively disassembled into
aasbb
its constituents; therefore, this process is often also called parsing . To implement a
Applying the first rule, a sentence is finally produced: grammar, normally means to write a parsing program for the grammar. We will see
that in Prolog such parsing programs can be written very easily. What makes this
aaa bbb particularly elegant in Prolog is a special grammar rule notation, called DCG
Obviously, our grammar can generate other sentences - for example, ab, aabb, etc. (definite clause grammar). Many Prolog implementations support this special
ln general, this grammar generates strings of the form a" b" for n = l, 2, 3 .... ,. The notation for grammars. A grammar written in DCG is already a parsing program
set of sentences, generated by a grammar, is called the language defined by the for this grammar. To transform a BNF grammar into DCG we only have to change
grammar. some notational conventions. Our example BNF grammars can be written in DCG as
Our example grammar is simple and very abstract. However, we can use follows:
grammars to define much more interesting languages. Formal grammars are used s--> [ a], [ b].
for defining programming languages and also subsets of natural languages. s--> [ a], s, [b].
Our next grammar is still very simple, but slightly less abstract. Suppose a robot move--> step.
arm can be sent sequences of commands: move--> step, move.
In Prolog implementations that accept the DCG notation, our transformed What is actually achieved by this conversion? Let us look at the move procedure. The
grammars can be immediately used as recognizers of sentences. Such a recognizer relation move has two arguments - two lists:
expects sentences to be essentially represented as difference lists of terminal
symbols. (Difference-list representation was introduced in Chapter 8.) So each move( List, Rest)
sentence is represented by two lists: the sentence represented is the difference is true if the difference of the lists List and Rest is an acceptable move. Example
between both lists. The two lists are not unique, for example: relationships are
aabb can be represented by lists [ a, a, b, b) and [) move( [up, down, up],[]).
or by lists [ a, a, b, b, c] and [ c]
or
or by lists [ a, a, b, b, 1, 0, 1] and [ 1, 0, 1)
move( [up, down, up, a, b, c],[ a, b, cl)
Taking into account this representation of sentences, our example DCG can be or:
asked to recognize some sentences by questions: move( [ up, down, up],[ down, up])
?- s( [ a, a, b, b],[)). i<J Recognize string aabb
0
Figure 21.1 illustrates what is meant by the clause:
yes move( Listl, Rest) :-
?- s( [ a, a, bl,[)). step( Listl, List2),
move( List2, Rest).
no
The clause can be read as:
?- move( I up, up, down],[)).
The difference of lists Listl and Rest is a move if
yes
the difference between Listl and List2 is a step and
?- move([up, up, left),[)). the difference between List2 and Rest is a move.
no
This also explains why difference-list representation is used: the pair ( Listl, Rest)
?- move( [up, X, up),[)). represents the concatenation of the lists represented by the pairs ( Listl, List2) and
X =up; ( List2, Rest). As shown in Chapter 8, concatenating lists in this way is much more
X = down; efficient than with cone.
no
move move( List!, Rest) :
Let us now explain how Prolog uses the given DCG to answer such questions. When step( Listi, List2),
Prolog consults grammar rules, it automatically converts them into normal Prolog move( List2, Rest).
clauses. In this way, Prolog converts the given grammar rules into a program for step move
,--A---..,----A----,
recognizing sentences generated by the grammar. The following example illustrates
this conversion. Our four DCG rules about robot moves are converted into four
clauses:
CJ iI
Rest
move( List, Rest)
step( List, Rest).
List2
move( Listl, Rest)
step( Listl; List2),
move( List2, Rest).
Listi
step( [up I Rest], Rest).
step( [down I Rest], Rest). Figure 21.1 Relations between sequences of symbols.
560 Language Processing with Grammar Rules Grammar rules in Prolog 561
Now we are ready to formulate more generally the translation between DCG and noun--> [ cats).
standard Prolog. Each DCG rule is translated into a Prolog clause according to the noun--> [ mice].
following basic scheme: let the DCG rule be: verb--> [ scare).
verb--> [ hate).
n --> nl, n2, ... , nn.
The grammar thus extended will generate the intended sentence. However, in
If all nl, n2, ... , nn are non- terminals then the rule is translated into the clause: addition, it will unfortunately also generate some unintended, incorrect English
n( Listl, Rest) :- sentences, such as:
nl( Listl, List2), [ the, mouse, hate, the cat)
n2( List2, List3),
The problem lies in the rule:
nn( Listn, Rest).
sentence--> noun_phrase, verb_phrase.
If any of nl, n2, ..., nn is a non-terminal (in square brackets in the DCG rule) then
This states that any noun phrase and verb phrase can be put together to form a
it is handled differently. It does not appear as a goal in the clause, but is directly
sentence. But in English and many other languages, the noun phrase and verb
inserted into the corresponding list. As an example, consider the DCG rule:
phrase in a sentence are not independent: they have to agree in number. Both have
n--> nl, [ t2], n3, f t4). to be either singular or plural. This phenomenon is called context dependence. A
phrase depends on the context in which it occurs. Context dependencies cannot be
where nl and n3 are non-terminals, and t2 and t4 are terminals. This is translated directly handled by BNF grammars, but they can be easily handled by DCG
into the clause: grammars, using an extension that DCG provides with respect to BNF - namely,
n( Listl, Rest) arguments that can be added to non-terminal symbols of the grammar. For example,
nl( List], [t21 List3J), we may add 'number' as an argument of noun phrase and verb phrase:
n3( List3, [t4 I Rest)).
noun_phrase( Number)
More interesting examples of grammars come from programming languages and verb_phrase( Number)
natural languages. In both cases they can be elegantly implemented using DCG.
Here is an example grammar for a simple subset of English: With this argument added we can easily modify our example grammar to force
number agreement between the noun phrase and verb phrase:
sentence--> noun_phrase, verb_phrase.
sentence( Number)--> noun_phrase( Number), verb_phrase( Number).
verb_phrase--> verb, noun_phrase.
verb_phrase( Number)--> verb( Number), noun_phrase( Numberl).
noun_phrase--> determiner, noun.
noun_phrase( Number)--> determiner( Number), noun( Number).
determiner --> [ a).
determiner--> f the). noun( singular)--> [ mouse).
[ the, cat, scares, a, mouse] sentence( Number)--> noun_phrase( Number), verb_phrase( Number).
[ the, mouse, hates, the, cat) is converted into:
[ the, mouse, scares, the, mouse]
sentence( Number, Listl, Rest)
Let us add nouns and verbs in plural to enable the generation of sentences like noun_phrase( Number, Listl, List2),
[ the, mice, hate, the, cats): verb_phrase( Number, List2, Rest).
562 Language Processing with Grammar Rules Handling meaning 563
?- sentence( plural,[ the, mouse, hates, the, cat], []). [ the, cat, scares, the, mice]
no
is parsed as shown by the parse tree of Figure 21.2. Some parts of the sentence are
?- sentence( Number,[ the, mouse, hates, the, cat],[]). called phrases - those parts that correspond to non-terminals in the parse tree. In our
Number = singular example, [ the, mice] is a phrase corresponding to the non-terminal noun_phrase;
[ scares, the, mice] is a phrase corresponding to verb_phrase. Figure 21.2 shows how
?- sentence( singular,[ the, What, hates, the, cat],[]). the parse tree of a sentence contains, as subtrees, parse trees of phrases.
What cat; Here now is a definition of parse tree. The parse tree of a phrase is a tree with the
What = mouse; following properties:
no
(1) All the leaves of the tree are labelled by terminal symbols of the grammar.
(2) All the internal nodes of the tree are labelled by non-terminal symbols; the root
Exercises of the tree is labelled by the non-terminal that corresponds to the phrase.
(3) The parent-children relation in the tree is as specified by the rules of the
21.1 Translate into standard Prolog the DCG rule: grammar. For example, if the grammar contains the rule
s --> [ a], s,[ b]. s --> p, q, r.
21.2 Write a Prolog procedure
that translates a given DCG rule into the corresponding Prolog clause.
� �
verb_ phrase
21.3 noun.phrase
/
One DCG rule in our grammar about robot moves is:
I
verb
/�
determiner noun noun.phrase
If this rule is replaced by
I I
the language, defined by the so-modified grammar, is the same. However, the
corresponding recognition procedure in Prolog is different. Analyze the difference.
What is the advantage of the original grammar? How would the two grammars the mice
handle the question:
?- move( [ up, left], []). Figure 21.2 The parse tree of the sentence 'The cat scares the mice'.
564 Language Processing with Grammar Rules Handling meaning 565
then the tree may contain the node s whose children are p, q and r: determiner( determiner( the)) --> [ the].
noun( singular, noun( cat))--> [ cat].
p
/ I�
q
noun( plural, noun( cats))--> [ cats].
When this grammar is read by Prolog it is automatically translated into a standard
Prolog program. The first grammar rule above is translated into the clause:
Sometimes it is useful to have the parse tree explicitly represented in the program to sentence( Number, sentence( NP, VP), List, Rest)
perform some computation on it - for example, to extract the meaning of a sentence. noun_phrase( Number, NP, List, RestO),
verb_phrase( Number, VP, Resto, Rest).
The parse tree can be easily constructed using arguments of non- terminals in the
DCG notation. We can conveniently represent a parse tree by a Prolog term whose Accordingly, a question to Prolog to parse a sentence has to be stated in this format;
functor is the root of the tree and whose arguments are the subtrees of the tree. For for example:
example, the parse tree for the noun phrase 'the cat' would be represented as:
?- sentence( Number, P arseTree, [ the, mice, hate, the, cat], [ ]).
noun_phrase( determiner( the), noun( cat)) Number = plural
To generate a parse tree, a DCG grammar can be modified by adding to each non P arseTree = sentence( noun_phrase( determiner( the), noun( mice)),
verb_phrase( verb( hate), noun_phrase( determiner( the),
terminal its parse tree as an argument. For example, the parse tree of a noun phrase
noun( cat))))
in our grammar has the form:
noun_phrase( DetTree, NounTree)
Here DetTree and NounTree are the parse trees of a determiner and a noun
21.2.2 From the parse tree to the meaning
respectively. Adding these parse trees as arguments into our noun phrase grammar
Prolog grammars are particularly well suited for the treatment of the meaning of a
rule results in the modified rule:
language, in particular in natural languages. Arguments that are attached to non
noun_phrase( noun_phrase( DetTree, NounTree))--> terminal symbols of a grammar can be used to handle the meaning of sentences.
determiner( DetTree), noun( NounTree). One approach to extract the meaning involves two stages:
This rule can be read as: (1) Generate the parse tree of the given sentence.
A noun phrase whose parse tree is noun_phrase( DctTrec, NounTree) consists of: (2) Process the parse tree to compute the meaning.
• a determiner whose parse tree is DetTree, and Of course this is only practical if the syntactic structure, represented by the parse
• a noun whose parse tree is NounTree. tree, also reflects the semantic structure; that is, both the syntactic and semantic
decomposition have similar structures. In such a case, the meaning of a sentence can
We can now modify our whole grammar accordingly. To ensure number agreement,
be composed from the meanings of the syntactic phrases into which the sentence
we can retain the number as the first argument and add the parse tree as the second
has been parsed.
argument. Here is part of the modified grammar:
A simple example will illustrate this two-stage method. For simplicity we will
sentence( Number, sentence( NP, VP))--> consider robot moves again. A grammar about robot moves that generates the parse
noun_phrase( Number, NP), tree is:
verb_phrase( Number, VP).
move( move( Step))--> step( Step).
verb_phrase( Number, verb_phrase( Verb, NP))--> move( move( Step, Move))--> step( Step), move( Move).
verb( Number, Verb),
noun_phrase( Numberl, NP). step( step( up))--> [ up).
step( step( down))--> [ down).
noun_phrase( Number, noun_phrase( Det, Noun))-->
determiner( Det), Let us define the meaning of a move as the distance between the robot's position
noun( Number, Noun). before the move and after it. Let each step be 1 mm in either the positive or negative
566 Language Processing with Grammar Rules Handling meaning 567
move
distance = I distance = I
ls it possible to use this grammar and the meaning procedure for the inverse task;
/�
that is, to find a move with the given distance - for example:
?- move( Tree, Move, []), meaning( Tree, 5).
up step move
I
Discuss this application.
/�
up step move 21.2.3 Interleaving syntax and semantics in DCG
down
I
step
The DCG notation actually makes it possible to incorporate the meaning directly
into a grammar, thereby avoiding the intermediate construction of the parse tree.
There is another notational extension, provided by DCG, that is useful in this
respect. This extension allows us to insert normal Prolog goals into grammar rules.
"-v-J
up up down up up Such goals have to be enclosed in curly brackets to make them distinguishable from
I +�
I 2 other symbols of the grammar. Thus everything enclosed in curly brackets is, when
encountered, executed as normal Prolog goals. This execution may, for example,
Figure 21.3 Extracting the meaning of a move as its distanc involve arithmetic computation.
e.
This feature can be used to rewrite our robot move grammar so that the meaning
direction. So the meaning of the move 'up up down extraction is interleaved directly with the syntax. To do that, we have to add the
up' is 1 + 1 - 1 + 1 = 2. The meaning; that is, the move's distance as an argument of the non-terminals move and
distance of a move can be computed from the
move's parse tree as illustrated in step. For example
Figure 21.3. The corresponding arithmetic is:
distance('up up down up') = move( Dist)
distance('up') + distance('up down up') = 1 + 1 = 2 now denotes a move phrase whose meaning is Dist. A corresponding grammar is:
The following procedure computes the meaning move( D) --> step( D).
of a move (as the corresponding
distance) from the move's parse tree: move( D) --> step( DI), move( DZ), {D is DJ + DZ}.
'¾, meaning( ParseTree, Value) step( 1) --> [ up].
meaning( move( Step, Move), Dist) step( -1) --> [ down].
meaning( Step, DI),
meaning( Move, DZ), The second rule here exemplifies the curly bracket notation. This rule can be read as
Dist is DI + DZ. follows:
meaning( move( Step), Dist) A move whose meaning is D consists of:
meaning( Step, Dist).
• a step whose meaning is Dl, and
meaning( step( up), 1).
• a move whose meaning is DZ, where the relation D is Dl + DZ must also be
meaning( step( down), -1).
satisfied.
This can be used to compute the meaning of the move
'up up down up' by: In fact, handling semantics by incorporating the meaning formation rules directly
?- move( Tree, [ up, up, down, up], []), meaning( Tree,
Dist). into a DCG grammar is so convenient that the intermediate stage of constructing
Dist= 2 the parse tree is often avoided altogether. Avoiding such an intermediate stage
Tree= move( step( up), move( step( up), move( step( results in a 'collapsed' program. The usual advantages of this are a shorter and more
down), move( step( up)))) )
568 Language Processing with Grammar Rules
Defining the meaning of natural language 569
efficient program, but there are also disadvantages: the collapsed program may be
the target representation of the meaning extraction process would be a language for
less transparent, less flexible and harder to modify.
querying and updating the database.
As a further illustration of this technique of interleaving the syntax and meaning,
Logic has been accepted as a good candidate for representing the meaning of
let us make our robot example a little more interesting. Suppose that the robot can
natural language sentences. In comparison with database formalisms, logic is in
be in one of two gears: gl or g2. When a step command is received in gear gl, the
general more powerful and, while essentially subsuming database formalisms, also
robot will move by 1 mm up or down; in gear g2 it will move by 2 mm. Let the whole
allows more subtle semantic issues to be dealt with. In this section we will show how
program for the robot consist of the gear commands (to switch to gear gl or g2) and
interpretations of simple natural language sentences in logic can be constructed
step commands, finally ending with stop. Example programs are:
using the DCG notation. The logical interpretations will be encoded as Prolog terms.
stop We will only look at some interesting ideas, so many details necessary for a more
gl up up stop general coverage will be omitted. A more complete treatment would be far beyond
gI up up g2 down up stop the scope of this book.
gl gl g2 up up gl up down up g2 stop To start with, it is best to look at some natural language sentences and phrases
and try to express in logic what they mean. Let us consider first the sentence 'John
The meaning (that is, the distance) of the last program is: paints'. The natural way to express the meaning of this sentence in logic, as a Prolog
Dist = 2 , ( 1 + 1) + 1 * (1 - 1 + l) = 5 term, is:
To handle such robot programs, our existing robot move grammar has to be paints( john)
extended with the following rules: Notice that 'paints' here is an intransitive verb and therefore the corresponding
prog(0) --> [ stop]. predicate paints only has one argument.
prog( Dist)--> gear(_ ), prog( Dist). Our next example sentence is 'John likes Annie'. The formalized meaning of this
prog(Dist)--> gear( G), move(D), prog(Distl), /Dist is G * D + Distl}. can be:
gear( 1)--> [ gl). likes( john, annie)
gear( 2) --> [ g2).
The verb 'likes' is transitive and accordingly the predicate likes has two arguments.
Let us now try to define, by DCG rules, the meaning of such simple sentences. We
21.3 ··Defining the meaning of natural language
·· ·· ···· · · ·· ··
will start with the bare syntax and then gradually incorporate the meaning into
· · -- · · · · ·················································································································· these rules. Here is a grammar that comfortably covers the syntax of our example
sentences:
21.3.1 Meaning of simple sentences in logic sentence--> noun_phrase, verb_phrasc.
noun_phrase --> proper_noun.
Defining the meaning of natural language is an extremely difficult problem that is verb_phrase--> intrans_verb_, 'Yc, Intransitive verb
the subject of on-going research. An ultimate solution to the problem of formalizing verb_phrase--> trans_verb, noun_phrase. % Transitive verb
the complete syntax and meaning of a language like English is far away. But
intrans_ verb - -> [ paints).
(relatively) simple subsets of natural languages have been successfully formalized trans_verb--> [ likes].
and consequently implemented as working programs.
In defining the meaning of a language, the first question is: How will the proper_noun--> [ john].
proper_noun--> [ annie).
meaning be represented7 There are of course many alternatives, and good choice
will depend on the particular application. The important question therefore is: Now let us incorporate meaning into these rules. We will start with the simpler
What will the meaning extracted from natural language text be used for? A typical categories - noun and verb - and then proceed to the more complicated ones. Our
application is natural language access to a database. This involves_ answering natural foregoing examples suggest the following definitions. The meaning of proper noun
language questions regarding information in the database and updating the john is simply john:
database by new information extracted from natural language input. In such a case,
proper_noun( john)--> [ john).
S70 Language Processing with Grammar Rules
Defining the meaning of natural language 571
paints( X) sentence( S) --> noun_phrase( NP), verb_phrase( VP), \compose( NP, VP, S)}.
where X is a variable whose value only becom The goal compose( NP, VP, S) has to assemble the meanings of the noun phrase john
es known from the context; that and the verb phrase paints( X). Let us say that Xis the actor in paints( X). Let us define
from the noun phrase. Correspondingly, the is,
DCG rule for paints is: the relation
intrans_verb( paints( X)) --> [ paints].
actor( VP, Actor)
Let us now look at the question: How can we
construct from the two meanings, john so that Actor is the actor in the meaning VP of a verb phrase. One clause of the
and paints( X), the intended meaning of the
whole sentence: paints( john)? We have procedure actor then is:
to force the argument X of paints to becom
e equal to john.
It may be helpful at this point to consider actor( paints( X), X).
Figure 21.4. This shows how the
meanings of phrases accumulate into the
meaning of the whole sentence. To
achieve the effects of the propagation Once the actor relation is available, the composition of the noun phrase's and verb
of the meanings of phrases, we can
first simply define that noun_phrase and phrase's meanings can be defined as:
verb_phrase receive their meanings from
proper_noun and intrans_verb respectivel 0/t, Meaning of sentence is VP where
y: compose( NP, VP, VP)
noun_phrase( NP)--> proper_noun( NP). actor( VP, NP). '¼, actor in VP is NP
verb_phrase( VP)--> intrans_verb( VP). This works, but there is a shorter method. We can avoid the extra predicates compose
and actor if we make the argument X in term paints( X) 'visible' from the outside of
the term, so that it becomes accessible for instantiation. This can be achieved by
paints(john) redefining the meaning of vcrb_phrase and intrans_verb so that X becomes an extra
argument:
intrans_verb( Actor, paints( Actor))--> [ paints].
sentence
verb_phrase( Actor, VP) --> intrans_verb( Actor, VP).
john
/
noun.phrase verb.phrase paints( X)
This makes the argumen, �ctor easily accessible and facilitates a simpler definition of
the meaning of a sentence:
This forces the 'actor' argument of the verb's meaning to become equal to the
john proper.noun meaning of the noun phrase.
intrans_ verb paints( X)
This technique of making parts of meanings visible is a rather common trick
in incorporating meaning into DCG rules. The technique essentially works as
follows. The meaning of a phrase is defined in a 'skeleton' form - for example,
john paints paints( SomeActor). This defines the general form of the meaning, but leaves some of
it uninstantiated (here, variable SomeActor). Such an uninstantiated variable serves
as a slot that can be filled later depending on the meaning of other phrases in the
context. This filling of slots can be accomplished by Prolog matching. To facilitate
Figure 21.4 The parse tree of the sentence 'John
pain_t,s:,with meaning attached to the this, however, slots are made visible by being added as extra arguments to non
nodes. The logical meaning of each phrase is attached
to the corresponding terminals. We will adopt the following convention regarding the order of these
non-terminal in the tree. The arrows indicate how the
meanings of phrases arguments: first will come the 'visible slots' of the phrase's meaning, followed by the
accumulate.
meaning itself - for example, verb_phrase( Actor, VPMeaning).
570 Language Processing with Grammar Rules
Defining the meaning of natural language 571
The meaning of an intransitive verb like 'paints' is slightly more complicated. It can It remains to define the meaning of the whole sentence. Here is a first attempt:
be stated as
sentence( S) --> noun_phrase( NP), verb_phrase( VP), {compose( NP, VP, S)}.
paints( X)
The goal compose( NP , V:P , S) has to assemble the meanings of the noun phrase john
where X is a variable whose value only becomes known from the context; that is, and the verb phrase paints( X). Let us say that Xis the actor in paints( X). Let us define
from the noun phrase. Correspondingly, the DCG rule for paints is: the relation
intrans_verb( paints( X)) --> [ paints).
actor( VP, Actor)
Let us now look at the question: How can we construct from the two meanings, john so that Actor is the act-or in the meaning VP of a verb phrase. One clause of the
and paints( X), the intended meaning of the whole sentence: paints( john)? We have procedure actor then is:
to force the argument X of paints to become equal to john.
It may be helpful at this point to consider Figure 21.4. This shows how the actor( paints( X), X).
meanings of phrases accumulate into the meaning of the whole sentence. To
Once the actor relation_is available, the composition of the noun phrase's and verb
achieve the effects of the propagation of the meanings of phrases, we can phrase's meanings can j:)e defined as:
first simply define that noun_phrase and verb_phrase receive their meanings from
proper_noun and intrans_verb respectively: compose( NP, VP, VP)�: '¼, Meaning of sentence is VP where
actor( VP, NP). % actor in VP is NP
noun_phrase( NP)-- > proper_noun( NP).
verb_phrase( VP) --> intrans_verb( VP). This works, but there is a shorter method. We can avoid the extra predicates compose
and actor if we make the argument X in term paints( X) 'visible' from the outside of
the term, so that it becomes accessible for instantiation. This can be achieved by
paints(john) redefining the meaning of verb_phrase and intrans_verb so that X becomes an extra
argument:
intrans_verb( Actor, paints( Actor))--> [ paints].
sentence
verb_phrase( Actor, VP)--> intrans_verb( Actor, VP).
/ �
This makes the argument Actor easily accessible and facilitates a simpler definition of
the meaning of a sentence:
t
john nomLphrase verb_ phrase paints(X)
sentence( VP)--> noun_phrase( Actor) , verb_phrase( Actor, VP).
Sentences that contain determi Both variable s Property and Assertion are slots for meanings from the context to be
ners such as 'a' are much mor plugge d in. To fac ilitate the importation of the meanings from other phrase s in
tha n those in the previous se e difficult to handle
ction. Let us consider an exam
paints'. It would now be a gros ple sentence : 'A man context, we can, as explained in the previous section, make pa rts of the meaning of
s mistake to think that the mea
paints( man). The sentence rea ning of this sentence is determiner ' a' visible . A suitable DCG rule for determiner 'a' is:
lly says: There exists some ma
this is phrased as: n that paints. In logic
determiner( X, Prop, Assn, exists( X, Prop and Assn))--> [ a).
There exists an X such that The logical meaning of the tiny determiner ' a' may seem surprisingly complicated .
X is a man and X paints. Another determiner , 'every', can be handled in a similar w ay. Consider the
sentence: 'Every woman dances'. The logic interpretation of this is:
In logic, the variable X here
is said to be existentially qua ntifi
will choose to represent this by ed ('there exists'). We For all X,
the Prolog term: if X is a woman then X dance s.
exists( X, man( X) and paints( X))
We will repre sent this by the following Prolog term:
The first argument in this term all( X, woman( X) => dances( X))
is a variable, X - the variable
existe ntially qua ntifi that is meant to be
ed. a nd is assumed to
be declared as an infix operator: where '=>' is an infix operator denoting logical implication. D eterminer 'every' thus
-: op( 100, xfy, and). indicate s a meaning whose skeleton structure is:
phrase( X, Assn).
finally to the meaning of the whole sentence. We can get a first idea by looking sentence( S) --> noun_phrase( X, Assn, S), verb_
again at the sentence 'A man paints' whose meaning is: Prop, Assn, S), noun( X, Prop).
noun_phrase( X, Assn, S)--> determiner( X,
exists( X, man( X) and paints( X)) .
verb_phrase( X, Assn)--> intrans_verb( X, Assn)
We have already defined the meaning of 'a' as: intrans_verb( X, paints( X))--> [ paints].
Assn))--> [ a].
determiner( X, Prop, Assn, exists( X, Prop and
exists( X, Prop and Assn)
noun( X, man( X)) --> [ man].
Comparing these two meanings it is immediately obvious that the main structure of
the meaning of 'A man paints':
the meaning of the sentence is dictated by the determiner. The meaning of the This grammar can be asked to construct
sentence can be assembled as illustrated by Figure 21.5: start with the skeleton ?- sentence( S, [ a, man, paints],[]).
meaning enforced by 'a':
S = exists( X, man( X) and paints( X))
exists( X, Prop and Assn) a Prolog-generated variable name like
Prolog's answer was polished by replacing
Prop then becomes instantiated by the noun and Assn by the verb phrase. The 123 with X.
les sentences like 'John paints'. Now
main structure of the meaning of the sentence is received from the noun phrase. The grammar of the previous section hand
need to ensure that this new grammar that
Notice that this is different from the simpler grammar in the previous section where that we have modified our grammar we
le the simpler 'John paints'. To do this, the
it was the verb phrase that provided the sentence's meaning structure. Again, can handle 'A man paints' can also hand
porated into the new noun phrase. The
applying the technique of making some parts of the meaning visible, the relations meaning of proper nouns has to be incor
between meanings indicated in Figure 21.5 can be stated by the following following rules accomplish this:
DCG rules:
proper_noun( john)--> [ john].
n( X).
noun_phrase( X, Assn, Assn)--> proper_nou
exists( X, man ( X) and paints( X))
ing of this kind of noun phrase is the
""'¼=-���-.... - This last rule simply says that the whole mean
' slot' - that is, Assn. This slot's value is
/ •mw" same as the value in the second 'visible
phrase) as in the question:
'� •� obtained from the context (from the verb
.
/,,.-
(X man(X)andAssn)
....-------
/
// - -7--/noun_phrase
/ - �,"
verb_phrase
\
\\
\
paints( X)
?- sentence( S, [ john, paints],[]).
S = paints( john)
/
\
\
' /
T
' d,<e,rn("" \1.
21.5
( '
, ""dA;s") i/ I .
· ( X Prop """" I ma"( Xl f( '""'"' ",h iil paints( X) ructs the meaning of 'John paints'.
"'"' . .
' Study how our modified grammar const
the meaning received from the verb phras
e
, I Essentially the following should happen: sente nce's
1' t
!
,
t! becomes the noun phrase's meaning, whic
h eventually becom es the
i I
a man
_
1!
l
ll. paints l
I meaning.
',\�\4;:;_-,,.s\>.il,,�O::tNt"�>l'.,.,.<it<:,�,.-m;,:,;;,,�,)ip•/,•"'-,._, '--=N•""''"W-�
Figure 21.5 Accumulation of meaning for the sentence 'A man paints' Determiner 'a' dictates the overall
form of the meaning of the sentence. Arrows indicate the importation of meaning between Nouns can be qualified by relative clauses - for example, 'Every man that paints
phrases. admires Monet'. The phrase 'every man that paints' is a noun phrase in which 'that
...
576 Language Processing with Grammar Rules
Defining the meaning of natural language 577
like: 'Does Annie admire anybody who admires Monet?' The answer to this logically ,admires, a,man ,that,paints]
S = [monet,likes,every,woman,that
follows from the sentences above, and we just have to make our program do some to
suggest a modification of the grammar
necessary reasoning. In general, we need a theorem prover to deduce answers from Explain how this was obtained, and variable
the grammar allows a quantified
the meanings represented in logic. Of course, it is most practical to simply use Prolog prevent this. Hint: The problem is that d.
er noun (monet); this can easily be prevente
itself as such a theorem prover. To do that we would have to translate the logical (e.g. X in all(X, ...)) to match a prop
meanings into equivalent Prolog clauses. This exercise in general requires some ectives
handle composite sentences with conn
work, but in some cases such a translation is trivial. Here are some easily translatable 21.8 Extend the grammar of Figure 21.6 to
etc. For example: John pain ts and Ann ie sings.
meanings written as Prolog clauses: 'if', 'then', 'and', 'or', 'neither', 'nor',
s.
If Annie sings then every teacher listen
paints( j o hn).
adm ires(X, monet)
man(X),
pa ints( X). Project
directly
admires( a nnie,X) represent the meaning of sentences as
Modify the grammar of Figure 21.6 to ence s in
man(X),
ram that reads natural language sent
paints( X). executable Prolog clauses. Write a prog 6) and asser ts
a relevant program in Chapter
normal text format (not as lists; see le simp le ques tion s in
nd the grammar to hand
The example query 'Does Annie admire anybody who admires Monet?' would have their meaning as Prolog clauses. Exte a small
to be translated into the Prolog query: h wou ld resul t in a complete conversation system for
natural language whic of the gram mar to gene rate
also cons ider use
?- admires( annie, X), admires( X, monet). subset of natural language. You may
natural language answers to the user.
X = j o hn sentences with the given meaning as
Exercises
Summ ary · ·· · · · ·· ·· · · ·· · ··· · ··· ··· ····· · ···· ·· · · · ·· · · · · · · · · · · ····
· · · ·· · · · ·· ····· · ··· · ···· · · · · · · ····· · ········ · ·· · · · ·· ·· ········ · ···· · ·· · ··· · · ·· · · · ·
21.6 State in logic the meaning of the sentences:
BNF, can be trivially translated into
the
• Standard grammar notations, such as DCG can be read and
(a) Mary knows all important artists. mars). A gram mar in
DCG notation (definite clause gram the
(b) Every teacher who teaches French and studies music understands Chopin. direc tly by Prol og as a reco gnizer for the language defined by
executed
(c) A charming lady from Florida runs a beauty shop in Sydney. grammar.
..
580 Language Processing with Grammar Rules
581
The minimax principle 583
582 Game Playing
T
initial position
chess. The rules also determine what is the outcome of the game that has ended in 'I'
this terminal position.
1 ply
l
Such a game can be represented by a same tree. The nodes in such a tree
~ 30 successor positions
1
correspond to situations, and the arcs correspond to moves. The initial situation of 2 ply
the game is the root node; the leaves of the tree correspond to terminal positions.
ln most games of this type the outcome of the game can be win, loss or draw. We 80 ply=
40 moves
will now consider games with just two outcomes: win and loss. Games where a draw ~ 30 x 30 � 1000 positions
is a possible outcome can be reduced to two outcomes: win, not-win. The two players i
will be called 'us' and 'them'. 'Us' can win in a non-terminal 'us-to-move' position if
there is a legal move that leads to a won position. On the other hand, a non-terminal
'them-to-move' position is won for 'us' if all the legal moves from this position 0 .............. ······· .... 0 ~ 100040 positions
Much de pends on the evaluation function, which, in most games of interest, has
to be a heuristic estimator that estimates the winning chances from the point of v(P)
view of one of the players. The higher the val ue the higher the player 's chanc es are
and the backed-up value by:
to win, and the lower the valu e the higher the o pponent's chanc es are to win. As one
o f the players will tend to a chi eve a high position valu e, and the other a low value, V(P)
ion be twee n static
r positions of P. Then the relat
the two players w ill be called MAX and MIN respectively. Whenever MAX is to
move, he or she will choose a move that maximizes the value; on the contrary, MIN Let P 1, ... , Pn be legal successo
lu es can be d efined as:
will choose a move that minimizes the value. Given the values of the bottom-level values and backed-up va
search tree (11 = 0)
positions in a search tree, this principle (called minimax) will determine the values of
V(P) = v(P) if P is a terminal p osition in a
all the other pos itions in the sea rch tree. Figure 22.2 illustrates. In the figure, levels
if P is a MAX-to-move positi
on
of positions with MAX to move alternate w ith those with MIN to move . The bottom V(P) = max V(P;)
I
level position values are determined by the evaluatio n function. The values of the
V(P) = min V(P;) if P is a MIN-to-move position
internal nodes can be computed in a bo ttom-up fashion, level by level, until the root I
2 Static
values best( PosList, BestPos, BestVal)
ist. BestVal is
a list of candidate positions PosL
Figure 22.2 Static values (bottom level) and minimax backed-up values in a search tree. The the 'best' position BestPos from
selects
t' is h r ith r max imum or
nce also o f Pos. 'Bes e e e e
indicated moves constitute the main variation - that is, the minimax optimal play the value of BestPos, and h e
s id e to move.
for both sides. minimum, dep ending on the
implementation of minimax 587
586 Game Playing The alpha-beta algorithm: an efficient
yielding V(d) = 4.
(4) Take the maximum of d's successors
'Yc, minimax( Pos, BestSucc, Val): toe.
% Pos is a position, Val is its minimax value; (5) Backtrack to band move down
(who is to
r of e whose value is 5. At this point MAX
% best move from Pos leads to position BestSucc (6) Consider the first successo tion e rega rdles s of other
value of 5 in posi
minimax( Pos, BestSucc, Val) move in e) is guaranteed at least the MIN to reali ze that, at
This is suffi cien t for
moves( Pos, PosList), !, % Legal moves in Pos produce PosList (possibly better) alternatives from e. t value
d, even without knowing the exac
best( PosList, BestSucc, Val) node b, the alternative e is inferior to
of e.
staticval( Pos, Val). % Pos has no successors: evaluate statically n to e an
second successor of e and simply assig
best( [ Pos), Pos, Val) : On these grounds we can neglect the ever , have no effec t on the
ation will, how
minimax( Pos, _, Val), !. approximate value 5. This approxim
a.
computed value of b and, hence, of this idea.
best( [Posl I PosList), BestPos, BestVal)
celeb rated alph a-be ta algo rithm for efficient minimaxing is based on
minimax( Posl, _, Vall), The our exam ple tree
the alpha-beta algorithm on
best( PosList, Pos2, Val2), Figure 22.4 illustrates the action of es are appr oxim ate.
s, some of the backed-up valu
betterof( Posl, Vall, Pos2, Va!Z, BestPos, BestVal). of Figure 22.2. As Figure 22.4 show e the root valu e prec isely .
sufficient to determin
betterof( Pos0, Va!O, Posl, Vall, Pos0, Va!0) /c, Pos0 better than Pos 1
0 However, these approximations are redu ces the searc h com plex
a-beta principle
min_to_move( Pos0), % MIN to move in Pos0 In the example of Figure 22.4, the alph static
t stati c eval uati ons (as originally in Figure 22.2) to five
Va!O > Vall, ! % MAX prefers the greater value ity from eigh
evaluations. gh'
a-beta pruning is to find a 'good enou
max_to_move( Pos0), % MAX to move in Pos0 As said before, the key idea of the alph e the corr ect deci sion .
Va!O < Vall, !. % MIN prefers the lesser value is sufficiently good to mak
move, not necessarily the best, that ds, usua lly deno ted Alph a and
ducing two boun
betterof( Pos0, Val(), PoSl, Vall, Posl, Vall). '¾, Otherwise Pos 1 better than This idea can be formalized by intro is: Alpha is
ed-u p valu e of a posi tion. The meaning of these bounds
Beta, on the back is the
dy guaranteed to achieve, and Beta
Figure 22.3 A straightforward implementation of the minimax principle. the minimal value that MAX is alrea t of view , Beta is
e to achieve. From MIN's poin
maximal value that MAX can hop
4 \ a MAX to move
idea: Suppose that there are two alternative moves; once one of them has been
shown to be clearly inferior to the other, it is not necessary to know exactly how ·-- I
/ '
\ '':- .... ',
much inferior it is for making the correct decision. For example, we can use this '
I ' - /
principle to reduce the search in the tree of Figure 22.2. The search process here
alpha-beta
proceeds as follows: Figure 22.4 The tree of Figure 22.2
searched by the alpha-beta algorithm. The
lines, thus economizing the search As
d
search prunes the nodes shown by dotte
(1) Start with position a. of the back ed-u p value s are inexact (nodes c, e, f; compare with
a result, some
ation s suffice for determining the root
(2) Move down to b. Figure 22.2). However, these approxim
Move down to d. value and the main variation correctly.
(3)
11111
588 Game Playing The alpha-beta algorithm: an efficient implementation of minimax 589
the worst value for MIN that MIN is guaranteed to achieve. Thus, the actual value
% The alpha-beta algorithm
(that is to be found) lies between Alpha and Beta. If a position has been show n to
have a value that lies outside the Alpha-Beta interval then this is sufficient to know alphabeta( Pos,Alpha,Beta,GoodPos,Val)
that this position is not in the main variation, without knowing the exact value of moves( Pos, PosList), !,
boundedbest( PosList, Alpha,Beta,GoodPos,Val);
this position. We only have to know the exact value of this position if this value is % Static value of Pos
staticval( Pos,Val).
between Alpha and Beta. Formally, we can define a 'good enough' backed-up value
V(P, Alpha, Beta) of a position P, with respect to Alpha and Beta, as any value that boundedbest( [Pos I PosList],Alpha,Beta,GoodPos, GoodVal) :
alphabeta( Pos,Alpha,Beta,_ ,Val),
satisfies the following requirements:
goodenough( PosList,Alpha,Beta,Pos,Val,GoodPos,GoodVal).
V(P,Alpha,Beta) < Alpha if V(P) < Alpha goodenough( [], _, _,Pos,Val,Pos,Val) :- !. A, No other candidate
0
The economization effect of the alpha-beta algorithm can also be expressed in hand, if Black can capture the White's queen on the next move, such an evaluation
terms of the effective branching factor (number of branches stemming from each can result in a disastrous blunder, as it will not be able to perceive the position
internal node) of the search tree. Assume that the game tree has a uniform dynamically. Clearly, we can better trust the static evaluation in quiescent positions
branching factor b. Due to the pruning effect, alpha-beta will only search some of than in turbulent positions in which each side has direct threats of capturing the
the branches, thus effectively reducing the branching factor. The reduction is, in the opponent's pieces. Obviously, we should use the static evaluation only in quiescent
best case, from b to -lb. In chess-playing programs the effective branching factor due positions. Therefore a standard trick is to extend the search in turbulent positions
to the alpha-beta pruning becomes about 6 compared to the total of about 30 legal beyond the depth limit until a quiescent position is reached. In particular, this
moves. A less optimistic view on this result is that in chess, even with alpha-beta, extension includes sequences of piece captures in chess.
deepening the search by 1 ply (one half-move) increases the number of terminal Another refinement is heuristic pnming. This aims at achieving a greater depth
search positions by a factor of about 6. limit by disregarding some less promising continuations. This technique will prune
branches in addition to those that are pruned by the alpha-beta technique itself.
Therefore this entails the risk of overlooking some good continuation and incor
Project
rectly computing the minimax value.
Yet another technique is progressive deepening. The program repeatedly executes
Consider a two-person game (for example, some non-trivial version of tic-tac-toe).
the alpha-beta search, first to some shallow depth, and then increases the depth
Write game-definition relations (legal moves and terminal game positions) and
limit on each iteration. The process stops when the time limit has been reached. The
propose a static evaluation function to be used for playing the game with the alpha
best move according to the deepest search is then played. This technique has the
beta procedure.
following advantages:
• enables the time control; when the time limit is reached there will always be
22.4 Minimax-based programs: refinements and limitations
····· · • • · · · · · · · · · ···· · · ·· ························ · · ··· ·· · · · · · · · · · · · · · ·························· · · · · · · · ······· · · · · · ················ · · ······
some best move found so far;
• the minimax values of the previous iteration can be used for preliminary
The minimax principle, together with the alpha-beta algorithm, is the basis of many ordering of positions on the next iteration, thus helping the alpha-beta
successful game-playing programs, most notably chess programs. The general algorithm to search strong moves first.
scheme of such a program is: perform the alpha-beta search on the current position
in the game, up to some fixed depth limit (dictated by the time constraints imposed Progressive deepening entails some overhead (researching upper parts of the game
by tournament rules), using a game-specific evaluation function for evaluating the tree), but this is relatively small compared with the total effort.
terminal positions of the search. Then execute the best move (according to alpha A known problem with programs that belong to this general scheme is the
beta) on the play board, accept the opponent's reply, and start the same cycle again. 'horizon effect'. Imagine a chess position in which the program's side inevitably
The two basic ingredients, then, are the alpha-beta algorithm and a heuristic loses a knight. But the loss of the knight can be delayed at the cost of a lesser
evaluation function. To build a good program for a complicated game like chess sacrifice, a pawn say. This intermediate sacrifice may push the actual loss of the
many refinements to this basic scheme are needed. We will briefly review some knight beyond the search limit (beyond the program's 'horizon'). Not seeing the
standard techniques. eventual loss of the knight, the program will then prefer this variation to the quick
Much depends on the evaluation function. If we had a perfect evaluation death of the knight. So the program will eventually lose both the pawn (unneces
function we would only have to consider the immediate successors of the current sarily) and the knight. The extension of search up to a quiescent position can
position, thus practically eliminating search. But for games like chess, any evalua alleviate the horizon effect.
tion function of practically acceptable computational complexity will necessarily be There is, however, a more fundamental limitation of the minimax-based
just a heuristic estimate. This estimate is based on 'static' features of the position (for programs, which lies in the limited form of the domain-specific knowledge they
example, the number of pieces on the board) and will therefore be more reliable in use. This becomes very conspicuous when we compare the best chess programs with
some positions than in others. Consider for example such a material-based evalua human chess masters. Strong programs often search millions (and more) of positions
tion function for chess and imagine a position in which White is a knight up. This before deciding on the move to play. It is known from psychological studies that
function will, of course, assess the position in White's favour. This is fine if the human masters typically search just a few tens of positions, at most a few hundred.
position is quiescent, Black having no violent threat at his disposal. On the other Despite this apparent inferiority, a chess master may still beat a program. The
592 Game Playing Pattern knowledge and the mechanism of 'advice' 593
masters' advantage lies in their knowledge, which far exceeds that contained in Generally speaking, advice is expressed in terms of goals to be achieved, and means of
the programs. Games between machines and strong human players show that the achieving these goals. The two sides are called 'us' and 'them'; advice always refers to
enormous advantage in the calculating power cannot always completely compen the 'us' point of view. Each piece-of advice has four ingredients:
sate for the lack of knowledge.
Knowledge in minimax-based programs takes three main forms: • better-goal: a goal to be achieved;
• holding-goal: a goal to be maintained during play toward the better-goal;
• evaluation function,
• us-move-constraints: a predicate on moves that selects a subset of all legal
• tree-pruning heuristics,
us-moves (moves that should be considered of interest with respect to the goals
• quiescence heuristics. specified);
The evaluation function reduces many aspects of a game situation into a single • them-move-constraints: a predicate to select moves to be considered by 'them'
number, and this reduction can have a detrimental effect. A good player's under (moves that may undermine the goals specified).
standing of a game position, on the contrary, spans over many dimensions. Let us
As a simple example from the chess endgame king and pawn vs king, consider the
consider an example from chess: an evaluation function will evaluate a position as
straightforward idea of queening the pawn by simply pushing the pawn forward.
equal simply by stating that its value is 0. A master's assessment of the same position
This can be expressed in the form of advice as:
can be much more informative and indicative of a further course of the game. For
example, Black is a pawn up, but White has a good attacking initiative that • better-goal: pawn queened;
compensates the material, so chances are equal. • holding-goal: pawn is not lost;
In chess, minimax-based programs usually play best in sharp tactical struggles
when precise calculation of forced variations is decisive. Their weakness is more • us-move-constraints: pawn move;
likely to show in quiet positions where their play falls short of long-range plans that • them-move-constraints: approach the pawn with the king.
prevail in such slow, strategic games. Lack of a plan makes an impression that the
program keeps wandering during the game from one idea to another.
In the rest of this chapter we will consider another approach to game playing, 22.5.2 Satisfiability of advice
based on introducing pattern knowledge into a program by means of 'advice'. This
enables the programming of goal-oriented, plan-based behaviour of a game-playing We say that a given piece-of-advice is satisfiable in a given position if 'us' can force
program. the achievement of the better-goal specified in the advice under the conditions that:
• there is exactly one us-move from each internal us-to-move position in T; and possible then try to 'squeeze' the opponent's king toward a corner, etc. Notice that
that move must satisfy the us-move-constraints; with an appropriate definition of operators, the rule above is a syntactically correct
• Prolog clause.
T contains all them-moves (that satisfy the them-move-constraints) from each
Each piece-of-advice will be specified by a Prolog clause of the form:
non-terminal them-to-move position in T.
advice( AdviceName,
Each piece-of-advice can be viewed as a definition of a small special game with
BetterGoal
the following rules. Each opponent is allowed to make moves that satisfy his or her HoldingGoal :
move-constraints; a position that does not satisfy the holding-goal is won for 'them'; Us_Move_Constraints :
a position that satisfies the holding-goal and the better-goal is won for 'us'. A non Them_Move_Constraints).
terminal position is won for 'us' if the piece-of-advice is satisfiable in this position.
Then 'us' will win by executing a corresponding forcing-tree in the play. The goals are expressions that consist of predicate names and logical connectives
and, or, not. Move-constraints are, again, expressions that consist of predicate names
and the connectives and and then: and has the usual logical meaning, then prescribes
22.5.3 Integrating pieces-of-advice into rules and advice-tables the ordering. For example, a move-constraint of the form
MCI then MC2
In Advice Languages, individual pieces-of-advice are integrated in the complete
knowledge representation schema through the following hierarchy. A piece-of says: first consider those moves that satisfy MCI, and then those that satisfy MCZ.
advice is part of an if-then rule. A collection of if-then rules is an advice-table. A set For example, a piece-of-advice to mate in 2 moves in the king and rook vs king
of advice-tables is structured into a hierarchical network. Each advice-table has the ending, written in this syntax, is:
role of a specialized expert to deal with some specific subproblem of the whole
advice( mate_in_2,
domain. An example of such a specialized expert is an advice-table that knows how
mate:
to mate in the king and rook vs king ending in chess. This table is summoned when not rooklost :
such an ending occurs in a game. (depth = 0) and legal then (depth = 2) and checkmove :
For simplicity, we will consider a simplified version of an Advice Language in (depth= 1) and legal).
which we will only allow for one advice-table. We shall call this version Advice
Language 0, or ALO for short. Here is the structure of ALO already syntactically Here the better-goal is mate, the holding-goal is not rooklost (rook is not lost). The
tailored toward an easy implementation in Prolog. us-move-constraints say: at depth O (the current board position) try any legal move,
A program in ALO is called an advice-table. An advice-table is an ordered collection then at depth 2 (our second move) try checking moves only. The depth is measured
of if-then rules. Each rule has the form: in plies. Them-move-constraints are: any legal move at depth l.
In playing, an advice-table is then used by repeating, until the end of the game,
RuleName :: if Condition then AdviceList the following main cycle: build a forcing-tree, then play according to this tree until
Condition is a logical expression that consists of predicate names connected by the play exits the tree; build another forcing-tree, etc. A forcing-tree is generated
logical connectives and, or, not. AdviceList is a list of names of pieces-of-advice. An each time as follows: take the current board position Pos and scan the rules in the
example of a rule called 'edge_rule', from the king and rook vs king ending, can be: advice-table one by one; for each rule, match Pos with the precondition of the rule,
and stop when a rule is found such that Pos satisfies its precondition. Now consider
edge_rule :: the advice-list of this rule: process pieces-of-advice in this list one by one until a
if their_king_on_edge and our_king_close piece-of-advice is found that is satisfiable in Pos. This results in a forcing-tree that is
then [ mate_in_2, squeeze, approach, keeproom, divide].
the detailed strategy to be executed across the board.
This rule says: if in the current position their king is on the edge and our king is close Notice the importance of the ordering of rules and pieces-of-advice. The rule used
to their king (or more precisely, kings are less than four squares apart), then try to is the first rule whose precondition matches the current position. There must be for
satisfy, in the order of preference as stated, the pieces-of-advice: 'mate_in_Z', any possible position at least one rule in the advice-table whose precondition will
'squeeze', 'approach', 'keeproom', 'divide'. This advice-list specifies pieces-of-advice match the position. Thus an advice-list is selected. The first satisfiable piece-of
in the decreasing order of ambition: first try to mate in two moves, if that is not advice in this list is applied.
596 Game Playing A chess endgame program in Advice Language 0 597
An advice-table is thus largely a non-procedural program. An ALO interpreter % A miniature implementation of Advice Language 0
accepts a position and by executing an advice-table produces a forcing-tree which %
determines the play in that position. % This program plays a game from a given starting position using knowledge
% represented in Advice Language 0
op( 200, xfy, [:, ::].
22.6 ·····
A chess endgame program in Advice Language 0
···················································································································· ·················
op( 220, xfy, ..).
op( 185, fx, if).
op( 190, xfx, then).
Implementation of an ALO-based game-playing program can be conveniently op( 180, xfy, or).
divided into three modules: op( 160, xfy, and).
op( 140, fx, not).
(1) an ALO interpreter,
playgame( Pos) : % Play a game starting in Pos
(2) an advice-table in ALO, playgame( Pos, nil). % Start with empty forcing-tree
(3 ) a library of predicates (including rules of the game) used in the advice-table. playgame( Pos, ForcingTree)
show( Pos),
This structure corresponds to the usual structure of knowledge-based systems as ( end_of_game( Pos), '¼, End of game?
follows: write( 'End of game'), nl, !
• The ALO interpreter is an inference engine. playmove( Pos, ForcingTree, Posl, ForcingTreel), !,
• The advice-table and the predicate library constitute a knowledge base. playgame( Posl, ForcingTreel)
).
% Play 'us' move according to forcing-tree
22.6.1 A miniature ALO interpreter playmove( Pos, Move .. FTreel, Posl, FTreel)
side( Pos, w), % White = 'us'
A miniature, game-independent ALO interpreter is implemented in Prolog in Jegalmove( Pos, Move, Posl),
Figure 22.6. This program also performs the user interaction during play. The central showmove( Move).
function of the program is the use of knowledge in an ALO advice-table; that is, '¼, Read 'them' move
interpreting an ALO advice-program for the generation of forcing-trees and their
playmove( Pos, FTrec, Posl, FTreel)
execution in a game. The basic forcing-tree generation algorithm is similar to the side( Pos, b),
depth-first search in AND/OR graphs of Chapter 13; a forcing-tree corresponds to an write( 'Your move: '),
AND/OR solution tree. On the other hand, it also resembles the generation of a read( Move),
proof tree in an expert system (Chapter 15). ( legalmove( Pos, Move, Posl),
For simplicity, in the program of Figure 22.6 'us' is supposed to be White, and subtree( Ffree, Move, FTreel), ! % Move down forcing-tree
'them' is Black. The program is started through the procedure
write( 'Illegal move'), nl,
playgame( Pos) playmove( Pos, Ffree, Posl, Ffreel)
).
where Pos is a chosen initial position of a game to be played. If it is 'them' to move in
Pos then the program reads a move from the user, otherwise the program consults % If current forcing-tree is empty generate a new one
the advice-table that is attached to the program, generates a forcing-tree and plays playmove( Pos, nil, Posl, FTreel)
its move according to the tree. This continues until the end of the game is reached as side( Pos, w),
resetdepth( Pos, PosO), % PosO = Pos with depth 0
specified by the predicate end_of_game (mate, for example).
strategy( PosO, FTree), !, % Generate new forcing-tree
A forcing-tree is a tree of moves, represented in the program by the following playmove( PosO, FTree, Posl, Ffreel).
structure
Move .. [ Reply! .. Ftreel, Reply2 .. Ftree2, ...] Figure 22.6 A miniature implementation of Advice Language 0.
A chess endgame program in Advice Language O 599
598 Game Playing
holds( Condition, Pos, _), !, % Match Pos against precondition Cond = .. [ Pred, Pos, RootPos] ),
member( AdviceName, AdviceList), % Try pieces-of-advice in turn call( Cond).
nl, write( 'Trying'), write( AdviceName),
% Interpreting move-constraints
satisfiable( AdviceName, Pos, ForcingTree), !. % Satisfy AdviceName in Pos
move( MCI and MC2, Pos, Move, PosI) !,
satisfiable( AdviceName, Pos, FTree)
move( MCI, Pos, Move, PosI),
advice( AdviceName, Advice), % Retrieve piece-of-advice
move( MC2, Pos, Move, PosI).
sat( Advice, Pos, Pos, FTree). % 'sat' needs two positions
% for comparison predicates move( MCI then MC2, Pos, Move, Posl) !,
( move( MCI, Pos, Move, Posl)
sat( Advice, Pos, RootPos, FTree)
holdinggoal( Advice, HG),
move( MC2, Pos, Move, PosI)
holds( HG, Pos, RootPos), % Holding-goal satisfied
satl( Advice, Pos, RootPos, FTree). ).
% Selectors for components of piece-of-advice
satl( Advice, Pos, RootPos, nil)
bettergoal( Advice, BG), bettergoal( BG:_, BG).
holds( BG, Pos, RootPos), !. % Better-goal satisfied
holdinggoal( BG: HG:_, HG).
satl( Advice, Pos, RootPos, Move .. FTrees)
usmoveconstr( BG: HG: UMC: _, UMC).
side( Pos, w), !, % White = 'us'
usmoveconstr( Advice, UMC), themmoveconstr( BG: HG : UMC: TMC, TMC).
move( UMC, Pos, Move, PosI), 'i'b A move satisfying move-constr. member( X, [X I L] ).
sat( Advice, Posl, RootPos, FTrees).
member( X, [Y I L) )
satl( Advice, Pos, RootPos, FTrees) member( X, L).
side( Pos, b), !, % Black = 'them'
themmoveconstr( Advice, TMC),
bagof( Move .. Posl, move( TMC, Pos, Move, Posl), MPlist),
satall( Advice, MP!ist, RootPos, FTrees). '¼, Satisfiable in all successors where' .. ' is an infix operator; Move is the first move for'us'; ReplyI, Reply2, etc. are
the possible 'them' replies; and FtreeI, Ftree2, etc. are forcing-subtrees that corres
satall( _, [], _, [] ).
pond to each of the 'them' replies respectively.
satall( Advice, [Move .. Pos I MP!ist], RootPos, (Move .. FT I MFTs] )
sat( Advice, Pos, RootPos, FT),
satall( Advice, MP!ist, Rootl'os, MFTs).
While making sure that stalemate is never created or the rook left undefended % King and rook vs king in Advice Language 0
under attack, repeat until mate: % Rules
(1) Look for a way to mate the opponent's king in two moves. edge_rule :: if their_king_edge and kings_close
then [ mate_in_2, squeeze, approach, keeproom,
(2) If the above is not possible, then look for a way to constrain further the area divide_in_2, divide_in_3 ).
on the chessboard to which the opponent's king is confined by our rook.
else_rule:: if true
(3) If the above is not possible, then look for a way to move our king closer to then [ squeeze , approach, keeproom, divide_in_2, divide_
in_3).
the opponent's king.
% Pieces-of-advice
(4) If none of the above pieces-of-advice 1, 2 or 3 works, then look for a way of
advice( mate_in_2,
maintaining the present achievements in the sense of 2 and 3 (that is, make mate:
a waiting move). not rooklost and their_king_edge:
ove:
(5) If none of 1, 2, 3 or 4 is attainable, then look for a way of obtaining a (depth= 0) and legal then (de pth= 2) and checkm
position in which our rook divides the two kings either vertically or (depth= 1) and legal).
horizontally. advice( squeeze,
newroomsmaller and not rookexposed and
rookdivides and not stalemate:
not rooklost:
These principles are implemented in detail as an ALO advice-table in Figure 22.7. (depth= 0) and rookmove:
This table can be run by the ALO interpreter of Figure 22.6. Figure 22.8 illustrates the nomove).
meaning of some of the predicates used in the table and the way the table works.
advice( approach,
The predicates used in the table are: stalemate and
okapproachedcsquare and not rookexposed and not
(room gt2 or not our_king_e dge) :
(rookdivides or lpatt) and
Goal predicates not rooklost:
mate their king mated (depth= 0) and kingdiagfirst:
stalemate their king stalemated nomove).
rooklost their king can capture our rook advice( keeproom,
es and okorndle and
rookexposed their king can attack our rook before our king can get to themtomove and not rookexposed and rookdivid
defend the rook (roomgt2 or not okedge):
not rooklost:
newroomsmaller area to which their king is restricted by our rook has (depth= 0) and kingdiagfirst:
shrunk nomove).
rookdivides rook divides both kings either vertically or horizontally
advice( divide_in_2,
okapproachedcsquare our king approached 'critical square', see Figure 22.9; here :
themtomove and rookdivides and not rookexposed
this means that the Manhattan distance has decreased not rooklost:
lpatt 'L-pattern' (Figure 22.9) (depth< 3) and legal:
roomgt2 the 'room' for their king is greater than two squares (depth< 2) and legal).
advice( divide_in_3,
Move-constraints predicates themtomove and rookdivides and not rookexposed
depth = N move occurring at depth = N in the search tree not rooklost
legal any legal move (depth< 5) and legal:
checkmove (depth< 4) and legal).
checking move
rookmove a rook move
nomove fails for any move Figure 22.7 An ALO advice-table for king and rook vs king.
The table consists of two rules and
kingdiagfirst a king move, with preference for diagonal king moves six pieces-of-advice.
602 Game Playing A chess endgame program in Advice Language 0 603
(a) (h)
V
V X V
13
w squeeze
w n
w fl
n ,
w
Figure 22.9 (a) Illustration of the 'critical square' (a crucial square in the squeezing
BLACK REPLY
J manoeuvres, indicated by a cross); the White king approaches the critical square
by moving as indicated. (b) The three pieces form an L-shaped pattern.
rmmrn 1x:
LJ
· ·.:l•••AH?
n
The arguments of these predicates are either positions (goal predicates) or moves
(move-constraints predicates). Goal predicates can have one or two arguments. One
argument is always the current search node; the second argument (if it exists) is the
W-+- keeproom
w root node of the search tree. The second argument is needed in the so-called
comparison predicates, which compare in some respect the root position and the
current search position. An example is the predicate newroomsmaller, which tests
whether the 'room' for their king has shrunk (Figure 22.8). These predicates,
together with chess rules for king and rook vs king, and a board displaying procedure
( show( Pos) ), are programmed in Figure 22.10.
An example of how this advice-program plays is shown in Figure 22.8. The game
BLACK REPLY
)
would continue from the last position of Figure 22.8 as in the following variation
f (assuming 'them' moves as given in the variation). The algebraic chess notation is
used where the files of the chessboard are numbered 'a', 'b', 'c', etc, and ranks are
i:':':.:: f:::::'::
<> }{ kt'AJt numbered 1, 2, 3, etc. For example, the move 'BKb7' means: move the Black king to
>< Id<< V:
·.·.·.·.· ,·.·.·.·.
fjk\ •Jt
the square in file 'b' and rank 7.
-ti fl BKb7
I@ squeeze w WKdS
WKcS
BKc7
BKb7
WRc6 BKa7
WRb6 BKa8
WKbS BKa7
WKc6 BKa8
WKc7 BKa7
Figure 22.8 A game fragment played by the advice-table of Figure 22. 7, illustrating the WRc6 BKa8
method of squeezing their king toward a corner. Pieces-of-advice used in this WRa6 mate
sequence are keeproom (waiting move preserving 'room') and squeeze ('room'
has shrunk). The area to which their king is confined by our rook ('room') is Some questions can now be asked. First, is this advice-program correct in the sense
shadowed. After the last squeeze, 'room' shrinks from eight to six squares. that it mates against any defence if the game starts from any king and rook vs king
A chess endgame program in Advice Language 0 605
604 Game Playing
vemgb(X : Y,X : YI) : % Vertical neighbour squares Jegalmove( Pos, Move, Posl) :
n( Y, YI). move( legal, Pos, Move, Posl).
horngb( X: Y,XI: Y) % Horizontal neighbour squares check( _..W ..Rx: Ry ..Bx: By .,_ )
n(X,XI). ngb( W, Bx: By); % King's too close
?.�.�.��.rY........................................................................................................................ implemented the method or at least part of it. This interesting history is described by Knuth
and Moore (1975), who also present a more compact formulation of the alpha-beta algorithm
• Two-person games fit the formalism of AND/OR graphs. AND/OR search using the 'neg-max' principle instead of minimax, and give a mathematical analysis of its
performance. A comprehensive treatment of several minimax-based algorithms and their
procedures can be therefore used to search game trees.
analyses is Pearl (1984). Kaindl (1990) also reviews search algorithms. Platt et al. (1996)
• The straightforward depth-first search of game trees is easy to program, but is too introduce a more recent variation of alpha-beta search. There is another interesting question
inefficient for playing interesting games. In such cases, the minimax principle, regarding the minimax principle: Knowing that the static evaluation is only reliable to some
in association with an evaluation function and depth-limited search, offers a degree, will the minimax backed-up values be more reliable than the static values themselves7
more feasible approach. Pearl (1984) has also collected results of mathematical analyses that pertain to this question.
610 Game Playing References 611
Results on error propagation in minimax trees explain when and why the minimax look-ahead Kaindl, H. (1990) Tree searching algorithms. In: Marsland and Schaeffer (1990).
Intelligence
is beneficial. Knuth, D.E. and Moore, R.W. (1975) An analysis of alpha-beta pruning. Artificial
Bramer (1983), Frey (1983), and Marsland and Schaeffer (1990) edited collections of papers 6: 293-326.
Berlin: Springer-
on computer game playing, and chess in particular. On-going research on computer chess is Marsland, A.T. and Schaeffer, J. (eds) (1990) Computers, Chess and Cognition.
published in the Advances in Computer Chess series and in the ICCA journal. Verlag.
Reading, MA:
The Advice Language approach to using pattern knowledge in chess was introduced by Pearl, J. (1984) Heuristics: Intelligent Search Strategics for Computer Problem Solving.
Michie, and further developed in Bratko and Michie (1980), and Bratko (1982, 1984, 1985). The Addison-Wesley.
Intelligence 8:
king and rook vs king advice-program of this chapter is a slight modification of the advice-table Pitrat, J. (1977) A chess combination program which uses plans. Artificial
that was mathematically proved correct in Bratko (1978). 275-321.
minimax
Other interesting experiments in knowledge-intensive approach to chess (as opposed to Plaat, A., Schaeffer, J., Pijls, W. and de Bruin, A. (1996) Best-first fixed-depth
search-intensive approaches) include Berliner (1977), Pitrat (1977) and Wilkins (1980). Interest algorithms. Artificial Intelligence. 87: 255-293.
cal Magazine 41:
in knowledge-intensive chess programming seems to have declined over the time, probably Shannon, C.E. (1950) Programming a computer for playing chess. Philosophi
due to competitive success of the search-intensive, 'brute force' approach to chess program 256-275.
e 14: 165-203.
ming. Due to increasing power of computer hardware, including special-purpose chess Wilkins, D.E. (1980) Using patterns and plans in chess. Artificial Intelligenc
hardware, up to hundreds of millions of positions can be searched per second, so the sheer
power more than compensates for the lack of more subtle knowledge. The brute-force approach
culminated in 1997 in the eventual defeat of the world leading chess player Gary Kasparov by
the program Deep Blue (Hsu et al. 1990), an extreme example of brute-force. However, this
competitive success does not remove the known shortcoming of brute-force programs: they
cannot explain their play in conceptual terms. So from the points of view of explanation,
commentary and teaching, the knowledge-intensive approach remains necessary. The game of
go, on the other hand, seems to require a knowledge-based approach also for the mere
competitive reasons. Brute-force has not worked so well in go as in chess because go has a much
greater combinatorial complexity.
Advanas in Computer Chess Series: Vols 1-2 (M.R.B. Clarke, ed.) Edinburgh University Press;
Vol. 3 (M.R.B. Clarke, ed.) Pergamon Press; Vol. 4 (D.F. Beal, ed.) Pergamon Press; Vol. 5
(D.F. Beal, ed.) North-Holland; Vol. 6 (D.F. Beal, ed.) Ellis Horwood.
Berliner, H.J. (1977) A representation and some mechanisms for a problem solving chess
program. In: Advances in Computer Chess 1 (M.R.B. Clarke, ed.). Edinburgh University Press.
Bramer, M.A. (ed.) (1983) Computer Game Playing: Theory and Practice. Chichester: Ellis Horwood
and John Wiley.
Bratko, I. (1978) Proving correctness of strategies in the ALI assertional language. Information
Processing Letters 7: 223-230.
Bratko, J. (1982) Knowledge-based problem solving in AL3. In: Machine Intelligence 10 (Hayes, J.,
Michie, D. and Pao, J.H., eds). Ellis Horwood (an abbreviated version also appears in Bramer
1983).
Bratko, I. (1984) Advice and planning in chess end-games. In: Artificial and Human Intelligence
(Amarel, S., Elithorn, A. and Banerji, R., eds). North-Holland.
Bratko, J. (1985) Symbolic derivation of chess patterns. In: Progress in Artificial Intelligence
(Steels, L. and Campbell, J.A., eds). Chichester: Ellis Horwood and John Wiley.
Bratko, I. and Michie, D. (1980) An advice program for a complex chess programming task.
Computer Journal 23: 353-359.
Frey, P.W. (ed.) (1983) Chess Skill in Man and Machine (second edition). Berlin: Springer-Verlag.
Hsu, F.-H., Anantharaman, T.S., Campbell, M.S. and Nowatzyk, A. (1990) A grandmaster chess
machine. Scientific American, 263: 44-50.
Prolog meta-interpreters 613
chapter 23 programming paradigm or program architecture. New ideas are rapidly implemented
and experimented with. In prototyping the emphasis is on bringing new ideas to life
quickly and cheaply, so that they can be immediately tested. On the other hand,
there is not much emphasis on efficiency of implementation. Once the ideas are
course of no practical value because it does not provide any additional feature. To meta-interpreter for pure Prolog only. lt does not handle built-in predicates, in
enable features such as the generation of proof trees, we first have to reduce the particular the cut. The usefulness of this basic meta-interpreter lies in the fact that it
'grain size' of the interpreter. This reduction in granularity of the meta-interpreter is provides a scheme that can be easily modified to obtain interesting effects. One such
made possible by a built-in predicate, provided in many Prolog implementations: well-known extension results in a trace facility for Prolog. Another possibility is to
clause( Head, Body) prevent the Prolog interpreter from getting into infinite loops, by limiting the depth
of subgoal calls (Exercise 23.2).
This 'retrieves' a clause from the consulted program. Head is the head of the
retrieved clause and Body is its body. For a unit clause (a fact), Body = true. In a
non-unit clause (a rule), the body can contain one or several goals. If it contains one Exercises
goal, then Body is this goal. If the body contains several goals then they are retrieved
as a pair: 23.1 What happens if we try to execute the meta-interpreter of Figure 23.1 with itself- for
Body= ( FirstGoal , OtherGoals) example, by:
The comma in this term is a built-in infix operator. In the standard Prolog notation, ?- prove( prove( member( X, [ a, b, cl))).
this pair is equivalently written as:
There is a problem because our meta-interpreter cannot execute built-in predicates
,( FirstGoal, OtherGoals) such as clause. How could the meta-interpreter be easily modified to be able to
execute itself, as in the query above"?
Here OtherGoals may again be a pair consisting of another goal and remaining goals.
ln a call clause( Head, Body), the first argument Head must not be a variable. Suppose 23.2 Modify the meta-interpreter of Figure 23.1 by limiting the depth of Prolog's search for
the consulted program contains the usual member procedure. Then the clauses of proof. Let the modified meta-interpreter be the predicate prove( Goal, DepthLimit),
member can be retrieved by: which only succeeds if DepthLimit � 0. Each recursive call reduces the limit.
?- clause( member( X, L), Body).
X = _14
L= (_14 I _IS] 23.2.2 A tracing meta-interpreter
Body= true;
X= _14 A first attempt to extend the basic meta-interpreter of Figure 23.1 to a tracing
v= us I _16J interpreter is:
Body member( _14, _16)
prove( true) :- !.
Figure 23.1 shows a basic meta-interpreter for Prolog at the level of granularity that prove( ( Goall, Goal2)) !,
has proved to be useful for most purposes. It should be noted, however, that this is a prove( Goall),
prove( Goal2).
% The basic Prolog meta-interpreter prove( Goal) :-
write( 'Call: '), write( Goal), nl,
prove( true). clause( Goal, Body),
prove( ( Goall, Goal2)) prove( Body),
prove( Goall), write( 'Exit: '), write( Goal), nl.
prove( Goal2).
The cuts are needed here to prevent the display of 'true' and composite goals of the
prove( Goal) :-
clause( Goal, Body), form ( Goall, Goal2). This tracer has several defects: there is no trace of failed goals
prove( Body). and no indication of backtracking when the same goal is re-done. The tracer
in Figure 23.2 is an improvement in these respects. To aid readability, it also indents
the displayed goals proportionally to the depth of inference at which they are called.
Figure 23.1 The basic Prolog meta-interpreter. It is, however, still restricted to pure Prolog only. An example call of this tracer is:
Prolog meta-interpreters 617
616 Meta-Programming
?- trace( ( member( X, [ a, bl), member( X, [ b, cl))). This tracer outputs the following information for each goal executed:
Call: member(_0085, [ a, b]) The goal to be executed (Call: Goal).
(1)
Exit: member( a, [ a, bl)
Call: member( a, [ b, cl) (2) Trace of the subgoals (indented).
Call: member( a, [ cl) (3) If the goal is satisfied then its final instantiation is displayed (Exit:
Call: member( a, [])
InstantiatedGoal); if the goal is not satisfied then Fail: Goal is displayed.
Fail: member( a, [])
Fail: member( a, [ cl) (4) In the case of backtracking to a previously satisfied goal, the message is:
Fail: member( a, [ b, cl) Redo: InstantiatedGoal (instantiation in the previous solution of this goal).
Redo: member( a, [ a, bl)
specific
Call: member( _0085, [ bl) Of course, it is possible to further shape the tracing interpreter according to
Exit: member( b, [ bl) users' requirements.
Exit: member( b, [ a, bl)
Call: member( b, [ b, cl)
Exit: member( b, [ b, cl)
23.2.3 Generating proof trees
is the
Another well-known extension of the basic interpreter of Figure 23.1
tree is available for
% trace( Goal): execute Prolog goal Goal displaying trace information generation of proof trees. So after a goal is satisfied, its proof
generatio n of proof trees was
trace( Goal) :- further processing. In Chapters 15 and 16, the
there
trace( Goal, 0). implemented for rule-based expert systems. Although the syntax of rules
the principles of generatin g a proof tree are the same.
trace( true, Depth) :- !. % Red cut; Depth = depth of call was different from Prolog,
23.1. For
These principles are easily introduced into the meta-interpreter of Figure
trace( ( Goall, Goal2), Depth) :- !, % Red cut case as follows:
example, we may choose to represent a proof tree depending on the
trace( Goall, Depth),
trace( Goal2, Depth). (1) For a goal tme, the proof tree is true.
of
trace( Goal, Depth) :- (2) For a pair of goals ( Goall, Goal2), the proof tree is the pair ( Proofl, Proof2)
display( 'Call: ', Goal, Depth),
the proof trees of the two goals.
clause( Goal, Body),
proof
Depth! is Depth+ 1, (3) For a goal Goal that matches the head of a clause whose body is Body, the
trace( Body, Depth!), tree is Goal<== Proof, where Proof is the proof tree of Body.
display( 'Exit: ', Goal, Depth),
as follows:
display_redo( Goal, Depth). This can be incorporated into the basic meta-interpreter of Figure 23. l
trace( Goal, Depth) :- % All alternatives exhausted :- op( 500, xfy,<==).
display( 'Fail: ', Goal, Depth),
prove( true, true).
fail.
display( Message, Goal, Depth) prove( ( Goall, Goal2), ( Proof!, Proof2))
tab( Depth), write( Message), prove( Goall, Proof!),
write( Goal), nl. prove( Goal2, Proof2).
23.3 Explanation-based generalization Thus stated, explanation-based generalization can be viewed as a kind of program
· · · · · · · · · · · · · · · · · ············ · ·· · · · · · · · · · · · · · ··················· · · · · · · · · · · · · · ···· · · · · · · · · · · · · · · · ··········· · · · · · · · · · · · · · · · ······· · ········ compilation from one form into another. The original program defines the target
The idea of explanation-based generalization comes from machine learning, where concept in terms of domain theory predicates. The compiled program defines the
the objective is to generalize given examples into general descriptions of concepts. same target concept (or subconcept) in terms of the 'target language' - that is,
Explanation-based generalization (EBG) is a way of building such descriptions from operational predicates only. The compilation mechanism provided by EBG is rather
typically one example only. The lack of examples is compensated for by the system's unusual. Execute the original program on the given example, which results in a
'background knowledge', usually called domain theory. proof tree. Then generalize this proof tree so that the structure of the proof tree is
EBG rests on the following idea for building generalized descriptions: given an retained, but the constants are replaced by variables whenever possible. In the
instance of the target concept, use the domain theory to explain how this instance in generalized proof tree thus obtained, some nodes mention operational predicates.
fact satisfies the concept. Then analyze the explanation and try to generalize it so The tree is then reduced so that only these 'operational nodes' and the root are
that it applies not only to the given instance, but also to a set of 'similar' instances. retained. The result constitutes an operational definition of the target concept.
This generalized explanation then becomes part of the concept description and can All this is best understood by an example of EBG at work. Figure 23.3 defines
be subsequently used in recognizing instances of this concept. It is also required that two domains for EBG. The first domain theory is about giving a gift, while the
the constructed concept description must be 'operational'; that is, it must be stated second is about lift movements. Let us consider the first domain. Let the training
in terms of concepts declared by the user as operational. Intuitively, a concept instance be:
description is operational if it is (relatively) easy to use. lt is entirely up to the user to gives( john, john, chocolate)
specify what is operational.
In an implementation of EBG, these abstract ideas have to be made more Our proof-generating meta-interpreter finds this proof:
concrete. One way of realizing them in the logic of Prolog is:
gives( john, john, chocolate)<==
• A concept is realized as a predicate. ( feels_sorry_for( john, john)<== sad( john),
• A concept description is a predicate definition. would_comfort( chocolate, john)<== likes( john, chocolate)
• An explanation is a proof tree that demonstrates how the given instance satisfies
the target concept.
• A domain theory is represented as a set of available predicates defined as a Prolog
'¼, A domain theory: about gifts
program.
gives( Person 1, Person2, Gift) :
The task of explanation-based generalization can then be stated as:
likes( Person 1, Person2),
Given: would_please( Gift, Person2).
A domain theory: A set of predicates available to the explanation-based general gives( Personl, Person2, Gift) :-
izer, including the target predicate whose operational definition is to be feels_sorry_for( Person 1, Person2),
constructed. would_comfort( Gift, Person2).
Operationality criteria: These specify the predicates that may be used in the would_please( Gift, Person) :
needs( Person, Gift).
target predicate definition.
would_comfort( Gift, Person) :
Training example: A set of facts describing a particular situation and an instance
likes( Person, Gift).
of the target concept, so that this instance can be derived from the given set of
facts and the domain theory. feels_sorry_for( Person1, Person2)
likes( Personl, Person2),
Find: sad( Person2).
A generalization of the training instance and an operational definition of the feels_sorry_for( Person, Person)
target concept; this definition consists of a sufficient condition (in terms of sad( Person).
the operational predicates) for this generalized instance to satisfy the target
concept. Figure 23.3 Two problem definitions for explanation-based generalization.
620 Meta-Programming Explanation-based generalization 621
Thus a sufficient condition Condition for gives( Person, Person, Thing) is:
Figure 23.3 contd
Condition=( sad( Person), likes( Person, Thing))
% Operational predicates
This new definition can now be added to our original program with:
operational(likes(_ ,_ )).
operational(needs(_ ,_ )). asserta( ( gives( Person, Person, Thing) :• Condition))
operational(sad(_)).
As a result, we have the following new clause about gives that only requires the
% An example situation
evaluation of operational predicates:
likes( john,annie).
likes(annie,john). gives( Person, Person, Thing)
likes(john, chocolate). sad( Person),
needs(annie, tennis_racket). likes( Person, Thing).
sad( john).
Through the generalization of the given instance
01<, Another domain theory: about lift movement
gives( john, john, chocolate)
% go( Level, GoalLevel, Moves) if
% list of moves Moves brings lift from Level to GoalLevel a definition (in operational terms) of giving as self-consolation was derived as one
go( Level, GoalLevel, Moves) : general case of gives. Another case of this concept would result from the example:
move_list( Moves, Distance), % A move list and distance travelled gives( john, annie, tennis_racket).
Distance=:=GoalLevel - Level.
move_list( [ ], 0). The explanation-based generalization would in this case produce the clause:
move_list( [Movel I Moves], Distance+ Distance I) gives( Person 1, Person2, Thing)
move_list(Moves, Distance), likes( Person 1, Person2),
move(Movel, Distancel). needs( Person2,Thing).
move( up, 1).
move(down, -1). The lift domain in Figure 23.3 is slightly more complicated and we will experiment
with it when we have EBG implemented in Prolog.
operational( A=:= B).
EBG can be programmed as a two-stage process: first, generate a proof tree for the
given example, and, second, generalize this proof tree and extract the 'operational
nodes' from it. Our proof-generating meta-interpreter could be used for this. The
This proof can be generalized by replacing constants john and chocolate by two stages are, however, not necessary. A more direct way is to modify the basic
variables: meta-interpreter of Figure 23.1 so that the generalization is intertwined with the
gives( Person, Person, Thing)<== process of proving the given instance. The so-modified meta-interpreter, which
( feels_sorry_for( Person, Person) <==sad( Person) carries out EBG, will be called ebg and will now have three arguments:
would_comfort(Thing, Person)<== likes( Person, Thing)
ebg( Goal,GenGoal,Condition)
)
where Goal is the given example to be proved, GenGoal is the generalized goal and
Predicates sad and likes are specified as operational in Figure 23.3. An operational
Condition is the derived sufficient condition for GenGoal, stated in terms of
definition of the predicate gives is now obtained by eliminating all the nodes from
operational predicates. Figure 23.4 shows such a generalizing meta-interpreter. For
the proof tree, apart from the 'operational' ones and the root. This results in:
our gift domain of Figure 23.3, ebg can be called as:
gives( Person, Person, Thing)<==
?- ebg( gives( john, john, chocolate), gives( X, Y, Z), Condition).
( sad( Person) ,
likes( Person,Thing) X=Y
) Condition =( sad( X), likes( X, Z))
622 Meta-Programming Explanation-based generalization 623
program, so that failed branches do not appear in the compiled definition at all. have the form of Prolog clauses - that is, Prolog facts and rules (except that they do
When a program is compiled by the £BG technique, care is needed during not end with a period). In our Prolog implementation, an object definition will
compilation so that new clauses do not interfere in an uncontrolled way with the possibly specify a whole class of objects, such as the class of all rectangles referred to
original clauses. as rectangle( Length, Width). A particular rectangle with sides 4 and 3 is then referred
to as rectangle( 4, 3). In general, then, the object rectangle( Length, Width) with two
methods area and describe can be defined by:
Exercise
object( rectangle( Length, Width),
[ ( area( A) :-
23.3 It may appear that the same compilation effect achieved in EBG can be obtained A is Length* Width),
without an example, simply by substituting goals in the original concept definition ( describe :-
by their subgoals, taken from corresponding clauses of the domain theory, until all write( 'Rectangle of size '),
the goals are reduced to operational subgoals. This procedure is called unfolding write( Length * Width)) ] ).
(a goal 'unfolds' into subgoals). Discuss this idea with respect to £BG and show that
In our implementation, the sending of a message to an object will be simulated by
the guidance by an example in £BG is essential. Also show that, on the other hand,
the procedure:
£BG-generated new concept definitions are only a generalization of given examples
and are therefore not necessarily equivalent to the original program (new concept send( Object, Message)
definitions may be incomplete).
We can make the rectangle with sides 4 and 3 decribe itself and compute its area by
sending it the corresponding messages:
Now: I•
polygon( [ Side 1, Side2, ... ])
I \
?- send( square( 5), area( Area)).
Area= 25
rectangle( Length, Width) reg_polygon( Side, N)
The message area( Area) is processed as follows: first, the object square( 5) searches for
area( Area) among its methods and cannot find it. Then through the isa relation it \ I \
square( Side) pentagon( Side)
finds its 'super-object' rectangle( 5, 5). The super-object has the relevant method area *f
which is executed.
An interpreter for object-oriented programs along the lines discussed here is given object( polygon( Sides), % Polygon with sides Sides
in Figure 23.5. Figure 23.6 provides a completed object-oriented program about [ ( perimeter( P) :-
sum( Sides, P))]). % Perimeter is the sum of sides
geometric figures.
Until now we have not mentioned the problem of multiple inheritance. This object( reg_polygon( Side, N), % Regular polygon with N sides
arises when the isa relation defines a lattice so that an object has more than one [ ( perimeter( P) :-
parent object, as is the case for square in Figure 23.6. So more than one parent pis Side * N),
( describe :- write( 'Regular polygon'))]).
object may potentially supply a method to be inherited by the object. In such a case
the question is: Which one of the several potentially inherited methods should be object( square( Side),
used? The program of Figure 23.5 searches for an applicable method among the [ ( describe :-
write( 'Square with side '),
objects in the graph defined by the isa relation. The search strategy in Figure 23.5
write( Side))]).
is simply depth first although some other strategy may be more appropriate.
Breadth first would, for example, ensure that the 'closest inheritable' method object( rectangle( Length, Width),
[ ( area( A) :-
is used.
A is Length* Width),
( describe :-
write( 'Rectangle of size '),
% An interpreter for object-oriented programs write( Length* Width))]).
'¾, send( Message, Object): object( pentagon( Side),
% find Object's methods and execute the method that corresponds to Message [ ( describe :- write( 'Pentagon'))]).
Figure 23.5 A simple interpreter for object-oriented programs. Figure 23.6 An object-oriented program about geometric figures.
-__________________________ ___________________;;,;;::;<':"<-~:i\S' :•:,;k:i¾·====:':::';�:':"'1---------->,
:c:.E.J&£:il.'.¾i ·Yiffi +5'7Rr7>:?Y Y< < '. :,K"<�LLSQ-;[� ..µ,
(1) If the block has a clear top then it can be seen by the camera, which will
Figure 23.6 contd
determine the xy coordinates of the block. This is accomplished by sending the
% sum( ListOfNumbers, Sum): message look( B, X, Y) to the camera.
% Sum is the sum of numbers in ListOfNumbers
(2 ) If block B is underneath some block Bl then it is helpful to observe that
sum([], 0). all of the blocks in the same stack share the same xy coordinates. So to
sum( [Number I Numbers], Sum) determine the xy coordinates of block B, send the message xy_coord( X, Y)
sum( Numbers, Suml), to block Bl.
Sum is Suml + Number.
To further illustrate the style of object-oriented programming enabled by our object( camera,
interpreter in Figure 23.5, Jet us consider the situation in Figure 23.7. The figure [ look( a, 1, 1), % Find xy coord. of a visible block
shows a robot world: a number of blocks arranged in stacks on the table. There is a look( d, 3, 2),
xy_coord( 2, 2), % xy coordinates of camera
camera on the ceiling that can capture the top view of the objects on the table. For
z_coord( 20)]). % z coordinate of camera
simplicity we assume that all of the blocks are cubes with side equal l. The blocks
have names a, b, ... , and the camera can recognize by name those blocks that are object( block( Block),
[ ( xy_coord( X, Y) :-
not obstructed from the top, and locate them with respect to the x and y
send( camera, look( Block, X, Y))),
coordinates. Suppose that we are interested in computing the locations of the
( xy_coord( X, Y) :-
objects - that is, their x, y and z coordinates. Each block has some local information, send( Block, under( Blockl)),
so it knows what block (if any) is immediately above it or underneath it. Thus the send( Blockl, xy_coord( X, Y))),
xy coordinates of a block B can be obtained in two ways: ( z_coord( 0) :-
send( Block, on( table))),
( z_coord( Z) :-
il
send( Block, on( Blockl)),
send( Blockl, z_coord( Zl)),
Camera Z is Zl + 1)]).
object( physical_object( Name),
[ ( coord( X, Y, Z) :-
send( Name, xy_coord( X, Y)),
send( Name, z_coord( Z)))]).
object( a, [ on( b)]).
object( b, [ under( a), on( c)]).
object( c, [ under( b), on( table)]).
object( d, [ on( table)]).
=
isa( d, block( d)).
ztz.�= isa( block( Name), physical_object( Name)).
isa( camera, physical_object( camera)).
Figure 23.7 A robot world. Figure 23.8 An object-oriented program about a robot world.
630 Meta-Programming Pattern-directed programming 631
Conflict
-
set
�---
Module 2 Condition 2
E
•I
,::;
Action 2
Database
Figure 23.9 A pattern-directed system. Rectangles represent pattern-directed modules Arrows
indicate modules' triggering patterns occurring in data.
Figure 23.10 The basic life cycle of pattern-directed systems. In this example the database
The same is true for the addition of new modules and for modifications of the satisfies the condition pattern of modules 1, 3 and 4; module 3 is chosen for
existing modules. If similar modifications are carried out in systems with conven execution.
tional organization, at least the calls between modules have to be properly modified.
The high degree of modularity is especially desirable in systems with complex (2) Conflict resolution: choose one of the modules in the conflict set.
knowledge bases because it is difficult to predict in advance all the interactions (3) Execution: execute the module that was chosen in step 2.
between individual pieces of knowledge in the base. The pattern-directed architec
ture offers a natural solution to this: each piece of knowledge, represented by an This implementational scheme is illustrated in Figure 23.10.
if-then rule, can be regarded as a pattern-directed module.
Let us further elaborate the basic scheme of pattern-directed systems with the
view on an implementation. Figure 23.9 suggests that the parallel implementation 23.5.2 Prolog programs as pattern-directed systems
would be most natural. However, let us assume the system is to be implemented on a
traditional sequential processor. Then in a case that the triggering patterns of several Prolog programs themselves can be viewed as pattern-directed systems. Without
modules simultaneously occur in the database there is a conflict: which of all these much elaboration, the correspondence between Prolog and pattern-directed systems
potentially active modules will actually be executed? The set of potentially active is along the following lines:
modules is called a conflict set. In an actual implementation of the scheme of • Each Prolog clause in the program can be viewed as a pattern-directed module.
Figure 23.9 on a sequential processor, we need an additional program module, called
The module's condition part is the head of the clause, the action part is specified
the control module. The control module resolves the conflict by choosing and
by the clause's body.
activating one of the modules in the conflict set. One simple rule of resolving
conflicts can be based on a predefined, fixed ordering of modules.
• The system's database is the current list of goals that Prolog is trying to satisfy.
The basic life cycle of pattern-directed systems, then, consists of three steps: • A clause is fired if its head matches the first goal in the database.
• To execute a module's action (body of a clause) means: replace the first goal in
(1) Pattern matching: find in the database all the occurrences of the condition
the database with the list of goals in the body of the clause (with the proper
patterns of the program modules. This results in a conflict set.
instantiation of variables).
Pattern-directed programming 635
634 Meta-Programming
25 10 15 30
• The process of module invocation is non-deterministic in the sense that several
clauses' heads may match the first goal in the database, and any one of them
can, in principle, be executed. This non-determinism is actually implemented in
Prolog through backtracking.
Whenever the condition of Module 1 is satisfied, so is the condition of Module 2 and [ Action 1, Action2, ...)
we have a conflict. This will be resolved by a simple control rule: Module 1 is always Each action is, again, simply a Prolog goal. To execute an action list, all the actions
preferred to Module 2. Initially the database contains the two numbers A and B. in the list have to be executed. That is, all the corresponding goals have to be
As a pleasant surprise, our pattern-directed program in fact solves a more general satisfied. Among available actions there will be actions that manipulate the
problem: computing the greatest common divisor of any number of integers. If database: add, delete or replace objects in the database. The action 'stop' stops further
several integers are stored in the database the system will output the greatest execution.
common divisor of all of them. Figure 23.11 shows a possible sequence of changes Figure 23.12 shows our pattern-directed program for computing the greatest
in the database before the result is obtained, when the initial database contains four common divisor written in this syntax.
numbers: 25, 10, 15, 30. Notice that a module's precondition can be satisfied at The simplest way to implement this pattern-directed language is to use Prolog's
several places in the database. own built-in database mechanism. Adding an object into the database and deleting
ln this chapter we will implement an interpreter for a simple language for an object can be accomplished simply by the built-in procedures:
specifying pattern-directed systems, and illustrate the flavour of pattern-directed
assertz( Object) retract( Object)
programming by programming exercises.
636 Meta-Programming Pattern-directed programming 637
% Production rules for finding greatest common divisor (Euclid algorithm) % A small interpreter for pattern-directed programs
:- op( 300, fx, number). % The system's database is manipulated through assert/retract
[ number X, number Y, X > Y] ---> :- op( 800, xfx, --->).
[ NewX is X - Y, replace( number X, number NewX) ]. % run: execute pattern-directed modules until action 'stop' is triggered
[ number X] ---> [ write( X), stop].
run :-
% An initial database Condition ---> Action, % A production rule
test( Condition), 'X, Precondition satisfied?
number 25.
execute( Action).
number 10.
% test( [ Condition 1, Condition2, ... ]) if all conditions true
number 15.
test( []). % Empty condition
number 30.
test( [First I Rest]) % Test conjunctive condition
call( First),
Figure 23.12 A pattern-directed program to find the greatest common divisor of
a set of test( Rest).
numbers.
% execute( [ Action 1, Action2, ... ]): execute list of actions
Replacing an object with another object is also easy: execute( [ stop]) !. '¾, Stop execution
replace( Objectl, Object2) execute( []) :- % Empty action (execution cycle completed)
retract( Objectl), !, run. % Continue with next execution cycle
assertz( Object2). execute( [First I Rest))
call( First),
The cut in this clause is used to prevent retract from deleting (through backtracking)
execute( Rest).
more than one object from the database.
replace( A, B) : % Replace A with B in database
A small interpreter for pattern-directed programs along these lines is shown
in retract( A), !, % Retract once only
Figure 23.13. This interpreter is perhaps an oversimplification in some respects.
In assertz( B).
particular, the conflict resolution rule in the interpreter is extremely simple and rigid:
always execute the first potentially active pattern-directed module (in the order as
they are written). So the programmer's control is reduced just to the ordering of Figure 23.13 A small interpreter for pattern-directed programs.
modules. The initial state of the database for this interpreter has to be asserted as
Prolog facts, possibly by consulting a file. Then the execution is triggered by the goal: When the database is large and there are many pattern-directed modules in the
program then pattern matching can become extremely inefficient. The efficiency in
?- run.
this respect can be improved by a more sophisticated organization of the database.
This may involve the indexing of the information in the database, or partitioning of
23.5.5 Possible improvements the information into sub-bases, or partitioning of the set of pattern-directed
modules into subsets. The idea of partitioning is to make only a subset of the
Our simple interpeter for pattern-directed programs is sufficient for illustrating some database or of the modules accessible at any given time, thus reducing the pattern
ideas of pattern-directed programming. For more complex applications it should be matching to such a subset only. Of course, in such a case we would need a more
elaborated in several respects. Here are some critical comments and indications for sophisticated control mechanism that would control the transitions between these
improvements. subsets in the sense of activating and de-activating a subset. A kind of meta-rules
In our interpreter, the conflict resolution is reduced to a fixed, predefined order. could be used for that.
Much more flexible schemas are often desired. To enable more sophisticated Unfortunately our interpreter, as programmed, precludes any backtracking due
control, all the potentially active modules should be found and fed into a special to the way that the database is manipulated through assertz and retract. So we
user-programmable control module. cannot study alternative execution paths. This can be improved by using a different
638 Meta-Programming A simple theorem prover as a pattern-directed program 639
Project This formula is read as: if b follows from a, and c follows from b, then c follows
from a.
Implement an interpreter for pattern-directed programs that does not maintain Before the resolution process can start we have to get our negated, conjectured
database as Prolog's own internal database (with assertz and retract), but
its theorem into a form that suits the resolution process. The suitable form is the
as a conjunctive normal form, which looks like this:
procedure argument according to the foregoing remark. Such a new interpret
er
would allow for automatic backtracking. Try to design a representation of (p1 VP2 V ...)&(q1 VqzV. .. )& (11 Vr2 V...)&...
the
database that would facilitate efficient pattern matching.
Here all p's, q's and r's are simple propositions or their negations. This form is also
called the clause form. Each conjunct is called a clause. So (p1 v p 2 v ....)is a clause.
We can transform any propositional formula into this form. For our example
23.6 A simple theorem prover as a pattern-directed program theorem, this transformation can proceed as follows. The theorem is
·
· · · · · ···· · · · · ···· · · · · · · · · · ·· ·· · ··· · · · · ·· · · · · · · · · · ·· · · · · · · · · · · · · · · · · · ·· · · · · · · · · ·· · · · ·· · · · · · · · · · · · ··· · · ··· · · · ··· · · · · · · · · · · · · · · ·· ·
· · ·· ·· · · · ·
(a=> b) & (b =>c)=>(a=> c)
Let us implement a simple theorem prover as a pattern-directed system. The prover
will be based on the resolution principle, a popular method for mechanical theorem The negated theorem is:
proving. We will limit our discussion to proving theorems in the simple propositio
nal b)&(b =>c)=>(a=>c))
logic just to illustrate the principle, although our resolution mechanism will be ~((a=>
easily
extendable to handle the first-order predicate calculus (logic formulas that contain
The following known equivalence rules will be useful when transforming this
variables). Basic Prolog itself is a special case of a resolution-based theorem prover.
formula into the normal conjunctive form:
The theorem-proving task can be defined as: given a formula, show that the
formula is a theorem; that is, the formula is always true regardless of the interpreta (1) X =>y is equivalent to ~XVy
tion of the symbols that occur in the formula. For example, the formula ~(xVy) is equivalent to ~x&~y
(2)
p V ~p (3) ~(x&y) is equivalent to ~XV~y
Using rule 1 at several places we get: Figure 23.15 shows how this resolution process can be formulated as a pattern
directed program. This program operates on clauses asserted into the database. The
(~a Vb) & (~b Vc) & ~(~a Vc)
resolution principle can be formulated as a pattern-driven activity:
By rule 2 we finally get the clause form we need:
if
(~avb) & (~b Vc) & a & ~c there are two clauses CJ and C2, such that Pis a (disjunctive) subexpression
of Cl, and ~Pis a subexpression of C2
This consists of four clauses: (~avb), (~b vc), a, ~c. Now the resolution process then
can start. remove P from Cl (giving CA), remove ~P from C2 (giving CB), and add into
The basic resolution step can occur any time that there are two clauses such that the database a new clause: CA v CB.
some proposition p occurs in one of them, and ~p occurs in the other. Let two such
clauses be: Written in our pattern-directed language this becomes:
[ clause( Cl), delete( P, Cl, CA),
p Vy and ~p Vz
clause( CZ), delete( ~P, CZ, CB)]--->
where p is a proposition, and Y and Z are propositional formulas. Then the [ assertz( clause( CA v CB) ) ).
resolution step on these two clauses produces a third clause: This rule needs a little elaboration to prevent repeated actions on the same clauses,
which would merely produce new copies of already existing clauses. The program in
YvZ
Figure 23.15 records into the database what has already been done, by asserting:
It can be shown that this clause logically follows from the two initial clauses. So by
done( Cl, CZ, P)
adding the expression (Yv Z) to our formula we do not alter the validity of the
formula. The resolution process thus generates new clauses. lf the 'empty clause' The condition parts of rules will then recognize and prevent such repeated actions.
(usually denoted by 'nil') occurs then this will signal that a contradiction has been The rules in Figure 23.15 also deal with some special cases that would otherwise
found. The empty clause nil is generated from two clauses of the forms: require the explicit representation of the empty clause. Also, there are two rules that
just simplify clauses when possible. One of these rules recognizes true clauses such as
x and ~x
a Vb V ~a
which is obviously a contradiction.
Figure 23.14 shows the resolution process that starts with our negated con and removes them from the database since they are useless for detecting a contra
jectured theorem and ends with the empty clause. diction. The other rule removes redundant subexpressions. For example, this rule
would simplify the clause
/
~av b ~c avbva
�/
~b V C a
into av b.
~a V C A remaining question is how to translate a given propositional formula into the
clause form. This is not difficult, and the program of Figure 23.16 does it. The
� procedure
translate( Formula)
�
nil translates a formula into a set of clauses Cl, C2, etc., and asserts these clauses into
the database as:
Figure 23.14 Proving the theorem (a=> b) & (b=> c)=>(a=> c) by the resolution method
clause( Cl).
The top line is the negated theorem in the clause form. The empty clause at the clause( CZ).
bottom signals that the negated theorem is a contradiction.
642 Meta-Programming
643
A simple theorem prover as a pattern-directed program
% Production rules for resolution theorem proving % Translating a propositional formula into (asserted) clauses
% Contradicting clauses
op( 100, fy, ~). % Negation
( clause(X), clause( ~X)]--->
op( 110, xfy, &). % Conjunction
l write('Contradiction found'), stop].
op( 120, xfy, v). % Disjunction
% Remove a true clause op( 130, xfy, = >). % Implication
[ clause( C), delete( P, C, Cl), in( P, Cl)]---> translate( F & G) % Translate conjunctive formula
[ replace( clause( C), clause( Cl))]. !, % Red cut
translate( F),
% Resolution step, a special case translate(G).
[ clause( P ), clause( C), delete( ~P, C, Cl), not done( P, C, P )]---> translate( Formula)
[ assertz( clause( Cl)), assertz( done( P, C, P ))]. transform( Formula, NewFormula), % Transformation step on Formula
!, /c, Red cut
0
% Resolution step, a special case
translate(NewFormula).
[ clause( ~P), clause( C), delete( P, C, CJ), not done( ~P, C, P )] ---> translate( Formula) : % No more transformation possible
[ assertz( clause( Cl)), assertz( done( ~P, C, P ))]. assert(clause( Formula)).
% Resolution step, general case
% Transformation rules for propositional formulas
[ clause( Cl), delete( P, Cl, CA),
clause( CZ), delete( ~P, CZ, CB), not done( Cl,CZ,P)]--->
[ assertz( clause( CA v CB)), assertz( done( Cl, CZ, P ))]. % transform( Formulal, FormulaZ):
% FormulaZ is equivalent to Formula 1, but closer to clause form
% Last rule: resolution process stuck
% delete( P, E, El) if deleting a disjunctive subexpression P from E gives El transform(X => Y, ~Xv Y). % Eliminate implication
delete( X,X v Y, Y). transform(~ (X & Y), ~Xv ~Y). % De Morgan's law
delete( X, Y vX, Y). transform(~ (Xv Y), ~X & ~Y). % De Morgan's law
delete( X, Y v Z, Y v Zl) transform(X & Yv Z, (Xv Z) & ( Yv Z)). % Distribution
delete(X, Z, Zl).
transform(Xv Y & Z, (Xv Y) & (Xv Z)). % Distribution
delete( X, Y v Z, Yl v Z)
delete( X, Y, Yl). transform(Xv Y, Xlv Y)
transform( X, Xl). % Transform subexpression
01c, in( P, E) if P is a disjunctive subexpression in E
transform(Xv Y, Xv YI)
in(X, X). transform(Y, Yl). % Transform subexpression
in(X, Y) :-
transform(~ X, ~ XI)
delete(X, Y, _ ).
transform(X, XI). % Transform subexpression
Figure 23.15 A pattern-directed program for simple resolution theorem proving. Figure 23.16 Translating a propositional calculus formula into a set of (asserted) clauses.
644 Meta-Programming References 645
Now the pattern-directed theorem prover can be triggered by the goal run. So, to References
prove a conjectured theorem using these programs, we translate the negated theorem
into the clause form and start the resolution process. For our example theorem, this is Writing Prolog meta-interpreters is part of the traditional Prolog programming culture. Le
done by the question: (1993), Shoham (1994) and Sterling and Sh apiro (1994) give some oth er interesting examples
?- translate(~( (a=> b) & ( b => c) => (a=> c))), run. of Prolog meta-interpreters.
The idea of explanation -based generalization was developed in the area of machine
The program will respond with 'Contradiction found', meaning that the original learning. The formulation used in this chapter is as in the paper by Mitch ell, Keller and
formula is a theorem. K edar-Cabelli (1986). Our EBG program is similar to that in Kedar-Cabelli and McCarty (1987).
This program is a delightful illustration of how e legantly a complicated symbolic method can
?.�.�-�.�-'.Y....................................................................................................................
be implemente d in Prolog. With this program, Kedar-Cabelli and McCarty transformed pages
of previous vague descriptions of EBG into a succinct, crystal clear and immediately executable
syst ems. A useful collection of pap ers on blackboard systems is Engelmore and Morgan (1988).
tion. It can be viewed as symbolic execution of a program,
guided by a specific Our illustrative application of pa ttern-directed programming is a basic example of mechanical
example. Explanation-based generalization was invented as theorem proving. Fundamen tals of mechanical theorem proving in predicate logic are cov ered
an approach to
machine learning. in many general books o n artificial intelligence, such as those by Genesereth and Nilsson
• An object-oriented program consists of objects that send
messages between
(1987), Ginsbe rg (1993), Poole et al. (1998), and Russell and Norvig (1995).
themselves. Objects respond to messages by executing their
methods. Methods Engelmore, R. and Morgan, T. (eds) (1988) Blackboard Systems. Reading, MA: Addison-Wesley.
can also be inherited from other objects.
• A pattern-directed program is a collection of pattern-directed
modules whose
Genesereth, M.R. and Nilsson , N.J. (1987) Logical Foundation o( Artificial Intellig ence . Palo Alto,
CA: Morgan Ka ufmann.
execution is triggered by patterns in the 'database'. Ginsberg, M. (1993) Essen tials o( Artificial Intelligence. San Francisco, CA: Morgan Kaufmann .
• Prolog programs themselves can be viewed as pattern-directed
systems. Kedar-Cabelli, S.T. and McCarty, L.T. (1987) Explanation-based generalization as resolution
• The parallel implementation of pattern-directed systems would
be most natural.
th eorem proving. In: Proc. 4th Int. Machine Leaming Workshop, Irvine, CA: Morgan
The sequential implementation requires conflict resolution among Kaufmann.
the modules Le, T.V. (1993) Techniques (or Prolog Programming. John Wiley.
in the conflict set.
• A simple interpreter for pattern-directed programs was implem
ented in this
Mitchell, T.M., Keller, R.M. and Kedar-Cabelli, S.T. (1986) Explanation-based generalization: a
unifying view. Machine Leaming 1: 47-80.
chapter and applied to resolution-based theorem proving in proposi Moss, C. (1994) Prolog++: The Power of Object-Oriented and Logic Programming. Harlow: Addison-
tional logic.
• Concepts discussed in this chapter are: Wesley.
Poole, D., Mackworth, A. and Gaebel, R. (1998) Computational Intelligence: A Logical Approach.
meta-programs, meta-interpreters
Oxford University Press.
explanation-based generalization
Russell, S. and Norvig, P. (1995) Artificial Intelligence: A Modern Approach. Englewood Cliffs, NJ:
object-oriented programming
Prentice Hall.
objects, methods, messages Shoham, Y. (1994) Artificial Intellig ence Techniques in Prolog. San Francisco, CA: Morgan
inheritance of methods Kaufmann.
pattern-directed systems, pattern-directed architecture Stabler, E.P. (1986) Object-oriente d programming in Prolog. Al Expert (October 1986): 46-57.
pattern-directed programming Sterling, L. and Shapiro, E. (1994) The Art o( Prolog, second edition. Cambridge, MA: MIT Press.
pattern-directed module Waterman, D.A. and Hayes-Roth, F. (eds) (1978) Pattern-Directed Inference Systems. London:
conflict set, conflict resolution Academic Pre ss.
resolution-based theorem proving, resolution principle
appendix A
The syntax for Prolog in this book follows the tradition of the Edinburgh Prolog,
which has been adopted by the majority of Prolog implementations and also the ISO
standard for Prolog. Typically implementations of Prolog offer many additional
features. Generally, the programs in the book use a subset of what is provided in a
typical implementation, and is included in the ISO standard. However, there are still
some differences between various Prologs that may require small changes when
executing the programs in the book with a particular Prolog. This appendix draws
attention to some of such more likely differences.
647
-x,''\h'?L'J;iisJk�v t .1& �2Q���/biAY0 <:ZLD ,L;.,;U}z:>;:!"''C
648 Appendix A
Undefined predicates
In some Prologs, a c all to a predicate not defined in the program at all, simply fails. appendix B
Other Prologs in such cases compl ain with an error message. In such Prologs,
undefined predicates can be made to f ail (without error messages) by a built-in
predicate like: unknown(_, fail).
Some Frequently Used Predicates
Negation as failure: not and '\+'
In this book we use not Goal for negation as failure. Many Prologs
(and the standard)
use the (somewhat less pretty) notation:
\+ Goal
to emphasize that this is not the proper logical negation, Some basic predicates such as member/2 and conc/3 are used in m any programs
but negation defined
through failure. For compatibility with these Prologs, 'not' should throughout the book. To avoid repetition, the definition of such predicates is usually
be replaced by
'\+',or (with less work),not introduced as a user-defined predicate not included in the program's listing. To run a program, these frequently used
(see Appendix B).
predic ates also h ave to be loaded into Prolog. This is done most easily by consulting
(or compiling) a file, such as one given in this appendix, that defines these
Predicate name( Atom, Codelist) predic ates. The listing below includes some predicates that may already be included
a mong the built-in predicates, depending on the implementation of Prolog. For
This predicate is provided by most implementations, but not included in the
standard (instead: atom_codes/2). There are small differences between Prologs in example, negation as failure written as not Goal is also included below for compat
the behaviour of name in special cases, e.g. when the first argument is a number. ibility with Prologs that use the notation \ + Goal instead. When lo ading into Prolog
the definition of a predicate that is already built-in, Prolog will typically just issue a
warning message and ignore the new definition.
Loading programs with consult and reconsult
% File frequent.pl: Library of frequently used predicates
Loading programs with consult and reconsult varies between implementations. % Negation as failure
Differences occur when programs are loaded from multiple files and the s ame % This is normally available as a built-in predicate,
predicate is defined in more than one file (the new clauses about the same predicate % oft en written with the prefix operator'\+', e.g.\+ likes(mary, snakes)
may simply be added to the old clauses; alternatively, just the clauses in the most % The definition below is only given for compatibility among Prolog implementations
recent file are loaded, abandoning the previous clauses about the same predic ate). :- op( 900, fy, not).
not Goal :-
Modules Goal, !, fail
In some Prologs, the program can be divided into modules so that predic ate names true.
are local to a module unless they are specifically made visible from other modules.
% once( Goal): produce one solution of Goal only (the first solution only)
This is useful when writing large programs, when predicates with the s ame name
'Yc, This may already be provided as a built-in predicate
and arity may mean different things in different modules.
once( Goal)
Goal,!.
% member( X, List): X is a m ember of List
member( X, [ X I_]). % X is h ead of list
member( X, [ _ I Rest])
member( X, Rest). % X is in body of list
649
Appendix B 651
650 Appendix B
Y)
% max( X, Y, Max): Max= max(X,
% cone( LI, L2 , L3): list L3 is the concatenation of lists LI and L2
max( X, Y, Max) :-
cone( [), L, L). X > = Y, !, Max= X
cone([X I Ll], L2, [X I L3])
cone( Ll, L2, L3). Max=Y.
)
% min( X, Y, Min): Min= min(X,Y
% de!( X, LO, L): List Lis equal to list LO with X deleted
% Note: Only one occurrence of X is deleted min( X, Y, Min) :-
% Fail if X is not in LO X = < Y, !, Min = X
length( L, N) :-
length( L, 0, N).
length([), N, N).
(e) atom
2.3 (a) A= 1, B = 2
Chapter 1 (b) no
(c) no
1.1 (a) no
(d) D = 2, E = 2
(b) X = pat
(e) Pl = point(-1,0)
(c) X = bob P2 = point(l,0)
(d) X = bob, Y = pat P3 = point(O,Y)
1.2 (a) ? parent( X, pat). This can represent the family of triangles with two vertices on the x-axis at 1 and - J
respectively, and the third vertex an}'\,vhere on the y-axis.
(b) ?- parent( liz, X).
(c) ?- parent( Y, pat), parent( X, Y). 2.4 seg( point(5,Yl), point(5,Y2) )
(b) hastwochildren( X) % This assumes that the first point is the left bottom vertex.
parent( X, Y), 2.6 (a) A= two
sister( Z, Y).
(b) no
1.4 grandchild( X, Z) (c) C = one
parent( Y, X),
parent( Z, Y).
(d) D = s(s(l));
D = s(s(s(s(s(l)))))
1.5 aunt( X, Y) :-
2.7 relatives( X, Y) :
parent( Z, Y),
sister( X, Z). predecessor( X, Y)
I
2.1 (a) variable
(b) atom 2.9 In the case of Figure 2.10 Prolog does slightly more work.
(c) atom 2.10 According to the definition of matching of Section 2.2, this succeeds. X becomes a sort
(d) variable of circular structure in which X itself occurs as one of the arguments.
652
654 Solutions to Selected Exercises Solutions to Selected Exercises 655
Chapter 7 Chapter 8
7.2 add_to_tail( Item, List) 8.2 add_at_end( L1 -[Item IZ2], Item, Ll - Z2).
var( List), !, % List represents empty list
List =[Item ITail]. 8.3 reverse( A - Z, L - L) : % Result is empty list if
A ==Z, !. % A - Z represents empty list
add_to_tail( Item,[_ ITail] )
add_to_tail( Item, Tail). reverse([X IL] - Z, RL - RZ) % Non-empty list
reverse( L - Z, RL -[X IRZ] ).
member( X, List)
var( List), !, 0/o List represents empty list 8.6 % Eight queens program
fail. % so X cannot be a member
sol( Ylist) :-
member( X,[X ITail] ). functor( Du, u, 15), % Set of upward diagonals
member( X,[_ ITail] ) functor( Dv, v, 15), % Set of downward diagonals
member( X, Tail). sol( Ylist,
[l,2,3,4,5,6, 7,8], 0/o Set of X-coordinates
7.5 % subsumes( Term!, Term2): [l,2,3,4,5,6,7,8], % Set of Y-coordinates
% Term! subsumes Term2, e.g. subsumes( t(X,a,f(Y)), t(A,a,f(g(B)))) Du, Dv).
% Assume Term 1 and Term2 do not contain the same variable
sol([ ],[ ],[], _ , _ ).
% In the following procedure, subsuming variables get instantiated
% to terms of the form literally( SubsumedTerm) sol([Y IYs],[X IXL], YLO, Du, Dv)
de!( Y, YLO, YL), % Choose a Y coordinate
subsumes( Atoml, Atom2)
U is X+Y-1,
atomic( Atoml), !, % Upward diagonal free
arg( U, Du, X),
Atoml = = Atom2.
Vis X-Y+8,
subsumes( Var, Term) arg( V, Dv, X), '¼, Downward diagonal free
var( Var), !, % Variable subsumes anything sol( Ys, XL, YL, Du, Dv).
Var= literally( Term). % To handle other occurrences of Var de!( X,[X IL], L).
subsumes( literally( Term!), Term2) !, 0/o Another occurrence of Term2 de!( X, [Y ILO], [Y IL])
Terml== Term2.
de!( X, LO, L).
subsumes( Term 1, Term2) A, Term 1 not a variable
0
nonvar( Term2),
Terml = . . [Fun I Argsl], Chapter 9
Term2 = . . [Fun IArgs2],
subsumes_list( Argsl, Args2). 9.4 %mergesort( List, SortedList): use the merge-sort algorithm
subsumes_list([ ], [ ]). mergesort([ ], [ ] ).
subsumes_list( [Firstl I Restl ], [First2 IRest2]) mergesort([X],[X] ).
subsumes( Firstl, First2), mergesort( List, SortedList)
subsumes_list( Restl, Rest2). divide( List, Listl, List2), % Divide into approx. equal lists
mergesort( Listl, Sorted!),
7.6 (a) ?- retract( product( X, Y, Z)), fail.
mergesort( List2, Sorted2),
(b) ?- retract( product( X, Y, 0)), fail. merge( Sorted1, Sorted2, SortedList). % Merge sorted lists
height( t( Left,Root,Right),H)
height( Left,LH),
height( Right,RH),
Chapter 10
max( LH,RH,MH), 10.l in( Item,I( Item) ). % Item found in leaf
H is l + MB. in( Item,n2( Tl,M,T2) ) % Node has two subtrees
max( A,B,A) :- gt( M,Item),!, % Item not in second subtree
A>= B, !. in( Item,Tl) % Search first subtree
max( A, B,B).
in( Item,T2). % Otherwise search the second
9.7 linearize( nil, [] ). in( Item,n3( Tl,M2,T2,M3,T3) ) % Node has three subtrees
linearize( t( Left,Root,Right),List) gt( M2,Item),!, % Item not in second or third
linearize( Left,Listl), in( Item,Tl) % Search first subtree
linearize( Right, List2),
cone( Listl,[Root I List2],List). gt( M3,Item),!, % Item not in third subtree
in( Item,TZ) % Search second subtree
9.8 maxelement( t( _,Root,nil),Root) !. '¼, Root is right-most element
maxelement( t( _,_,Right),Max) % Right suhtree non-empty in( Item,T3). % Search third subtree
maxelement( Right, Max).
10.3 av!( Tree) :-
9.9 in( Item,t( _,Item,_),[Item]). avl( Tree, Height). % Tree is AVL-tree with height Height
11.3 % Iterative deepening search that stops increasing depth 11.11 findl: depth-first search; find2: iterative deepening (conc(Path,_,_) generates list
% when there is no path to current depth templates of increasing length forcing findl into iterative deepening regime); find3:
backward search.
iterative_deepening( Start, Solution)
id_path( Start, Node,[ ], Solution), 11.12 Bidirectional search with iterative deepening from both ends.
goal( Node).
% path( First, Last, Path): Path is a list of nodes between First and Last Chapter 12
path( First, First,[First]). 12.2 Not correct. h ,:; h* is sufficient for admissibility, but not necessary.
path( First, Last, [First, Second I Rest])
s( First, Second), 12.3 h (n) = max{ h 1 (n), h 2(n), h 3(n))
path( Second, Last,[Second I Rest]). 12. 7 % Specification of eight puzzle for IDA'
% Iterative deepening path generator s( Depth:State, NewDepth:NewState)
% id_path( First, Last, Template, Path): Path is a path between First and s( State, NewState, _),
% Last not longer than template list Template. Alternative paths are NewDepth is Depth + 1.
% generated in the order of increasing length
f( Depth:[Empty I Tiles], F)
id_path( First, Last, Template, Path) goal( [Empty0 I TilesO]),
Path = Template, totdist( Tiles, Tiles0, Dist),
path( First, Last, Path) F is Depth + Dist. A, Use total dist. as heuristic function
0
s( Nadel, Node2) :-
origs( Node2, Nadel). Chapter 14
11.10 % States for bidirectional search are pairs of nodes StartNode-EndNode 14.1 We get the same result regardless of the order.
% of the original state space, denoting start and goal of search 14.6 A diode in series with RS, in direction from left to right, should not affect the voltage at
s( Start - End, NewStart - NewEnd) TS1. A diode in the opposite direction affects this voltage.
origs( Start, NewStart), % A step forward
origs( NewEnd, End). % A step backward
Chapter 15
% goal( Start - End) for bidirectional search = p(B) = 0.5
15.2 No error when p(AIB) = 1 or p(BIA) = l. Greatest error (0.5) when p(A)
goal( Start - Start). % Start equal end
and p(AIB) = 0.
goal( Start - End) :-
origs( Start, End). % Single step to end 15.4 A = 6, B = 5, C = 25.
Solutions to Selected Exercises 667
666 Solutions to Selected Exercises
'1
672 Index Index 673
binary relation 8,10 retractall/1 646 concatenation of lists 65 declarative meaning 23, 24,38, 39, 57, 59
binary tree 202 retract/1 · 162 concept description language 445, 447 declarative semantics of Prolog 59
deletion 208 setof/3 168 concept description 445 definite clause grammar 555, 557
insertion at root 210 see/1 133 concept, formalization of 443 deleting item from list 69
insertion 208 seen/0 134 conc/3 65,649 del/3 69
blackboard systems 645 tab/1 135 conflict set in pattern-directed depth-first AND/OR search 301
blocks world 414 told/0 134 programming 632 depth-first search 244, 256
body of clause 9, 13 trace/0 180 conjunction of goals 7, 39 diagnostic reasoning 365
branch-and-bound 335 ttyflush/0 138 consistency algorithm 321 difference lists 185
breadth-first search 250 unknown/2 647 constant in Prolog 7 directive op/3 76
bubble sort 198 var/1 148 constraint logic programming 319 disjunction of goals 39
built-in comparison predicates @<,@=<, @>, write/1 135 constraint logic programming, dynamic predicate 163,646
@>= 161 155 CLP(R),CLP(Z), CLP(Q),CLP(FD),CLP(B) dynamic/I, directive 163,646
built-in equality predicates =,is,=:=,=\=, \+ 647 324
=·=,\== 160 over finite domains 341 EBG 618
built-in predicate 85,147 caching of answers 164 over rational numbers 329 eight puzzle 240,270
abolish/I 646 caching results 192 over real numbers 324 eight queens problem 103
arg/3 158 call/1, built-in predicate 158 simulation with 336 eight queens problem,with CLP 343
asserta/1 164 categorical knowledge 350 constraint network 321 end_of_file 134
assertz/1 164 certainty factor 360 constraint programming 319 entropy formula 465
assert/I 162 chaining of rules 352 constraint satisfaction 319 execution trace 22
atomic/I 148 backward 353 consulting programs 144 expert system shell 349
atom/I 148 data-driven 357 consult/1, built-in predicate 144, 647 expert system 347
bagof/3 167 forward 355 copy_term/2, built-in predicate 623,650 AL/X 351
call/1 158 goal-driven 357 correctness, declarative 52 explanation 358
clause/2 614 chess programming 590 correctness,procedural 52 how question 350, 391
compile/1 145 king and rook vs king 599 covering algorithm 457 MYCIN 351
compound/I 148 pattern knowledge 592 cross validation 477 structure of 349
consult/I 144,647 quiescence heuristics 592 cryptarithmetic puzzle 150 uncertainty 360
copy-term/2 623 class probability tree 464 with CLP 343 why question 350, 391
dynamic/I 163 clauses in Prolog, order of 50,52,53 current input/output stream 133, 145 explanation-based generalization 618
fail 128 clause form 57 cut 114,116 program compilation as 623
findall/3 168 clause in Prolog 4,7,13 green 128
float/I 148 body of 9,13 improving efficiency with 116 fact in Prolog 13
functor/3 158 head of 9,13 meaning of 118 fail,built-in predicate 128
get0/1 140 instance of 39 mutually exclusive rules 127 files in Prolog 132
get/I 140 variant of 39 problems with 127,130 file user 133
integer/1 148 clause/2 built-in predicate#614 red 128 final state of automaton 95
is/2 85 closed world assumption 129 use of 119,177 findall/3, built-in predicate 168
name/2 141 CLP 319 cut-fail combination 128 finite automaton 94
nl/0 135 CLP(FD) 341 finite automaton,non-deterministic 94
nonvar/1 148 CLP(R),CLP(Z),CLP(Q),CLP(FD),CLP(B) database manipulation in Prolog 161 first-order predicate logic 57
nospy/1 180 324 data abstraction 92,113 float/1,built-in predicate 148
notrace/0 180 combinatorial complexity 256 data driven reasoning 357 forward chaining 352,355
not/1 125 comments in Prolog 18 data mining 443 frame representation 374
number/1 148 common sense reasoning 521 data object in Prolog 26 frequently used predicates 648
once/1 166 comparison operator 82 DCG grammar 555,557 functional reasoning 524
put/I 140 compiled code 145 debugging in Prolog 179 functor 29, 58
read/1 135 compile/I, built-in predicate 145 decision tree 446 arity of 30, 58
reconsult/I 647 composed query 6 induction of 462 principal 30
repeat/0 166 compound/1, built-in predicate 148 declarative correctness 52 functor/3,built-in predicate 158
----
Index 675
674 Index
notrace/0, built-in predicate 180 precedence of argument 76 qualitative differential equation 525, 529 rule, recursive 14, 16
not/1 125 precedence of operator 75, 76, 78 qualitative modelling 520 rule-based systems 358
numbers in Prolog 27 predefined operators 77, 78 qualitative reasoning 520
number/1, built-in predicate 148 predefined operator; 39 qualitative simulation program 536 scheduling 274
predicate logic 57 qualitative state of system 541 scheduling, with CLP 329
object in Prolog, structured 29 first-order 57 qualitative state transition 543 scheduling, with constraints 320
object, type of 26 predictive reasoning 365 querying a database in Prolog 88 search in graphs 255
object-oriented programming in Prolog prefix operator 76 query in Prolog 4, 6 search
624 preventing backtracking 114 question in Prolog 4, 7, 13, 24 alpha-beta 588
occurs check 58 principal functor 20 quicksort in Prolog 199 AND/OR depth-first 301
once/1 166, 648 probabilistic knowledge 350 quicksort 199 AO* algorithm 305
operator definition 76 probabilistic reasoning 381 A* algorithm 261
operator notation 74, 78, 87 probability estimation 4 72 RBFS algorithm 282 backward 253
operator precedence 75, 76, 78 procedural correctness 52 properties of 289 best-first AND/OR 305
operator types procedural meaning 23, 38, 41, 57, 59 RBFS program 287 best-first 260
fx, fy, xfx, xfy, yfx 76 procedural semantics of Prolog 59 reading programs 144 bidirectional 254, 256
operator 74, 76 procedure in Prolog 18, 24 read/], built-in predicate 135 breadth-first 250
comparison 82 production rule 555 reasoning complexity of basic techniques 255, 25,
infix 74, 76 programming style 176 common sense 521 depth-first 244, 256
postfix 76 program comments 18, 178 diagnostic 365 greedy 461
predefined 77, 78 program layout 177 functional 524 heuristic 260
prefix 76 progressive deepening in game programs in belief networks 367 hill-climbing 461
standard 77 591 predictive 365 IDA* algorithm 280
type of 79 Prolog 3 qualitative 521 iterative deepening 248, 256
op/3, directive 76 atom 26 with uncertainty 380 RBFS algorithm 282
order of clauses 50, 52 clause 4, 13 reconsult/1, built-in predicate 647 seen/0 built-in predicate 134
order of goals 50, 52 compiler 145 recursion 16 see/I built-in predicate 133
output in Prolog 132 data object 26 recursion, tail recursion 186 selector relations 93
output stream 133 declarative meaning of 59 recursion, use of 174 semantics of Prolog 59
declarative semantics of 59 recursive definition 16 semantic network 372
parse tree 563 fact 13 recursive programming 16 semantic structure 565
partial-order planning 436 interpreter 145 recursive rule 14, 16 setof/3, built-in predicate 168
path finding 216 meaning of 24 red cut 128, 177 set_difference/3 649
pattern-directed module 531 meta-interpreter 614 refinement graph over clauses 518 simple objects in Prolog 58
pattern-directed programming 631, 634 procedural meaning of 59 refinement graph over hypotheses 518 simulation with CLP(R) 336
pattern-directed program, interpreter for procedural semantics of 59 refinement graph 489 SLD resolution S 7
635 procedure 18, 24 refinement operator 490 SLD theorem proving strategy 57
pattern-directed system 631 rule 9 regressing goals through action 427 sorting lists 197
permutation of list 71 simple objects 58 relational descriptions 447 spanning tree 219
permutation/2 71 term 29 relationship 4 spurious behaviour in qualitative
planning 413 variable 7, 13, 28 relation 7 simulation 548, 550
best-first heuristic 430 proof sequence 20 relation, arguments of 10 spy/1, built-in predicate 180
completeness of 426 proof tree 617 relation, binary 8, 10 standard functions in Prolog 82
goal protection 422 pruning set 475 relation, unary 8, 10 standard operator 77
goal regression 427 pure Prolog 58 repeat/0, built-in predicate 166 state of automaton 94
means-ends analysis 418 put/1, built-in predicate 140 resolution principle 57 state space representation in Prolog 242
means-ends 419 resolution step 639 state space 240
non-linear 437 QDE 525, 529 resolution-based theorem proving 57, 638 state transition 94
partial-order 436 QSIM algorithm 552, 553 retractal!/1, built-in predicate 646 static predicate 646
uninstantiated actions and goals 434 qualitative abstraction 522 retract/I, built-in predicate 162 structured object 29
postfix operator 76 qualitative behaviour 541 rule in Prolog 8, 9, 13 structures in Prolog 58
678 Index