Rule-Based Expert Systems
Rule-Based Expert Systems
Edited by
Bruce G. Buchanan
Department of Computer Science
Stanford University
EdwardH. Shortliffe
Department of Medicine
Stanford University School of Medicine
Bibliography: p.
Includes index.
1. Expert systems (Computer science) 2. MYCIN
(Computer system) I. Buchanan, Bruce G. II. Short-
liffe, Edward Hance.
QA76.9.E96R84 1984 001.535 83-15822
ISBN 0-201-10172-6
BCDEFGHIJ-MA-8987654
For Sally and Linda
Contents
Contributors ix
vi
Contents vii
Epilog 703
Appendix 7O5
References 717
Name Index 739
Subject Index 742
Contributors
The last seven years have seen the field of artificial intelligence (AI) trans-
formed. This transformation is not simple, nor has it yet run its course.
The transformation has been generated by the emergence of expert systems.
Whatever exactly these are or turn out to be, they first arose during the
1970s, with a triple claim: to be AI systems that used large bodies of heu-
ristic knowledge, to be AI systems that could be applied, and to be the
wave of the future. The exact status of these claims (or even whether my
statement of them is anywhere close to the mark) is not important. The
thrust of these systems was strong enough and the surface evidence im-
pressive enough to initiate the transformation. This transformation has at
least two components. One comes from the resulting societal interest in
AI, expressed in the widespread entrepreneurial efforts to capitalize on
AI research and in the Japanese Fifth-Generation plans with their subse-
quent worldwide ripples. The other component comes from the need to
redraw the intellectual map of AI to assimilate this new class of systems--
to declare it a coherent subarea, or to fragment it into intellectual subparts
that fit the existing map, or whatever.
A side note is important. Even if the evidence from politics is not
persuasive, science has surely taught us that more than one revolution can
go on simultaneously. Taken as a whole, science is currently running at
least a score of revolutions--not a small number. AI is being transformed
by more than expert systems. In particular, robotics, under the press of
industrial productivity, is producing a revolution in AI in its own right.
Although progressing somewhat more slowly than expert systems at the
moment, robotics in the end will produce an effect at least as large, not
just on the applied side, but on the intellectual structure of the field as
well. Even more, both AI and robotics are to some degree parts of an
overarching revolution in microelectronics. In any event, to focus on one
revolution, namely expert systems, as I will do here for good reason, is not
to deny the importance of the others.
The book at whose threshold this foreword stands has (also) a triple
claim on the attention of someone interested in expert systems and AI.
First, it provides a detailed look at a particular expert system, MYCIN.
Second, it is of historical interest, for this is not just any old expert system,
but the granddaddy of them all--the one that launched the field. Third,
it is an attempt to advance the science of AI, not just to report on a system
or project. Each of these deserves a moments comment, for those readers
who will tarry at a foreword before getting on with the real story.
MYCIN as Example It is sometimes noted that the term expert system
is a pun. It designates a system that is expert in some existing humanart,
xi
xii Foreword
and thus that operates at human scale--not on some trifling, though per-
haps illustrative task, not on some toy task, to use the somewhatpejorative
term popular in the field. But it also designates a system that plays the role
of a consultant, i.e., an expert who gives advice to someone whohas a task.
Such a dual picture cannot last long. The population of so-called expert
systems is rapidly becoming mongrelized to include any system that is ap-
plied, has some vague connection with AI systems and has pretentions of
success. Such is the fate of terms that attain (if only briefly) a positive halo,
when advantage lies in shoehorning a system under its protective and pro-
ductive cover.
MYCINprovides a pure case of the original pun. It is expert in an
existing art of humanscale (diagnosing bacterial infections and prescribing
treatment for them) and it operates as a consultant (a physician describes
a patient to MYCIN and the latter then returns advice to the physician).
The considerations that came to the fore because of the consultant mode--
in particular, explanation to the user--play a strong role throughout all of
the work. Indeed, MYCINmakes explicit most of the issues with which
any group who would engineer an expert system must deal. It also lays
out some of the solutions, making clear their adequacies and inadequacies.
Because the MYCIN story is essentially complete by now and the book tells
it all, the record of initial work and response gives a perspective on the
development of a system over time. This adds substantially to the time-
sliced picture that constitutes the typical system description. It is a good
case to study, even though, if we learn our lessons from it and the other
early expert systems, we will not have to recapitulate exactly this history
again.
One striking feature of the MYCIN story, as told in this book, is its
eclecticism. Those outside a systems project tend to build brief, trenchant
descriptions of a system. MYCIN is an example of approach X leading to
a system of type Y. Designers themselves often characterize their own sys-
tems in such abbreviated terms, seeking to make particular properties
stand out. And, of course, critics do also, although the properties they
choose to highlight are not usually the same ones. Indeed, I myself use
such simplified views in this very foreword. But if this book makes anything
clear, it is that the MYCIN gang (as they called themselves) continually
explored, often with experimental variants, the full range of ideas in the
AI armamentarium. We would undoubtedly see that this is true of many
projects if we were to follow their histories carefully. However,it seems to
have been particularly true of the effort described here.
MYCINas History MYCINcomes out of the Stanford Heuristic Pro-
gramming Project (HPP), the laboratory that without doubt has had the
most impact in setting the expert-system transformation in motion and
determining its initial character. I said that MYCIN is the granddaddy of
expert systems. I do not think it is so viewed in HPP. They prefer to talk
about DENDRAL,the system for identifying chemical structures from
mass spectrograms (Lindsay, Buchanan, Feigenbaum, and Lederberg,
Foreword xiii
REFERENCES
lWe use the nameEMYCIN tbr the system that evolved from MYCIN as a frameworkfor
building and runningnewexpert systems. Thenamestands for "essential MYCIN,"that is,
MYCINs frameworkwithout its medical knowledgebase. Wehave been remindedthat E-
MYCIN is the nameo[ a drug that UpjohnCorp. has trademarked.The two namesshould
not be confused: EMYCINshould not be ingested, nor should E-MYCIN be loaded into a
computer.
xvii
xviii Preface
Acknowledgments
L. Wittgenstein, Philosophical
Investigations, para. 255 (trans.
G. E. M. Anscombe). New York:
Macmillan, 1953.
Every one then who hears these words of mine and does them will be like a
wise man who built his house upon the rock; and the rain fell, and the floods
came, and the winds blew and beat upon that house, but it did not fall, because
it had been founded on the rock. And every one who hears these words of mine
and does not do them will be like a foolish man who built his house upon the
sand; and the rain fell, and the floods came, and the winds blew and beat against
that house, and it fell; and great was the fall of it.
Matthew 7:24-27
(Revised Standard Version)
PART ONE
Background
1
The Context of the MYCIN
Experiments
EXPERTSYSTEM
Description User
=~
I
of newcase inter- qp~ Inference
Engine
face
USER
Advice& t
Explanation qp_~ Knowledge
Base [
If A and B, then C
A& B--*C
If A, then B (Rule 1)
If B, then C (Rule 2)
A (Data)
..C (Conclusion)
The Contextof the MYCIN
Experiments 5
Since there are many rule chains and many pieces of data about which the
system needs to inquire, we sometimes say that MYCINis an evidence-
gathering program.
The whole expert system is used to perform a task, in MYCINscase
to provide diagnostic and therapeutic advice about a patient with an in-
fection as described in Section 1.2. Wesometimes refer to the whole system,
shown in Figure 1-1, as the performance system to contrast it with other
subsystems not so directly related to giving advice. MYCIN contains an
explanation subsystem, for example, which explains the reasoning of the
performance system (see Part Six).
Several of the chapters in this book deal with the problems of con-
structing a performance system in the first place. Wehave experimented
with different kinds of software tools that aid in the construction of a new
system, mostly by helping with the formulation and understanding of a
new knowledge base. Werefer to the process of mapping an experts knowl-
edge into a programs knowledge base as knowledge engineering. 1 The in-
tended users of these kinds of tools are either (a) the so-called knowledge
engineers who help an expert formulate and represent domain-specific
knowledge for the performance system or (b) the experts themselves. AI-
though either group might also run the performance system to test it,
neither overlaps with the intended routine users of the performance sys-
tem. Our model is that engineers help experts build a system that others
later use to get advice. Elaborating on the previous diagrams, we show this
model in Figure 1-2.
LISP has been the programming language of choice for AI programs for
nearly two decades (McCarthy et al., 1962). It is a symbol manipulation
language of extreme flexibility based on a small number of simple con-
structs. 2 We are often asked why we chose LISP for work on MYCIN,so
a brief answer is included here. Above all, we needed a language and
programming environment that would allow rapid modification and test-
ing and in which it was easy and natural to separate medical rules in the
knowledge base from the inference procedures that use the rules. LISP is
an interpretive language and thus does not require that programs be re-
compiled after they have been modified in order to test them. Moreover,
LISP removes the distinction between programs and data and thus allows
us to use rules as parts of the program and to examine and edit them as data
structures. The editing and debugging facilities of Interlisp also aided our
research greatly.
Successful AI programs have been written in many languages. Until
recently LISP was considered to be too slow and too large for important
applications. Thus there were reasons to consider other languages. But for
a research effort, such as this one, we were much more concerned with
saving days during program development than with saving seconds at run
time. Weneeded the flexibility that LISP offered. WhenInterlisp became
available, we began using it because it promised still more convenience
than other versions. Nowthat additional tools, such as EMYCIN,have been
built on top of Interlisp, more savings can be realized by building new
systems using those tools (when appropriate) than by building from the
base-level LISP system. At the time we began work on MYCIN,however,
we had no choice.
As best as we can tell, production rules were brought into artificial intel-
ligence (AI) by Allen Newell, who had seen their power and simplicity
demonstrated in Robert Floyds work on formal languages and compilers
0
e-
X v
LU:
I-
UA
O.
J~
o
8 The Context of the MYCINExperiments
3Even more specifically, the data abom the unknown compound were data from a mass
spectrometer, an instrument that bombards a small sample uf a compound with high-energy
electrons and produces data on the resulting fragments.
Historical Perspective on MYCIN 9
Shortliffe in his efforts to obtain formal training in the field. Thus the
scene was set for a collaborative effort involving Cohen, Buchanan, and
Shortliffe--an effort that ultimately grew into Shortliffes dissertation.
After six months of collaborative effort on MEDIPHOR, our discus-
sions began to focus on a computer program that would monitor physi-
cians prescriptions for antibiotics and generate warnings on inappropriate
prescriptions in the same way that MEDIPHOR produced warnings re-
garding potential drug-drug interactions. Such a program would have
needed to access data bases on three Stanford computers: the pharmacy,
clinical laboratory, and bacteriology systems. It would also have required
considerable knowledge about the general and specific conditions that
make one antibiotic, or combination of antibiotics, a better choice than
another. Cohen interested Thomas Merigan, Chief of the Infectious Dis-
ease Division at Stanford, in lending both his expertise and that of Stanton
Axline, a physician in his division. In discussing this new kind of monitor-
ing system, however, we quickly realized that it would require much more
medical knowledge than had been the case for MEDIPHOR.Before a
system could monitor for inappropriate therapeutic decisions, it would
need to be an "expert" in the field of antimicrobial selection. Thus, with
minor modifications for direct data entry from a terminal rather than from
patient data bases, a monitoring system could be modified to provide con-
sultations to physicians. Another appeal of focusing on an interactive sys-
tem was that it provided us with a short-term means to avoid the difficulty
of linking three computers together to provide data to a monitoring sys-
tem. Thus our concept of a computer-based consultant was born, and we
began to model MYCINafter infectious disease consultants. This model
also conformed with Cohens strong belief that a computer-based aid for
medical decision making should suggest therapy as well as diagnosis.
Shortliffe synthesized medical knowledge from Cohen and Axline and
AI ideas from Buchanan and Cordell Green. Green suggested using In-
terlisp (then known as BBN-LISP), which was running at SRI International
(then Stanford Research Institute) but was not yet available at the univer-
sity. Conversations with him also led to the idea of using Carbonells pro-
gram, SCHOLAR(Carbonell, 1970a), as a model for MYCIN. SCHOLAR
represented facts about the geography of South America in a large se-
mantic network and answered questions by making inferences over the
net. However, this model was not well enough developed for us to see how
a long dialogue with a physician could be focused on one lin e of reasoning
at a time. Wealso found it difficult to construct semantic networks for the
ill-structured knowledgeof infectious disease. Weturned instead to a rule-
based approach that Cohen and Axline found easier to understand, par-
ticularly because chained rules led to lines of reasoning that they could
understand and critique.
One important reason for the success of our early efforts was Short-
liffes ability to provide quickly a working prototype program that would
show Cohen and Axline the consequences of the rules they had stated at
10 The Contextof the MYCIN
Experiments
For the past year and a half the Divisions of Clinical Pharmacologyand
Infectious Disease plus membersof the Department of ComputerScience
have collaborated on initial developmentof a computer-basedsystem(termed
MYCIN) that will be capable of using both clinical data and judgmental de-
cisions regarding infectious disease therapy. The proposed research involves
developmentand acceptable implementation of the following:
A. CONSULTATION PROGRAM. The central component of the MY-
CINsystem is an interactive computer program to provide physicians with
consultative advice regarding an appropriate choice of antimicrobial therapy
as determined from data available from the microbiology and clinical chem-
istry laboratories and from direct clinical observations entered by the physi-
cian in response to computer-generatedquestions;
B. INTERACTIVEEXPLANATION CAPABILITIES. Another impor-
tant componentof the system permits the consultation programto explain
its knowledgeof infectious disease therapy and to.justify specific therapeutic
recommendations;
C. COMPUTER ACQUISITION OF JUDGMENTALKNOWLEDGE.
The third aspect of this workseeks to permit experts in the field of infectious
disease therapy to teach the MYCIN system the therapeutic decision rules
that they find useful in their clinical practice.
1960S
J
1970S
CONGEN
Meta-DENDRAL
SU/X
I I
(QA)(Inference)
I
(Evaluation)
I TEIRESIASI~ EMYCIN
I
BAOBAB
I GUIDON
I "FFI I I
I SACONI :l I CENTAUR
I
~, GRAVIDA
1980S
1
NEOMYCIN ONCOCIN
WHEEZE CLOT
DART
J
FIGURE1-3 HPP programs relating to MYCIN.(Program
namesin boxes were Ph.D. dissertation research programs.)
tLater renamed IqASI)/SIAP (Nii and Feigenbaum, 1978; Nii et al., 1982).
12 The Contextof the MYCIN
Experiments
Ancient History
"If a horse enters a mans house and bites either an ass or a man,
the owner of the house will die and his household will be scattered."
"If a manunwittingly treads on a lizard and kills it,
he will prevail over his adversary."
Thus we see that large collections of simple rules were used for medical
diagnosis long before MYCINand that some thought had been given to
6the organization of the knowledge base.
what drugs are apt to be beneficial for the patient. Initially, MYCIN did
not consider infections caused by viruses or pathogenic fungi, but since
these other kinds of organisms are particularly significant as causes of
meningitis, they were later added when we began to work with that do-
main.
Selection of therapy is a four-part decision process. First, the physician
must decide whether or not the patient has a significant infection requiring
treatment. If there is significant disease, the organism must be identified
or the range of possible identities must be inferred. The third step is to
select a set of drugs that may be appropriate. Finally, the most appropriate
drug or combination of drugs must be selected from the list of possibilities.
Each step in this decision process is described below.
There the technicians first attempt to grow organisms from the sample on
an appropriate nutritional medium. Early evidence of growth may allow
them to report the morphological and staining characteristics of the or-
ganism. However, complete testing of the organism to determine a definite
identity usually requires 24-48 hours or more.
The problem with this identification process is that the patient may be
so ill at tile time whenthe culture is first obtained that the physician cannot
wait two days before beginning antimicrobial therapy. Early data regarding
the organisms staining characteristics, morphology, growth conformation,
and ability to grow with or without oxygen may therefore become crucially
important for narrowing down the range of possible identities. Further-
more, historical infbrmation about the patient and details regarding his or
her clinical status may provide additional useful clues as to the organisms
identity.
Even once the identity of an organism is known with certainty, its range
of antimicrobial sensitivities may be unknown. For example, although a
Pseudomonas is usually sensitive to gentamicin, an increasing number of
gentamicin-resistant Pseudomonaeare being isolated. For this reason the
microbiology technicians will often run in vitro sensitivity tests on an or-
ganism they are growing, exposing the bacterium to several commonly
used antimicrobial agents. This sensitivity information is reported to the
physician so that he or she will know those drugs that are likely to be
effective in vivo (i.e., in the patient).
Sensitivity data do not become available until one or two days after
the culture is obtained, however. The physician must therefore often select
a drug on the basis of the list of possible identities plus the antimicrobial
agents that are statistically likely to be effective against each of the ident-
ities. These statistical data are available from manyhospital laboratories
(e.g., 82%of E. coli isolated at Stanford Hospital are sensitive in vitro to
gentamicin), although, in practice, physicians seldom use the probabilistic
information except in a rather intuitive sense (e.g., "Most of the E. coli
infections I have treated recently have responded to gentamicin.").
Once a list of drugs that may be useful has been considered, the best
regimen is selected on the basis of a variety of factors. These include the
likelihood that the drug will be effective against the organism, as well as a
number of clinical considerations. For example, it is important to know
whether or not the patient has any drug allergies and whether or not the
drug is contraindicated because of age, sex, or kidney status. If the patient
16 The Contextof the MYCIN
Experiments
has meningitis or brain involvement, whether or not the drug crosses the
blood-brain barrier is an important question. Since some drugs can be
given only orally, intravenously (IV), or intramuscularly (IM), the desired
route of administration may become an important consideration. The se-
verity of the patients disease mayalso be important, particularly for those
drugs whose use is restricted on ecological grounds or which are particu-
larly likely to cause toxic complications. Furthermore, as the patients clin-
ical status varies over time and more definitive information becomes avail-
able from the microbiology laboratory, it may be wise to change the drug
of choice or to modify the recommended dosage.
1. Has the wide use of antibiotics led to the emergence of new resistant
bacterial strains?
2. Has the ecology of "natural" or "hospital" bacterial flora been shifted
because of antibiotic use?
3. Have nosocomial (i.e., hospital-acquired) infections changed in inci-
dence or severity due to antibiotic use?
4. What are the trends of antibiotic use?
5. Are antibiotics properly used in practice?
Is there evidence that prophylactic use of antibiotics is harmful, and
how commonis it?
Are antibiotics often prescribed without prior bacterial culture?
Whencultures are taken, is the appropriate antibiotic usually pre-
scribed and correctly used?
6. Is the increasingly more frequent use of antibiotics presenting the med-
ical communityand the public with a new set of hazards that should be
approached by some new administrative or educational measures?
Having stated the issues, these authors proceed to cite evidence that in-
dicates that each of these questions has frightening answers--that the ef-
fects of antibiotic misuse are so far-reaching that the consequences may
often be worse than the disease (real or imagined) being treated!
Our principal concern has been with the fifth question: are physicians
rational in their prescribing habits and, if not, why not? Roberts and Vis-
conti examined these issues in 1,035 patients consecutively admitted to a
500-bed community hospital (Roberts and Visconti, 1972). Of 340 patients
receiving systemic antimicrobials, only 35%were treated for infection. The
rest received either prophylactic therapy (55%) or treatment for symptoms
without verified infection (10%). A panel of expert physicians and phar-
macists evaluated these therapeutic decisions, and only 13% were judged
to be rational, while 66%were assessed as clearly irrational. The remainder
were said to be questionable.
Of particular interest were the reasons why therapy was judged to be
irrational in those patients for whomsome kind of antimicrobial therapy
was warranted. This group consisted of 112 patients, or 50.2% of the 223
patients who were treated irrationally. It is instructive to list the reasons
that were cited, along with the percentages indicating how many of the
112 patients were involved:
The percentages add up to more than 100% because a given therapy may
have been judged inappropriate for more than one reason. Thus 62.5%
of the 112 patients who required antimicrobial therapy but were treated
irrationally were given a drug that was inappropriate for their clinical con-
dition. This observation reflects the need for improved therapy selection
tor patients requiring therapy--precisely the decision task that MYCIN
was designed to assist.
Once a need for improved continuing medical education in antimi-
crobial selection was recognized, there were several valid ways to respond.
One was to offer appropriate post-graduate courses for physicians. An-
other was to introduce surveillance systems for the monitoring and ap-
proval of antibiotic prescriptions within hospitals (Edwards, 1968; Kunin,
1973). In addition, physicians were encouraged to seek consultations with
infectious disease experts when they were uncertain how best to proceed
with the treatment of a bacterial infection. Finally, we concluded that an
automated consultation system that could substitute for infectious disease
experts when they are unavailable or inaccessible could provide a valuable
partial solution to the therapy selection problem. MYCINwas conceived
and developed in an attempt to fill that need.
This volume is organized into twelve parts of two to four chapters, each
highlighting a fundamental theme in the development and evolution of
MYCIN.This introductory part closes with a classic review paper that
outlines the production rule methodology.
The design and implementation of MYCINare discussed in Part Two.
Shortliffes thesis was the beginning, but the original system he developed
was modified as required.
In Part Three we focus on the problems of building a knowledge base
and on knowledge acquisition in general. TEIRESIAS,the program result-
ing from RandyDavis dissertation research, is described.
In Part Four we address the problems of reasoning under uncertainty.
The certainty factor model, one answer to the question of how to propagate
uncertainty in an inference mechanism, forms the basis of this part.
Part Five discusses the generality of the MYCINformalism. The EMY-
CIN system, written largely by William van Melle as part of his dissertation
Organization
of the Book 19
This chapter is based on an article taken with permission from Machine Intelligence 8: Machine
Representations of Knowledge, edited by E. W. Elcock and D. Michie, published in 1977 by ELlis
Horwood Ltd., Chichester, England.
"Pure"ProductionSystems 21
2.1.1 Rules
More generally, one side of a rule is evaluated with reference to the data
base, and if this succeeds (i.e., evaluates to TRUE in somesense), the action
specified by the other side is performed. Note that evaluate is typically taken
22 TheOriginof Rule-BasedSystemsin AI
S~ABA
A-,A1
A~I
B~B0
B~0
matching the LHSon a data base that consists of the start symbol S gives
a generator for strings in the language. Matching on the RHSof the same
set of rules gives a recognizer for the language. We can also vary the
methodology slightly to obtain a top-down recognizer by interpreting ele-
ments of the LHSas goals to be obtained by the successful matching of
elements from the RHS. In this case the rules "unwind." Thus we can use
the same set of rules in several ways. Note, however, that in doing so we
obtain quite different systems, with characteristically different control
structures and behavior.
The organization and accessing of the rule set is also an important
issue. The simplest scheme is the fixed, total ordering already mentioned,
but elaborations quickly grow more complex. The term conflict resolution
has been used to describe the process of selecting a rule. These issues of
rule evaluation and organization are explored in more detail below.
production
set decision
tree
ACF _, WZ B B } 2ndchar
ACD .., WY
2.1.3 Interpreter
The interpreter is the source of much of the variation found among dif-
ferent systems, but it maybe seen in the simplest terms as a select-execute
loop in which one rule applicable to the current state of the data base is
chosen and then executed. Its action results in a modified data base, and
the select phase begins again. Given that the selection is often a process of
choosing the first rule that matches the current data base, it is clear why
this cycle is often referred to as a recognize-act, or situation-action, loop. The
range of variations on this theme is explored in Section 2.5.3 on control
cycle architecture.
This alternation between selection and execution is an essential ele-
ment of PS architecture, which is responsible for one of its most funda-
mental characteristics. By choosing each new rule for execution on the
basis of the total contents of the data base, we are effectively performing
a complete reevaluation of the control state of the system at every cycle.
This is distinctly different from procedurally oriented approaches in which
control flow is typically the decision of the process currently executing and
is commonly dependent on only a small fraction of the total number of
state variables. PSs are thus sensitive to any change in the entire environ-
ment, and potentially responsive to such changes within the scope of a
single execution cycle. The price of such responsiveness is, of course, the
computation time required for the reevaluation.
An example of one execution of the recognize-act loop for a greatly
Two Views of Production Systems 25
PD1 says that if the symbol DDand some expression beginning with EE,
i.e., (EE . . .), is found in STM,then insert the symbol BBat the front
STM. PD2 says that if the symbol XXis found in STM, then first insert
the symbol CC, then the symbol DD, at the front of STM.
The initial contents of STMare
STM:(QQ(EE FF) RR SS)
Prior work has suggested that there are two major views of PSs, charac-
terized on one hand by psychological modeling efforts (PSG, PAS II, VIS,
etc.) and on the other by performance-oriented, knowledge-based expert
systems (e.g., MYCIN,DENDRAL). These distinct efforts have arrived
similar methodologies while pursuing differing goals.
The psychological modeling efforts are aimed at creating a program
that embodies a theory of human performance of simple tasks. From the
performance record of experimental human subjects, the modeler for-
mulates the minimally competent set of production rules that is able to
reproduce the behavior. Note that "behavior" here is meant to include all
aspects of human performance (mistakes, the effects of forgetting, etc.),
26 TheOriginof Rule-BasedSystemsin AI
including all sht)rtcomings or successes that may arise out of (and hence
~
may be clues to) the "architecture" of humancognitive systems.
An example of this approach is the PSG system, from which we con-
structed the example above. This system has been used to test a number
of theories to explain the results of the Sternberg memory-scanning tasks
(Newell, 1973), with each set of productions representing a different theory
of how tile humansubject retains and recalls the information given to him
or her during the psychological task. Here the subject first memorizes a
small subset of a class of familiar symbols (e.g., digits) and then attempts
to respt)nd to a symbol flashed on a screen by indicating whether or not it
was in the initial set. His or her response times are noted.
The task was first simulated with a simple production system that per-
formed correctly but did not account for timing variations (which were
due to list length and other factors). Refinements were then developed to
incorporate new hypotheses about how the symbols were brought into
memory, and eventually a good simulation was built around a small num-
ber of productions. Newell has reported (Newell, 1973) that use of a
methodology led in this case to the novel hypothesis that certain timing
effects are caused by a decoding process rather than by a search process.
The experiment also clearly illustrated the possible tradeoffs in speed and
accuracy between differing processing strategies. Thus the PS model was
an effective vehicle for the expression and evaluation of theories of be-
havior.
The performance-oriented expert systems, on the other hand, start
with productions as a representation of knowledge about a task or domain
and attempt to build a program that displays competent behavior in that
domain. These efforts are not concerned with similarities between the re-
sulting systems and human performance (except insofar as the latter may
provide a possible hint about ways to structure the domain or to approach
the problem or may act as a yardstick fbr success, since few AI programs
approach human levels of competence). They are intended simply to per-
fbrm the task without errors of any sort, humanlike or otherwise. This
approach is characterized by the DENDRAL system, in which much of the
development has involved embedding a chemists knowledge about mass
spectrometry into rules usable by the program, without attempting to
model the chemists thinking. The programs knowledge is extended by
adding rules that apply to new classes of chemical compounds. Similarly,
much of the work on the MYCIN system has involved crystallizing informal
knowledgeof" clinical medicine in a set of production rules.
Despite the difference in emphasis, researchers in both fields have
Observations such as this have led to speculation that the interest in pro-
28 TheOriginof Rule-BasedSystemsin AI
Program designers have found that PSs easily model problems in some
domains but are awkward for others. Let us briefly investigate why this
maybe so, and relate it to the basic structure and. function of a PS.
Wecan imagine two very different classes of problems--the first is best
viewed and understood as consisting of many independent states, while
the second seems best understood via a concise, unified theory, perhaps
embodied in a single law. Examples of the former include some views of
perceptual psychology or clinical medicine, in which there are many states
relative to the number of actions (this may be due either to our lack of
cohesive theory or to the basic complexity of the system being modeled).
Examples of the latter include well-established areas of physics and math-
ematics, in which a few basic tenets serve to embodymuch of the required
knowledge, and in which the discovery of unifying principles has empha-
sized the similarities in seemingly different states. This first distinction
appears to be one important factor in distinguishing appropriate from
inappropriate domains.
A second distinction concerns the complexity of control flow. At two
extremes, we can imagine two processes, one of which is a set of indepen-
dent actions and the other of which is a complex collection of multiple,
parallel processes inw)lving several dependent subprocesses.
A third distinction concerns the extent to which the knowledge to be
embedded in a system can be separated from the manner in which it is to
be used [also knownas the controversy between declarative and procedural
representations; see Winograd (1975) for an extensive discussion]. As one
example, we can imagine simply stating facts, perhaps in a language like
predicate calculus, without assuming how those facts will be employed.
Alternatively, we could write procedural descriptions of how to accomplish
AppropriateandInappropriateDomains 29
a stated goal. Here the use of the knowledge is for the most part prede-
termined during the process of embodyingit in this representation.
In all three of" these distinctions, a PS is well-suited to the first descrip-
tion and ill-suited to the latter. The existence of multiple, nontrivially dif-
ferent, independent states is an indication of the feasibility of writing mul-
tiple, nontrivial, modular rules. A process composed of a set of
independent actions requires only limited communication between the ac-
tions, and, as we shall see, this is an important characteristic of PSs. The
ability to state what knowledge ought to be in the system without also
describing its use greatly improves the ease with which a PS can be written
(see Section 2.4.9).
For the second class of problems (unified theory, complex control flow,
predetermined use for the knowledge), the economy of the relevant basic
theory makes for either trivial rules or multiple, almost redundant, rules.
In addition, a complex looping and branching process requires explicit
communication between actions, in which one action explicitly invokes the
next, while interacting subgoals require a similarly advanced communica-
tion process to avoid conflict. Such communication is not easily supplied
in a PS-based system. The same difficulty also makes it hard to specify in
advance exactly how a given fact should be used.
It seems also to be the nature of production systems to focus upon the
variations within a domain rather than upon the commonthreads that link
different facts or operations. Thus, for example, the process of addition
is naturally expressed via productions as n2 rewrite operations involving
two symbols (the digits being added). The fact that addition is commuta-
tive, or rather that there is a property of "commutativity" shared by all
operations that we consider to be addition, is a rather awkward one to
express in production system terms. This same characteristic may, con-
versely, be viewed as a capability for focusing on and handling significant
amounts of detail. Thus, where the emphasis of a task is on recognition of
large numbers of distinct states, PSs provide a significant advantage. In a
procedurally oriented approach, it is both difficult to organize and trou-
blesome to update the repeated checking of large numbers of state vari-
ables and the corresponding transfers of control. The task is far easier in
PS terms, where each rule can be viewed as a "demon" awaiting the oc-
:~
currence of a specific state.
The potential sensitivity and responsiveness of PSs, which arise from
their continual reevaluation of" the control state, has also been referred to
as the openness of rule-based systems. It is characterized by the principle
that "any rule can fire at any time," which emphasizes the fact that at any
point in the computation any rule could be the next to be selected, de-
pending only on the state of" the data base at the end of the current cycle.
Compare this to the normal situation in a procedurally oriented language,
INTERACTION
1.,_
CHECKING
,I
CONSISTENCY
EXTENSIBILITY SELECTION
ALGORITHM OF
BEHAVIOR
EXPLANATIONS
OFPRIMITIVE
ACTIONS
While this characterization is clearly true for a pure PS, with its limitations
on the size of" STM, we can generalize on it slightly to deal with a broader
class of systems. First, in the more general case, the channel is not so much
32 TheOriginof Rule-BasedSystemsin AI
narrow as indirect and unique. Second, the kludgery 4 arises not from arbi-
trarily complex messages but from specially crafted messages, which force
highly specific, carefully chosen interactions.
With reference to the first point, one of the most fundamental char-
acteristics of the pure PS organization is that rules must interact indirectly
through a single channel. Indirection implies that all interaction must oc-
cur by the effect of modifications written in the data base; uniqueness of
the channel implies that these modifications are accessible to every one of
the rules. Thus, to produce a system with a specified behavior, one must
not think in the usual terms of having one section of" code call another
explicitly, but rather use an indirect approach in which each piece of" code
(i.e., each rule) leaves behind the proper traces to trigger the next relevant
piece. The uniform access to the channel, along with the openness of" PSs,
implies that those traces must be constructed in the light of a potential
response from any rule in the system.
With reference to Winograds second point, in many systems the action
of a single rule may, quite legitimately, result in the addition of very com-
plex structures to the data base (e.g., DENDRAL; see Section 2.5). Yet
another rule in the same system may deposit just one carefully selected
symbol, chosen solely because it will serve as an unmistakable symbol for
precisely one other (carefully preselected) rule. Choosing the symbol care-
fully provides a way of sending what becomes a private message through
a public channel; the continual reevaluation of the control state assures
that the message can take immediate effect. The result is that one rule has
effectively called another, procedure style, and this is the variety of kludg-
ery that is contrary to the style of knowledgeorganization typically asso-
ciated with a PS. It is the premeditated nature of such message passing
(typically in an attempt to "produce a system with specified behavior") that
is the primary violation of the "spirit" of PS methodology.
The primary effect of this indirect, limited interaction is the devel-
opment of a system that is strongly modular, since no rule is ever called
directly. The indirect, limited interaction is also, however, the most signif-
icant factor that makes the behavior of a PS more difficult to analyze. This
results because, even for very simple tasks, overall behavior of a PS may
not be at all evident from a simple review of its rules.
1b illustrate manyof these issues, consider the algorithm for addition
of positive, single-digit integers used by Waterman(1974) with his PAS
production system interpreter. First, the procedural version of the algo-
rithm, in which transfer of control is direct and simple:
add(m,n) ::=
A] count~O;nn~n;
B] L~: if count= mthen return(nn);
4Kludge is a term drawn from die vernacular of computer programmers. It refers to a "patch"
or "trick" in a program or system that deals with a potential problem, usually in an inelegant
or nongeneralized way. Thus kludgery refers to the use of" kludges.
Production System Characteristics 33
C] count~successor(count);
D] nn~successor(nn);
E] go(L1);
Compare this with the set of productions for the same task in Figure 2-3.
The S in Rules 2, 3, and 5 indicates the successor function. After initiali-
zation (Rules 1 and 2), the system loops around Rules 4 and 5 producing
the successor rules it needs (Rule 5) and then incrementing NNby 1 for
Miterations. In this loop, intermediate calculations (the results of successor
function computations) are saved via (PROD)in Rule 5, and the final
swer is saved by (PROD)in Rule 3. Thus, as shown in Figure 2-4, after
computing 4 + 2 the rule set will contain seven additional rules; it is
recording its intermediate and final results by writing new productions and
in the future will have these answers available in a single step. Note that
the set of productions therefore/s memory(and in fact long-term memory,
or LTM, since productions are never lost from the set). The two are not
precisely analogous, since the procedural version does simple addition,
while the production set both adds and "learns." As noted by Waterman
(1974), the production rule version does not assume the existence of
successor function. Instead Rule 5 writes new productions that give the
successor for specific integers. Rule 3 builds what amounts to an addition
table, writing a new production for each example that the system is given.
Placing these new rules at the front of the rule set (i.e., before Rule 1)
means that the addition table and successor function table will always be
consulted before a computation is attempted, and the answer obtained in
one step if possible. Without these extra steps, and with a successor func-
tion, the production rule set could be smaller and hence slightly less com-
plex.
Waterman also points out some direct correspondences between the
production rules in Figure 2-3 and the statements in the procedure above.
For example, Rules 1 and 2 accomplish the initialization of line A, Rule 3
corresponds to line B, and Rule 4 to lines C and D. There is no production
equivalent to the "goto" of line E because the production system execution
cycle takes care of that implicitly. On the other hand, note that in the
procedure there is no question whatsoever that the initialization step
nn *-- n is the second statement of "add" and that it is to be executed just
once, at the beginning of the procedure. In the productions, the same
action is predicated on an unintuitive condition of the STM(essentially it
says that if the value of N is known, but NNhas never been referenced or
incremented, then initialize NNto the value that N has at that time). This
degree of explicitness is necessary because the production system has no
notion that the initialization step has already been performed in the given
ordering of statements, so the system must check the conditions each time
it goes through a new cycle.
Thus procedural languages are oriented toward the explicit handling
of control flow and stress the importance of its influence on the funda-
mental organization of the program (as, for example, in recent develop-
34 The Origin of Rule-BasedSystemsin AI
Production Rules:
Initial STM:
READY)(ORDER0123456789)
Notation."
The Xls in the condition are variables in the pattern match; all other symbols
are literals. AnXl appearing only in the action is also taken as a literal. Thus if
Rule 5 is matched with XI =4 and X2=5, as its second action it would deposit
(COND(S X~ 4)) in STM.These variables are local to each rule; that is, their
previous bindings are disregarded.
All elements of the LHSmust be matched for a match to succeed.
A hyphen indicates the ANDNOT operation.
An expression enclosed in parentheses and starting with a literal [e.g., (COUNT)
in Rule 4] will match any expression in STMthat starts with the same literal
[e.g., (COUNT2)]. The expression (ORDERXl X2) will match (ORDER
3 . . . 9) and bind XI=0 and X2= 1.
REP stands for REPlace, so that, for example, the RHSof Rule 1 will replace
the expression (READY)in the data base with the expression (COUNT
[where the variable X1 stands for the element matched by the XI in (ORDER
X0].
DEPstands tor DEPosit symbols at front of STM.
ATTEND means wait for input from computer terminal. For this example, typ-
ing (M 4)(N 2) will have the system add 4 and
SAYmeans output to terminal.
FIGURE2-3 continued
While there are wide variations in the format permitted by various PSs, in
any given system the syntax is traditionally quite restrictive and generally
follows the conventions accepted for PSs. 5 Most commonlythis means,
first, that the side of" the rule to be matched should be a simple predicate
built out of" a Boolean combination of computationally primitive opera-
tions; these involve (as noted above) only matching and detection. Second,
it means the side of the rule to be executed should perform conceptually
simple operations on the data base. In many of the systems oriented toward
psychological modeling, the side to be matched consists of a set of literals
or simple patterns, with the understanding that the set is to be taken as a
conjunction, so that the predicate is an implicit one regarding the success
or failure of matching all of the elements. Similarly, the side to be executed
performs a simple symbol replacement or rearrangement.
Whatever the format, though, the conventions noted lead to clear re-
strictions for a pure production system. First, as a predicate, the matching
side of a rule should return only some indication of the success or failure
of the match.6 Second, as a simple expression, the matching operation is
CYCLE#1
(READY)(ORDER 0123456789) initial state
Rule1 Succeeds (COUNT 0)(ORDER 0123456789) awaitsinput (M4)(N
(N 2)(M 4)(COUNT after input
(ORDER0123456789)
Rule2 Succeeds (NN2)(N 2)(M 4)(COUNT X1 boundto 2
(ORDER0123456789)
Rule3 Fails
Rule4 Succeeds (S NN2)(N 2)(M 4)(S COUNT
(ORDER 0123456789)
Rule5 Succeeds (S NN2)(N 2)(M 4)(S COUNT Xl boundto 0
(ORDER123456789) NewRule 6:
(S 30) ~(REP(S X
30)(X31)
CYCLE #2
Rule 6 Succeeds (S NN2)(N 2)(M 4)(COUNT X3 boundto the literal COUNT
(ORDER123456789)
Rule1 Fails
Rule2 Fails
Rule 3 Fails
Rule 4 Fails
Rule5 Succeeds (S NN2)(N 2)(M 4)(COUNT NewRule 7:
(ORDER 23456789) (S X31) ~ (REP(SX31)(X32))
CYCLE #3
Rule7 Fails
Rule6 Fails
Rule1 Fails
Rule2 Fails
Rule3 Fails
Rule4 Fails
Rule5 Succeeds (S NN2)(N 2)(M 4)(COUNT NewRule 8:
(ORDER 3456789) (S 32) ~(REP(S X~
2)(33))
CYCLE #4
Rule8 Succeeds (NN3)(N 2)(M 4)(COUNT X3boundto NN
(ORDER 3456789)
Rule7 Fails
Rule 6 Fails
Rule1 Fails
Rule2 Fails
Rule 3 Fails
Rule 4 Succeeds (S NN3)(N 2)(M 4)(S COUNT
(ORDER 3456789)
Rule5 Succeeds (S NN3)(N 2)(M 4)(S COUNT NewRule 9:
(ORDER 456789) (S X33) ~ (REP(S33)(X34))
CYCLE #5
Rule 9 Succeeds (NN4)(N 2)(M 4)(S COUNT
(ORDER 456789)
and since clauses that are unknown cause subproblems that may involve
long computations to be set up, it makes sense to check to see if, based on
what is currently known,the entire premise is sure to fail (e.g., if any clause
of a conjunction is knownto be false). Wecannot simply EVALeach clause,
since this will trigger a search if the value is still unknown.But if the clause
can be "unpacked" into its proper constituents, it is possible to determine
whether or not the value is knownas yet, and if so, what it is. This is done
via a template associated with each predicate function. For example, the
template fbr SAMEis
(SAMECNTXTPARMVALUE)
and it gives the generic type and order of arguments for the function
(much like a simplified procedure declaration). By using this as a guide
unpack and extract the needed items, we can safely do a partial evaluation
of the rule premise. A similar technique is used to separate the knownand
38 TheOriginof Rule-Based
Systemsin AI
unknown clauses of a rule for the users benefit when the system is ex-
plaining itself (see Chapter 18 for several examples).
Note that part of the system is reading the code being executed by the
other part. Furthermore, note that this reading is guided by infi)rmation
carried in the rule components themselves. This latter characteristic as-
sures that the capability is unaffected by the addition of new rules or
predicate functions to the system.
This kind of technique limits expressibility, however, since the limited
syntax may not be sufficiently powerful to make expressing each piece of
knowledge an easy task. This in turn both restricts extensibility (adding
something is difficult if it is hard to express it) and makes modification of
the systems behavior more difficult (e.g., it might not be particularly at-
tractive to implement a desired iteration if doing so requires several rules
rather than a line or two of code).
2.4.4 Modularity
7The number of rules that could be removed without performance degradation (short of
redundancies) is an interesting characteristic that would appear to be correlated with which
of the two commonapproaches to PSs is taken. The psychological modeling systems would
apparently degenerate fastest, since they are designed to be minimally competent sets of
rules. Knowledge-based expert systems, on the other hand, tend to embody numerous in-
dependent subproblems in rules and often contain overlapping or even purposefully redun-
dant representations of knowledge. Hence, while losing their competence on selected prob-
lems, it appears they would often flmction reasonably well, even with several rules removed.
40 TheOriginof Rule-BasedSystemsin AI
8Onespecific exampleof the importanceof rule order can be seenin our earlier exampleof
addition (Figure2-3). HereRule5 assumesthat an orderingof the digits exists in STM
the form(ORDER 0 1 2 ...) and from this can be created the successor function for each
digit. If Rule5 wereplacedbefore Rule1, the systemwouldntadd at all. In addition,
acquiringthe notion of successorin subsequentruns dependsentirely on the placementof
the newsuccessorproductionsbeJbreRule3, or the effect of this newknowledge wouldbe
masked.
Production System Characteristics 41
Visibility of behavior flow is the ease with which the overall behavior of a
PS can be understood, either by observing the system or by reviewing its
rule base. Even for conceptually simple tasks, the stepwise behavior of a
PS is often rather opaque. The poor visibility of PS behavior compared to
that of the procedural formalism is illustrated by the Waterman integer
addition example outlined in Section 2.4.1. The procedural version of the
iterative loop there is reasonably clear (lines B, C, and E), and an ALGOL-
type
FORI := 1 UNTILN DO...
would be completely obvious. Yet the PS formalism for the same thing
requires nonintuitive productions (like 1 and 2) and symbols like NNwhose
only purpose is to "mask" the condition portion of a rule so it will not be
inw)ked later [such symbols are termed control elements (Anderson, 1976)].
The requirement for control elements, and much of the opacity of PS
behaviol, is a direct result of two factors noted above: the unity of control
and data store, and the reevaluation of the data base at every cycle. Any
attempt to "read" a PS requires keeping in mind the entire contents of the
data base and scanning the entire rule set at every cycle. Control is much
more explicit and localized in procedural languages, so that reading AL-
)
GOLcode is a far easier task.
The perspective on knowledge representation implied by PSs also con-
tributes to this opacity. As suggested above, PSs are appropriate when it is
possible to specify the content of required knowledge without also speci-
fying the way in which it is to be used. Thus, reading a PS does not gen-
erally make clear how it works so much as what it may know, and the
behavior is consequently obscured. The situation is often reversed in pro-
cedural languages: program behavior may be reasonably clear, but the
domain knowledge used is often opaquely embedded in the procedures.
The two methodologies thus emphasize different aspects of knowledge and
program organization.
Several interesting capabilities arise from making it possible for the system
to examine its own rules. As one example, it becomes possible to implement
automatic consistency checking. This can proceed at several levels. In the
simplest approach we can search for straightforward syntactic problems
such as contradiction (e.g., two rules of the form A & B ~ C and A &
--* -C) or subsumption (e.g., two rules of the form D & E & F ~ G and
:~One of the motivations |or the interest in structured programming is the attempt to em-
phasize still further tile degree of explicitness and localization of control.
42 TheOriginof Rule-BasedSystemsin AI
lt~Theseare known
as self-re[erencing
rules; seeChapter5.
Production
SystemCharacteristics 43
respect to newly added rules. While all these are conceivable in a system
using a standard procedural approach, the heavily stylized format of rules,
and the typically simple control structure of the interpreters, makes them
all realizable prospects in a PS.
Finally, the relative complexity of the rule selection mechanismwill
have varying effects on the ability to automate consistency checks, or be-
havior modification and extension. An RHS scan with backward chaining
(i.e., a goal-directed system; see Section 2.5.3) seems to be the easiest
follow since it mimics part of human reasoning behavior, while an LHS
scan with a complex conflict resolution strategy makes the system generally
more difficult to understand. As a result, predicting and controlling the
effects of changes in, or additions to, the rule base are directly influenced
in either direction by the choice of rule selection mechanism.
2.4.9 Programmability
and the behavior produced by the entire set of rules noted. As a second
approach, the programmer starts out with a specific behavior that he or
she wants to recreate. The entire rule set is written as a group with this in
mind, and, where necessary, one rule might deposit a symbol like A00124
in STMsolely to trigger a second specific rule on the next cycle.
In the first case the control elements would correspond to recognizable
states of the system. As such, they function as indicators of those states
and serve to trigger what is generally a large class of potentially applicable
rules.ll In the second case there is no such correspondence, and often only
a single rule recognizes a given control element, The idea here is to insure
the execution of a specific sequence of" rules, often because a desired effect
could not be accomplished in a single rule invocation. Such idiosyncratic
use of" control elements is formally equivalent to allowing one rule to call
a second, specific rule and hence is very much out of character for a PS.
To the extent that such use takes place, it appears to us to be suggestive
of a failure of the methodology--perhaps because a PS was ill-suited to
the task to begin with or because the particular decomposition used for
the task was not well chosen. 12 Since one fundamental assumption of the
PS methodologyas a psychological modelingtool is that states of" the system
correspond to what are at least plausible (if not immediately recognizable)
individual "states of mind," the relative abundance of the two uses of con-
trol elements mentioned above can conceivably be taken as an indication
of how successfully the methodology has been applied.
A second approach to dealing with tile difficulty of programming in
PSs is the use of increasingly complex forms within a single rule. Where
a pure PS might have a single action in its RHS, several psychological
modeling systems (PAS II, VIS) have explored the use of more complex
sequences of actions, including the use of" conditional exits from the se-
quence.
Finally, one effort (Rychener, 1975) has investigated the use of PSs
that are unconstrained by prior restrictions on rule format, use of tags,
etc. The aim here is to employ the methodology as a formalism for expli-
cating knowledge sources, understanding control structures, and examin-
ing the effectiveness of" PSs for attacking the large problems typical of
artificial intelligence. The productions in this system often turn out to have
a relatively simple format, but complex control structures are built via
carefully orchestrated interaction of rules. This is done with several tech-
niques, including explicit reliance on both control elements and certain
characteristics of the data base architecture. For example, iterative loops
are manufactured via explicit use of control elements, and data are (re-
dundantly) reasserted in order to make use of the "recency" ordering on
rules (the rule that mentions the most recently asserted data item is chosen
first; see Section 2.5.3). These techniques have supported the reincarnation
as PSs of a number of sizable AI programs [e.g., STUDENT (Bobrow,
1968)], but, Bobrownotes, "control tends to be rather inflexible, failing to
take advantage of the openness that seems to be inherent in PSs."
This reflects something of a new perspective on the use of PSs. Pre-
vious efforts have used them as tools for analyzing both the core of knowl-
edge essential to a given task and the manner in which such knowledge is
used. Such efforts relied in part on the austerity of the available control
structure to keep all of the knowledgeexplicit. The expectation is that each
production will embody a single chunk of knowledge. Even in the work of
Newell (1973), which used PSs as a mediumfor expressing different the-
ories in the Sternberg task, an important emphasis is placed on productions
as a model of the detailed control structure of humans. In fact, every aspect
of the system is assumed to have a psychological correlate.
The work reported by Rychener (1975), however, after explicitly de-
tailing the chunks of knowledge required in the word problem domain of
STUDENT,notes a many-to-many mapping between its knowledge chunks
and productions. That work also focuses on complex control regimes that
can be built using PSs. While still concerned with knowledge extraction
and explication, it views PSs more as an abstract programming language
and uses them as a vehicle for exploring control structures. While this
approach does offer an interesting perspective on such issues, it should
also be noted that as productions and their interactions grow more com-
plex, many of the advantages associated with traditional PS architecture
may be lost (for example, the loss of openness noted above). The benefits
to be gained are roughly analogous to those of using a higher-level pro-
gramming language: while the finer grain of the process being examined
may become less obvious, the power of the language permits large-scale
tasks to be undertaken and makes it easier to examine phenomena like the
interaction of entire categories of knowledge.
The use of PSs has thus grown to encompass several different forms,
many of which are tar more complex than the pure PS model described
initially.
The LHShere is the name of the graph structure that describes the estro-
gen class of molecules, while the RHSindicates the likely locations for bond
breakages and hydrogen transfers when such molecules are subjected to
mass spectral bombardment. Note that while both sides of the rule are
relatively complex, they are written in terms that are conceptual primitives
in the domain.
A related issue is illustrated by the rules used by MYCIN,where the
LHSconsists of a Boolean combination of standardized predicate func-
tions. Here the testing of a rule for relevance consists of having the stan-
dard LISP evaluator assess the LHS, and all matching and detection are
controlled by the functions themselves. While using functions in LHSs
provides power that is missing from using a simple pattern match, that
creates the temptation to write one function to do what should be ex-
pressed by several rules. For example, one small task in MYCIN is to de-
duce that certain organisms are present, even though they have not been
recovered from any culture. This is a conceptually complex, multistep op-
eration, which is currently (1975) handled by invocation of a single func-
tion. If one succumbs often to the temptation to write one function rather
than several rules, the result can be a system that may perform the initial
task but that loses a great manyof the other advantages of the PS approach.
The problem is that the knowledge embodied in these functions is un-
available to anything else in the system. Whereas rules can be accessed and
their knowledge examined (because of their constrained format), chunks
of" ALGOL-likecode are not nearly as informative. The availability of a
standardized, well-structured set of operational primitives can help to
avoid the temptation to create new functions unnecessarily.
If ttle patient has had a bowel tumor, then in concluding about or-
ganism identity, rules that mention the gastrointestinal tract are more
likely to be useful.
The basic control cycle can be broken down into two phases called recog-
nition and action. The recognition phase involves selecting a single rule for
execution and can be further subdivided into selection and conflict resolu-
tion. 14 In the selection process, one or more potentially applicable rules are
chosen from the set and passed to the conflict resolution algorithm, which
chooses one of them. There are several approaches to selection, which can
be categorized by their rule scan method. Most systems (e.g., PSG, PASII)
use some variation of an LHS scan, in which each LHSis evaluated in
turn. Manystop scanning at the first successful evaluation (e.g., PSG), and
hence conflict resolution becomesa trivial step (although the question then
remains of where to start the scan on the next cycle: to start over at the
first rule or to continue from the current rule).
Somesystems, however, collect all rules whose LHSs evaluate success-
fully. Conflict resolution then requires some criterion for choosing a single
rule from this set (called the conflict set). Several have been suggested,
including:
For example, the LISP70 interpreter uses (iii), while DENDRAL uses (iv).
A different approach to the selection process is used in the MYCIN
system. The approach is goal-oriented and uses an RHSscan. The process
is quite similar to the unwinding of consequent theorems in PLANNER
(Hewitt, 1972): given a required subgoal, the system retrieves the (unor-
dered) set of rules whose actions conclude something about that subgoal.
The evaluation of the first LHSis begun, and if any clause in it refers to
a fact not yet in the data base, a generalized version of this fact becomes
the new subgoal, and the process recurs. However, because MYCINis
designed to work with judgmental knowledge in a domain where collecting
all relevant data and considering all possibilities are very important, in
general, it executes all rules from the conflict set rather than stopping after
the first success.
The meta-rules mentioned above may also be seen as a way of selecting
a subset of the conflict set for execution. There are several advantages to
this. First, the conflict resolution algorithm is stated explicitly in the meta-
rules (rather than implicitly in the systems interpreter) and in the same
representation as the rest of the rule-based knowledge. Second, since there
can be a set of meta-rules for each subgoal type, MYCIN can specify dis-
tinct, and hence potentially more customized, conflict resolution strategies
for each individual subgoal. Since the backward chaining of rules may also
be viewed as a depth-first search of an AND/OR goal tree, 15 we may view
These are questions that will be important and useful to confront in de-
signing any system intended to do knowledge acquisition, especially any
built around production rules as underlying knowledge representation.
2.6 Conclusions
In artificial intelligence research, production systems were first used to
embody primitive chunks ofinformation-processing behavior in simulation
programs. Their adaptation to other uses, along with increased experience
with them, has focused attention on their possible utility as a general pro-
gramming mechanism. Production systems permit the representation of
knowledge in a highly uniform and modular way. This may pay off hand-
somely in two areas of investigation: development of programs that can
manipulate their own representations and development of a theory of
loosely coupled systems, both computational and psychological. Production
systems are potentially useful as a flexible modeling tool for many types
of systems; current research efforts are sufficiently diverse to discover the
extent to which this potential may be realized.
Information-processing psychologists continue to be interested in pro-
duction systems. PSs can be used to study a wide range of tasks (Newell
and Simon, 1972). They constitute a general programming system with
the full power of a Turing machine, but use a homogeneous encoding of
knowledge. To the extent that the methodology is that of a pure production
system, the knowledge embedded is completely explicit and thus aids
experimental verification or falsifiability of theories that use PSs as a me-
dium of expression. Productions may correspond to verifiable bits of psy-
chological behavior (Moran, 1973a), reflecting the role of postulated hu-
man information-processing structures such as short-term memory. PSs
are flexible enough to permit a wide range of variation based on reaction
times, adaptation, or other commonlytested psychological variables. Fi-
nally, they provide a method for studying learning and adaptive behavior
(Waterman, 1974).
For those wishing to build knowledge-based expert systems, the homo-
geneous encoding of knowledge offers the possibility of automating parts
of the task of dealing with the growing complexity of such systems. Knowl-
edge in production rules is both accessible and relatively easy to modify. It
can be executed by one part of the system as procedural code and exam-
ined by another part as if it were a declarative expression. Despite the
difficulties of programmingPSs, and their occasionally restrictive syntax,
the fundamental methodology suggests a convenient and appropriate
framework for the task of structuring and specifying large amounts of
knowledge. (See Hayes-Roth et al., 1983, for recent uses of production
systems.) It maythus prove to be of great utility in dealing with the prob-
lems of complexity encountered in the construction of large knowledge
bases.
PART TWO
Using Rules
3
The Evolution of MYCINs
Rule Form
There is little doubt that the decision to use rules to encode infectious
disease knowledge in the nascent MYCIN system was largely influenced by
our experience using similar techniques in DENDRAL. However, as men-
tioned in Chapter 1, we did experiment with a semantic network repre-
sentation before turning to the production rule model. The impressive
published examples of Carbonells SCHOLAR system (Carbonell, 1970a;
1970b), with its ability to carry on a mixed-initiative dialogue regarding
the geography of South America, seemed to us a useful model of the kind
of rich interactive environment that would be needed for a system to advise
physicians.
Our disenchantment with a pure semantic network representation of
the domain knowledge arose for several reasons as we began to work with
Cohen and Axline, our collaborating experts. First, the knowledge of in-
fectious disease therapy selection was ill-structured and, we found, difficult
to represent using labeled arcs between nodes. Unlike South American
geography, our domain did not have a clear-cut hierarchical organization,
and we found it challenging to transfer a page or two from a medical
textbook into a network of sufficient richness for our purposes. Of partic-
ular importance was our need for a strong inferential mechanism that
would allow our system to reason about complex relationships among di-
verse concepts; there was no precedent for inferences on a semantic net
that went beyond the direct, labeled relationships between nodes.1
Perhaps the greatest problem with a network representation, and the
greatest appeal of production rules, was our gradually recognized need to
deal with small chunks of domain knowledge in interacting with our expert
collaborators. Because they were not used to dissecting their clinical rea-
soning processes, it was totally useless to ask them to "tell us all that you
know." However, by discussing specific difficult patients, and by encour-
55
56 The Evolution of MYCINsRule Form
2The arbitrary order of MYCINsrules did lead to some suboptimal performance character-
istics, however. In particular, the ordering of questions to the user often seemed unfocused.
It was for this reason that the MAINPROPS (later known as INITIALDATA)feature was
devised (see Chapter 5), and the concept of meta-rules was developed to allow rule selection
and ordering based on strategic knowledge of the domain (see Chapter 28). The development
of prototypes in CENTAUR (Chapter 23) was similarly motivated.
DesignConsiderations 57
test these hypotheses and to select the best ones. Thus DENDRALs control
scheme involved forward invocation of rules for the last phase of the plan-
generate-and-test paradigm. On the other hand, it was unrealistic for MY-
CIN to start by generating hypotheses regarding likely organisms or com-
binations of pathogens; there were no reasonable heuristics for pruning
the search space, and there was no single piece of orienting information
similar to the mass spectrum, which provided the planning information to
constrain DENDRALshypothesis generator. Thus MYCINwas dependent
on a reasoning model based on evidence gathering, and its rules were used
to guide the process of input data collection. Because we wanted to avoid
problems of natural language understanding, and also did not want to
teach our physician users a specialized input language, we felt it was un-
reasonable to ask the physician to enter some subset of the relevant patient
descriptors and then to have the rules fire in a data-driven fashion. Instead,
we chose a goal-directed control structure that allowed MYCIN to ask the
relevant questions and therefore permitted the physician to respond, in
general, with simple one-word answers. Thus domain characteristics led
to forward-directed use of the generate-and-test paradigm in DENDRAL
and to goal-directed use of the evidence-gathering paradigm in MYCIN.
Wewere not entirely successful in putting all of the requisite medical
knowledge into rules. Chapter 5 describes the problems encountered in
trying to represent MYCINstherapy selection algorithm as rules. Because
therapy selection was initially implemented as LISP code rather than in
rules, MYCINsexplanation system was at that time unable to justify spe-
cific therapy decisions in the same way it justified its diagnostic decisions.
This situation reflects the inherent tension between procedural and pro-
duction-based representation of this kind of algorithmic knowledge. The
need for further work on the problem was clear. A few years later Clancey
assumed the challenge of rewriting the therapy selection part of MYCIN
so that appropriate explanations could be generated for the user. Wewere
unable to encode the entire algorithm in rules, however, and instead settled
on a solution reminiscent of the generate-and-test approach used in DEN-
DRAL:rules were used to evaluate therapeutic hypotheses after they had
been proposed (generated) by an algorithm that was designed to support
explanations of its operation. This clever solution, described in Chapter 6,
seemed to provide an optimal mix of procedural and rule-based knowl-
edge.
This list of design considerations played a major role in guiding our early
work on MYCIN,and, as we suggested earlier in this chapter, they largely
account for our decision to implement MYCINas a rule-based system. In
Chapters 4 through 6, and in subsequent discussions of knowledge acqui-
sition (Part Three) and explanation (Part Six), it will becomeclear how
production system formalism provided a powerful foundation for an evolv-
ing system intended to satisfy the design goals we have outlined here.
One of the lessons of the MYCINresearch has been the way in which the
pure theory of production systems, as described in Chapter 2, has required
adaptation in response to issues that arose during system development.
Many of these deviations from a pure production system approach with
backward chaining will become clear in the ensuing chapters. For reference
we summarize here some of those deviations, citing the reasons for changes
60 The Evolution of MYCINs
Rule Form
that were introduced, even though this anticipates more complete discus-
sions in later chapters.
1. The context tree: Werealized the need to allow our rules to make
conclusions about multiple objects and to keep track of the hierarchical
relationships among them. The context tree (described in Chapter 5) was
created to provide a mechanismfor representing hierarchical relationships
and for quantifying over multiple objects. For instance, ORGANISM-1 and
ORGANISM-2 are contexts of the same type that are related to cultures
in which they are observed to be growing and that need to be compared,
collected, and reasoned with together at times.
Thus, reasoning with defaults is done in the rules and can be explained
in the same way as any other conclusions. The control structure had to be
changed, however, to delay executing these rules until all other relevant
rules had been tried.
b. Screening: For purposes of human engineering, we needed a screen-
6. Mapping rules: Wesoon recognized the need for rules that could
be applied iteratively to a set of contexts (e.g., a rule comparinga current
organism to each bacterium in the set of all previous organisms in the
context tree). Special predicate functions (e.g., THERE-IS, FOR-EACH,
ONE-OF)were therefore written so that a condition in a rule premise could
map iteratively over a set of contexts. This was a partial solution to the
general representation problem of expressing universal and existential
quantification. Only by considering all contexts of a type could we deter-
mine if all or some of them had specified properties. The context tree
allowed easy comparisons within any parent context (e.g., all the organisms
growing in CULTURE-2) but did not allow easy comparison across contexts
(e.g., all organisms growing in all cultures).
rules gave the latter more the character of frames than of pure produc-
tions.
10. Managementof uncertainty: Previous PSs had not encoded the un-
certainty in rules. Thus MYCINscertainty factor model (see Part Four)
was an augmentation mandated by the nature of decision making in this
complex medical domain.
involving any number of rules.) Special changes to the rule monitor were
required to prevent this undesirable occurrence (see Chapter 5).
16. The ASKFIRST concept: Pure production systems have not gen-
erally distinguished between attributes that the user may already knowwith
certainty (such as values of laboratory tests) and those that inherently re-
quire inference. In MYCINthis became an important distinction, which
required that each parameter be labeled as an ASKFIRSTattribute (orig-
inally named LABDATA as discussed in Chapter 5) or as a parameter that
should first be determined by using rules rather than by asking the user.
The remainder of this part consists of three papers that summarize MY-
CIN and its use of production rules. In order to orient the reader to
MYCINsoverall motivation and design, we first include as Chapter 4 an
introductory paper that provides an overview of the system as of 1978
(approximately the time when development of the medical knowledge base
stopped). Chapter 5 is the original detailed description of MYCINfrom
1975. It provides technical information on the systems representation and
control mechanisms. Chapter 6 is a brief paper from 1977 that discusses
the way in which production rules were adapted to deal with the algo-
rithmic knowledge regarding therapy selection.
4
The Structure of the
MYCIN System
67
68 The Structure of the MYCIN
System
PhysicianUser
Dynamic Explanation ]
1
Static Factual
Patient andJudgmental
Program
Data Knowledge
i
Knowledge
I Acquisition
Program
T
InfectiousDisease
Expert
Information about the patient and conclusions drawn during the consul-
tation are represented as associative (object-attribute-value) triples. The
The Consultation Program 69
1) Patientsname:(first-last)
** FREDBRAUN
2) Sex: (Questions1-3 request
**M background
patient data.)
3) Age:
** 55
4) Are there anyculturesfor FredBraunwhichmayberelated to
the presentillness, andfromwhichorganisms havebeengrown
successfullyin the microbiology
laboratory?
**y
........ CULTURE-1
........
5) Fromwhatsite wasthe specimenfor CULTURE-1 taken?
** BLOD
= BLOOD (System
correctstypingerror.)
6) Pleasegive the date and time whenCULTURE-1 wasobtained.
(mo/da/yrtime)
** JUN20, 1977
Thefirst organism isolated fromthe bloodculture of 20-JUN-77
(CULTURE-I)will bereferred to as:
........ ORGANISM-1........
7) Enter the laboratory-reported identity of ORGANISM-l:
**UNKNOWN (Typicallyidentity is notyet known.)
8) Thestain (gramor ZiehI-Neelsenacid-fast) of ORGANISM-l:
** NEG (Preliminarylab results give some clues.)
9) Is ORGANISM-1 rod or coccus(etc.):
** ROD
10) Whatis the formof the individualorganisms (e.g. lancet-
shaped
for cocci, fusiformfor rods,etc.)?
** FUSlFORM
(... morequestions
follow in orderto gathersufficient informationto infer the identity
andsignificanceof the infecting organisms...)
23) Whatis the suspected portal of entry of the gramneg rod fromthe
bloodculture of 20-JUN-77 (ORGANISM-I) into this sterile site?
** GI(6) (Userindicatesthe uncertaintyof his response.)
24) Is FredBrauna burnpatient?
**N
25) Is FredBrauna compromised host (e.g. alcoholic,
sickle-cell-disease, immunosuppressed)?
**N
26) Is FredBraunfebrile?
**y
FIGURE4-2 continued
RULE036
PREMISE: (SAND (SAME CNTXTGRAMGRAMNEG)
(SAME CNTXTMORPHROD)
(SAMECNTXTAIR ANAEROBIC))
ACTION: (CONCLUDECNTXTIDENTITY BACTEROIDES
TALLY.6)
IF: 1) .Thegramstain of the organism is gramneg,and
2) Themorphologyof the organism is rod, and
3) Theaerobicityof theorganism is anaerobic
THEN:Thereis suggestiveevidence(.6) that the identity
of theorganism
is bacteroides
The modularity of rules simplifies the task of updating the knowledge base.
Individual rules can be added, deleted, or modified without drastically
affecting the overall performance of the system. And because each rule is
a coherent chunk of knowledge, it is a convenient unit for explanation
purposes. For example, to explain why the system is asking a question
during the consultation, a first approximation is simply to display the rule
currently under consideration.
The stylized nature of the rules is useful for many operations. While
the syntax of the rules permits the use of any LISP function, there is a
small set of standard predicates that make up the vast majority of the rules.
The system contains information about the use of these predicates in the
form of function templates. For example, the predicate SAMEis described
as follows:
The system can use these templates to "read" its own rules. For example,
the template shown here contains the standard tokens CNTXT,PARM,
and VALUE (for context, parameter, and corresponding value), indicating
the components of the associative triple that SAMEtests. If the clause
above appears in the premise of a given rule, the system can determine
that the rule needs to know the site of the culture, and that the rule can
only succeed if that site is, in fact, blood. Whenasked to display rules that
are relevant to blood cultures, MYCIN will be able to choose that rule.
An important function of the templates is to permit MYCINto pre-
compute automatically (at system generation time) the set of rules that
conclude about a particular parameter; it is this set that the rule monitor
retrieves when the system needs to deduce the value of that parameter.
The system can also read rules to eliminate obviously inappropriate
ones. It is often the case that, of a large set of rules under consideration,
several are provably false by information already known. That is, the in-
formation needed to evaluate one of the clauses in the premise has already
been determined, and that clause is false, thereby making the entire prem-
ise false. By reading the rules before actually invoking them, many can be
immediately discarded, thereby avoiding the deductive work necessary in
evaluating the premise clauses that precede the false one (this is called the
preview mechanism). In some cases this means the system avoids the useless
search of one or more subgoal trees, when the information thereby de-
duced would simply be overridden by the demonstrably false premise.
Another more dramatic case occurs when it is possible, on the basis of
information currently available, to deduce with certainty the value of some
parameter that is needed by a rule. This is the case when there exists a
Explanation Capability 73
chain of one or more rules whose premises are known (or provable, as
above) with certainty and that ultimately conclude the desired value with
certainty. Since each rule in this chain must have a certainty factor of 1.0,
we term such a chain a unity path; and since a value known with certainty
excludes all other potential values, no other rules need be tried. MYCIN
always seeks a unity path before trying a set of rules or asking a question;
typically, this means "commonsense"deductions are made directly, without
asking the user "silly" questions or blindly invoking all the rules pertaining
to the goal. Since there are usually few rules on any potential unity path,
the search tends to be small.
The ability to read rules opens the way to the writing of rules that
manipulate other rules. Weterm such rules meta-rules (see Part Nine); they
are used to make deductions not about the medical entities of the domain
but about strategies to be used by the system. Whenever the rule inter-
preter is about to invoke a list of rules to establish somegoal, it first applies
any meta-rules associated with that goal. These meta-rules can reorder or
prune the rule list to make the search more suitable for the given case.
4.2Explanation Capability
** IS ORGANISM-1 A STREPTOCOCCUS?
No.Theidentity of ORGANISM-1is e.coli (.74) bacteroides(.7)
klebsiella-pneumoniae
(.4) pseudornonas-aeruginosa (.22) enterobacter
(.21) protaus-non-mirabilis
(.19) serratia (.03) salmonella
**WHYDID YOURULEOUTSTREPTOCOCCUS AS A POSSIBILITY FORORGANISM-l?
RULE033could havebeenusedto determinethat the identity of
ORGANISM-1is streptococcus.However,clause 2 of RULE033 ["the
morphologyof the organismis coccus"]wasalreadyknownto be false
for ORGANISM-I,so the rule wasnevertried.
** HOWDO YOUKNOWTHATCULTURE-1WASFROMA STERILE SOURCE?
I usedRULE189 to concludethat this bloodculture wastakenfroma
sterile source.Thisgavea cumulativeCFof (1.0). Thelast
questionaskedbefore the conclusionwasmade was7.
** WHATDOESGU MANIPULATION TELL YOUABOUTTHE IDENTITY OF AN ORGANISM?
Therules listed belowuse:
whetherthe patient hashada genito-urinarymanipulativeprocedure
to concludeabout:
the identity of the organism
156, 163, 190
Whichdo youwish to see?
** 156
RULE156
IF: 1) Thesite of the cultureis blood,and
2) Thegramstain of the organismis gramneg, and
3) Themorphology of the organismis rod, and
4) Theportal of entryof the organism is udne,and
5) Thepatient hasnot hada genito-urinarymanipulativeprocedure,
and
6) Cystitis is not a problemfor whichthe patient hasbeentreated
THEN: Thereis suggestive evidence(.6) that the identity of the
organismis e.coli
** IS BLOOD A STERILESITE?
Yes.Bloodis oneof thosesites that is normally
sterile.
** WHATARETHENONSTERILE SITES?
Thosesites that are normally nonsterile are: CERVIX,
CUTANEOUS-ULCER,
LOCHIA,
NOSE,SKIN,STOOL,
THROAT,URETHRA, VAGINA.
** IS BACTEROIDES AEROBIC?
No,the aerobicityof bacteroides
is anaerobic.
questions (see Figure 4-4) seek the value of a certain parameter, how this
value was concluded, and how this parameter was used. Questions about
the value of a given parameter of an object are answered by simply dis-
playing the current hypothesis regarding it. To explain how the value of a
parameter was concluded, MYCIN retrieves the list of rules that were suc-
cessfully applied and prints them, along with the conclusions drawn. Al-
ternatively, if the user supplied the value by answering a previous question,
this is noted. More general questions about how information is used or
concluded are answered by retrieving the relevant rules from the rule base.
Explanation Capability 75
the user is asked a question, he or she can delay answering it and instead
ask why the question was asked. Since questions are asked in order to
establish the truth of the premise of some rule, a simple answer to WHY
is "because Im trying to apply the following rule." Successive WHY ques-
tions unwind the chain of subgoals, citing the rules that led to the current
rule being tried.
Besides examining the current line of reasoning, the user can also ask
about previous decisions, or about how future decisions might be made,
by giving the HOWcommand.Explaining how the truth of a certain clause
was established is accomplished as described above for the general QA
Module. To explain how a presently unknownclause might be established,
MYCIN retrieves the set of rules that the rule interpreter would select to
establish that clause and selects the relevant rules from among them by
"reading" the premises for applicability and the conclusions for relevance
to the goal.
4.3Knowledge Acquisition
simple is occasionally at odds with the need to encode the many aspects of
medical decision making. The backward chaining of rules by the deductive
system is also often a stumbling block for experts who are new to the
system. However, they soon learn to structure their knowledge appropri-
ately. In fact, some experts have felt that encoding their knowledge into
rules has helped them formalize their own view of the domain, leading to
greater consistency in their decisions.
5
Details of the Consultation
System
Edward H. Shortliffe
78
System Knowledge 79
which is the foundation for both the systems advice and its explanation
capabilities (to be described in Part Six).
Section 5.3 is devoted to an explanation of the programs context tree,
i.e., the network of interrelated organisms, drugs, and cultures that char-
acterize the patient and his or her current clinical condition. The need for
such a data structure is clarified, and the method for propagation (growth)
of the tree is described.
The final tasks in MYCINs clinical problem area are the identification
of" potentially useful drugs and the selection of the best drug or drugs
from that list. MYCINsearly mechanism for making these decisions is
discussed in Section 5.4 of this chapter. Later refinements are the subject
of Chapter 6.
Section 5.5 discusses MYCINsmechanisms for storing patient data
and for permitting a user to change the answer to a question. As will be
described, these two capabilities are closely interrelated.
In Section 5.6 we briefly mention extensions to the system that were
contemplated when this material was written in 1975. Several of these
capabilities were eventually implemented.
P(hle) = means
IF: e is known to be true
THEN:conclude that h is true with probability X
Legal:
[1] A&B&C~D
[2] A & (B or C) -
[3] (A or B or C) & (D or E)
Illegal:
[4] AorBorC~D
[5] A&(Bor(C&D))-E
[6] A -o D
[7] B --, D
[8] C ~ D
[9] A&C&D--.E
[10] A&B~E
RULE037
IF: 1) Theidentity of the organismis notknown with
certainty,
and
2) Thestain of the organism is gramneg, and
3) Themorphology of the organism is rod, and
4) Theaerobicity of theorganism is aerobic
THEN: There is stronglysuggestive evidence(.8) thatthe
classof theorganism is enterobacteriaceae
RULE145
IF: 1) Thetherapyunderconsideration is one
of: cephalothin clindamycin
erythromycin
lincomycin vancomycin,and
2) Meningitis is aninfectiousdiseasediagnosis
forthepatient
THEN: It is definite(1) thetherapy underconsideration
is nota potentialtherapy
for useagainst
the
organism
RULE060
iF: Theidentity of theorganism
is bacteroides
THEN:I recommend therapychosenfromamong the followingdrugs:
1 - clindamycin (.99)
2 - chloramphenicol(.99)
3 - erythromycin (.57)
4- tetracycline (.28)
5- carbenicillin (.27)
Before we can explain how rules such as these are invoked and eval-
uated, it is necessary to describe further MYCINsinternal organization.
We shall therefore temporarily digress in order to lay some groundwork
for the description of the evaluation [unctions in Section 5.1.5.
JThe use of the wordcontext should not be confusedwith its meaningin high-level languages
that permit temporary saving of all information regarding a programs current status--a
commonmechanismfor backtracking and parallel-processing implementations.
System Knowledge 83
--0
System Knowledge 85
The 200 rules currently used by MYCIN 2 are not explicitly linked in a
decision tree or reasoning network. This feature is in keeping with our
desire to keep system knowledge modular and manipulable. However, rules
are subject to categorization in accordance with the context-types for which
they are most appropriately invoked. For example, some rules deal with
organisms, somewith cultures, and still others deal solely with the patient.
MYCINscurrent rule categories are as follows (context-types to which they
may be applied are enclosed in parentheses):
Every rule in the MYCINsystem belongs to one, and only one, of these
categories. Furthermore, selecting the proper category for a newly ac-
quired rule does not present a problem. In fact, category selection can be
automated to a large extent.
Consider a rule such as this:
RULE124
IF:1)The
siteofthecultureis throat,and
2)Theidentity
oftheorganism is streptococcus
THEN:Thereis stronglysuggestive
evidence
(.8)that
thesubtypeof theorganismJsnotgroup-D
This is one of MYCINsORGRULES and may thus be applied to either a
CURORGS context or a PRIORORGScontext. Referring back to Figure
5-1, suppose RULE124 were applied to ORGANISM-2. The first condition
in the premise refers to the site of the culture from which ORGANISM-2
was isolated (i.e., CULTURE-2) and not to the organism itself (i.e., orga-
nisms do not have sites, but cultures do). The context tree is therefore
important for determining the proper context when a rule refers to an
attribute of a node in the tree other than the context to which the rule is
being explicitly applied. Note that this means that a single rule may refer
to nodes at several levels in the context tree. The rule is categorized simply
on the basis of the lowest context-type (in the tree) that it mayreference.
Thus RULE124 is an ORGRULErather than a CULRULE.
Note that the last two examples are different from the others ~n that they
represent a rather different kind of relationship. In fact, several authors
would classify the first six as "relations" and the last two as "predicates,"
using the simpler notation:
MAN (BOB)
-WOMAN(BOB)
Yes-No Parameter
FEBRILE: <FEBRILEis an attribute of a patient and is therefore a member of
the list PROP-PT>
EXPECT: (YN)
LOOKAHEAD: (RULE149 RULE109 RULE045)
PROMPT:(Is * febrile?)
TRANS: (* IS FEBRILE)
Single-Valued Parameter
IDENT: <IDENTis an attribute of an organism and is therefore a member of
the list PROP-ORG>
CONTAINED-IN: (RULE030)
EXPECT: (ONEOF (ORGANISMS))
LABDATA: T
LOOKAHEAD: (RULE004 RULE054 ... RULE168)
PROMPT:(Enter the identity (genus) of*:)
TRANS: (THE IDENTITY OF *)
UPDATED-BY: (RULE021 RULE003 ... RULE166)
Multi-Valued Parameter
INFECT: <INFECTis an attribute of a patient and is therefore a member of
the list PROP-PT>
EXPECT: (ONEOF (PERITONITIS BRAIN-ABCESS MENINGITIS
BACTEREMIA UPPER-URINARY-TRACT-INFECTION ...
ENDOCARDITIS))
LOOKAHEAD: (RULE115 RULE149 ... RULE045)
PROMPTl: (Is there evidence that the patient has a (VALU)?)
TRANS: (AN INFECTIOUS DISEASE DIAGNOSIS FOR *)
UPDATED-BY: (RULE157 RULE022 ... RULEI05)
Our solution has been to suppress the YESaltogether and simply to say:
PATIENT-1 IS FEBRILE
System Knowledge 91
Certainty factors are used in two ways. First, as noted, the value of
every clinical parameter is stored with its associated certainty factor. In this
case the evidence E stands for all information currently available to MY-
92 Details of the ConsultationSystem
x
A&B&C~D
This rule may also be represented as CF[hl,e] = .7, where hi is the hy-
pothesis that the organism (context of the rule) is St reptococcus and e is
the evidence that it is a gram-positive coccus growing in chains.
Since diagnosis is, in effect, the problem of selecting a disease from a
list of competing hypotheses, it should be clear that MYCINmay simul-
taneously be considering several hypotheses regarding the value of a clin-
ical parameter. These hypotheses are stored together, along with their CFs,
for each node in the context tree. We use the notation Val[C,P] to signify
the set of all hypotheses regarding the value of the clinical parameter P
for the context C. Thus, if MYCINhas reason to believe that ORGANISM-
1 may be either a Streptococcus or a Staphylococcus, but Pneumococcus has
been ruled out, its dynamic data base might well show:
VaI[ORGANISM-I,IDENT]
= ((STREPTOCOCCUS
.6)(STAPHYLOCOCCUS
(DIPLOCOCCUS-PNEUMONIAE
-1))
It can be shown that the sum of the CFs for supported hypotheses
regarding a single-valued parameter (i.e., those parameters for which the
hypotheses are mutually exclusive) cannot exceed 1 (Shortliffe and Buch-
anan, 1975). Multi-valued parameters, on the other hand, may have several
hypotheses that are all known to be true, for example:
VaI[PATIENT-1,ALLERGY]
= ((PENICILLIN
1)(AMPICILLIN
(CARBENICILLIN
1)(METHICILLIN
VaI[ORGANISM-I,IDENT]
= ((STREPTOCOCCUS
1)(STAPHYLOCOCCUS-1)
(DIPLOCOCCUS-PNEUMONIAE
-1))
System Knowledge 93
Figure 5-4 shows the relationship among these functions for yes-no param-
eters.
There are nine predicates in the category <func2>. Unlike the
<funcl> predicates, these functions control conditional statements re-
garding specific values of the clinical parameter in question. For example,
SAME{ORGANISM-I,IDENT,E.COLI] is an invocation of the <func2>
SystemKnowledge 95
NOTKNOWN ..I
"I
KNOWN
4 NOTDEFINITE ~)
-1 -.2 0 .2
J J i
t
DEFINITE
NOTKNOWN
J
KNOWN ( KNOWN
J
r "1
NOTDEFINITE ;)
-1 -.2 0 .2
I I I
tDEFINITE DEFINITE
t
FIGURE5-4 Diagram indicating the range of CF values over
which the <funcl> predicates hold true when applied to yes-
no clinical parameters.
The names of the functions have been selected to reflect their semantics.
Figure 5-5 shows a graphic representation of each function and also ex-
plicitly states the interrelationships among them.
Note that SAME and THOUGHTNOTare different from all the
other functions in that they return a number (CF) rather than T if the
defining condition holds. This feature permits MYCINto record the de-
gree to which premise conditions are satisfied. In order to explain this
SystemKnowledge 97
THOUGHTNOT ~- ) ( ~ SAME
7 i
VNOTKNOWN
I"
I. MIGHTBE
l" "l
( NOTDEFNOT ~- ) ( ~ NOTDEFIS ~- )
-1 -.2 0 .2 +1
I I I I
t DEFNOT
t
DEFIS
SAME or NOTSAME = THOUGHTNOTor MIGHTBE = T
NOTSAME -- VNOTKNOWNor THOUGHTNOT
THOUGHTNOT - NOTDEFNOT or DEFNOT
MIGHTBE = VNOTKNOWNor SAME
SAME = NOTDEFIS or DEFIS
SAME[ORGANISM-1
,IDENT,STREPTOCOCCUS]
SAME[ORGANISM-1
,IDENT,STAPHYLOCOCCUS]
SAME[ORGANISM-I,IDENT,
STREPTOCOCCUS]
= .7
SAME[ORGANISM-I,IDENT,
STAPHYLOCOCCUS]
= .3
whereas KNOWN[ORGANISM-I,IDENT]
=T
and NOTDEFIS[ORGANISM-I,IDENT,
STREPTOCOCCUS]
= T
PREMISE:(SAND
(SAMECNTXTGRAMGRAMNEG)
(SAMECNTXTMORPH
ROD)
(SAMECNTXT
AIRAEROBIC))
ACTION:(CONCLUDE
CNTXT
CLASSENTEROBACTERIACEAE
TALLY.8)
System Knowledge 99
ORGANISMS
is the name of a linear list containing the names of all bac-
100 Details of the ConsultationSystem
(ONEOF
ENTEROBACTERIACEAE
(ORGANISMS)
G COCCI C-COCCI)
RULE030
IF: Theidentityof theorganism is known
withcertainty
THEN:It is definite(1) that theseparameters
- GRAM
MORPH AIR- shouldbetransferredfromthe identity
of the organism
to this organism
Specialized Functions
The efficient use of knowledge tables requires the existence of four spe-
cialized functions (the category <special-func> from Section 5.1.1).
explained below, each function attempts to add members to a list named
GRIDVALand returns T if at least one element has been found to be
placed in GRIDVAL.
Then:
If you know the portal of entry of the current organism and also
know the pathogenic bacteria normally associated with that site, you
have evidence that the current organism is one of those pathogens
so long as there is no disagreement on the basis of gram stain,
morphology, or aerobicity.
Note that GRID sets up the initial value of GRIDVAL for use by SAME2,
which then redefines GRIDVAL for use in the action clause. This rule is
translated (to somewhatstilted English) as follows:
IF: 1) Thelist of likely pathogens associated with the
portal of entry of the organism is known, and
2) This current organismandthe members youare
considering agreewith respectto the following
properties: GRAM MORPH AIR
THEN: Thereis stronglysuggestiveevidence (3) that
eachof themis the identity of this current
organism
Rules are translated into a subset of English using a set of recursive func-
tions that piece together bits of text. Weshall demonstrate the process
using the premise condition (GRID (VAL CNTXT PORTAL) PATH-
FLORA),which is taken from the rule in the preceding section.
The reader will recall that every clinical parameter has a property
named TRANSthat is used for translation (Section 5.1.3). In addition,
every function, simple list, or knowledge table that is used by MYCINs
rules also has a TRANSproperty. For our example the following TRANS
properties are relevant:
GRID: (THE (2) ASSOCIATED
WITH(1) IS KNOWN)
VAL: (((21
PORTAL: (THE PORTALOFENTRYOF*)
PATH-FLORA: (LIST OFLIKELYPATHOGENS)
Use of the Rules to Give Advice 103
All other portions of rules use essentially this same procedure for
translation. An additional complexity arises, however, if it is necessary to
negate the verbs in action or else clauses whenthe associated CF is negative.
The translator program must therefore recognize verbs and know how to
negate them when evidence in a premise supports the negation of the
hypothesis that is referenced in the action of the rule.
The discussion in Section 5.1 was limited to the various data structures
used to represent MYCINsknowledge. The present section proceeds to
an explanation of how MYCIN uses that knowledge in order to give advice.
MYCINs
task involves a four-stage decision problem:
This rule is one of MYCINs PATRULES (i.e., its context is the patient)
and is knownas the goal rule for the system. A consultation session with
MYCIN results from a simple two-step procedure:
1. Create the patient context as the top node in the context tree (see Sec-
tion 5.3 for an explanation of hownodes are added to the tree).
2. Attempt to apply the goal rule to the newlycreated patient context.
After the second step, the consultation is over. Thus we must explain how
the simple attempt to apply the goal rule to the patient causes a lengthy
consultation with an individualized reasoning chain.
WhenMYCIN first tries to evaluate the premise of the goal rule, the
first condition requires that it knowwhether there is an organism that
requires therapy. MYCIN then reasons backwards in a manner that may
be informally paraphrased as follows:
Use of the Rules to Give Advice 105
CONSIDER THE
FIRST CONDITION
IN THE PREMISE
OF THE RULE
HAS
NECESSARY
no INFORMATION BEEN CONSIDER THE
GATHEREDTO DECIDE NEXT CONDITION
IF THE CONDITION IN THE PREMISE
IS TRUE?
GATHER THE
NECESSARY
INFORMATION
USING THE FINDOUT
MECHANISM
IS ~ yes
THE CONDITION -
l
no (or unknown)
~ no
ADD THE
CONCLUSION OF
REJECT THE RULE TO THE
THE ONGOING RECORD
RULE OF THE CURRENT
CONSULTATION
yes
1
RETRIEVE Y = LIST OF RULES
1
ASK USER FOR THE VALUE
WHICH MAY AID IN DEDUCING
THE VALUE OF THE PARAMETER
OF THE PARAMETER
]
I
APPLY MON/TOR TO EACH RULE
IN THE LIST Y
RETURN
I WHICH MAYYAID
J
i
=
IN DEDUCING
.o.s
iI
THE VALUE OF THE PARAMETER
its decision process (Figure 5-7). Thus IDENT is marked as being LAB-
DATAin Figure 5-2.
Recall that the UPDATED-BY property is a list of all rules in the system
that permit an inference to be made regarding the value of the indicated
parameter. Thus UPDATED-BYis precisely the list called Y in Figure
5-7. Every time a new rule is added to MYCINs knowledge base, the name
of the rule is added to the UPDATED-BY property of the clinical param-
108 Details of the Consultation
System
eter referenced in its action or else clause. Thus the new rule immediately
becomes available to FINDOUTat times when it may be useful. It is not
necessary to specify explicitly its interrelationships with other rules in the
system.
Note that FINDOUTis accessed from the MONITOR,but the MON-
ITOR may also be accessed from FINDOUT.This recursion allows self-
propagation of a reasoning network appropriate for the patient under
consideration and selects only the necessary questions and rules. The first
rule passed to the MONITOR is always the goal rule. Since the first con-
dition in the premise of this rule references a clinical parameter named
TREATFOR,and since the value of TREATFORis of course unknown
before any data have been gathered, the MONITORasks FINDOUTto
trace the value of TREATFOR. This clinical parameter is not LABDATA,
so FINDOUTtakes the left-hand pathway in Figure 5-7 and sets Y to the
UPDATED-BYproperty of TREATFOR,the two-element list (RULE090
RULE149). The MONITOR is then called again with RULE090as the rule
for consideration, and FINDOUT is used to trace the values of clinical
parameters referenced in the premise of RULE090.Note that this process
parallels the informal paraphrase of MYCINsreasoning given above.
It is important to recognize that FINDOUTdoes not check to see
whether the premise condition is true. Instead, the FINDOUTmechanism
traces the clinical parameter exhaustively and returns its value to the MON-
ITOR, where the conditional expression may then be evaluated. 4 Hence
FINDOUT is called one time at most for a clinical parameter (in a given
context--see Section 5.3). When FINDOUTreturns a value to the MON-
ITOR, it marks the clinical parameter as having been traced. Thus when
the MONITORreaches the question "HAS ALL NECESSARYINFOR-
MATION BEEN GATHERED TO DECIDE IF THE CONDITION IS
TRUE?" (Figure 5-6), the parameter is immediately passed to FINDOUT
unless it has been previously marked as traced.
Figure 5-8 is a portion of MYCINsinitial reasoning chain. In Figure
5-8 the clinical parameters being traced are underlined. Thus REGIMEN
is the top goal of the system (i.e., it is the clinical parameter in the action
clause of the goal rule). Below each parameter are the rules (from the
UPDATED-BY property) that may be used for inferring the parameters
value. Clinical parameters referenced in the premise of each of these rules
are then listed at the next level in the reasoning network. Rules with mul-
tiple premise conditions have their links numbered in accordance with the
order in which the parameters are traced (by FINDOUT).ASK1indicates
that a parameter is LABDATA, so its value is automatically asked of the
user when it is needed. ASK2refers to parameters that are not LABDATA
but for which no inference rules currently exist, e.g., if the dose of a drug
is adequate. One of the goals in the future development of MYCINsknowl-
I-. ooo
Z~ OOe
N -
|
~{~"
11 ~ w~
~
u.i
-!-~
110 Details of the Consultation System
dition being evaluated in the MONITOR. Suppose, for example, the MON-
ITOR were evaluating the condition (SAME CNTXT INFECT MENIN-
GITIS), i.e., "Meningitis is an infectious disease diagnosis for the patient."
If FINDOUT were to ask the question using the regular PROMPT strategy,
it would request:
Whatis the infectiousdisease
diagnosis
for PATIENT-1
?
The problem is that the patient may have several diagnoses, each of which
can be expressed in a variety of ways. If the physician were to respond:
A meningeal
inflammation
that is probably
of infectiousorigin
MYCIN would be forced to try to recognize that this answer implies men-
ingitis. Our solution has been to customize questions for multi-valued pa-
rameters to reflect the value being checked in the current premise condi-
tion. The PROMPT1 property is used, and questions always expect a yes
or no response:
Is thereevidence
that thepatienthasa meningitis?
The advantages of this approach are the resulting ability to avoid natural
language processing during the consultation itself and the posing of ques-
tions that are specific to the patient under consideration.
In addition to the automatic spelling-correction capability described
above, there are a number of options that may be utilized whenever MY-
CIN asks the user a question:
Except for questions related to propagation of the context tree, all queries
from MYCIN to the physician request the value of a specific clinical pa-
rameter for a specific node in the context tree. The FINDOUT mechanism
screens the users response, stores it in MYCINsdynamic data base, and
returns the value to the MONITOR for evaluation of the conditional state-
ment that generated the question in the first place. The physicians re-
sponse is stored, of course, so that future rules containing conditions ref-
erencing the same clinical parameter will not cause the question to be asked
a second time.
As has been noted, however, the values of clinical parameters are al-
ways stored along with their associated certainty factors. A physicians re-
sponse must therefore have a CF associated with it. MYCINsconvention
is to assume CF = 1 for the response unless the physician explicitly states
otherwise. Thus the following exchange:
7) Staining
characteristics
of ORGANISM-1
(gram):
**GRAMNEG
If, on the other hand, the user is fairly sure of the answer to a question
but wants to indicate uncertainty, he or she may enter a certainty factor in
parentheses after the response. MYCINexpects the number to be an in-
teger between - 10 and + 10; the program divides the number by 10 to
obtain a CEUsing integers simplifies the users response and also discour-
Use of the Rules to Give Advice 113
This example also shows how the dictionary is used to put synonyms into
standardized form for the patients data base (i.e., Enterococcus is another
name for a group-D Streptococcus).
A variant of this last example is theusers option to enter multiple
responses to a question, as long as each is modified by a CEFor example:
The CFs associated with the parameter values are then used for evaluation
of premise conditions as described earlier. Note that the users freedom to
modify answers increases the flexibility of MYCINsreasoning. Without the
CF option, the user might well have responded UNKNOWN to question
13 above. The demonstrated answer, although uncertain, gives MYCIN
much more information than would have been provided by a response of
UNKNOWN.
This subsection explains the <conclusion> item from the BNF rule
description, i.e., the functions that are used in action or else clauses when
a premise has shown that an indicated conclusion may be drawn. There
are only three such functions, two of which (CONCLISTand TRANS-
LIST) reference knowledge tables (Section 5.1.6) but are otherwise depen-
dent on the third, a function called CONCLUDE. CONCLUDE takes five
arguments:
CNTXT The node in the context tree about which the conclusion is
being made
PARAM The clinical parameter whose value is being added to the
dynamic data base
VALUE The inferred value of the clinical parameter
TALLY The certainty tally for the premise of the rule (see Section
5.1.5)
114 Details of the Consultation System
translates as:
Thereis suggestive
evidence
(.7) that theidentityof theorganism
is streptococcus
If, for example, the rule with this action clause were successfully applied
to ORGANISM-1, an organism for which no previous inferences had been
made regarding identity, the result would be:
VaI[ORGANISM-I,IDENT]
= ((STREPTOCOCCUS
X))
As new rules were acquired from the collaborating experts, it became ap-
parent that MYCINwould need a small number of rules that departed
from the strict modularity to which we had otherwise been able to adhere.
For example, one expert indicated that he would tend to ask about the
typical Pseudomonas-type skin lesions only if he already had reason to be-
lieve that the organism was a Pseudomonas.If the lesions were then said to
be evident, however, his belief that the organism was a Pseudomonaswould
be increased even more. A rule reflecting this fact must somehowimply
an orderedness of rule invocation; i.e., "Dont try this rule until you have
already traced the identity of the organism by using other rules in the
system." Our solution has been to reference the clinical parameter early in
the premise of the rule as well as in the action, for example:
RULE040
IF: 1) Thesite of the cultureis blood,and
2) Theidentity of the organismmaybe pseudomonas, and
3) Thepatient hasecthymagangrenosum skin lesions
THEN: Thereis stronglysuggestive evidence
(.8) that the
identity of theorganismis pseudomonas
having the same parameter in both premise and action are termed se!/-
referencing rules. The ordered invocation of such rules is accomplished by
a generalized procedure described below.
As discussed in Section 5.2.1, a rule such as RULE040is originally
invoked because MYCIN is trying to infer the identity of an organism; i.e.,
FINDOUTis asked to trace the parameter IDENTand recursively sends
the UPDATED-BY list for that parameter to the MONITOR.When the
MONITOR reaches RULE040, however, the second premise condition ref-
erences the same clinical parameter currently being traced by FINDOUT.
If the MONITORmerely passed IDENT to FINDOUTagain (as called
for by the simplified flow chart in Figure 5-6), FINDOUTwould begin
tracing IDENTfor a second time, RULE040would be passed to the MON-
ITORyet again, and an infinite loop would occur.
The solution to this problem is to let FINDOUT screen the list called
Y in Figure 5-7, i.e., the UPDATED-BY property for the parameter it is
about to trace. Y is partitioned by FINDOUT into regular rules and self-
referencing rules (where the latter category is defined as those rules that
also occur on the LOOKAHEAD list for the clinical parameter). FIND-
OUT passes the first group of rules to the MONITORin the normal
fashion. After all these rules have been tried, FINDOUT marks the pa-
rameter as having been traced and then passes the self-referencing rules
to the MONITOR.In this way, when the MONITOR considers the second
condition in the premise of RULE040,the condition is evaluated without
a call to FINDOUTbecause the parameter has already been marked as
traced. Thus the truth of the premise of a self-referencing rule is deter-
mined on the basis of the set of non-self-referencing rules, which were
evaluated first. If one of the regular rules permitted MYCIN to conclude
that an organism might be a Pseudomonas, RULE040might well succeed
when passed to the MONITOR.This mechanism for handling self-refer-
encing rules satisfies the intention of an expert when he or she gives us
decision criteria in self-referencing form.
It should be noted that this approach minimizes the potential for self-
referencing rules to destroy certainty factor commutativity. By holding
these rules until last, we insure that the certainty tally for any of their
premises (see Section 5.1.5) is the same regardless of the order in which
the non-self-referencing rules were executed. If there is more than one
self-referencing rule successfully executed for a given context and param-
eter, however, the order of their invocation may affect the final CE The
approach we have implemented thus seeks merely to minimize the poten-
tial undesirable effects of self referencing rules.
[q] X ::> Y
means that decision rule [q] uses clinical parameter X to reach a conclusion
regarding the value of clinical parameter Y. Thus a self-referencing rule
may be represented by:
[a] E ::> E
[1] A ::> B
[2] B ::> C
[3] C ::> D
[4] D ::> A
Rule [1], for example, says that under certain unspecified conditions, the
value of A can be used to infer the value of B. Now suppose that the
MONITOR asks FINDOUTto trace the clinical parameter D. Then MY-
CINs recursive mechanism would create the following reasoning chain:
The mechanismby which the context tree is customized for a given patient
has not yet been discussed. As described in Section 5.2.2, the consultation
system begins simply by creating the patient context and then attempting
to execute the goal rule. All additional nodes in the context tree are thus
added automatically during the unwinding of MYCINsreasoning regard-
ing the premise of the goal rule. This section first explains the data struc-
tures used for creating new nodes. Mechanisms for deciding when new
nodes should be added are then discussed.
PROMPT 1 A sentence used to ask the user whether the first node
of this type should be added to the context tree;
expects a yes-no answer
PROMPT2 A sentence used to ask the user whether subsequent
nodes of this type should be added to the context tree
PROMPT3 Replaces PROMPT1 when it is used. This is a message
to be printed out if MYCIN assumes that there is at
least one node of this type in the tree.
PROPTYPE Indicates the category of clinical parameters (see
Section 5.1.3) that may be used to characterize
context of this type
Propagation of the Context Tree 119
Two sample context-types are shown in Figure 5-9. The following ob-
servations may help clarify the information given in that figure:
Thus the user is familiar with MYCINsinternal names for the cultures,
organisms, and drugs under discussion. The node names may then be used
in MYCINsquestions at times when there may be ambiguity regarding
which node is the current context, e.g.:
PRIORCULS
ASSOCWITH: PERSON
MAINPROPS: (SITE WHENCUL)
PROMPTI: (Were any organisms that were significant (but no longer
require therapeutic attention) isolated within the last
approximately 30 days?)
PROMPT2: (Any other significant earlier cultures from which pathogens
were isolated?)
PROPTYPE: PROP-CUL
SUBJECT: (PRCULRULES CULRULES)
SYN: (SITE (this * culture))
TRANS: (PRIOR CULTURES OF *)
TYPE: CULTURE-
CURORG
ASSOCWITH: CURCUL
MAINPROPS: (IDENT GRAM MORPH SENSITIVS)
PROMPT2: (Any other organisms isolated from * for which you would like
a therapeutic recommendation?)
PROMPT3:(I will refer to the first offending organism from * as:)
PROPTYPE: PROP-ORG
SUBJECT: (ORGRULES CURORGRULES)
SYN: (IDENT (the *))
TRANS: (CURRENT ORGANISMS OF *)
TYPE: ORGANISM-
There are two situations under which MYCINattempts to add new nodes
to the context tree. The simpler case occurs when rules explicitly reference
contexts that have not yet been created. Suppose, for example, MYCINis
trying to determine the identity of a current organism and therefore in-
vokes the following CURORGRULE:
IF: 1)Theidentity of theorganism
is notknown
withcertainty,
and
2) Thiscurrentorganismandpriororganisms
of
thepatientagreewithrespect
tothefollowing
properties: GRAM MORPH
THEN:There is weaklysuggestive
evidence
thateach
of
themis a priororganism
withthesame
identity
asthiscurrent organism
The second condition in the premise of this rule references other nodes
in the tree, namely nodes of the type PRIORORGS. If no such nodes exist,
the MONITORasks FINDOUTto trace PRIORORGSin the normal fash-
ion. The difference is that PRIORORGS is not a clinical parameter but a
context-type. FINDOUTtherefore uses PROMPT1of PRIORORGSto ask
the user if there is at least one organism. If so, an instantiation of PRIOR-
ORGSis added to the context tree, and its MAINPROPS are traced.
PROMPT2 is then used to see if there are any additional prior organisms,
and the procedure continues until the user indicates there are no more
PRIORORGS that merit discussion. Finally, FINDOUTreturns the list of
prior organisms to the MONITOR so that the second condition in the rule
above can be evaluated.
There is a class of decision rules, the THERULES, that are never invoked
by MYCINsregular control structure because they do not occur on the
UPDATED-BY list of any clinical parameter. These rules contain sensitivity
information for the various organisms known to the system, for example:
iF: Theidentity
of theorganism
is pseudomonas
THEN:I recommend
therapychosenfromamong
thefollowing
drugs:
1- colistin (.96)
2 - polymyxin (.96)
3 - gentamicin(.96)
4- carbenicillin(.65)
5 - sulfisoxazole
(.64)
The numbers associated with each drug are the probabilities that a Pseu-
domonasisolated at Stanford Hospital will be sensitive (in vitro) to the in-
Selection of Therapy 123
dicated drug. The sensitivity data were acquired from Stanfords micro-
biology laboratory (and could easily be adjusted to reflect changing
resistance patterns at Stanford or the data for some other hospital desiring
a version of MYCIN with local sensitivity information). Rules such as the
one shown here provide the basis for creating a list of potential therapies.
There is one such rule for every kind of organism known to the system.
MYCINselects drugs only on the basis of the identity of offending
organisms. Thus the programs first task is to decide, for each current
organism deemed to be significant, which hypotheses regarding the or-
ganisms identity (IDENT)are sufficiently likely that they must be consid-
ered in choosing therapy. MYCIN uses the CFs of the various hypotheses
in order to select the most likely identities. Each identity is then given an
item number (see below) and the process is repeated for each significant
current organism. The Set of Indications for therapy is then printed out,
e.g.:
erations. MYCINs strategy is to select the best drug on the basis of sensi-
tivity information but then to consider contraindications fi)r that drug.
Only if a drug survives this second screening step is it actually recom-
mended. Furthermore, MYCINalso looks for ways to minimize the num-
ber of drugs recommended and thus seeks therapies that cover for more
than one of the items in the Set of Indications. The selection/screening
process is described in the following two subsections.
The procedure used for selecting the apparent first-choice drug is a com-
plex algorithm that is somewhat arbitrary and is thus currently (1974)
under revision. This section describes the procedure in somewhat general
terms since the actual LISP functions and data structures are not partic-
ularly enlightening.
There are three initial considerations used in selecting the best therapy
for a given item:
5Ed.note: Amikacin
andtobramyciuwerenot yet availablein 1974whenthis rule waswritten.
Theknowledgebase was later updatedwith the newdrug infi)rmation.
Selection of Therapy 125
Item 1 is the second-choice drug for Item 2 and if the second-choice drug
for Item 2 is almost as strongly supported as the first-choice drug, Item
ls first-choice drug also becomesItem 2s first-choice drug. This strategy
permits MYCINto attempt to minimize the number of drugs to be rec-
ommended.
A similar strategy is used to avoid giving two drugs of the same drug
class. For example, MYCIN knows that if the first choice for one item is
penicillin and the first choice for another is ampicillin, then the ampicillin
maybe given for both indications (because ampicillin covers essentially all
organisms sensitive to penicillin).
In the ideal case MYCIN will find a single drug that effectively covers
for all the items in the Set of Indications. But even if each item remains
associated with a different drug, a screening stage to look for contraindi-
cations is required. This rule-based process is described in the next sub-
section. It should be stressed, however, that the manipulation of drug lists
described above is algorithmic; i.e., it is coded in LISP functions that are
called from the action clause of the goal rule. There is considerable "knowl-
edge" in this process. Since rule-based knowledge provides the foundation
of MYCINs ability to explain its decisions, it would be desirable eventually
to remove this therapy selection method from functions and place it in
6decision rules.
6Ed. note: See the next chapter tor a discussion of howthis was later accomplished.
126 Details of the Consultation System
The user may also ask for second, third, and subsequent therapy recom-
mendations until MYCIN is able to suggest no reasonable alternatives. The
mechanismfor these iterations is merely a repeat of the processes described
above but with recommended drugs removed from consideration.
7Ed. note: This rule ignores any statement of the mechanism whereby its conclusion follows
from its premise. The lack of" underlying "support" knowledge accounts for changes intro-
duced in GUIDON when MYCINsrules were used for education. See Part Eight for further
discussion of this point.
Mechanisms
for Storageof Patient Data 127
1. Before asking the question, check to see if the answer is already stored
(in the Patient Data Table--see Step 3 below); if the answer is there, use
that value rather than asking the user; otherwise go to Step 2.
2. Ask the question using PROMPTor PROMPT1as usual.
3. Store the users response in the dynamic record of facts about the pa-
tient, called the Patient Data Table, under the appropriate clinical pa-
rameter and context.
The Patient Data Table, then, is a growing record of the users responses
to questions from MYCIN.It is entirely separate from the dynamic data
record that is explicitly associated with the nodes in the context tree. Note
128 Details of the Consultation
System
that the Patient Data Table contains only the text responses of the user--
there is no CF information (unless included in the users response), nor
are there data derived from MYCINsrule-based inferences.
The Patient Data Table and the FINDOUTalgorithm make the task
of changing answers much simpler. The technique MYCINuses is the
following:
Whena consultation is complete, the Patient Data Table contains all re-
sponses necessary for generating a complete consultation for that patient.
It is therefore straightforward to store the Patient Data Table (on disk or
tape) so that it may be reloaded in the future. FINDOUT will automatically
read responses from the table, rather than ask the user, so a consultation
maybe run several times on the basis of only a single interactive session.
There are two reasons for storing Patient Data Tables for future ref-
erence. One is their usefulness in evaluating changes to MYCINsknowl-
edge base. The other is the resulting ability to reevaluate patients once new
clinical information becomes available.
Mechanisms
for Storageof Patient Data 129
Newrules may have a large effect on the way a given patient case is handled
by MYCIN.For example, a single rule may reference a clinical parameter
not previously sought or may lead to an entirely new chain in the reasoning
network. It is therefore useful to reload Patient Data Tables and run a new
version of MYCIN on old patient cases. A few new questions may be asked
(because their responses are not stored in the Patient Data Table). Conclu-
sions regarding organism identities may then be observed, as may the pro-
grams therapeutic recommendations. Any changes from the decisions
reached during the original run (i.e., when the Patient Data Table was
created) must be explained. When a new version of MYCINevaluates
several old Patient Data Tables in this manner, aberrant side effects of new
rules may be found. Thus a library of stored patient cases provides a useful
mechanism for screening new rules before they become an integral part
of MYCINsknowledge base.
The second use for stored Patient Data Tables is the reevaluation of patient
data once additional laboratory or clinical information becomes available.
If a user answers several questions with UNKNOWN during the initial
consultation session, MYCINsadvice will of course be based on less than
complete information. After storing the Patient Data Table, however, the
physician may return for another consultation in a day or so once he or
she has more specific information. MYCINcan use the previous Patient
Data Table for responses to questions whose answers are still up to date.
The user therefore needs to answer only those questions that reference
new information. A mechanism for the physician to indicate directly what
snew data are available has not yet been automated, however,
A related capability to be implemented before MYCINbecomes avail-
able in the clinical setting is a SAVEcommand. 9 If a physician must leave
the computer terminal midwaythrough a consultation, this option will save
the current Patient Data Table on the disk. Whenthe physician returns to
complete the consultation, he or she will reload the patient record and the
session will continue from the point at which the SAVEcommand was
entered.
It should be stressed that saving the current Patient Data Table is not
the same as saving the current state of MYCINsreasoning. Thus, as we
have stated above, changes to MYCINsrule corpus may result in different
advice from the same Patient Data Table.
The order in which rules are invoked by the MONITOR is currently con-
trolled solely by their order on the UPDATED-BY property of the clinical
parameter being traced.l The order of rules on the UPDATED-BY prop-
erty is also arbitrary, tending to reflect nothing more than the order in
which rules were acquired. Since FINDOUT sends all rules on such lists
to the MONITOR and since our certainty factor combining function is
commutative, the order of rules is unimportant.
Somerules are much more useful than others in tracing the value of
a clinical parameter. For example, a rule with a six-condition premise that
infers the value of a parameter with a low CF requires a great deal of work
(as manyas six calls to FINDOUT) with very little gain. On the other hand,
a rule with a large CF and only one or two premise conditions may easily
provide strong evidence regarding the value of the parameter in question.
It may therefore be wise for FINDOUTto order the rules in the UP-
DATED-BY list on the basis of both information content (CF) and the work
necessary to evaluate the premise. Then if the first few rules are success-
fully executed by the MONITOR, the CF associated with one of the values
of the clinical parameter may be so large that invocation of subsequent
rules will require more computational effort than they are worth. If FIN-
DOUTtherefore ignores such rules (i.e., does not bother to pass them to
the MONITOR), considerable time savings may result. Furthermore, entire
reasoning chains will in some cases be avoided, and the number of ques-
tions asked the user could accordingly be decreased.l~
The MONITOR diagram in Figure 5-6 reveals that conditions are evalu-
ated strictly in the order in which they occur within the premise of the
rule. The order of conditions is therefore important, and the most corn-
t tEd. note: Manyof these ideas werelater implemented and are briefly mentioned
in Chapter
lAnexceptionto this point is the sell-referencingrules--seeSection5.2.3.
V2Ed. note: The preview mechanism in MYCINwas eventually implemented to deal with this
issue,
13Ed. note: It was for this reason that the idea outlined here was never implemented.
132 Details of the Consultation
System
The context tree used by MYCINis the source of one of the systems
primary problems in attempting to simulate the consultation process. Every
node in the context tree leads to the uppermost patient node by a single
pathway. In reality, however, drugs, patients, organisms, and cultures are
not interrelated in this highly structured fashion. For example, drugs are
often given to cover for more than one organism. The context tree does
not permit a single CURDRUG or PRIORDRUGto be associated with
more than a single organism. What we need, therefore, is a network of
contexts in the form of a graph rather than a pure tree. The reasons why
MYCINcurrently needs a tree-structured context network are explained
in Section 5.1.2. Wehave come to recognize that a context graph capability
is an important extension of the current system, however, and this will be
the subject of future design modifications. 15 Whenimplemented, for ex-
ample, it will permit a physician to discuss a prior drug only once, even
though it may have been given to cover for several prior organisms.
William J. Clancey
This chapter is an expanded version of a paper originally appearing in Proceedings of the IJCAI
1977. Used by permission of International Joint Conferences on Artificial Intelligence, Inc.;
copies of the Proceedings are available from William Kaufmann, Inc., 95 First Street, Los
Altos, CA94022.
133
134 Details of the RevisedTherapy
Algorithm
during the optimization process and why the output was not different.
While the maintenance of records for explanation purposes is not new
(e.g., see Winograd, 1972; Bobrow and Brown, 1975; Scragg, 1975a;
1975b), the means that we use to retrieve them are novel, namely a state
transition representation of the algorithm. Our work demonstrates that a
cleanly structured algorithm can provide both sophisticated performance
and a simple, useful explanation capability.
6.1 The
Problem
The main problem of the therapy selector is to prescribe the best drug for
each organism thought to be a likely cause of the infection, while mini-
mizing the total number of drugs. These two constraints often conflict: the
best prescription for, say, four items may require four different drugs,
although for any patient usually no more than two drugs need to be given
(or should be, for reasons of drug interaction, toxic side effects, cost, etc.).
The original therapy program lacked a general scheme for relating
the local constraints (best drug for each item) to the global constraint (few-
est possible number of drugs). As we began to investigate the complexities
of therapy selection, it became necessary to patch the program to deal with
the special cases we encountered. Before long we were losing track of how
any given change would affect the programs output. Wefound it increas-
ingly difficult to keep records during the program execution for later use
in the explanation system; indeed, the logic of the program was too con-
fusing to explain easily. Wedecided to start over, aiming for a more struc-
tured algorithm that would provide sophisticated therapy, and by its very
organization would provide simple explanations for a naive user. The ques-
tion was this: what organization could balance these two, sometimes con-
tradictory, goals?
Because we wanted to formulate judgments that could be provided by
physicians and would appear familiar to them, we decided not to use math-
ematical methods such as evaluation polynomials or Bayesian analysis. On
the other hand, MYCINsinferential rule representation seemed to be
inadequate because of the general algorithmic nature of the problem (i.e.,
iteration and complex data structures). Weturned our attention to sepa-
rating out the optimization criteria of therapy selection from control in-
formation (specifications for iteratively applying the heuristics). As is dis-
cussed below, the key improvement was to encode canonically the
optimization performed by the inner loop of the algorithm.
Our Solution 135
T I
FIGURE6-1Therapy selection viewed as a plan-generate-
and-test process.
IHere we realized that we could group the items into those that should definitely be treated
("most likely") and those that could be left out when three or more drugs would be necessary.
136 Details of the RevisedTherapy
Algorithm
Number
of drugsof eachrank."
Instruction first second third
1 1 0 0
2 2 0 0
3 1 1 0
4 1 0 1
FIGURE
6-2 Instructions for the therapy proposer.
2. propose a recommendation and test it, thus dealing with the global
factors; and
3. make a final recommendation.
6.2.2 Plan
6.2.3 Generate
The second step of the algorithm is to take the ordered drug lists and
generate possible recommendations. This is done by a proposer that selects
subsets of drugs (a recommendation) from the collection of drugs for all
of the organisms to be treated. Selection is directed by a fixed, ordered set
of instructions that specify how manydrugs to select from each preference
group. The first few instructions are listed in Figure 6-2. For example, the
third instruction tells the proposer to select a drug from each of the first
and second ranks. Instructions for one- and two-drug recommendations
are taken from a static list; those for recommendations containing three
or more drugs are generated from a simple pattern.
It should be clear that the ordering of the instructions ensures that
two of the global criteria will be satisfied: prescribing one or two drugs if
possible, and selecting the best possible drug(s) for each organism.
instruction therefore serves as a canonical description of a recommenda-
tion. Consequently, we can "reduce" alternate subsets of drugs to this form
(the number of drugs of each rank) and compare them.
6.2.4 Test
Since all of the drugs for all of the organisms were grouped together for
use by the proposer, it is quite possible that a proposed recommendation
will not cover all of the most likely organisms. For example, the proposal
might have two drugs that are in the first rank for one item but are second
or third for other items, or are not even on their lists. Thus the first step
of testing is to makesure that all of the most likely items are covered.
The second test ensures that each drug is in a unique drug class. For
example, a proposal having both gentamicin and streptomycin would be
rejected because these two drugs are aminoglycosides and therefore cause
a "redundant" effect.
The last test is for patient-specific contraindications. These rules take
into account allergies, age of the patient, pregnancy, etc. These rules are
relatively expensive to apply, so they are done last, rather than applying
them to each possible drug in the plan step. With this test we have dealt
with the last global criterion of therapy selection. The first proposal that
satisfies these three tests becomes the therapy advice. The details of drug
prescription will not be considered further here; it consists primarily of
algorithmic dosage calculation and adjustment in the case of renal failure.
6.2.5 Performance
(dynamic)
.I
I MYCIN
PROGRAM
I
Records
left
behind
Event
History
(static)
TRANSITION
Event
DIAGRAM Structure
Answerto
Question
FIGURE
6-4 Organization of the Explanation System.
(therefore)
<generate and output criteria>
On the other hand, ifa drug is not prescribed, there must be a negative
criterion to explain whyit dropped out of contention if it was on the initial
list. Failure to prescribe can be caused by either failure to consider the
pos .pos
cnteria cnteria
STATE-1 ~ STATE-2 STATE-3 " ~ STATE-N
{goal}
STATE-2 STATE-3
(final} (final}
FIGURE
6-5 The state transition diagram.
140 Details of the Revised TherapyAlgorithm
** WHYDIDNTYOUSUGGESTPENICILLIN IN REC-1FORSTAPH-COAG+?
PENICILLIN
wasnotprescribed
for ITEM-1
in RECOMMENDATION-1
:
PENICILLIN
was discounted
forITEM-1because
it is NOT
DEFINITE
thattheitemis sensitive
to thisdrug.
There
areotherpotentialtherapies
underconsideration
which
aremuchmoredesirable,
viz., currenttherapies
ordrugs
towhich theitem
is definitely
sensitive.
Would
youlike to seesomedetails?
** YES
Thedrugsto which thestaphylococcus-coag-pos
is sensitive
are:cephalothin
(1.0) vancomycin
(1.0)
gentamycin
(1.0) tobramycin(1.0) erythromycin-and-tetracycline
(1.0) chloramphenicol-and-erythromycin
(1.0)
[RULE098 RULE445]
Would
youlike to know aboutthehistory of PENICILLIN
in thedecision
process
upto thispoint? ** YES
-- PENICILLIN
is a treatmentof choicefor staphylococcus-coag-pos
in meningitis.
Butasexplained
above,
PENICILLINwasdiscounted.
drug (plan) or failure of a test. A third possibility is that the drug wasnt
part of an acceptable recommendation, but was otherwise a plausible choice
(when considered alone). In this case, the drug needs to be considered
the context of a full recommendation for the patient. 3 (See Figure 6-9 for
an example.)
Figure 6-6 shows an example of a question concerning why a drug was
not prescribed. In response to a question of this type, the negative criterion
is printed and the user is offered an opportunity to see the positive deci-
sions accrued up to this point. In this example we see that penicillin was
not prescribed because it is not definite that the item is sensitive to this
drug. That is the negative criterion. The fact that penicillin was a potential
treatment of choice permitted its transition to the reranking step. 4 This is
shown in Figure 6-7. When MYCINs rules (as opposed to Interlisp code)
are used to make a transition decision, we can provide further details, as
shown in Figure 6-6.
For questions involving two drugs, e.g., "Why did you prescribe chlor-
amphenicol instead of penicillin for Item-l?", CHRONICLERis invoked
to explain why the rejected drug was not given. Then the user is offered
the opportunity to see why the other drug was given.
To summarize, MYCINleaves behind traces that record the application
3Eventsare recordedas properties of the drugs they involve. Thetrace includes other contexts
such as the item being considered. To deal with iteration, events are of two types: enduring
and pass-specific. Enduringevents represent decisions that, oncemade,are never reconsidered,
e.g., the initial ranking of drugs tbr each organism.Pass-specific events maynot figure in the
final result; they mayindicate computationthat ~ailed to producea solution, e.g., proposing
a drug as part of a specific recommendation.Thus traces are accessed by drug nameand the
context of the computation, including which pass of the generate-and-test process produced
the final solution.
4penicillin is givenfor staph-coag+ only if the organismis knownto be sensitive to that agent.
Comparing Alternative Recommendations 141
1
"itemnot sensitive"
plan2
{final}
FIGURE
6-7 Trace history for the question shownin Figure
6-6.
of the positive and negative criteria. The Explanation System uses a state
transition diagram that represents the steps of the algorithm to retrieve
the relevant traces in a logical order.
It is interesting to note that CHRONICLER is described well by Bob-
row and Browns synthesis, contingent knowledge, and analysis (SCA) par-
adigm for understanding systems (Bobrow and Brown, 1975). Contingent
knowledge is a record of program-synthesized observations for later use
by an analysis program to answer questions or comment on the observed
system. In CHRONICLER the traces and transition diagram constitute the
contingent knowledge structure. Synthesis (abstraction of results) is per-
formed by the therapy selector as it classifies the drugs in the various
decision steps and records its "observations" in traces. Analysis is per-
formed by CHRONICLER as it "reads" the traces, interpreting them in
terms of" the state transition diagram. The meteorology question-answering
system described by Brown et al. (1973) uses a similar knowledge repre-
sentation.
will be a "close call," because one of the recommendations might use better
drugs for the most likely organisms but cover for fewer of the less likely
organisms. Again, it is the ability to encode output canonically that gives
us the ability to make such a direct comparison of alternatives.
Once the user has supplied a set of drugs to cover for all of the most
likely organisms, his or her proposal is tested for the criteria of drug class
uniqueness and patient-specific factors (described in Section 6.2.4). If the
proposal is approved, this recommendation is compared to the programs
choice of therapy, just as the program compares its alternatives to its own
first-choice recommendation. 5 It is also possible to directly invoke the ther-
apy comparison routine.
erence ranks for drugs. Wenow provide text annotations that include ref-
erences and comments about shortcomings and intent.
Finally, we could further develop the tutorial aspects of the Explana-
tion System. Rather than passively answering questions, the Explanation
System might endeavor to teach the user about the overall structure and
philosophy of" the program (upon request!). For example, a user might
appreciate the optimality of the results better if he or she understood the
separation of factors into local and global considerations. Besides explain-
ing the results of a particular run, an Explanation System might charac-
terize individual decisions in the context of the programs overall design.
Parts Six and Eight discuss the issues of explanation and education in more
detail.
6.7 Conclusions
Wehave developed a system that prescribes optimal therapy and is able to
provide simple, useful explanations. The system is based on a number of
design ideas that are summarized as follows:
1. There are relatively few traces (fewer than 50 drugs to keep track of
and fewer than 25 strategies that might be applied).
2. There is a single basic question: Whywas (or was not) a particular drug
prescribed for a particular organism?
Building a Knowledge
Base
7
Knowledge Engineering
Section 7.1 is largely taken from material originally written for Chapter 5 of Building Expert
Systen~ (eds., E Hayes-Roth, D. Waterman, and D. Lenat). Reading, Mass.: Addison-Wesley,
1983.
149
150 KnowledgeEngineering
DATA,TEXTS KNOWLED~GEBASE
All involve transferring, in one way or another, the expertise needed for
high-performance problem solving in a domain from a source to a program.
The source is generally a human expert, but could also be the primary
sources from which the expert has learned the material: journal articles
(and textbooks) or experimental data. A knowledge engineer translates
statements about the domain from the source to the program with more
or less assistance from intelligent programs. And there is variability in the
extent to which the knowledgebase is distinct from the rest of the system.
Handcrafting
Knowledge Engineering
The process of working with an expert to map what he or she knows into
a form suitable fbr an expert system to use has come to be known as
knowledge engineering (Feigenbaum, 1978; Michie, 1973).
As DENDRAL matured, we began to see patterns in the interactions
152 KnowledgeEngineering
between the person responsible for the code and the expert responsible
for the knowledge. There is a dialogue which, at first, is muchlike a systems
analysis dialogue between analyst and specialist. The relevant concepts are
named, and the relations among them made explicit. The knowledge engi-
neer has to become familiar enough with the terminology and structure
of the subject area that his or her questions are meaningful and relevant.
As the knowledge engineer learns more about the subject matter, and as
the specialist learns more about the structure of the knowledge base and
the consequences of expressing knowledge in different forms, the process
speeds up.
After the initial period of conceptualization, in which most of the
framework for talking about the subject matter is laid out, the knowledge
structures can be filled in rather rapidly. This period of rapid growth of
the knowledge base is then followed by meticulous testing and refinement.
Knowledge-engineering tools can speed up this process. For example, in-
telligent editing programs that help keep track of changes and help find
inconsistencies can be useful to both the knowledge engineer and the ex-
pert. At times, an expert can use the tools independently of the knowledge
engineer, thus approaching McCarthys idea of a program accepting advice
from a specialist (McCarthy, 1958). The ARLeditor incorporated in EMY-
CIN (see Chapters 14-16) is a simple tool; the TEIRESIASdebugging
system (discussed in Chapter 9) is a more complex tool. Politakis (1982)
has recently developed a tool for examining a knowledge base for the
EXPERTsystem (Kulikowski and Weiss, 1982) and suggesting changes,
much like the tool for ONCOCIN discussed in Chapter 8.
A recent experiment in knowledge engineering is the ROGETpro-
gram (Bennett, 1983), a knowledge-based system that aids in the concep-
tualization of knowledge bases for EMYCIN systems. Its knowledge is part
of what a knowledge engineer knows about helping an expert with the
initial process of laying out the structure of a new body of knowledge. It
carries on a dialogue about the relationships among objects in the new
domain, about the goal of the new system, about the evidence available,
and about the inferences from evidence to conclusions. Although it knows
nothing (initially) about a new knowledge base, it knows something about
the structure of other knowledge bases. For example, it knows that evi-
dence can often be divided into "hard" evidence from instruments and
laboratory analysis and "soft" evidence from subjective reports and that
both are different from identifying features such as gender and race. Much
more remains to be done, but ROGETis an important step in codifying
the art of knowledge engineering.
**NR [Theknowledgeengineerstarts
Thenewrule will becalledRULE200. theruleacquisition
routineby
If: 1-** THEORGANISM IS A GRAM
NEGATIVEROD typingNRfor New Rule.]
and2-** IT IS ANAEROBIC
and3-** IT WAS ISOLATEDFROM
THEBLOOD
and4-** YOUTHINKTHEPORTAL WASTHEGI TRACT
and5-**
[user:cardage
returnwithno
entry]
Then:1-** IT IS PROBABLY A BACTEROIDES
Ona scaleof 1 to 10, howmuchcertaintywould
youaffix to thisconclusion?
** 9
and 2-**
[user:carriage
returnwithno
entry]
Thisis myunderstanding
of yourrule:
RULE200
Okay?(YESor NO)
** YES
following the format of" other rules in the system. MYCINtranslates the
rule into its internal LISP representation and then translates it back into
English to print out a version of the rule as it has understood the meaning.
The user is then asked to approve the rule or modify it. The original system
also allowed simple changes to rules in a quick and easy interaction, much
as is shown in Figure 7-2 for acquiring a new rule.
This simple model of knowledge acquisition was subsequently ex-
panded, most notably in the work on TEIRESIAS (Chapter 9). Many
the ideas (and lines of LISP code) from TEIRESIAS were incorporated
EMYCIN(Part Five). Contrast Figure 7-2 with the TEIRESIAS example
in Section 9.2 and the EMYCINexample in Chapter 14 for snapshots of
our ideas on knowledge acquisition. Research on this problem continues.
Two of our initial working hypotheses about knowledge acquisition
have had to be qualified. We had assumed that the rules were sufficiently
independent of one another that an expert could always write new rules
without examining the rest of the knowledge base. Such modularity is
desirable because the less interaction there is among rules, the easier and
safer it is to modify the rule set. However, we fbund that some experts are
KnowledgeAcquisition in MYCIN 155
helped if they see the existing rules that are similar to a new rule under
consideration, where similar means either that the conclusion mentions the
same parameter (but perhaps different values) or that the premise clauses
mention the same parameters. The desire to compare a proposed rule with
similar rules stems largely from the difficulty of assigning CFs to new rules.
Comparing other evidence and other conclusions puts the strength of the
proposed rule into a partial ordering. For example, evidence el for con-
clusion C could be seen to be stronger than e2 but weaker than e3 for the
same conclusion. Wealso assumed, incorrectly, that the control structure
and CF propagation method were details that the expert could avoid learn-
ing. That is, an expert writing a new rule sometimes needs to understand
how the rule will be used and what its effect will be in the overall solution
to a problem. These two problems are illustrated in the transcripts of
several electronic mail messages reprinted at the end of Chapter 10. The
transcripts also reveal much about the vigorous questioning of assumptions
that was taking place as rules were being written.
Throughout the development of MYCINsknowledge base about in-
fectious diseases (once a satisfactory conceptualization for the problem was
found), the primary mode of interaction between the knowledge engineer
and expert was a recurring cycle as shown in Figure 7-3. Much of the
actual time, particularly in the early years, was spent on changes to the
code, outside of this loop, in order to get the system to work efficiently (or
sometimes to work at all) with new kinds of knowledge suggested by ex-
perts. Considerable time was spent with the experts trying to understand
their larger perspective on diagnosis and therapy in infectious disease. And
some time was spent trying to reconceptualize the programs problem-
solving framework. Webelieved that the time-consuming nature of the six-
step loop shown in Figure 7-3 was one of the key problems in building an
expert system, although the framework itself was simple and effective.
Thus we looked at several ways to improve the experts and knowledge
engineers efficiency in the loop.
For Step 1 of the loop we created facilities for experts (or other users)
156 Knowledge
Engineering
examine the subsets of" rules in which the rule checker identifies problems.
Because this analysis is more systematic than the empirical testing in Steps
3-5 of the six-step loop, it can catch potential problems long before they
would manifest themselves in test cases.
Some checking of rules is also done in EMYCIN,as described in the
EMYCIN manual (van Melle et al., 1981). As each rule is entered or edited,
it is checked for syntactic validity to catch commoninput errors. By syn-
tactic, we meanissues of rule form--viz., that terms are spelled correctly,
values are legal for the parameters with which they are associated, etc.--
rather than the actual information (semantic) content (i.e., whether or
the rule "makes sense"). Performing the syntactic checks at acquisition time
reduces the likelihood that the consultation program will later fail due to
"obvious" errors. This permits the expert to concentrate on debugging
logical errors and omissions.
The purely syntactic checks are made by comparing each rule clause
with the internal function template corresponding to the predicate or action
function used in the clause. Using this template, EMYCINdetermines
whether the argument slots for these functions are correctly filled. For
example, each argument requiring a parameter must be assigned a valid
parameter (of some context), and any argument requiring a value must
assigned a legal value for the associated parameter. If an unknownparam-
eter is tbund, the checker tries to correct it with the Interlisp spelling
corrector, usinga spelling list of" all parameters in the system. If that fails,
it asks if" this is a new(previously unmentioned)parameter. If so, it defines
the new parameter and, in a brief diversion, prompts the system builder
to describe it. Similar action is also taken if an unrecognized value for a
parameter is found.
A limited semantic check is also performed: each new or changed rule
is compared with any existing rules that conclude about the same param-
eter to make sure it does not directly contradict or subsume any of them.
A contradiction occurs when two rules with the same set of premise clauses
make conflicting conclusions (contradictory values of CFs for the same
parameter); subsumption occurs when one rules premise is a subset of the
others, so that the first rule succeeds wheneverthe second one does (i.e.,
the second rule is more specific), and both conclude about the same values.
In either case, the interaction is reported to the expert, who may then
examine or edit any of the conflicting or redundant rules.
Another experimental system we incorporated into MYCINwas a
small body of code that kept statistics on the use of rules and presented
the statistical results to the knowledgebase builders. 1 It provided another
way of analyzing the contents of a knowledge base so potential problems
could be examined. It revealed, for example, that some rules never suc-
ceeded, even though they were called many times. Even though their con-
clusions were relevant (mentioned a subgoal that was traced), their premise
conditions never matched the specific facts of the cases. Sometimes this
happens because a rule is covering a very unusual set of circumstances not
instantiated in the test cases. Since muchexpertise resides in such rules,
we did not modify them if they were in the knowledge base for that reason.
Sometimes, though, the lack of successful invocation of rules indicated a
problem. The premises might be too specific, perhaps because of tran-
scription errors in premise clauses, and these did need attention. This
experimental system also revealed that some rules always succeeded when
called, occasionally on cases where they were not supposed to. Although it
was a small experiment, it was successful: empirically derived statistics on
rule use can provide valuable information to the persons building the
knowledge base.
One of the most important questions we have been asking in our work
on knowledge acquisition is
This chapter is based on an article originally appearing in The AI Magazine 3:16-21 (Autumn
1982). Copyright 1982 by AAAI. All rights reserved. Used with permission.
159
160 Completeness
andConsistencyin a Rule-BasedSystem
8.1 Earlier
Work
One goal of the TEIRESIASprogram, described in the next chapter, was
to provide aids for knowledge base debugging. TEIRESIASallows an ex-
pert to judge whether or not MCINsdiagnosis is correct, to track down
the errors in the knowledge base that led to incorrect conclusions, and to
alter, delete, or add rules in order to fix these errors. TEIRESIASmakes
no formal assessment of rules at the time they are initially entered into the
knowledge base.
In the EMYCINsystem for building knowledge-based consultants
(Chapter 15), the knowledge acquisition program fixes spelling errors,
checks that rules are semantically and syntactically correct, and points out
potentially erroneous interactions among rules. In addition, EMYCINs
knowledge base debugging facility includes the following options:
Conflict: two rules succeed in the same situation but with conflicting re-
sults.
Redundancy: two rules succeed in the same situation and have the same
results.
Subsumption: two rules have the same results, but one contains additional
restrictions on the situations in which it will succeed. Wheneverthe more
restrictive rule succeeds, the less restrictive rule also succeeds, resulting
in redundancy.
posely written the rules so that the more restrictive one adds a little more
weight to the conclusion made by the less restrictive one.
An exhaustive syntactic approach for identifying missing rules would
assume that there should be a rule that applies in each situation defined
by all possible combinations of domain variables. Someof these combina-
tions, however, are not meaningful. For example, there are no males who
are pregnant (by definition) and no infants who are alcoholics (by reason
of circumstances). Like checking for consistency, checking for complete-
ness generally requires some knowledge of the problem domain.
Because of these pragmatic considerations, an automated rule checker
should display potential errors and allow an expert to indicate which ones
represent real problems. It should prompt the expert for domain-specific
information to explain why apparent errors are, in fact, acceptable. This
information should be represented so that it can be used to make future
checking more accurate.
Certain rules for determining the value of a parameter serve special func-
tions. Somegive a "definitional" value in the specified context. Theseare
called initial rules and are tried first. Otherrules providea (possibly context-
dependent) "default" or "usual" value in the event that no other rule suc-
ceeds. Theseare called default rules and are applied last. Rules that do not
serve either of these special functions are called normalrules. Concluding
a parameters value consists of trying, in order, three groups of rules:
initial, normal,then default. Arules classification tells whichof these three
groups it belongs to.t
lInternally in LISP,the context, condition, action, and classification are properties of an atom
namingthe rule. The internal form of Rule 75 is
RULE075
CONTEXT: ((MOPPDRUG)(PAVEDRUG))
CONDITION: (AND(SIS POST,
ABORT
(SIS NORMALCOUNTS
YES))
ACTION: (CONCLUDEVALUE ATTENDOSE
(PERCENTOF
75 PREVIOUSDOSE))
CLASSIFICATION:
NORMAL
As in MYCIN, the LISP functions that are used in conditions or actions in ONCOCIN have
templates indicating what role their arguments play, For example, both SIS and CON-
CLUDEVALUE take a parameter as their first argument and a value of that parameter as
their second argument. Eachfunction also has a descriptor representing its meaning. For
example,the descriptor of $1S showsthat the function will succeed whenthe parameter value
of its first argumentis equal to its secondargument.
Rule Checking in ONCOCIN 165
2Because a parameters wllue is always known with certainty and the possible values are
mutually exclusive, the different combinations of condition parameter values are disjoint. If
a rule corresponding to one combination succeeds, rules corresponding to other combinations
in the same table will fail. This would not be true in an EMYCIN consultation system in
which the values of some parameters can be concluded with less than complete certainty. In
such cases, the combinations in a given table would not necessarily be disjoint.
:~We plan to add a mechanism to acquire information about the meanings of parameters and
the relationships amongthem and to use this information to omit semantically impossible
combinations from subsequent tables.
166
167
z
0
Z
0
o
o
T
#
168 Completeness
andConsistencyin a Rule-BasedSystem
Missing
rulecorresponding
to combination
C4:
Todeterminethecurrentattenuated doseforCytoxanin CVP
IF: 1) Theblood counts
dowarrant doseattenuation,
2)Thecurrent chemotherapy cyclenumberis 1, and
3) Thisis notthestartofthefirstcycle
after
significant
radiation
THEN:Conclude thatthecurrent attenuated
dose is...
with the rule model for its action parameter. TEIRESIASproposed missing
clauses if some condition parameters in the model did not appear in the
new rule.
8.3.3 An Example
Rule set: 33 24
Summmyof Comparison
No problems were tound.
Notes
Asterisks appear beneath values included by the rule.
Zeros appear beneath upper and lower bounds that are not included.
(e.g., Rule 33 applies when 1.5 -< WBC< 2.0)
Randall Davis
This chapter originally appeared in Artificial Intelligence 12:121-157 (1979). It has been
shortened and edited. Copyright 1979 by Artificial Intelligence. All rights reserved. Used
with permission.
IThe program is named for the blind seer in Oedipus the King, since the program, like the
prophet, has a form of "higher-order" knowledge.
171
172 InteractiveTransferof Expertise
TEIRESIAS
explanation
I DOMAIN I
EXPERT knowledge,.~[
transfer
vI
PERFORMANCE
PROGRAM
]
gram already knows and how it used that knowledge. Information flow
from left to right is labeled knowledgetransfer. This is the process by which
the expert adds to or modifies the store of domain-specific knowledge in
the performance program.
Work on TEIRESIAS has had two general goals. We have attempted
first to develop a set of tools for knowledge base construction and main-
tenance and to abstract from them a methodology applicable to a range
of systems. The second, more general goal has been the development of
an intelligent assistant. This task involves confronting manyof the tradi-
tional problems of AI and has resulted in the exploration of a number of
solutions, reviewed below.
This chapter describes a number of tile key ideas in the development
of TEIRESIASand discusses their implementation in the context of a
specific task (acquisition of new inference rules 2) for a specific rule-based
performance program. While the discussion deals with a specific task, sys-
tem, and knowledge representation, several of the main ideas are appli-
cable to more general issues concerning the creation of intelligent pro-
grams.
9.1 Meta-Level
Knowledge
A central theme that runs through this chapter (and is discussed more
fully in Part Nine) is the concept of meta-level knowledge, or knowledgeabout
knowledge. This takes several different forms, but can be summed up
generally by saying that a program can "know what it knows." That is, not
only can a program use its knowledge directly, but it may also be able to
examineit, abstract it, reason about it, and direct its application.
To see in general terms how this might be accomplished, recall that
We view the interaction between the domain expert and the performance
program as interactive transfer of expertise. Wesee it in terms of a teacher
who continually challenges a student with new problems to solve and care-
fully observes the students performance. The teacher may interrupt to
request a justification of some particular step the student has taken in
solving the problem or may challenge the final result. This process may
uncover a fault in the students knowledge of the subject (the debugging
phase) and result in the transfer of information to correct it (the knowledge
acquisition phase). Other approaches to knowledge acquisition can be com-
pared to this by considering their relative positions along two dimensions:
(i) the sophistication of their debuggingfacilities, and (if) the independence
of their knowledge acquisition mechanism.
The simplest sort of debugging tool is characterized by programs like
DDT, used to debug assembly language programs. The tool is totally pas-
sive (in the sense that it operates only in response to user commands),
low-level (since it operates at the level of machine or assembly language),
and knows nothing about the application domain of the program. Debug-
gers like BAIL(Reiser, 1975) and Interlisps break package (Teitelman,
1974) are a step up from this since they function at the level of program-
ming languages such as SAIL and Interlisp. The explanation capabilities
in TEIRESIAS, in particular the HOWand WHYcommands (see Part Six
for examples), represent another step, since they function at the level of
the control structure of the application program. The guided debugging
that TEIRESIAS can also provide (illustrated in Section 9.5) represents yet
another step, since here the debugger is taking the initiative and has
enough built-in knowledge about the control structure that it can track
downthe error. Finally, at the most sophisticated level are knowledge-rich
debuggers like the one described by Brown and Burton (1978). Here the
program is active, high-level, informed about the application domain, and
capable of independently localizing and characterizing bugs.
By independence of the knowledge acquisition mechanism, we mean the
degree of human cooperation necessary. Much work on knowledge acqui-
174 Interactive Transfer of Expertise
Figure 9-2 shows the major elements of the performance program that
TEIRESIASis designed to help construct. Although the performance pro-
gram described here is MYCIN,the context within which TEIRESIASwas
Designof the Performance
Program 175
Performance
Program
I INFERENCE
ENGINE
KNOWLEDGE
[
BASE
FIGURE
9-2 Architecture of the performance program.
Natural language has not been a major focus of this work, and for the
most part we have used the simplest techniques that would support the
level of performance required. For instance, all questions and responses
from TEIRESIASare either preformed or manufactured by filling in tem-
plates with appropriate words (as evidenced by the occasional appearance
of ungrammatical phrases like "a organism"). Where answers are free text
(rather than multiple choice), the system relies primarily on the keyword-
and template-oriented techniques described in Chapter 18. There is no
parser in the system, in part to minimize processing time and in part be-
cause users often give ungrammatical responses.
rule. The deleted rule was chosen in order to display a large number of
TEIRESIASs capabilities in a single example. It is special in no other sense,
however, and offers a real trace of the system in action.
In the trace, computer output is in mixed upper and lower case, user
responses follow a double asterisk and are in bold upper case, and com-
mentary is indented, italicized, and enclosed in parentheses. The example
is also divided into five phases for later reference.
(MYCIN
hasbeenprovidedwith details of a caseandhasjust concluded
likely causesof the infection.)
Mytherapyrecommendation
will bebasedon the following probableinfection(s) andpotential causative
organism(s):
INFECTION-Iis PRIMARY-BACTEREMIA
<Item 1> E.COLI [ORGANISM-I]
<Item 2> PSEUDOMONAS-AERUGINOSA
[ORGANISM-I]
Areall theseidentities correct?
++** N
(PHASEI: TRACKING
DOWN
THEBUG)
Whichone(s)?
+ +** KLEBSIELLA-PNEUMONIAE
PROTEUS-NON-MIRABILIS
Are thereanyidentities that shouldnthaveappeared?
+ +** PSEUDOMONAS-AERUGINOSA
Doyou wantto
1] track down the problemyourself, or
2] allow meto guidethe debugging, or
3] ignoreit for nowandfix it anothertime?
Type1, 2 or 3
++**2
Well start with the PSEUDOMONAS-AERUGINOSA
first.
(Thistime, TEIRESlASwill guidethe process.It appliesthe heuristicof starting with anidentity that
appearedincorrectly.)
RULE184
Since[1.1] the categoryof ORGANISM-1
is not known
[1.2] the gramstain of ORGANISM-1
is gramneg
[1.3] the morphologyof ORGANISM-1
is rod
FIGURE9-3 TEIRESIASexample.
178 Interactive Transfer of Expertise
Whichone?
+ +** ENTEROBACTERIACEAE
RULE116
IF: 1) the identity of ORGANISM-1 is not known
** 2) the gramstain of ORGANISM-1 is not known**
3) the morphologyof ORGANISM-1 is not known
4) the site of CULTURE-1 is csf
5) the infectionis meningitis
6) the age(in years)of the patientis less than
equalto .17
RULE050
IF: 1) the morphologyof ORGANISM-1 is rod
2) the gramstain of ORGANISM-1 is gramneg
3) the aerobicity of ORGANISM-1
is facultative
** 4) the infection with ORGANISM-1wasacquiredwhile the
patient washospitalized**
THEN: Thereis evidencethat the categoryof ORGANISM-1
is enterobacteriaceae
Shouldany of themhavebeensuccessfullyinvoked?
++**N
Good...
(To reviewbriefly: Theexpertis tracking downthe reasonwhyMYCIN incorrectly concluded that the
identity waspseudomonas-aeruginosa. Pseudomonas-aeruginosa appearedbecauseRule 184 executed
successfully. This occurredin part becausecategorywasunknown. Categorywasunknown becauseall
rules to conclude
it (50 and116)failed. Sincethe expertbelievesall existingrules are correct, the
knowledge basemustlack a rule that will conclude
aboutcategory;in particular, that the categoryis
enterobacteriaceae.Thiswill causecategoryto beknown, inhibiting Rule184,fixing at least this first
problem,the incorrect conclusionaboutpseudomonas-aeruginosa.
Notehowsharplyfocusedthe issue hasbecome, for both the expertandTEIRESIAS, as a result of the
large amount of contextualinformationavailable. Thedoctoris not beingaskedto "discussbacteremia";
instead, the systemcanat this point say: "1 needa rule that wouldallowmeto deduce that the category
of ORGANISM-1 is enterobacteriaceae."Andit mustbe a rule whichis invocablein the contextof this
patient. Thisfocusmakes it easierfor the expertto specifya rule that mayneverhavebeenexplicitly
stated or recognizedas such. This canbe an important advantage in any domainwherethe knowledge
is notyet fully formalized.
Thecontextualinformationis useful for TEIRESlAS as well. It providesa numberof expectationsabout
the contentof the rule that are useful both in understandingwhatthe experttypesin andin checkingthe
newrule to makesurethat it doesindeedsolve the problemat hand.Details of this processare
explainedbelow.
Thetrace continues
with the experttypingin the rule...)
(PHASEI1: INTERPRETING
THERULE)
This is myunderstanding
of yourrule:
RULE383
IF: 1) Primary-bacteremiais anassociatedinfection for
whichtherapywouldcover, and
2) Theinfection with the organismwasacquiredwhile
the patient washospitalized,and
3) Theculture wastakenfroma sterile source,and
4) Theorganism is not normallyfoundat this site
THEN: Thereis stronglysuggestive
evidence
(.8) that the
categoryof the organism
is enterobacteriaceae
180 Interactive Transfer of Expertise
(PHASEllk MODIFYING
THEINTERPRETATION)
(TEIRESIAS initial interpretationof the naturallanguage is incorrect (recall that the example waschosen
in partfor this reason).In the systemsfirst attemptto understand therule, eachline of the originaltext
hasbeeninterpreted as two clauses.This results primarily fromthe word-by-word approach to meaning.
For instance,despitethe obviousgrammatical simplicity of the first line of text, the systemusedtwo
connotations of the word"infection" to generatethe first twoclauses.In the secondline of text the word
"sterile" wasresponsiblefor clause3, andthe word"sites" usedas the basisfor clause4. Thedetails of
this processare explainedbelow,whereit will become clear that whilethe translationis wrong,it is not
unreasonable giventhe simplenaturallanguage facilities.
Noris it unreasonable that TEIRESIAS hasturned eachline of Englishinto morethan onepremise
clause.Theexpertis not restricted to typing the Englishequivalentof a single premiseclauseon each
line. If he were,rather thantyping "the organismis a gram-negative
aerobicrod," he wouldhaveto type
threelines.
Wewill seelater that whileTEIRESIAS interpretationsare incorrect, they are all "reasonable"
and
"consistent,"in light of the systemssomewhat primitive approachto natural language. Therest of the
exampleshowsthat evenwhenthe initial guessesare incorrect, straighteningthings out maybe very
easy.)
Okay?(Yes or No)
++** N
Changepremise?
++**y
Correct?
++**y
Anythingelse?
++**D3
clause(s)3 deleted.
Anythingelse?
++**?
1 - the infectionis primary-bacteremia
and 2 - <deleted>
and 3 - <deleted>
and4 - the organism is not normallyfoundat this site
Anythingelse?
++**U3
clause(s)3 undeleted.
Anythingelse?
++*D4
clause(s)4 deleted,
Anythingelse?
++**C3
Suppose(3) werechangedto read-
3 - Thesite of thecultureis oneof: thosesites that are
normallysterile
Wouldthat be ok?
++**y
Anythingelse?
++**?
1 - the infectionis primary-bacteremia
and 2 - <deleted>
and3 - the site of the cultureis oneof: thosesites that are
normally sterile
and 4 - <deleted>
++**OK
Changeaction?
++**N
This is myunderstanding
of yourrule:
182 Interactive Transfer of Expertise
RULE383
IF: 1) Theinfection is pdmary-bacteremia, and
2) Thesite of thecultureis oneof: thosesites that arenormallysterile
THEN: Thereis stronglysuggestive evidence (.8) that the categoryof the organism
enterobacteriaceae
Okay?(Yesor No)
++**y
(PHASEIV: SECOND-GUESSING)
Howabout-
[A] Theportal of entryof the organism
is GI
Ok?
++**y
++**N
(Thesystemnoticedanotherunfulfilled expectationconcerning
the action part of the rule, but the expert
alwayshasthe optionof overridingthe systemssuggestions,anddoesso.)
Therule nowreads:
RULE383
IF: 1) Theinfection is primary-bacteremia, and
2) Thesite of the cultureis oneof: thosesites
that are normallysterile, and
3) Theportal of entryof the organism is GI
THEN: Thereis stronglysuggestive evidence
(.8) that the
categoryof the organismis enterobacteriaceae
(Nowthat both the expertandTEIRESIAS are satisfied with the rule, there is a bit of bookkeeping
to do.
First, the newrule is checkedby the system(behindthe scenes)to be surethat there is nothing
obviouslywrongwith it--for instance,oneof the tests ensuresthat premise clausesare not contradictory.
Hadthis beenthe case,the systemwouldhavenoticedit at this point andthe user wouldhavebeen
giventhe optionof retumingto the rule editor to fix the problem. Thereare severalotherthingsthe
systemchecksat this point, but this rule passesall the tests, andhencethereis noindicationof any
problem.)
Pleasedescribe
briefly whyyouadded
(or changed)
this rule. Type
emptyline when
done.
RULE383
has nowbeenaddedto the knowledgebase.
I will nowrerunthe consultationto checkthe effectiveness
of your new
rule. Getcomfortable,
sincethis maytakea bit.
Also,newquestionsmaybeaskedas a result of this newrule. If a
question is printed, pleaseanswer
it with referenceto this current
patient.
18-DEC-75 16:30:46
.......... PATIENT-281 ..........
.......... INFECTION-1 ..........
........... CULTURE-1 ...........
........... ORGANISM-1 ..........
INFECTION-1is PRIMARY-BACTEREMIA
<Item 1> E. COLI [ORGANISM-I]
<item 2> KLEBSIELLA-PNEUMONIAE [ORGANISM-I]
<Item 3> PROTEUS-NON-MIRABILIS[ORGANISM-I]
Well,it appears
that thingsarestraightened
out,
(As always,TEIRESIAS
gives the expertthe last word,andwaits for approvalbefore proceeding.)
All of the bookkeeping that wasdoneis accessiblevia the INFOrmation command addedto the
question-answeringfacility of the performance
program.This gives the expert the background
for any
rule in the system.)
** INFO383
waswrittenbyDr. Davis
on December
18, 1975
[whowasdescribed
asfollows:
CLASSICCASEOF GRAMNEGATIVERODINFECTIONFORA
PATIENTWITHA NON-NOSOCOMIAL
DISEASE]
Before reviewing the trace in more detail, we describe the ideas that make
possible the capabilities displayed. This subsection serves primarily to name
and briefly sketch each in turn; the details are supplied in subsequent
subsections reviewing the example. [See Davis (1976) for more details.]
what is it that you knowand the system doesnt that allows you to avoid
making that same mistake?
Note how much more focused the second question is and how much easier
it is to answer.
Building Expectations
Model-Based Understanding
Learning by Experience
3The debugging process does allow the expert to indicate that the performance programs
results are incorrect, but he or she cannot find an error in the reasoning. This choice is
offered only as a last resort and is intended to deal with situations where there maybe a bug
in the underlying control structure of the performance program (contrary to our assumption
in Section 9.2).
188 InteractiveTransferof Expertise
of the performance program in action, has available all of the facts of the
case, and has seen how the relevant knowledge has been applied. This
makes it much easier for him or her to specify the particular chunk of
knowledge that may be missing. This contextual information will prove
very useful for TEIRESIAS as well. It is clear, for instance, what the effect
of invoking the new rule must be (as TEIRESIASindicates, it must be
rule that will deduce that the category should be Enterobacteriaceae), and it
is also clear what the circumstances of its invocation must be (the rule must
be invocable for the case under consideration, or it wont repair the bug).
Both of these pieces of information are especially useful in Phase II and
Phase V.
To set the stage for reviewing the details of the interpretation process, we
digress for a moment to consider the idea of models and model-based
understanding, and then to explore their application in TEIRESIAS. In
the most general terms, a model can be seen as a compact, high-level description
of structure, organization, or content that maybe used both to provide a frame-
work for lower-level processing and to express expectations about the world. One
How It All Works 189
early, particularly graphic example of this idea can be found in the work
on computer vision by Falk (1970). The task there was understanding
block-world scenes; the goal was to determine the identity, location, and
orientation of each block in a scene containing one or more blocks selected
from a knownset of possibilities. The key element of this work of interest
to us here is the use of a set of prototypes for the blocks, prototypes that
resembled wire frame models. Although such a description oversimplifies,
part of the operation of Falks system can be described in terms of two
phases. The system first performed a preliminary pass to detect possible
edge points in the scene and attempted to fit a block model to each col-
lection of" edges. The model chosen was then used in the second phase as
a guide to further processing. If, for instance, the model accounted for all
but one of the lines in a region, this suggested that the extra line might be
spurious. If the model fit well except for some line missing from the scene,
that was a good hint that a line had been overlooked and indicated as well
where to go looking for it.
Wecan imagine one further refinement in the interpretation process,
though it was not a part of Falks system, and explain it in these same
terms. Imagine that the system had available some a priori hints about what
blocks might be found in the next scene. One way to express those hints
would be to bias the matching process. That is, in the attempt to match a
model against the data, the system might (depending on the strength of
the hint) try the indicated models first, make a greater attempt to effect
match with one of them, or even restrict the set of possibilities to just those
contained in the hint.
Note that in this system (i) the models supply a compact, high-level
description of structure (the structure of each block), (ii) the description
used to guide lower-level processing (processing of the array of digitized
intensity values), (iii) expectations can be expressed by a biasing or restric-
tion on the set of models used, and (iv) "understanding" is viewed in terms
of a matching and selection process (matching models against the data and
selecting one that fits).
Rule Models
EXAMPLES--the
subset of rules this model describes
DESCRIPTION---characterization
of a typical memberof this subset
characterization of the premise
characterization of the action
MORE
GENERAL--pointersto models describing more general subsets of rules
MORE
SPECIFIC--pointers to models describing more specific subsets of rules
FIGURE
9-4 Rule model structure.
in the knowledge base that might supply what we need. Not surprisingly,
rules about a single topic tend to have characteristics in common--there
are ways of reasoning about a given topic. From these regularities we have
constructed rule models. These are abstract descriptions of subsets of rules,
built from empirical generalizations about those rules and used to char-
acterize a typical memberof the subset.
Rule models are composed of four parts as shown in Figure 9-4. They
contain, first, a list of EXAMPLES, the subset of rules from which this
model was constructed. Next, a DESCRIPTIONcharacterizes a typical
memberof the subset. Since we are dealing in this case with rules composed
of premise-action pairs, the DESCRIPTIONcurrently implemented con-
tains individual characterizations of a typical premise and a typical action.
Then, since the current representation scheme used in those rules is based
on associative triples, we have chosen to implement those characterizations
by indicating (a) which attributes typically appear in the premise (or action)
of a rule in this subset and (b) correlations of attributes appearing in the
premise (or action). 4 Note that the central idea is the concept of character-
izing a typical memberof the subset. Naturally, that characterization will look
different for subsets of rules, procedures, theorems, or any other repre-
sentation. But the main idea of characterization is widely applicable and
not restricted to any particular representational formalism.
The two remaining parts of the rule model are pointers to models
describing more general and more specific subsets of rules. The set of
models is organized into a number of tree structures, each of the general
form shown in Figure 9-5. At the root of each tree is the model made from
all the rules that conclude about the attribute (i.e., the CATEGORY model),
below this are two models dealing with all affirmative and all negative rules
(e.g., the CATEGORY-IS model). Below this are models dealing with rules
that affirm or deny specific values of the attribute. These models are not
handcrafted by the expert. They are instead assembled by TEIRESIASon
the basis of the current contents of the knowledge base, in what amounts
to a simple statistical form of concept formation. The combination of TEI-
RESIASand the performance program thus presents a system that has a
model of its own knowledge, one it forms itself.
<attribute>
1~ ~ ~~~~
<atlribute>-is <attribute).isWt
FIGURE
9-5 Organization of the rule models.
CATEGORY-IS
EXAMPLES ((RULE116.33)
(RULE050.78)
(RULE037.80)
(RULE095.90)
(RULE152
1.0)
(RULE140
1.0))
PREMISE ((GRAMSAMENOTSAME 3,83)
(MORPHSAMENOTSAME 3.83)
((GRAMSAME)(MORPH SAME)
((MORPHSAME)(GRAMSAME)
((AIR SAME)(NOSOCOMIAL NOTSAME
SAME)(MORPH
(GRAMSAME)1.50)
((NOSOCOMIAL NOTSAME SAME)(AIR SAME)(MORPH
(GRAMSAME)1.50)
((INFECTIONSAME)(SITE MEMBF SAME)
((SITE MEMBF SAME)(INFECTIONSAME)(PORTAL
1.23))
ACTION ((CATEGORYCONCLUDE 4.73)
(IDENT CONCLUDE
4.05)
((CATEGORYCONCLUDE)(IDENT CONCLUDE)
4.73))
MORE-GENL(CATEGORY-MOD)
MORE-SPECNIL
split into its two parts, one concerning the presence of individual attributes
and the other describing correlations. The first item in the premise de-
scription, for instance, indicates that most rules reaching conclusions about
the category mention the attribute GRAM (for gram stain) in their prem-
ises; when they do mention it, they typically use the predicate functions
SAMEand NOTSAME; and the "strength," or reliability, of this piece of
advice is 3.83 [see Davis (1976) for precise definitions of the quoted terms].
Correlations are shownas several lists of attribute-predicate pairs. The
fourth item in the premise description, for example, indicates that when
the attribute gram stain (GRAM)appears in the premise of a rule in this
subset, the attribute morphology (MORPH)typically appears as well.
before, the predicate functions are those frequently associated with the
attributes, and the numberis an indication of reliability.
Choosing a Model
It was noted earlier that tracking down the bug in the knowledge base
provides useful context and, among other things, serves to set up TEI-
RESIASsexpectations about the sort of rule it is about to receive. As sug-
gested, these expectations are expressed by restricting the set of models
that will be considered for use in guiding the interpretation. At this point
TEIRESIASchooses a model that expresses what it knows thus far about
the kind of rule to expect, and in the cur~ent example it expects a rule
that will deduce that the category should be Enterobacteriaceae.
Since there is not necessarily a rule model for every characterization,
the system chooses the closest one. This is done by starting at the top of
the tree of models and descending until either reaching a model of the
desired type or encountering a leaf of the tree. In this case the process
descends to the second level (the CATEGORY-IS model), notices that there
is no model for CATEGORY-IS-ENTEROBACTERIACEAE at the next
5level, and settles for the former.
1
OBJ
VALUE
PREDICATE
FUNCTION
ATTRIBUTE
(a) Connotations found in the new rule.
Function Template
SAME (OBJATTRIBUTE
VALUE)
(b) Templatefor the predicate function SAME.
1) (SAME CNTXTTREAT-ALSOPRIMARY-BACTEREMIA)
"Primarybacteremia
is an associated
infectionfor which
therapyshouldcover."
2) (SAME CNTXTINFECTIONPRIMARY-BACTEREMIA)
"Theinfectionis primarybacteremia."
RESIASto see whether the new rule "fits into" its current model of the
knowledge base in Phase IV.
To see how the rule models are used to guide the interpretation of the
text of the new rule in the example, consider the first line of text typed by
the expert in the new rule, Rule 383 (THE PATIENTS INFECTIONIS
PRIMARY-BACTEREMIA). Each word is first reduced to a canonical form
by a process that can recognize plural endings and that has access to a
dictionary of synonyms (see Chapter 18). Wethen consider the possible
connotations that each word may have (Figure 9-7a). Here connotation
means the word might be referring to one or more of the conceptual
primitives from which rules are built (i.e., it might refer to a predicate
6function, attribute, object, or value). One set of connotations is shown.
Code generation is accomplished via a fill-in-the-blank mechanism.
Associated with each predicate function is a template (see Chapter 5), a list
structure that resembles a simplified procedure declaration and gives the
6The connotations of a word are determined by a number of pointers associated with it,
which are in turn derived from the English phrases associated with each of the primitives.
194 InteractiveTransferof Expertise
order and generic type of each argument to a call of that function (Figure
9-7b). Associated with each of the primitives that make up a template (e.g.,
ATTRIBUTE, VALUE)is a procedure capable of scanning the list of con-
notations to find an item of the appropriate type to fill in that blank. The
whole process is begun by checking the list of connotations for the predi-
cate function implicated most strongly (in this case, SAME),retrieving the
template for that function, and allowing it to scan the connotations and
"fill itself in" using the procedures associated with the primitives. The set
of connotations in Figure 9-7a produces the LISP code in Figure 9-7c. The
ATTRIBUTEroutine finds two choices for the attribute name, TREAT-
ALSOand INFECTION,based on associations of the word infection with
the phrases used to mention those attributes. The VALUEroutine finds
an appropriate value (PRIMARY-BACTEREMIA),the OBJect routine
finds the corresponding object type (PATIENT)(but following the con-
vention noted earlier, returns the variable name CNTXT to be used in the
actual code).
There are several points to note here. First, the first interpretation in
Figure 9-7c is incorrect (the system has been misled by the use of the word
infection in the English phrase associated with TREAT-ALSO); well see
in a momenthow it is corrected. Second, several plausible (syntactically
valid) interpretations are usually available from each line of text, and TEI-
RESIASgenerates all of them. Each is assigned a score (the text score)
indicating how likely it is, based on howstrongly it was implicated by the
text. Finally, we have not yet used the rule models, and it is at this point
that they are employed.
We can view the DESCRIPTION part of the rule model selected ear-
lier as a set of predictions about the likely content of the new rule. In these
terms the next step is to see how well each interpretation fulfills those
predictions. Note, for example, that the last line of the premise description
in Figure 9-6 "predicts" that a rule about category of organism will contain
the attribute PORTAL and the third clause of Rule 383 fulfills this pre-
diction. Each interpretation is scored (employing the "strength of advice"
number in the rule model) according to how many predictions it fulfills,
yielding the prediction satisfaction score. This score is then combinedwith the
text score to indicate the most likely interpretation. Because more weight
is given to the prediction satisfaction score, the system tends to "hear what
it expects to hear."
While our approach to natural language is very simple, the overall perfor-
mance of the interpretation process is adequate. The problem is made
easier, of course, by the fact that we are dealing with a small amount of
text in a restricted context and written in a semiformal technical language,
rather than with large amounts of text in unrestricted dialogue written in
unconstrained English. Even so, the problem of interpretation is substan-
How It All Works 195
TEIRESIAShas a simple rule editor that allows the expert to modify ex-
isting rules or (as in our example) to indicate changes to the systems at-
196 InteractiveTransferof Expertise
This indicates that when the culture SITE for the patient appears in the
premise of a rule of this sort, then INFECTIONtype and organism POR-
TALof entry typically appear as well. Note that the new rule in the ex-
ample has the first two of these, but is missing the last, and the system
points this out.
If the expert agrees to the inclusion of a new clause, TEIRESIAS
attempts to create it. Since in this case the agreed-on topic for the clause
was the portal of entry of the organism, this must be the attribute to use.
The rule model suggests which predicate function to use (SAME, since
that is the one paired with PORTAL in the relevant line of the rule model),
and the template for this function is retrieved. It is filled out in the usual
way, except that TEIRESIASchecks the record of the consultation when
seeking items to fill in the template blanks. In this case only a value is still
missing. Note that since the expert indicated that the portal of entry was
VMuchof the editor has subsequently been incorporated into EMYCIN--seeChapter 15.
How It All Works 197
Two other uses have been developed for the rule models, which demon-
strate capabilities made possible by meta-level knowledge.
As described in Part Six, MYCIN has the ability to answer simple natural
language questions about the knowledge base. In response to a question
such as "How do you determine the identity of an organism causing an
infection?" MYCIN would originally have printed the relevant rules. But
a rule model, as a generalization of an entire class of rules, answers the
question, too. Figure 9-8 shows one example of MYCINscapabilities after
rule models had been added. By simply "reading" the rule model to the
user, TEIRESIAScan supply an overview of the knowledge in the relevant
rules. This suggests the structure of global trends in the knowledge of the
expert who assembled the knowledge base, and thus helps to make clear
the overall approach of the system to a given topic.
8Notethat these tests requirethe ability to dissect andpartially evaluatethe rule. Thesame
functiontemplatethat is usedas a pattern for constructingrules is also usedas a guidein
this dissectionandpartial evaluationprocess.
Other Uses for the Rule Models 199
** HOWDOYOUDECIDETHAT AN ORGANISM
IS PSEUDOMONAS
AERUGINOSA?
Ruleswhichconclude that the identity of the organism
is
pseudomonas-aeruginosa generallyuse oneor moreof the following pieces
of information:
thesite of theculture
the gramstain of the organism
the morphology of the organism
Furthermore,the followingrelationshipshold:
Thegramstain of the organismandthe morphology of the
organism tend to appeartogetherin theserules.
RULE184,RULE116,RULE047,RULE085, and RULE040 conclude that the
identity of the organismis pseudomonas-aeruginosa.
Whichof these do youwishto see?
The work reported here can be evaluated with respect to both the utility
of its approach to knowledge acquisition and its success in implementing
that approach.
9Where tile atttonomous induction technique can be used, it offers the interesting advantage
that tile knowledge we expect the system to acquire need not be specified ahead of time,
indeed not even known. Induction programs are in theory capable of inducing new infor-
mation (i.e., information unknownto their author) from their set of examples. Clearly, the
interactive transfer of expertise approach requires that the expert knowand be able to specify
precisely what it is the programis to learn.
202 InteractiveTransferof Expertise
9.9 Conclusions
Each of the ideas reviewed above offers some contribution toward achiev-
ing the two goals set out at the beginning of this chapter: the development
of a methodology of knowledge base construction via transfer of expertise,
and the creation of an intelligent assistant to aid in knowledgeacquisition.
These ideas provide a set of tools and ideas to aid in the construction of
knowledge-based programs and represent some new empirical techniques
of knowledge engineering. Their contribution here may arise from their
Conclusions 203
produces a novel sort of" feedback loop (Figure 9-10). Rule acquisition relies
on the set of" rule models to effect the model-based understanding process.
This results in the addition of a new rule to the knowledge base, which in
1
turn prompts the recomputation of the relevant rule model(s).
This loop has a number of" interesting implications. First, performance
on the acquisition of the next rule may be better because the systems
"picture" of its knowledge base has improved--the rule models are now
computed from a larger set of" instances, and their generalizations are more
likely to be valid. Second, since the relevant rule models are recomputed
each time a change is made to the knowledge base, the picture they supply
is kept constantly up to date, and they will at all times be an accurate
reflection of the shifting patterns in the knowledge base. This is true as
well for the trees into which the rule models are organized: they too grow
(and shrink) to reflect the changes in the knowledge base.
Finally, and perhaps most interesting, the models are not handcrafted
by the system architect or specified by the expert. They are instead formed
by the system itself, and formed as a result of its experience in acquiring
rules from the expert. Thus, despite its reliance on a set of models as a
basis for understanding, TEIRESIASsabilities are not restricted by the
existing set of" models. As its store of knowledge grows, old models can
become more accurate, new models will be formed, and the systems stock
of knowledge about its knowledge will continue to expand. This appears
to be a novel capability for a model-based system.
a Q
) 0 z
)
)
t.l..
PART FOUR
Reasoning Under
Uncertainty
10
Uncertainty and Evidential
Support
As we began developing the first few rules for MYCIN,it became clear
that the rules we were obtaining from our collaborating experts differed
from DENDRALs situation-action rules in an important way--the infer-
ences described were often uncertain. Cohen and Axline used words such
as "suggests" or "lends credence to" in describing the effect of a set of
observations on the corresponding conclusion. It seemed clear that we
needed to handle probabilistic statements in our rules and to develop a
mechanism for gathering evidence for and against a hypothesis when two
or more relevant rules were successfully executed.
It is interesting to speculate on why this problem did not arise in the
DENDRAL domain. In retrospect, we suspect it is related to the inherent
complexity of biological as opposed to artificial systems. In the case of
DENDRAL we viewed our task as hypothesis generation guided by rule-
based constraints. The rules were uniformly categorical (nonprobabilistic)
and were nested in such a way as to assure that contradictory evidence was
never an issue.l In MYCIN,however, an overall strategy for nesting cate-
gorical rules never emerged; the problem was simply too ill-structured. It
was possible to tease out individual inference rules from the experts work-
ing with us, but the program was expected to select relevant rules during
a consultation and to accumulate probabilistic evidence regarding the com-
peting hypotheses.
In response to these observations we changed the evolving system in
two ways. First, we modified the rule structure to permit a conclusion to
be drawn with varying degrees of certainty or belief. Our initial intent was
to represent uncertainty with probabilistic weights on a 0-to-1 scale. Sec-
ond, we modified the data structures for storing information. Rather than
simply recording attribute-object-value triples, we added a fourth element
to represent the extent to which a specific value was believed to be true.
This meant that the attribute of an object could be associated with multiple
competing values, each associated with its own certainty weight.
209
210 UncertaintyandEvidentialSupport
Although the motives behind the CF model were largely pragmatic and
we justified the underlying assumptions by emphasizing the systems ex-
cellent performance (see, [br example, Chapter 31), several theoretical ob-
jections to the model were subsequently raised. Professor Suppes had been
particularly influential in urging Shortliffe to relate CFs to the rules of
conventional probability theory, :~ and the resulting definitions of MBsand
MDsdid help,us develop an intuitive sense of what our certainty measures
might mean. However, the probabilistic definitions also permitted formal
analyses of the underlying assumptions in the combining functions and of
limitations in the applicability of the definitions themselves.
For example, as we note in Chapter 11, the source of confusion be-
tween CF(h,e) and P(h[e) becomes clear when one sees that, for small values
of the prior probabilities P(h), CF(h,e) P(hle ). Our ef fort to ignore pri or
probabilities was largely defended by observing that, in the absence of all
intormation, priors for a large number of competing hypotheses are uni-
formly small. For parameters such as organism identity, which is the major
diagnostic decision that MYCINmust address, the assumption of small
priors is reasonable. The same model is used, however, to deal with all
uncertain parameters in the system, including yes-no parameters for which
the prior probability of one of the values is necessarily greater than or
equal to 0.5.
The significance of the 0.2 threshold used by many of MYCINspred-
icates (see Chapter 5) was also a source of puzzlement to many observers
of the CF model. This discontinuity in the evaluation function is not an
intrinsic part of the CF theory (and is ignored in Chapter 11) but was
added as a heuristic for pruning the reasoning network. 4 If any small
positive CF were accepted in evaluating the premise of a rule, without a
threshold, two undesirable results would occur:
1. Very weak evidence favoring a condition early in the rule premise would
be "accepted" and would lead to consideration of subsequent conditions,
possibly with resulting backward-chained reasoning. It is wasteful to
pursue these conditions, possibly with generation of additional ques-
tions to the user, if" the evidence favoring the rules premise cannot
exceed 0.2 (recall that SANDuses min in calculating the TALLY--see
Chapters 5 and 11 for further details).
2. Even if low-yield backward chaining did not occur, the rule would still
have limited impact on the value of the current subgoal since the
TALLYfor the rule premise would be less than 0.2.
:~Suppes pressed us early on to state whether we were trying to model how expert physicians
do think or how they ought to think. Weargued that we were doing neither. Although we were
of course influenced by information regarding the relevant cognitive processes of experts
[see, for example, the recent books by Elstein et al. (1978) and Kahnemanet al. (1982)],
goals were oriented much more toward the development of a high-performance computer
program. Thus we sought to show that the CF model allowed MYCINto reach good decisions
comparable to those of experts and intelligible both to experts and to the intended user
community of practicing physicians.
4Duda et al. (1976) have examined this discontinuity aud the relationship of CFs to their
Bayesian updating model used in the PROSPECTOR system.
212 UncertaintyandEvidentialSupport
Thus the 0.2 threshold was added for pragmatic reasons and should not
be viewed as central to the CF model itself. In later years questions arose
as to whether the value of the threshold should be controlled dynamically
by the individual rules or by meta-rules (rather than being permanently
bound to 0.2), but this feature was never implemented.
Another important limitation of MYCINscontrol scheme was noted
in the mid-1970s but was never changed (although it would have been easy
to do so). The problem results from the requirement that the premise of
a rule be a conjunction of conditionals with disjunctions handled by mul-
tiple rules. As described in Chapter 5, A V B V C --, D was handled by
defining three rules: A --, D, B ~ D, and C -, D. If all rules permitted
conclusions with certainty, the three rules would indeed be equivalent to a
single disjunctive rule with certain inference (CF= 1). However, with CFs
less than unity, all three rules might succeed fi)r a given case, and then
each rule would contribute incremental evidence in favor of D. This evi-
dence would be accumulated using the CF combining function, that is,
CFcoMBINE,and might be very different from the CF that the expert
would have given if asked to assign a weight to the single disjunctive rule.
This problem could have been handled by changing the rule monitor to
allow disjunctions in a rule premise, but the change was never implemented
because a clear need never arose.
The rule interpreter does not allow rules to be written whose primary
connective is disjunction ($OR). We have encouraged splitting primary
disjunctions into separate rules for this reason. Thus
[1] ($ORABC)
~
5It is possible to force them to give the same result by adjusting the CFs either on [5] or on
[2], [3] and [4]. We would not expect a rule writer to do this, however, nor would we think
the difference would matter much in practice.
Analyses of the CF Model 213
Cumulative CF
1.0
0.3
CF=0.1
4 5 ~ ~ ~ ~ lb
Numberof Rules with the SamePositive CF
Another limitation for some problems is the rapidity with which CFs
converge on the asymptote 1. This is easily seen by plotting the family of
curves relating the number of rules with a given CE all providing evidence
for a hypothesis, to the resulting CF associated with the hypothesis. 6 The
result of plotting these curves (Figure 10-1) is that CFcoMB~NE is seen
converge rapidly on 1 no matter how small the CFs of the individual rules
are. For some problem areas, therefore, the combining function needs to
be revised. For example, damping factors of various sorts could be devised
6This was first pointed out to us by Mitch Model, who was investigating the use of the CF
model in the context of the HASP/SIAPprogram (Nii et al., 1982).
214 UncertaintyandEvidentialSupport
(but were not) that would remedy this problem in ways that are meaningful
for various domains. In MYCINsdomain of infectious diseases, however,
this potential problem never became serious. In PROSPECTOR this prob-
lem does not arise because there is no finite upper limit to the likelihood
ratios used.
As we were continuing to learn about the CF model and its implica-
tions, other investigators, faced with similar problems in building medical
consultation systems, were analyzing the general issues of inexact inference
(Szolovits and Pauker, 1978) and were in some cases examining shortcom-
ings and strengths of CFs. Later, Schefe analyzed CFs and fuzzy set theory
(Schefe, 1980). Dr. Barclay Adams, a memberof the research staff at the
Laboratory of Computer Science, Massachusetts General Hospital, re-
sponded to our description of the MYCINmodel with a formal analysis
of its assumptions and limitations (Adams, 1976), included in this book
Chapter 12. The observations there nicely specify the assumptions that are
necessary if the CFs in MYCINsrules are interpreted in accordance with
the probabilistic definitions from Chapter 11. Adamscorrectly notes that
there may be domains where the limitations of the CF model, despite their
minimal impact on MCINs performance, would seriously constrain the
models applicability and success. For example, if MYCIN had required a
single best diagnosis, rather than a clustering of leading hypotheses, there
would be reason to doubt the models ability to select the best hypothesis
on the basis of a maximal CE
Even before the Adams paper appeared in print, many of the same
limitations were being noted within the MYCINproject. For example, in
January of 1976 Shortliffe prepared an extensive internal memothat made
several of the same observations cited by Adams.7 He was aided in these
analyses by Dana Ludwig, a medical student who studied the CF model in
detail as a summer research project. The Shortliffe memooutlined five
alternate CF models and argued for careful consideration of one that
would require the use of a priori probabilities of hypotheses in addition to
the conventional CFs on rules. The proposed model was never imple-
mented, however, partly due to time constraints but largely because
MYCINsdecision-making performance was proving to be excellent despite
the theoretical limitations of CFs. Someof us felt that a one-number cal-
culus was preferable in this domain to a more theoretically sound calculus
that requires experts to supply estimates of two or more quantities per
rule. It is interesting to note, however, that the proposals developed bore
several similarities to the subjective Bayesian model developed at about the
same time for SRIs PROSPECTOR system (Duda et al., 1976). The
model has been used successfully in several EMYCIN systems (see Part
Five) and in the IRIS system (Trigoboff, 1978) developed at Rutgers Uni-
versity for diagnosing glaucomas.
7Thisis the file CEMEMO referred to by Clanceyin the exchangeof electronic messagesat
the end of this chapter.
Evolutionof the CFModel 215
The second of these points is discussed briefly in Chapter 11, but the first
may require clarification. Consider, for example, eight or nine rules all
supporting a single hypothesis with CFs in the range 0.4 to 0.8. Then the
asymptotic behavior of the cumulative MBwould result in a value of about
0.999. Suppose now that a single disconfirming rule were to succeed with
CF = 0.8. Then the net support for the hypothesis would be
MB - MD
CF=
1 - min(MB,MD)
X+Y
CFCOMBINE(X,Y) one ofX, Y< 0
1 - min(lXl, IY[)
-- CFcoMBINE(
-X, - Y) X, Y both < 0
Note that the definition of CF is unchanged for any single piece of evidence
(where either MDor MBis zero by definition) and that the combining
function is unchanged when both CFs are the same sign. It is only when
combining two CFs of opposite sign that any change occurs. The reader
will note, for example, that
whereas
CFcoMBINE(0.55
,- 0.5) = 0.05/0.5 = 0.1
Even before the change in the combining function was effected, we had
observed generally excellent decision-making performance by the program
and therefore questioned just how sensitive MYCINsdecisions were to the
CFs on rules or to the model for evidence accumulation. Bill Clancey (then
a student on the project) undertook an analysis of the CFs and the sensi-
tivity of MYCINsbehavior to those values. The following discussion is
based in large part on his analysis and the resulting data.
The CFs in rules reflect two kinds of knowledge. In some cases, such
as a rule that correlates the cause of meningitis with the age of the patient,
the CFs are statistical and are derived from published studies on the in-
cidence of disease. However, most CFs represent a mixture of probabilistic
and cost/benefit reasoning. One criticism of MYCINsrules has been that
utility considerations (in the decision analytic sense) are never madeexplicit
but are "buried" in a rules CEFor example, the rule that suggests treating
for Pseudomonasin a burned patient is leaving out several other organisms
that can also cause infection in that situation. However, Pseudomonasis a
particularly aggressive organism that often causes fatal infections and yet
is resistant to most commonantibiotics. Thus its "weight" is enhanced by
rules to ensure that it is adequately considered when reaching therapy
decisions, s Szolovits and Pauker (1978) have also provided an excellent
discussion of the issues complicating the combination of decision analytic
concepts and categorical reasoning in medical problems.
Figure 10-2 is a bar graph showing how frequently various CF values
occur in MYCINsrules. All but about 60 of the 500 rules in the most
recent version of the system have CFs.9 The cross-hatched portion of each
bar shows the frequency of CFs in the 1975 version of MYCIN,when
there were only 200 rules dealing with bacteremia. The open portion of
each bar refers to the CFs of incremental rules since that time, most of
which deal with meningitis. The overall pattern is about the same, although
the more recent system has proportionally more small positive CFs. This
makes sense because the newer rules often deal with softer data (clinical
evidence) in contrast to the rules for bacteremia, which generally interpret
8Self-referencing rules, described in Chapter 5, were often used to deal with such utility
considerations. As mentioned in Chapter 3, they allowed dangerous organisms, initially sug-
gested with only minimal certainty, to be reconsidered and further confirmed by special
evidence. For example:/fyou are already considering Pseudomonasand the patient has ecthyma
gangrenosum skin lesions, then there is even greater importance to the conclusion that the
pathogen is Pseudomonas.
9The rules without CFs do not associate evidence with hypotheses but make numerical com-
putations or save a text string to be printed later. Note also that some rules, particularly
tabular rules, make manyconclusions and thus account for the fact that there are more CFs
than rules.
218 UncertaintyandEvidentialSupport
+.5
NUMBEROF CONCLUSIONS
Number
of cases (out of 10)
Number Same Different Different
of organisms organisms organisms
intervaL~ and therapy and therapy
I0 9 1 0
5 7 3 0
4 8 2 1
3 5 5 1
2 1 9 3
FIGURE
10-3 Results of CFsensitivity experiment.
each run, rules were modified by mapping the existing rule CFs onto a
new, coarser scale. The original CF scale has 1000 intervals from 0 to
1000. H) Trials were run using ten, five, four, three, and two intervals. Thus,
when there are five intervals, all rule CFs are mapped onto 0, 200, 400,
600, 800, and 1000, rounding as necessary. Whenthere are two intervals,
only the numbers 0, 500, and 1000 are used.
CFs were combined using the usual combining function (the revised
version that was in use by 1979). Thus intermediate conclusions mapped
onto arbitrary numbers from 0 to 1000. Clustering the final organism list
was done in the normal way (cutting off at the largest gap). Finally, negative
CFs were treated analogously, for example, mapping onto 0, - 333, - 666,
and - 1000 when there were three intervals.
In examining results, we are interested primarily in three possible
outcomes: (1) no change to the item list (and hence no change in therapy);
(2) different organisms, but the same therapy; and (3) new therapy
therefore different organisms). Figure 10-3 summarizes the data from the
ten cases run with five different CF scales.
Degradation of performance was only pronounced when the number
of intervals was changed to three (all rule CFs mapped onto 0, 333, 666,
and 1000). But even here five of the ten cases had the same organism list
and therapy. It wasnt until CFs were changed to 0, 500, and 1000 that a
dramatic change occurred; and even with nine new organism lists, we find
that seven of" the ten cases had the same therapy. The fact that the organism
list did not change radically indicates that MYCINsrule set is not "fine-
tuned" and does not need to be. The rules use CFs that can be modified
by -+ 0.2, showing that there are few deliberate (or necessary) interactions
in the choice of CFs. The observed stability of therapy despite changing
organism lists probably results because a single drug will cover for many
organisms, a property of the domain.
By the early 1980s, when much of our research was focusing on issues
other than EMYCIN systems, we still often found CFs to be useful com-
putational devices. One such example was the work of Jerry Wallis, de-
scribed in detail in Chapter 20. His research modeled causal chains with
rules and used CFs to represent the uncertainty in the causal links. Because
his system reasoned both from effects to causes and from causes to effects,
techniques were needed to prevent fruitless searching of an entire con-
nected subgraph of the network. To provide a method for search termi-
nation, the concept of a subthreshold path was defined, i.e., a path of
reasoning whose product of CFs can be shown to be below the threshold
used to reject a hypothesis as unknown. For example, if there is a linear
reasoning path of four rules (R1, R2, R3, and R4) where A can be asked
of the user and E is the goal that initiated a line of backward-chained
reasoning:
R1 R2 R3 R4
A ~B
~ ~C
~ ~D ~E
.8 .4 .7 .7
F
AnElectronicExchangeRegardingCFs 221
and I would all be happy with the results. I think points (3) and (4) above
sum up other peoples objections that might remain. If this is so, what are
suggestions from people who still arent happy with the model? Is everyone
satisfied with everything now? Are there more objections that I missed? Have
I completely misunderstood something? Have I completely misunderstood
everything? Please let me know what you think so we can start to work out
problems that might remain.
Carli
Carli,
Thanks for your summary--it appears to be correct in almost every de-
tail. I would like you to try separating COVERFOR and IDENTas soon as
possible since that is needed for bacteremia anywayand is a help in clarifying
the conceptual basis on which the program makes a recommendation. I also
think that everyone will be happy with the results, especially me if it brings
the knowledge bases into a common framework.
Myconcern is I would also like you to begin working on the rerepresen-
tation of the context tree to help us with time relations and the infection-
organism link. As Ted described it, you and he have pretty well worked things
out. Because it is necessary for the FOREACH ll mechanism and is desirable
for manyother reasons, I would like us not to delay it. Do you see problems
with this?
As I tried to say yesterday, my reservations with the meningitis system
stem from my uneasiness with the CF model, which we all know needs im-
proving (which Pacquerette [a visiting student from France] was starting, but
wont finish). I dont want Victor to becomedependent on a particular mech-
anism fbr combining CFs--because we hope the mechanism will be improved
soon. I have no doubt that the rules work welt now, and I dont disagree at
all with the need for firm reference points for the CFs.
As soon as COVERFOR and IDENT are separate, could you try the
meningitis patients again, enlisting whatever help you need? Then well be
able to decide whether that meets all our specs. After that we can be working
on the context tree and time problems while Victor continues development
on the medical side. I foresee no difficulty in mapping the CFs from existing
rules (meningitis as well as bacteremia) into whatever numbers are appro-
priate for a new CF model when we have one--with firm reference points if
at all possible.
Bruce
PS: 1 think a reference point for defining how strongly suggestive some
evidence is for a conclusion is easier when almost all conclusions are about
identities o[ organisms that should be treated for. In bacteremia the rules
conclude about so manydifferent things that it is harder--but no less desir-
I IFOREACH
is a quantification primitive in rules.
224 Uncertainty and Evidential Support
XX? This doesnt mean FUNGUSto me; no, I want to know why that pre-
scription was made." This same criticism does not apply with the same force
to manyrules with CF >0.2 because they bring together a "more significant
set of facts." They do this by capturing (often disjoint) pictures of the world
that in themselves MAKE SENSE.I do not at all understand how a rule can
be written that can at once stand on its own and yet NOTbe significant truth
(i.e., believable observation, tangible conclusion). It is mysuspicion that Vic-
tor has not built a system in which EVIDENCE combines plausibly, but rather
a system in which independent rules SUCCEEDTOGETHERto make a
conclusion that could be expressed as a single rule, and WOULD have to be
expressed that way to have a CF > 0.2.
Now, Victor has said that he could have combined these rules!to give a
body of rules in which these same small observations appear together, thus
yielding larger CFs. However, he believes that this would result in far more
rules (to allow for the cross product of occurrences), and he would not
sure that he had covered all of the possible cases. Well, certainly, with respect
to the latter, we can tell him if the larger set covers all of the various com-
binations. The question of having far more rules is, I suppose, a valid con-
cern. But at least then we could feel sure that only the PLAUSIBLE obser-
vations had been combined.
To summarize, we talk about accumulating "lots of small bits of clinical
evidence," but I do not understand how a bit of EVIDENCE could be NOT-
KNOWN (the definition of CF < = 0.2). To me, evidence gathered by a rule
should be an all-or-nothing thing--if something more is needed to make the
parameter KNOWN [i.e., CF > 0.2], then I expect that there is something
to be made explicit in the rule. This is the only way in which I can interpret
the notion of a discrete cutoff at 0.2. Above that point I know something;
below it I know nothing (NOTKNOWN). The only plausible explanation
have for Victors small CFs is that they are like tags that record an observa-
tion. It would make me much happier to see each of these CFs changed to
NOTICEDE with definite (= 1) CFs. Then these parameters could be com-
bined with evidence garnered from lab rules.
I would be happy to hear other opinions about the 0.2 cutoff and its
meaning for rule CFs.
Bill
There are three things that I feel we should consider in our discussions
that have not yet been mentioned. The first is a concern about knowledge
acquisition. I feel that whatever we decide, the MYCIN acquisition module
should be designed so that a recognized medical expert could, without too
much difficulty, add a new rule or other piece of knowledge to the MYCIN
data base. I wonder if a doctor in Boston would be able to add a meningitis
rule to MYCINwithout hurting the performance of Victors system. I got
the impression that Victors system was somewhat fragile in this regard. I
doubt that he would want to give up the ability to easily add medical knowl-
296 Uncertainty and Evidential Support
edge to MYCIN.I fear that we would be doing just that. (This problem
includes the question of maintaining rule modularity.)
Mysecond concern is that even if we carl define fairly well what we mean
by 0.7, 0.5, anything above 0.2, (I.2, etc., it seems that the next problem will
be to define 0.25, 0.225, 0.175, 0.5, etc. Wecould continue this defining of
CFs in smaller and smaller intervals forever. However, I doubt that medical
science is exact enough for us to be able to do this.
This brings us to my third concern. In my recent meeting with Dr. Ken
Vosti [a professor in Stanfords Division of Infectious Diseases], he stated a
problem, already familiar to most of us, that even if we could reach agree-
ment among the infectious disease experts at Stanford as to the "right" CFs
to put on our rules, the infectious disease experts on the East Coast and other
places would probably not agree with us. Nowlets take this one step further.
Say we are able to assign fairly straightforward meanings to our CFs. Now
we have the problem of a doctor in some other part of the country who
doesnt want to use MYCINbecause our CFs dont agree with what he would
use. In other words, by defining our CFs at all rigorously, were inviting
disagreement. So, concerns two and three are saying that we can never define
each number on the 0 to 1.0 scale, and if we could, that might not be such
a good idea anyway.
I have no solutions to offer at this time, but I hope everyone will keep
these concerns in mind. I feel that CFs are designed to give doctors who
read and write the rules a certain "commonsense referent" as to how valid
the rule might be. If CFs become more important than that, I fear we will
use too muchof our medical expertise in deciding on the "right" CF for each
rule, time that could be used to add more medical knowledge to the MYCIN
data base.
Jan
Bill,
1. Whyis the system insensitive to CF? Certainly, this is not true for the
meningitis rules.
2. Your point about plausible situations is a good one, and deserves fur-
ther amplification and discussion. The reason I have "separated" the number
of premises that in the bacteremia rules would have been combined is that I
believe they are independent premises. I dont believe I ever said the reason
for separating them is to avoid having too many rules; the reason for sepa-
rating them is to cover a number of subtle clinical situations that would
otherwise not have been considered. More on this later.
3. Finally, I should add that the 0.2 cutoff was selected because it is the
one being used for SIGNIFICANCEand I thought it would best mesh with
the current system. I must admit that 1 am surprised at the furor it has
An Electronic ExchangeRegarding CFs 227
evoked; if you wish to use some other cutoff, thats fine with me--the CFs
could be easily adjusted.
4. I didnt understand a few of the points you raised, so I look forward
to the next meeting.
Finally, I should say that the system that I have proposed is not meant
in any way to replace the current bacteremia rules; it was merely a simple,
practical way to handle meningitis. I did not feel the approach used in bac-
teremia was precise enough to handle meningitis.
Victor
Date: 29 Feb 1976
From: Yu
Subject: On Wed. meeting and Aikins
To: Aikins, Scott
cc: MYCINgang
J~Ul,
1. You state that we are giving up the ability to "easily" add rules to
MYCIN.Certainly, it is currently "easy" to add new rules to MYCIN;
however, it is not so "easy" to rationalize, justify, and analyze these
new rules. Furthermore, it becomes "difficult" when the system starts
giving incorrect therapy after these new rules have been added.
2. I believe a doctor in Boston would have an "easier" task of adding
new meningitis rules, as compared to bacteremia rules. He now has
some reference points and definite guidelines on how a rule should
be written. Again, the rule is more likely to be compatible with the
existing system, since the new rule is written along the same guidelines
and same philosophy. This is not the case with the bacteremia rules
where it is likely and even probable that any new rule written by a
non-MYCINperson could cause the system to malfunction.
3. I have not attempted to specifically define every increment between
CFs.
4. I need not renfind all of us that we are dealing directly with human
lives. If" another M.D. on the East Coast disagrees with our CFs and
has data (be it strong or weak) as the basis for his disagreement, then
we had better know about it. I claim that one of the advantages of
specific criteria fbr CFs is that this "invites disagreement" (or to put
it another way---critical analysis of the rules by non-MYCIN experts
is possible).
5. What is this mystical "commonsense referent" that you have men-
tioned? (Likewise, Ted has stated that physicians would PROBABLY
agree fhirly closely on the CFs currently in MYCIN.If this is true,
then my arguments for preciseness are invalid and unnecessary.)
6. Your last point concerning using too much time and effort on the CF
question, when we could be adding more medical knowledge--I will
merely refer you to Matthew: Chapter 7, verses 24-27.
Cheers,
Victor
228 Uncertainty and Evidential Support
Bill
3 March 1976
From: Clancey
Subject: Modularity of rules
To: Yu
cc: MYCINgang
I have completed a write-up of my understanding of what we
mean by rule independence. I consider this useful as a tutorial to those who
perhaps have not fully appreciated the significance of the constraint
P(el & e21 h)= P(ellh)*P(e2[h), which is discussed in several of Teds
ups on the relation of CFs to probabilities.
For those of you for whomthis is old hat by now, I would appreciate it
if you would peruse my memoand let me know if Ive got it straight.
Ive expanded the discussion of plausibility of rule interaction here also.
This appears to be an issue worth pursuing.
The menlo is CEMODULAR on my directory. It is about 3 pages long.
Bill Clancey
An Electronic ExchangeRegarding CFs 229
<CLANCEY>CEMODULAR. 1
I. Introduction
This memoarose from my desire to understand rule CFs of less than
the 0.2 threshold. Howcould such a rule be evidence of something? Does a
rule having a CF less than 0.2 pose any problems to the process of combining
certainty factors? What does it mean to say that a rule is modular? Must a
rule satisfy some property relating to its certainty factor to be considered
modular?
After thinking out all of these problems for myself, I re-examined our
publications in the light of my new understanding. Alas! The ideas discussed
below have long been known and were simply overlooked or undervalued by
me. Indeed, I suspect that most of us have to some degree failed to appreciate
Teds thesis, from which I will be quoting below.
Some of the consistency checks ]i~d discusses are subsumption and rule con-
tradictions.
rule, other than one that simply adds the evidence together incrementally
according to the combining function. A new argument that is built from the
evidence mentioned in the other rules is proof that the individual rules are
not modular. (Subsumption is an explicit fi)rm of this.) Thus, Victors claim
that he wants to allow for all combinations MUSTrest on the inherent in-
dependence of his premise sets. Again, no conclusion whatsoever should be
drawn from tile coincidence of any combination of premise sets, other than
that arrived at by the CF combining function. Moreover, every conclusion
collected incrementally by the combining function must be one Victor would
reach with the same strength, given that union of premise clauses (cf., B
C above). In fact, I am willing to believe now that a rule having a CF<0.2 is
perhaps MORElikely to be independent because it wouldnt have been given
such a small CFunless the author saw it as minimally useful. That is, it stands
on its own as a very weak observation having no other inferential value (I
am still wary of calling it "evidence"). If it had a higher CF, it would almost
certainly be useful in combination with other observations. Based on Victors
decision to separate meningitis clinical and lab rules, I conclude that doctors
do not have the ability to relate the two. Is this correct? I believe that Ted
has also questioned Victors rules in this respect.
V. Plausibility
The problem of plausible combination of rules is difficult to anticipate
because it is precisely the unanticipated coincidence of rule success that we
are most likely to find objectionable. Suppose that we do find two rules D
and E that we cant imagine ever succeeding at the same time, yet there is no
logical reason for tiffs not to occur (i.e., the rules are not mutually exclusive;
not always easy to determine since all rules that cause these rules to be in-
w)ked must be examined). In this case we should try to define a new param-
eter that explains the connection between these two parameters, which we
do not as yet understand. (A method of theory formation: ask yourself"What
would I think if these two pieces of evidence were true?" Perhaps the actions
are in conflict--why? Perhaps the premises never appear together (usually
arent both true)--why not? Do this for the power set of all evidence under
consideration.)
that because a rule looks like a discrete object it is necessarily modular. I have
assumed that it is sufficient to have a CF combining function that models
adequately the process of incrementally collecting evidence, forgetting that
this evidence MUST be discrete [or the [unction to be valid. Otherwise, a
FUNCTION is replacing a logical argument, which a rule unifying the prem-
ises would represent.
VII. Making Rules Modular
It remains to detect if MYCINsrules are modular. We must look for
premises that are still "charged" with inference potential, as measured relative
to clauses in other rules. Victor has said that his rules are modular (at least
the ones having CF<0.2). If so, there is no problem, though we should be
wary about the 0.05/0.15 distinctions. (Howis it that "evidence" that is too
weak to yield an acceptable conclusion nevertheless is definite enough to be
put in one of three CF categories: 0.05, 0.10 and 0.15?)
One method for detecting rule modularity is as follows. Given, for ex-
ample, three rules A, B, and C, where B and C have the same CF (all three
mention VALUEP),then ifA & B and A & C are determined to have different
certainty factors (where & denotes the process of combining the rules into
single rule), then the rules A, B, and C arent modular.
On the other hand, given two rules A and B known to be modular (our
knowledge of the domain cannot yield an argument that combines the prem-
ises), then A & B must have a CF given by the combining function (obviously
true for disjoint rules). This gives us a wayfor evaluating a combining func-
tion.
11
A Model of Inexact
Reasoning in Medicine
233
234 A Model of Inexact Reasoning in Medicine
had never been explicitly stated. It is both fascinating and educational for
experts to reflect on the inference rules that they use when providing
clinical consultations.
Several programs have successfully modeled the diagnostic process.
Manyof these have relied on statistical decision theory as reflected in the
use of Bayes Theorem for manipulation of conditional probabilities. Use
of the theorem, however, requires either large amounts of valid back-
ground data or numerous approximations and assumptions. The success
of Gorry and Barnetts early work (Gorry and Barnett, 1968) and of
similar study by Warner and coworkers using the same data (Warner et al.,
1964) depended to a large extent on the availability of good data regarding
several hundred individuals with congenital heart disease.
Although conditional probability provides useful results in areas of
medical decision making such as those we have mentioned, vast portions
of medical experience suffer from having so few data and so much im-
perfect knowledgethat a rigorous probabilistic analysis, the ideal standard
by which to judge" the rationality of a physicians decisions, is not possible.
It is nevertheless instructive to examine models for the less formal aspects
of decision making. Physicians seem to use an ill-defined mechanism for
reaching decisions despite a lack of formal knowledge regarding the in-
terrelationships of all the variables that they are considering. This mech-
anism is often adequate, in well-trained or experienced individuals, to lead
to sound conclusions on the basis of a limited set of observations.1
The purpose of this chapter is to examine the nature of such non-
probabilistic and unformalized reasoning processes and to propose a model
by means of which such incomplete "artistic" knowledge might be quan-
tified. Wehave developed this model in response to the needs of a com-
puter program that will permit the opinions of experts to become more
generally available to nonexperts. The model is, in effect, an approxima-
tion to conditional probability. Although conceived with medical decision
making in mind, it is potentially applicable to any problem area in which
real-world knowledge must be combined with expertise before an informed
opinion can be obtained to explain observations or to suggest a course of
action.
We begin with a brief discussion of Bayes Theorem as it has been
utilized by other workers in this field. The theorem will serve as a focus
for discussion of the clinical problems that we would like to solve by using
computer models. The potential applicability of the proposed decision
model is then introduced in the context of the MYCINsystem. Once the
problem has been defined in this fashion, the criteria and numerical char-
acteristics of a quantification scheme will be proposed. Weconclude with
a discussion of how the model is used by MYCIN when it offers opinions
to physicians regarding antimicrobial therapy selection.
lIntuition may also lead to unsound conclusions, as noted by Schwartz et al. (1973).
Formulation
of the Problem 235
P(di) P(e[di)
P(dile) - X P(dj) P(eldi)
The successful programs that use Bayes Theorem in this form require
huge amountsof statistical data, not only P(skldj) for each of the pieces of
data, Sk, in e, but also the interrelationships of the Sk within each disease
dj. 3 The congenital heart disease programs (Gorry and Barnett, 1968; War-
ner et al., 1964) were able to acquire all the necessary conditional proba-
bilities from a survey of several hundred patients with confirmed diagnoses
and thus had nonjudgmental data on which to base their Bayesian analyses.
Edwards (1972, pp. 139-140) has summarized the kinds of problems
that can arise when an attempt is made to gather the kinds of data needed
for rigorous analysis:
Myfriends who are expert about medical records tell me that to attempt
to dig out from even the most sophisticated hospitals records the frequency
of association between any particular symptomand any particular diagnosis
is next to impossible--and when I raise the question of complexes of symp-
toms, they stop speaking to me. For another thing, doctors keep telling me
that diseases change, that this years flu is different from last years flu, so
that symptom-disease records extending far back in time are of very limited
usefulness. Moreover, the observation of symptomsis well-supplied with er-
ror, and the diagnosis of diseases is even more so; both kinds of errors will
ordinarily be frozen permanently into symptom-disease statistics. Finally,
even if diseases didnt change, doctors would. The usefulness of disease cat-
egories is so much a function of available treatments that these categories
themselves change as treatments change--a fact hard to incorporate into
symptom-diseasestatistics.
All these arguments against symptom-disease statistics are perhaps some-
what overstated. Wheresuch statistics can be obtained and believed, obviously
they should be used. But I argue that usually they cannot be obtained, and
even in those instances where they have been obtained, they may not deserve
belief.
aFor example,although sI and s 2 are independentover all diseases, it maybe true that sI and
s2 are closely linked for patients with disease di. Thusrelationships must be knownwithin
eachof the dj; overall relationships are not sufficient.
MYCINs Rule-Based Approach 237
11.2MYCiNs
Rule-Based
Approach
alms the results available using Bayes Theorem. We do not argue against
the use of Bayes Theorem in those medical environments in which suffi-
cient data are available to permit its adequate use.
The advantages of rule-based systems for diagnostic consultations in-
clude:
We shall use the following rule tor illustrative purposes throughout this
chapter:
IF: 1) Thestainof the organism is gram positive,and
2) Themorphology of the organism is coccus, and
3) Thegrowth conformationof the organism is chains
THEN: There is suggestiveevidence(.7) that the identity
of the organism
is streptococcus
This rule reflects our collaborating experts belief that gram-positive cocci
growing in chains are apt to be streptococci. When asked to weight his
belief in this conclusion, 4 he indicated a 70% belief that the conclusion was
valid. Translated to the notation of conditional probability, this rule seems
41n the English-languageversion of the rules, the programuses phrases such as "suggestive
evidence," as in the above example.However,the numberstbllowing these terms, indicating
degrees of certainty, are all that is used in the model.The English phrases are not given by
the expert and then quantified; they are, in effect, "canned-phrases"used only for translating
rules into English representations. The prompt used for acquiring the certainty measure
from the expert is as follows: "Ona scale of 1 to 10, howmuchcertainty do you affix to this
conclusion?"
Philosophical Background 239
to say P(hl]s I & s2 8c. s:~)=0.7 where hi is the hypothesis that the organism
is a Streptococcus, sl is the observation that the organism is gram-positive,
s~ that it is a coccus, and s3 that it grows in chains. Questioning of the
expert gradually reveals, however, that despite the apparent similarity to
a statement regarding a conditional probability, the number 0.7 differs
significantly from a probability. The expert may well agree that
P(hl]sl & s2 & s:0 = 0.7, but he becomes uneasy when he attempts to follow
the logical conclusion that therefore P(~hllS 1 & s2 & s~) = 0.3. He claims
that the three observations are evidence (to degree 0.7) in favor of the
conclusion that the organism is a Streptococcus and should not be construed
as evidence (to degree 0.3) against Streptococcus. Weshall refer to this prob-
lem as Paradox 1 and return to it later in the exposition, after the inter-
pretation of the 0.7 in the rule above has been introduced.
It is tempting to conclude that the expert is irrational if he is unwilling
to follow the implications of his probabilistic statements to their logical
conclusions. Another interpretation, however, is that the numbers he has
given should not be construed as probabilities at all, that they are judg-
mental measures that reflect a level of" belief. The nature of such numbers
and the very existence of such concepts have interested philosophers of
science for the last half-century. Weshall therefore digress temporarily to
examine some of these theoretical issues. Wethen proceed to a detailed
presentation of the quantitative model we propose. In the last section of
this chapter, we shall show how the model has been implemented for on-
going use by the MYCINprogram.
11.3Philosophical Background
5The P-function may be defined in a variety of ways. Emanuel Parzen (1960) suggests a set-
theoretical definition: Given a random situation, which is described by a sample description
space s, probability is a function P that to every event e assigns a nonnegative real number,
denoted by P(e) and called the probability of the event e. The probability function must satisfy
three axioms:
during the last 30 years. One difficulty with these analyses is that they are,
in general, more theoretical than practical in orientation. They have char-
acterized the problem well but have offered few quantitative or theoretical
techniques that lend themselves to computer simulation of related reason-
ing processes. It is useful to examine these writings, however, in order to
avoid recognized pitfalls.
This section therefore summarizes some of the theory that should be
considered when analyzing the decision problem that we have described.
Wediscuss several interpretations of probability itself, the theory on which
Bayes Theoremrelies. The difficulties met when trying to use the P-func-
tion during the modeling of medical decision making are reiterated. Then
we discuss the theory of confirmation, an approach to the interpretation
of evidence. Our discussion argues that confirmation provides a natural
environment in which to model certain aspects of" medical reasoning. We
then briefly summarize some other approaches to the problem, each of"
which has arisen in response to the inadequacies of applied probability.
Although each of" these alternate approaches is potentially useful in the
problem area that concerns us, we have chosen to develop a quantification
scheme based on the concept of confirmation.
11.3.1 Probability
The simplest [way] is to ask the geologist .... The geologist looks at the
evidence, thinks, and then gives a figure such as 1 in 5 or 50-50. Admittedly
this is difficult .... Thus, several wayshave been proposed to help the ge-
ologist makehis probability estimate explicit .... The leading proponentof
personal [i.e., subjective] probabilities, Savage,proposeswhat seemsto be the
most workable method. One can, namely, ask the person not how he feels
242 A Modelof Inexact Reasoningin Medicine
11.3.2 Confirmation
was willing to admit the subjective nature of such concepts some years later
when, in discussing the nature of inductive reasoning, he wrote (Carnap,
1962, p. 317):
There are additional approaches to this problem area that bear mention-
ing, even though they are peripheral to confirmation and probability as
we have described them. One is the theory of fuzzy sets first proposed by
Zadeh (1965) and further developed by Goguen (1968). The theory
tempts to analyze and explain an ancient paradox paraphrased by Goguen
as follows:
If" you add one stone to a small heap, it remains small. A heap containing
one stone is small. Therefore (by induction) every heap is small.
The term fuzzy set refers to the analogy with set theory whereby, for
example, the set of tall people contains all 7-foot individuals but may or
may not contain a man who is 5 feet 10 inches tall. The "tallness" of a man
in that height range is subject to interpretation; i.e., the edge of the set is
fuzzy. Thus, membershipin a set is not binary-valued (true or false) but
expressed along a continuum from 0 to 1, where 0 means "not in the set,"
1 means "in the set," and 0.5 means "equally likely to be in or out of the
set." These numbers hint of statistical probability in much the same way
that degrees of confirmation do. However, like confirmation, the theory of
fuzzy sets leads to results that defy numerical manipulation in accordance
with the axioms of the P-function. Although an analogy between our di-
agnostic problem and fuzzy set theory can be made, the statement of di-
agnostic decision criteria in terms of set membership does not appear to
be a natural concept for the experts who must formulate our rules. Fur-
246 A Modelof Inexact Reasoningin Medicine
p(hf~)- e(h)
1 - P(h)
This ratio is called the measure of increased belief" in h resulting from the
observation of e, i.e., MB[h,e].
Suppose, on the other hand, that P(h{e) were less than P(h). Then the
observation of e would decrease the experts belief in h while increasing his
or her disbelief regarding the truth of h. The proportionate decrease in
belief in this case is given by the following ratio:
P(h) - P(hle)
p(h)
Wecall this ratio the measure of increased disbelief in h resulting from the
observation of e, i.e., MD[h,e].
To summarize these results in words, we consider the measure of
increased belief, MB[h,e], to be the proportionate decrease in disbelief
regarding the hypothesis h that results tiom the observation e. Similarly,
the measure of increased disbelief, MD[h,e], is the proportionate decrease
in belief regarding the hypothesis h that results from the observation e,
where belief is estimated by P(h) at any given time and disbelief is estimated
by 1 - P(h). These definitions correspond closely to the intuitive concepts
of confirmation and disconfirmation that we have discussed above. Note
that since one piece of evidence cannot both favor and disfavor a single
hypothesis, when MB[h,e] > 0, MD[h,e] = 0, and when MD[h,e] > 0,
MB[h,e] = 0. Furthermore, when P(h]e) P(h), th e ev idence is ind ependent
of the. hypothesis (neither confirms nor disconfirms) and MB[h,e]
MD[h,e] = 0.
The above definitions may now be specified formally in terms of con-
ditional and a priori probabilities:
1 if P(h) =
MB[h,e] = max[P(hle),P(h)] -
otherwise
max[ 1,0] - P(h)
1 if P(h) =
MD[h,e] = min[P(hle),P(h)] -
otherwise
min[1,0] - P(h)
Examination of these expressions will reveal that they are identical to the
definitions introduced above. The tbrmal definition is introduced, how-
ever, to demonstrate the symmetry between the two measures. In addition,
we define a third measure, termed a certainty factor (CF), that combines the
MBand MDin accordance with the fblh)wing definition:
The Proposed Model of Evidential Strength 249
7There is a special case of Characteristic 2 that should be mentioned. This is the case of
logical truth or falsity where P(hle ) = 1 or P(h[e) = 0, regardless of e. Popper has also
suggested a quantilication scheme for confirmation (Popper, 1959) in which he uses - 1
C[h,e] <~ +1, defining his limits as:
- 1 = Club,h] <~ C[h,e] <~ C[h,h] = +1
This proposal led one observer (HarrY, 1970) to assert that Poppers numbering scheme
"obliges one to identify the truth of a self-contradiction with the falsity of a disconfirmed
general hypothesis and the truth of a tautology with the confirmation of a confirmed exis-
tential hypothesis, both of which are not only question begging but absurd." As we shall
demmlstrate, we awfid Poppers problem by introducing mechanisms for approaching cer-
tainty asymptotically as items of confirmatory evidence are discovered.
250 A Modelof Inexact Reasoningin Medicine
3. Lack of evidence:
a. MB[h,e] = 0 if h is not confirmed by e (i.e., e and h are independent
or e disconfirms h)
b. MD[h,e]= 0 if h is not disconfirmed by e (i.e., e and h are indepen-
dent or e confirms h)
c. CF[h,e] = 0 if e neither confirms nor disconfirms h (i.e., e and h are
independent)
= 0 - P(-~hie) p(-Th)
- p(Th)
[1 - P(hle)] - [1 - P(h)] P(h) - P(hle)
1 - P(h) 1 - P(h)
CF[h,e] = MB[h,e] - MD[h,e]
_ P(hie) - P(h) 0
1 - P(h)
Thus
Clearly, this result occurs because (for any h and any e) MB[h,e]
MD[--qh,e]. This conclusion is intuitively appealing since it states that evi-
dence that supports a hypothesis disfavors the negation of the hypothesis
to an equal extent.
Wenoted earlier that experts are often willing to state degrees of belief
in terms of conditional probabilities but they refuse to follow the assertions
to their logical conclusions (e.g., Paradox 1 above). It is perhaps revealing
to note, therefore, that whenthe a priori belief in a hypothesis is small (i.e.,
The Proposed Model of Evidential Strength 251
P(hle) - P(h)
CF[h,e] = MB[h,e] - MD[h,e] - 0) ~ P(hle
1 - P(h)
SWeassert that behavior is irrational if actions taken or decisions madecontradict the result
that would be obtained t, nder a probabilistic analysis of the behavior.
252 A Modelof Inexact Reasoningin Medicine
Certainty factors provide a useful way to think about confirmation and the
quantification of degrees of belief. However, we have not yet described
how the CF model can be usefully applied to the medical diagnosis prob-
lem. The remainder of this chapter will explain conventions that we have
introduced in order to use the certainty factor model. Our starting as-
sumption is that the numbers given us by experts who are asked to quantify
their degree of belief in decision criteria are adequate approximations to
the numbers that would be calculated in accordance with the definitions
of MBand MDif the requisite probabilities were known.
When we discussed Bayes Theorem earlier, we explained that we
would like to devise a method that allows us to approximate the value for
P(dile) solely from the P(di]sk), wheredi is the ith possible diagnosis, sk is the
The Modelas an Approximation
Technique 253
kth clinical observation, and e is the composite of all the observed sk. This
goal can be rephrased in terms of certainty factors as follows:
Defining Criteria
1. Limits:
a. MB[h,e+] increases toward 1 as confirming evidence is found,
equaling 1 if and only if a piece of evidence logically implies h with
certainty
Commutativity:
.
Missing information:
.
Combining Functions
2. Conjunctions of hypotheses:
MB[hz& hz,e] = min(MB[hl,e], MB[h2,e])
MD[hl & h2,e] = max(MD[hl,e], MD[hz,e])
4. Strength of evidence:
If" the truth or falsity of" a piece of evidence sl is not knownwith cer-
tainty, but a CF (based on prior evidence e) is known reflecting the
degree of belief in sl, then if MB[h,sl] and MD[h, sx] are the degrees
256 A Modelof Inexact Reasoningin Medicine
= min(MB[sl,e], MB[sz,e],
max(MB[s3,e], MB[s4,e]))
-0, o/ ..
-1.0 -0.8 -0.6 -0.4 -0.2 / I wg 0.2 0.4 0.6 0.8 1.0
0
-o.2 c1:[..,:]
/e
45 -0.4
/
/
-0.6
/
/ -0.8
/
/ -1.0
The program was run on sample data simulating several hundred patients.
The question to be asked was whether CF[h,e] is a good approximation to
CF*[h,e]. Figure 11-1 is a graph summarizing our results. For the vast
majority of cases, the approximation does not produce a CF[h,e] radically
different from the true CF*[h,e]. In general, the discrepancy is greatest
when Combining Function 1 has been applied several times (i.e., several
pieces of evidence have been combined). The most aberrant points, how-
ever, are those that represent cases in which pieces of evidence were
strongly interrelated for the hypothesis under consideration (termed con-
MYCINs
Use of the Model 259
CF[sl,e] = 1 CF[sz,e] = 1
Thus it is no longer appropriate to use the rule in question with its full
confirmatory strength of 0.7. That CF was assigned by the expert on the
assumption that all three conditions in the premise would be true with
certainty. The modified CF is calculated using Combining Function 4:
CF[hl,sa & Sz & s3] = MB[hl,sl & s 2 & s3] - MD[hl,Sl & s2 & s3]
= 0.7 max(0, CF[sl & s2 & s3,e]) -
Thus the strength of the rule is reduced to reflect the uncertainty re-
garding s3. Combining Function 1 is now used to combine 0.42 (i.e.,
MB[hl,sI & s,~ & s3]) with the previous MBfor the hypothesis that the
organism is a Streptococcus.
We have shown that the numbers thus calculated are approximations
at best. Henceit is not justifiable simply to accept as correct the hypothesis
with the highest CF after all relevant rules have been tried. Therapy is
therefore chosen to cover for all identities of organisms that account for a
sufficiently high proportion of the possible hypotheses on the basis of their
262 A Model of Inexact Reasoning in Medicine
J. Barclay Adams
263
264 Probabilistic Reasoning and Certainty Factors
P(hle) P(elh)
P(h) P(e)
and
or
and
P(elh)
- 1 - MD[h,e] (6)
P(e)
P(el-nh
) - 1 - MB[h,e] (7)
P(e)
266 ProbabilisticReasoning
andCertaintyFactors
Now, to continue the parallel, we write Bayes Theorem for two pieces of"
evidence favoring a hypothesis:
with
one then has a computationally simple way of serially adjusting the prob-
ability of a hypothesis with new evidence against the hypothesis:
where e i is the new evidence, e" is the total evidence after the introduction
of ei, and e is the evidence before the new evidence is introduced [note
that P(hle)=P(h ) before any evidence is introduced]. Alternatively, one
could combine all elements of evidence against a hypothesis simply by
using independence as in Equation (5) and separately combine all elements
of" evidence favoring a hypothesis by using Equation (9), and then use
Equations (12) and (13) once.
The attractive computational simplicity of this scheme is vitiated by
the restrictive nature of" the independence assumptions made in deriving
it. The MBs and MDs for different pieces of evidence cannot be chosen
arbitrarily and independently. This can be clearly seen in the following
simple theorem. If el and e 2 are independent both in the whole population
and in the subpopulation with property h, then
Tiffs follows from dividing Equation (2) by Equation (1). The nature
restrictions placed on the probabilities can be seen from the limiting case
in which all membersof el are in h. In that case, P(hlel) = P(hle 1 & e2)
1, so P(hle,e) = P(h); that is, if" somepiece of evidence is absolutely diagnostic
of an illness, then any evidence that is independent can have no diagnostic
value. This special case of the theorem was noted in a paper of Warner et
al. (1961). Restrictions this forces on the MBscan be further demonstrated
by the following example. Wewrite Bayes Theorem with the independence
assumption as follows:
Consider the case of two pieces of evidence that favor the hypothesis. Using
Equations (6), (10), and (11), one can express P(elh)/P(e ) in terms of MB
as follows:
Using this form and the fact that P(h[eI & e2) ~ 1, we get from Equation
(15)
This is not satisfied fbr all values of the MBs; e.g., if P(h) = 1/11 and
MB[h,el] = 0.7, then we must choose the narrow range MB[h,e2] ~< 0.035
to satisfy the inequality. Most workers in this field assume that elements of
evidence are statistically independent only within each of a complete set
of" mutually exclusive subpopulations and not in the population as a whole;
thus the properties of (14) and (15) do not hold. Occasionally, writers
implicitly made the stronger assumption of independence in the whole
space (Slovic et al., 1971).
bining MB[h,el] with MB[h,e2] to yield MB[h,q & e2] and similar rules for
MD. With one exception discussed below, these rules need not be postu-
lated because they are equivalent to, and can be derived from, the method
of combining probability ratios under the assumption of independence
used in the previous section. For example, the rule for MDsis derived as
follows by using Equation (5):
or
The certainty factor is used in two ways. One is to rank hypotheses to select
those for further action. The other is as a weighting factor for the credi-
bility of a hypothesis h, which is supposed by an intermediate hypothesis
i, which in turn is supported by evidence e. The appropriateness of CF for
each of these roles will be examined.
One of the uses of CF is to rank hypotheses. Because CF[h,e] does not
correspond to the probability ofh given e, it is not difficult to give examples
in which, of two hypotheses, the one with the lower probability would have
the higher certainty factor, or CE For example, consider two hypotheses
hI and h,) and some body of evidence e that tends to confirm both
hypotheses. Suppose that the a priori probabilities were such that P(hl)
P(h2) and P(hlle) > P(h21e); it is possible that CF[hl,e] < CF[h2,e]. For exam-
ple, if P(hl) = 0.8, P(h2) = 0.2, P(hlle ) = 0.9, P(h21e ) = 0.8, then
CF[hl,e] = 0.5 and CF[h2,e] = 0.75. This failure to rank according to
probabilities is an undesirable feature of CEIt would be possible to avoid
it if it were assumedthat all a priori probabilities were equal.
The weighting role for CF is suggested by the intuitive notion that in
a chain of reasoning, if e implies i with probability P(ile), and i, if true,
implies h with probability P(hli ), then
n(h & i) = n(h) n(i &e) = n(i) n(h & e) = n(h) (27)
is not true in general under the assumptions of (27) or any other natural
set, as may be demonstrated by substitution into these relationships of the
definitions of MB, MD, and CF.
12.3 Conclusions
The simple model of Section 12.1 is attractive because it is computationally
simple and apparently lends itself to convenient estimation of parameters
by experts. The weakness of the system is the inobvious interdependence
restriction placed on the estimation of parameters by the assumptions of
independence. The MYCIN model is equivalent in part to the simple prob-
ability model presented and suffers from the same subtle restrictions on
parameter estimation if it is to remain internally consistent.
The ultimate measure of success in models of medical reasoning of
this sort, which attempt to mimic physicians, is the closeness of their ap-
proach to perfect imitation of experts in the field. The empirical success
of MYCINusing the model of Shortliffe and Buchanan stands in spite of
theoretical objections of the types discussed in the preceding sections. It is
probable that the model does not flmnder on the difficulties pointed out
because in actual use the chains of reasoning are short and the hypotheses
simple. However, there are many fields in which, because of its shortcom-
ings, this model could not enjoy comparable success.
The fact that in trying to create an alternative to probability theory or
reasoning Shortliffe and Buchanan duplicated the use of standard theory
Conclusions 271
272
Basics of the Dempster-Shafer Theory 273
{hep, cirr, gall} {hep, cirr, pan} {hep, gall, pan} {cirr, gall, pan}
{hep, cirr} {hep, gall} {cirr, gall} {hep, pan} {cirr, pan} {gall, pan}
FIGURE
13-1 The subsets of the set of causes of cholestasis.
Cholestatic Jaundice
Examples
Thus, Bel(A) is a measure of the total amount of belief in A and not of the
amount committed precisely to A by the evidence giving rise to m.
Referring to Figure 13-1, Bel and m are equal for singletons, but
Bel(A), where A is any other subset of O, is the sum of the values of m for
every subset in the subtree formed by using A as the root. Bel(O) is always
equal to 1 since Bel(O) is the sum of the values of mfor every subset of
This sum must be 1 by definition of a bpa. Clearly, the total amount of
belief in O should be equal to the total amount of belief, 1, since the
singletons are exhaustive.
To illustrate, the belief function corresponding to the bpa of Example
2 is given by Bel(O)= 1, Bel(A)= 0.6, where A is any proper subset
containing {hep, cirr}, and the value of Bel for every other subset of O is
0.
m2
I
{cirr, gall, pan} (0.7) 0 (0.3)
In this example, a subset appears only once in the tableau and mlOm2 is
easily computed:
mlOm2({cirr}) = 0.42
mlOm,)({hep, cirr}) = 0.18
mlGmz({cirr, gall, pan}) = 0.28
mlO m2(O)=0.12
mlOm2is 0 for all other subsets of O
since
ml(~ m2({hep, cirr, pan}) = mlG m2({hep, pan}) = mlo m2({cirr, pan})
In this example, the reader should note that mlOmz satisfies the def-
inition of a bpa: "Z ml~ m2(X) = 1, where X runs over all subsets of O and
mxGm,)(O)= 0. Equation (1) shows that the first condition in the definition
is always fulfilled. However, the second condition is problematic in cases
where the "intersection tableau" contains null entries. This situation did
not occur in Example 5 because every two sets with nonzero bpa values
always had at least one element in common. In general, nonzero products
280 TheDempster-Shafer
Theoryof Evidence
In this example, there are two null entries in the tableau, one assigned
the value 0.336 and the other 0.224. Thus
After all bpas with the same frame of discernment have been combined
and the belief function Bel defined by this new bpa has been computed,
how should the information given by Bel be used? Bel(A) gives the total
INotethat the revisedvalueswill still sumto 1 andhencesatisfy that conditionin the defi-
nition ofa bpa. Ifa+b+c= 1 then (a+b)/(l-c)= 1 and a/(l-c) + b/(l-c)= 1.
Basics of the Dempster-Shafer
Theory 281
[Bel(A) 1 - Bel(AC)]
It is not difficult to see that the right endpoint is always greater than the
left: 1-Bel(A~) i> Bel(A) or, equivalently, Bel(A) + Bel(A~) ~< 1. Since
BeI(A) and Bel(A~) c,are the sum of all values of m for subsets of A and A
respectively, and since A and Ac have no subsets in common, Bel(A)
BeI(A~) ~< ~Lm(X)= 1 where X ranges over all subsets of O.
In the Bayesian situation, in which Bel(A) + Bel(A~) = 1, the two
endpoints of the belief interval are equal and the width of the interval
1 - BeI(A~) - Bel(A) is 0. In the Dempster-Shafer model, however, the
width is usually not 0 and is a measure of the belief that, although not
committedto A, is also not committedto Ac. It is easily seen that the width
is the sum of belief committed exactly to subsets of @that intersect A but
that are not subsets ofA. IfA is a singleton, all such subsets are supersets
of A, but this is not true for a nonsingleton A. To illustrate, let A = {hep}:
282
The Dempster-Shafer Theory and MYCIN 283
,
However, since the evidence may actually focus on a small subset of 2
the computations need not be intractable. A second, more reasonable al-
ternative would be to apply the Dempster-Shafer theory after partitioning
the set of" diseases into groups of mutually exclusive diseases and consid-
ering each group as a separate frame of discernment. The latter approach
would be similar to that used in INTERNIST-1(Miller et al., 1982), where
scoring and comparison of hypotheses are undertaken only after a special
partitioning algorithm has separated evoked hypotheses into subsets of
mutually exclusive diagnoses.
"~The objection may be raised that in somecases all triples with the same object and attribute
are not mutually exclusive. For example, both (Patient-1 Allergy Penicillin) and (Patient-1
Allergy Ampicillin) may be true. In MYCIN,however, these triples tend not to have partial
degrees of belief associated with them; they are usually true-false propositions ascertained
by simple questioning of the user by the system. Thus it is seldom necessary to combine
evidence regarding these multi-valued parameters (see Chapter 5), and these hypotheses need
not be t,eated by the Dempster-Shafer theory.
284 The Dempster-Shafer
Theoryof Evidence
m2
{Pseu} (0.7) 0 (0.3)
Note that mlOm2 is a bpa that, like ml and mg, assigns some belief to a
certain subset of O, {Pseu}, and the remaining belief to O. For two con-
firming rules, the subset is a singleton; for disconfirming rules, the subset
is a set of size n- 1, where n is the size of 0.
m3
{Pseu}c (0.8) 0(0.2)
m4
{Strep} ~ (0.7) 0 (0.3)
Before combination, the belief intervals for {Pseu} and {Strep} c are
[0.4 1] and [0.7 1], respectively. After combination, they are [0.4 1] and
[0.82 1], respectively. Note that evidence confirming {Pseu} has also con-
firmed {Strep}c, ca superset of {Pseu}, but that evidence confirming {Strep}
has had no effect on belief in {Pseu}, a subset of {Strep}:.
Step 1. For each triple (i.e., singleton hypothesis), combine all bpas
representing rules confirming that value of" the parameter. If st, s9 ..... sk
represent different degrees of support derived from the triggering of" k
rules confirming a given singleton, then the combined support is
Step 2. For each triple, combine the two bpas computed in Step 1.
Such a computation is a Category 2 combination and has been illustrated.
Wenow have n bpas, which are denoted Evil, Eviz ..... Evi,,.
Evii({i}) Pi
Evii({i}O = ci
Evii(O) ri
BeI(A) =K([aH
[I dj] [~,,jEAPi/dj] + [[Ij~:ACj] [I]jEAMj]-- I~ j)
all j
where
K- t =[allndj][
j + all
Xp/dj]-
j
ncj
allj
Example 7. Consider, for example, the net effect of the following set
of rules regarding the diagnosis of the infecting organism. Assumethat all
other rules failed and that the final conclusion about the beliefs in com-
peting hypotheses will be based on the following successful rules:
Note, here, that O = {Staph, Strep, Pseu} and that for this example
we are making the implicit assumption that the patient has an infection
with one of these organisms.
0.3(1 - 0.84)
Evil({Pseu}) = 0.064 = Pl
1 - (O.3)(0.84)
0.84(1 - 0.3)
Evil({Pseu} c) = = 0.786 = Cl
1 - (0.3)(0.84)
0.7(1 -0.08)
Eviz({Staph}) = i ----(0.7~ = 0.318 =
0.8(1 - 0.07)
Eviz({Staph}~) = i ---~(0.7)(O~.8~ = 0.545
Evi3({Strep}) = 0.58 =
Evi3({Strep} c) = 0 = cg
K-1 _= dld,2d:~(l
3 + p~/dl + p,2/d2 + p3/d3) - c12c
= (0.936)(0.682)(0.42)(1 + 0.064/0.936 + 0.318/0.682
+ 0.58/0.42) - (0.786)(0.545)(0)
= 0.268(1 + 0.068 + 0.466 + 1.38)
= 0.781
K = 1.28
Bel({Pseu}) K(pldzd 3 + rl czc3)
= 1.28((0.064)(0.682)(0.42) + (0.15)(0.545)0)
= 0.023
Bel({Staph}) K(pzdld~ + r2 clc3)
= 1.28((0.318)(0.936)(0.42) + (1.137)(0.786)0)
= 0.160
Bel({Strep}) K(p3dld2 + r3clc2)
= 1.28((0.58)(0.936)(0.682) + (0.42)(0.786)(0.545))
13.3Conclusion
The Dempster-Shafer theory is particularly appealing in its potential for
handling evidence bearing on categories of diseases as well as on specific
disease entities. It facilitates the aggregation of evidence gathered at vary-
ing levels of detail or specificity. Thus collaborating experts could specify
rules that refer to semantic concepts at whatever level in the domain hi-
erarchy is most natural and appropriate. They would not be limited to the
most specific level--the singleton hypotheses of their frame of discern-
ment--but would be free to use more unifying concepts.
In a system in which all evidence either confirms or disconfirms sin-
292 TheDempster-Shafer
Theoryof Evidence
Generalizing MYCIN
14
Use of the MYCIN
Inference Engine
After some consideration, van Melle decided that the problem re-
quired only a degenerate context tree, with "the horn" as the only context,
and that all relevant rules in the Pontiac manual could be written as defi-
nitional rules with no uncertainty. Tworules of his fifteen-rule system are
shown in Figure 14-1.
Much of MYCINs elaborate mechanism for gathering and weighing
evidence was unnecessary for this simple problem. Nevertheless, the proj-
ect provided support for our belief that MYCINsdiagnostic procedures
295
296 Use of the MYCIN
Inference Engine
RULEO02
IF: 1) Thehornis inoperativeis a symptom
of thehorn,and
2) Therelay doesclick whenthehornbuttonis depressed, and
3) Thetest lampdoesnot light when oneendis grounded and
theotherconnectedto thegreenwireterminal
of the relay
whilethe hornbuttonis depressed
THEN:It is definite(1.0)thatadiagnosis
of thehorn
replace
the relay
[HORNRULES]
RULEO03
IF: 1) Thehornis inoperative is a symptom
of the horn,and
2) Therelaydoesnot click when thehornbuttonis depressed,and
3) Thetest lampdoeslight when oneendis grounded andthe
otheris touched
to theblackwireterminal of therelay
THEN: It is definite(1.0)thatthereis anopen betweenthe
blackwireterminalof the relayandground
[HORNRULES]
llt also revealed several places in the code whereshortcuts had beentaken in keeping medical
knowledgeseparate. For example, the term organism was used in the code occasionally as
being synonymouswith cause.
ZWeare indebted to Joshua Lederberg for suggesting the phrase Essential MYCIN, i.e,
MYCIN stripped of its domainknowledge. EMYCIN is written in lnterlisp, a programming
environment for a particular dialect of the LISP language, and runs on a DECPDP-10or
-20 under the TENEX or TOPS20operating systems. The current implementation of EMY-
(IN uses about 45K words of resident memoryand an additional 80K of swapped code
space. The version of Interlisp in which it is embeddedoccupies about 130Kof resident
memory,leaving approximately 80K free for the domain knowledge base and the dynamic
data structures built up during a consultation. A manualdetailing the operation of the system
for the prospective systemdesigner is available (van Melleet al., 1981).
Use of the MYCINInference Engine 297
While these concepts were generalized and access to them made simpler,
they are much the same in EMYCIN as they were in the original system.
The major conceptual shift in generalizing MYCINto EMYCINwas
to focus primarily on the persons who build new systems rather than on
the persons who use them. Much of the interface to users remains un-
changed. The interface to system builders, however, became easier and
more transparent. Wewere attempting to reduce the time it takes to create
an expert system by reducing the effort of a knowledge engineer in helping
an expert. As discussed in Chapter 16, we believe the experiment was
successful in this respect.
Much of the TEIRESIAS system (discussedin Chapter 9) has been
incorporated in EMYCIN.Thus the debugging facilities are very similar.
In addition, EMYCIN allows rules to be entered in the Abbreviated Rule
Language, called ARL, that simplifies the expression of new relations. For
example, the rule premise
(SAND(SAMECNTXTSITE BLOOD)
(GREATERP*(VAL1 CNTXTSICKDEGREE)
($OR (NOTSAMECNTXTBURNED)
(LESSERQ*(PLUS (VAL1 CNTXTNUMCULS)
(VAL1 CNTXTNUMPOS))
3)))
or
(SITE = BLOOD,SICKDEGREE
> 2, -BURNEDORNUMCULS
+ NUMPOS
LE 3)
PROBLEM
PROMPT:(IS THE PROBLEM
WITH PAYROLL
OR INVENTORY?)
. Autosave..
Pleasegive a one-word
identifier for your knowledge
basefiles:
** BUSINESS
<EMYCIN>CHANGES,BUSINESS;1
Are there any descendantsof COMPANY in the context tree? No
RULE001
20-Oct-7914:16:48
........ COMPANY-1........
1) Whatcompany is having a problem?
** IBM
2) Is the problemwith payroll or inventory?
** PAYROLL
3) Whatis the number of employees
of ibm?
~* 10000000
Conclusions: the tools to usein solvingthe problemare as follows: a largecomputer.
Enter Debug/review,
Rules, Farms,Go,etc.? Parameters
Parametername:<cr> [Finishedenteringparameters.]
Rules,Parms,Go,etc.? Quit
@
[Sometime
later...]
@<EMYClN>EMYClN
EMYCIN12-DEC-80...
Hi.
(<EMYCIN>CHANGES.BUSlNESS;3)
23-Feb-9110:28:37
........ COMPANY-1
........
1) Whatcompany is having a problem?
** STANFORD
2) Is the problemwith payroll or inventory?
INVENTORY
3) Howmanypeopledoes Stanford employ?
"10000
EnterDebug/review
phase,or other option(? for help)?Quit
This chapter is a shortened and edited version of a paper appearing in Pergamon-lnfotech state
of the art report on machine intelligence, pp. 249-263. Maidenhead, Berkshire, U.K.: Infotech
Ltd., 1981.
302
Background 303
SYSTEM DESIGNER)
expertise debuggingfeedback
Knowledge Base
Construction Aids Domain
EMYCIN Knowledge
Consultation Base
Driver
questions. It then applies its knowledgeto the specific facts of the case and
informs the user of its conclusions. The user is free to ask the program
questions about its reasoning in order to better understand or validate the
advice given.
There are really two "users" of EMYCIN,as depicted in Figure 15-1.
The system designer, or expert, interacts with EMYCIN to produce a knowledge
base for the domain. EMYCIN then interprets this knowledge base to pro-
vide advice to the client, or consultation user. Thus the combination of EMY-
CIN and a specific knowledgebase of domain expertise is a new consultation
program. Someinstances of such consultation programs are described be-
low.
15.2 Background
Someof the earliest work in artificial intelligence attempted to create gen-
eralized problem solvers. Programs such as GPS (Newell and Simon, 1972)
304 EMYCIN:
A Knowledge
Engineers Tool
and theorem provers (Nilsson, 1971), for instance, were inspired by the
apparent generality of human intelligence and motivated by the desire to
develop a single program applicable to many problems. While this early
work demonstrated the utility of many general-purpose techniques (such
as problem decomposition into subgoals and heuristic search in its many
forms), these techniques alone did not offer sufficient power for high per-
formance in complex domains.
Recent work has instead focused on the incorporation of large
amounts of task-specific knowledge in what have been called knowledge-
based systems. Such systems have emphasized high performance based on
the accumulation of large amounts of knowledge about a single domain
rather than on nonspecific problem-solving power. Someexamples to date
include efforts at symbolic manipulation of algebraic expressions (Moses,
1971), chemical inference (Lindsay et al., 1980), and medical consultations
(Pople, 1977; Shortliffe, 1976). Although these systems display an expert
level of performance, each is powerfill in only a very narrow domain. In
addition, assembling the knowledge base and constructing a working pro-
grain for such domains is a difficult, continuous task that has often ex-
tended over several years. However, because MYCIN included in its design
the goal of keeping the domain knowledge well separated from the pro-
gram that manipulates the knowledge, the basic rule methodology pro-
vided a fbundation for a more general rule-based system.
With the development of EMYCINwe have now come full circle to
GPSs philosophy of separating the deductive mechanism from the prob-
lem-specific knowledge; however, EMYCINs extensive user facilities make
it a much more accessible environment for producing expert systems than
were the earlier programs. 1 Like MYCINs, EMYCINsrepresentation of
facts is in attribute-object-value triples, with an associated certainty factor.
Facts are associated in production rules. Rules of the same form are shown
throughout this book. Figures 16-2 and 16-5 in the next chapter show rules
from two different consultation systems constructed in EMYCIN.
in the list, EMYCIN evaluates the premise; if true, it makes the conclusion
indicated in the action. The order of the rules in the list is assumed to be
arbitrary, and all the rules are applied unless one of them succeeds and
concludes the value of" the parameter with certainty (in which case the
remaining rules are superfluous).
This control structure was also designed to be able to deal gracefully
with incomplete information. If the user is unable to supply some piece of
data, the rules that need the data will fail and make no conclusions. The
system will thus make conclusions, if possible, based on less information.
Similarly, if" the system has inadequate rules (or none at all) for concluding
some parameter, it may ask the user for the value. Whentoo many items
of information are missing, of course, the system will be unable to offer
sound advice.
The system can use these templates to "read" its own rules. For example,
the template shown here contains the standard symbols CNTXT,PARM,
and VALUE,indicating the components of the associative triple that SAME
tests. If" clause (b) above appears in the premise of a given rule, the system
can determine that the rule needs to know the site of the culture and, in
particular, that it tests whetherthe culture site is (i.e., is the sameas) blood.
Whenasked to display rules that are relevant to blood cultures, the system
will know that this rule should be selected. The most commonmatching
predicates and conclusion functions are those used in MYCIN (see Chapter
5): SAME, NOTSAME, KNOWN, NOTKNOWN, DEFINITE, NOT-
DEFINITE, etc.
306 EMYCIN:
A KnowledgeEngineers Tool
15.3The
System-Building Environment
The system designers principal task is entering and debugging a knowl-
edge base, viz., the rules and the object-attribute structures on which they
operate. The level at which the dialogue between system and expert takes
place is an important consideration for speed and efficiency of acquisition.
The System-Building Environment 307
The knowledge base must eventually reside in the internal LISP format
that the system manipulates to run the consultation and to answer ques-
tions. At the very basic level, one could imagine a programmer using the
LISP editor to create the necessary data structures totally by hand; 2 here
the entire translation from the experts conceptual rule to LISP data struc-
tures is performed by the programmer. At the other extreme, the expert
would enter rules in English, with the entire burden of understanding
placed on the program.
The actual means used in EMYCIN is at a point between these ex-
tremes. Entering rules at the base LISP level is too error-prone, and re-
quires greater facility with LISP on the part of the system designer than
is desirable. On the other hand, understanding English rules is far too
difficult for a program, especially in a new domain where the vocabulary
has not even been identified and organized for the programs use. (Just
recognizing new parameters in free English text is a major obstacle. 3) EMY-
CINinstead provides a terse, stylized, but easily understood, language for
writing rules and a high-level knowledge base editor for the knowledge
structures in the system. The knowledge base editor performs extensive
checks to catch commoninput errors, such as misspellings, and handles all
necessary bookkeeping chores. This allows the system builder to try out
new ideas quickly and thereby to get some idea of the feasibility of any
particular formulation of the domain knowledge into rules.
2This is the way the extensive knowledge base for the initial MYCIN system was originally
created.
3The task of building an assistant for designers of new EMYCIN systems is the subject of
current research by James Bennett (Bennett, 1983). The name of the program is ROGET.
308 EMYCIN:
A Knowledge
Engineers Tool
The parameter names are simply the labels that the expert uses in defining
the parameters of the domain. Thus they are familiar to the expert. The
conciseness of ARLmakes it much easier to enter than English or LISP,
which is an important consideration when entering a large body of rules.
Rule Checking
While the system designer builds up the domain knowledge base as de-
scribed above, EMYCIN automatically keeps track of the changes that have
been made (new or changed rules, parameters, etc.). The accumulated
changes can be saved on a file by the system builder either explicitly with
a simple commandor automatically by the system every n changes (the
frequency of automatic saving can be set by the system builder). When
EMYCIN is started in a subsequent session, the system looks for this file
of changes and loads it in to restore the knowledge base to its previous
state.
values for the parameter being asked about, as supplied by the system
designer.
In most places where EMYCIN prompts for input, the client may type
a question mark to obtain help concerning the options available. Whenthe
program asks for the value of a parameter, EMYCIN can provide simple
help by listing the legal answers to the question. The system designer can
also include more substantial help by giving rephrasings of or elaborations
on the original question; these are simply entered via the data base editor
as an additional property of the parameter in question. This capability
provides for both streamlined questions for experienced clients and more
detailed explanations of what is being requested for those who are new to
the consultation program.
There is more to building a knowledge base than just entering rules and
associated data structures. Anyerrors or omissions in the initial knowledge
base must be corrected in the debugging process. In EMYCIN the principal
method of debugging is to run sample consultations; i.e., the expert plays
the role of a client seeking advice from the system and checks that the
correct conclusions are made. As the expert discovers errors, he or she
uses the knowledgeacquisition facilities described above to modify existing
rules or add new ones.
Although the explanation program was designed to allow the consul-
tation user to view the programs reasoning, it is also a helpful high-level
debugging aid for the system designer. Without having to resort to LISP-
level manipulations, it is possible to examine any inferences that were
made, find out why others failed, and thereby locate errors or omissions
in the knowledge base. The TEIRESIAS program developed the WHY/
HOW capability used in EMYCIN for this very task (see Chapter 9).
EMYCINprovides a debugger based on a portion of the TEIRESIAS
program. The debugger actively guides the expert through the programs
reasoning chain and locates faulty (or missing) rules. It starts with a con-
clusion that the expert has indicated is incorrect and follows the inference
chain back to locate the error.
The rule interpreter also has a debugging mode, in which it prints out
assorted information about what it is doing: which rules it tries, which ones
succeed (and what conclusions they make), which ones fail (and for what
reason), etc. If the printout indicates that a rule succeeded that should
have failed, or vice versa, the expert can interrupt immediately, rather than
waiting for the end of the consultation to do the more formal TEIRESIAS-
style review.
In either case, once the problem is corrected, the expert can then
restart and try again, with the consultation automatically replayed using
the new or modified rules.
The System-Building Environment 311
Case Library
EMYCIN has facilities for maintaining a library of sample cases. These can
be used for testing a complete system, or for debugging a growing one.
The answers given by the consultation user to all the questions asked dur-
ing the consultation are simply stored away, indexed by their context and
parameter. Whena library case is rerun, answers to questions that were
previously asked are looked up and automatically supplied; any new ques-
tions resulting from changes in the rule base are asked in the normal
fashion. This makes it easy to check the performance of a new set of rules
on a "standard" case. It is especially useful during an intensive debugging
session, since the expert can make changes to the knowledge base and,
with a minimumof extra typing, test those changes---effectively reducing
the "turnaround time" between modifying a rule and receiving consulta-
tion feedback.
15.4Applications
EMYCINis designed to help build and run programs that provide con-
sultative advice. The resulting consultation system takes as input a body of
measurements or other iniormation pertinent to a case and produces as
output some form of recommendation or analysis of the case. The frame-
work seems well suited for many diagnostic or analytic problems, notably
some classes of fault diagnosis, where several input measurements (symp-
toms, laboratory tests) are available and the solution space of possible di-
agnoses can be enumerated. It is less well suited for "tormation" problems,
where the task is to piece together existing structures according to specified
constraints to generate a solution.
EMYCINwas not designed to be a general-purpose representation
language. It is thus wholly unsuited for some problems. The limitations
Rangeof Applicability 313
derive largely from the fact that EMYCIN has chosen one basic, readily
understood representation for the knowledge in a domain: production
rules that are applied by a backward-chaining control structure and that
operate on data in the form of associative triples. The representation, at
least as implemented in EMYCIN,is unsuitable for problems of constraint
satisfaction, or those requiring iterative techniques. 5 Amongother classes
of problems that EMYCIN does not attempt to handle are simulation tasks
and tasks involving planning with stepwise refinement. One useful heuris-
tic in thinking about the suitability of EMYCIN for a problem is that the
consultation system should work with a "snapshot" of information about a
case. Good advice should not depend on analyzing a continued stream of
data over a time interval.
Even those domains that have been successfully implemented have
demonstrated some of the inadequacies of EMYCIN.In addition to rep-
resentational difficulties, other problems noted have been the lack of user
control over the consultation dialogue (e.g., the order of questions) and
the amount of time a user must spend supplying information. These lim-
itations are discussed further in subsequent chapters.
This chapter is a shortened and edited version of a paper appearing in Pergamon-lnfotech state
of the art report on machineintelligence. Maidenhead, Berkshire, U.K.: Infotech Ltd., 1981.
314
SACON:
A Consultantfor StructuralAnalysis 315
RULE050
IF: 1) Thematerial
composing thesub-structureis oneof: themetals,and
2) Theanalysis
error(in percent)thatis tolerable
is between
5 and30, and
3) Thenon-dimensional
stressof thesub-structure
is greater
than.9, and
4) Thenumberof cycles theloading is to beappliedis between1000and10000
THEN: It is definite(1.0)thatfatigueis oneof thestressbehavior
phenomena
in thesub-structure
PREMISE:(SAND(SAMECNTXT
MATERIAL(LISTOFMETALS))
(BETWEEN*
CNTXTERROR 5 30)
(GREATER*
CNTXT
ND-STRESS .9)
(BETWEEN*
CNTXT
CYCLES 100010000))
ACTION:
(CONCLUDECNTXTSS-STRESS
FATIGUE TALLY1.0)
HelloDr. Melosh,
This consultationconcerns the 747wing. Thepurposeof the consultationis to demonstrate the generalfeatures
of a structural analysisconsultationon a simpleproblem.Thusthe consultationinvolvesonly onesubstructure:
Thetotal wing.
26-Jul-7810:54:44
[consultationof 13-Jul-781:55PM].
........ STRUCTURE-28........
1) Whatis the nameof STRUCTURE-28?
** BOEING 747 WING
2) Assuming that yourcharacterization of the boeing747wingin termsof its geometry,materialcharacteristics,
andboundary conditionsare accurate,howmucherror (in percent)is tolerable for the analysis?
** 10
3) DOyouwantto examine the responses
of the boeing747wing,to evaluateits instability, or both?
** EXAMINE-RESPONSES
4) Doesthe boeing747winghaveanytime dependent termsin its equationsof equilibrium?
** NO
5) Is heatingor coolingof interest in determining structureresponses?
** NO
6) Haveyoudecomposed the boeing747winginto a number of potential sub-structureswhichare relevantto this
analysis?
** YES
........ SUB-STRUCTURE-1
........
7) Whatis the nameof SUB-STRUCTURE-l?
** TOTALWING
8) Whatis the material composing mostof the total wing (SUB-STRUCTURE-I)?
** HIGH-STRENGTH-ALUMINUM
9) Enter the averagelongestdimensionof the total wing(SUB-STRUCTURE-I) in inches(or <number>
CMS):
** 1420
10) Enter the weight of the total wing (SUB-STRUCTURE-I) in pounds(or <number> KGS):
** 46000
11) Is the total wing(SUB-STRUCTURE-I) thin-walled or solid?
** THIN-WALLED
12) Whatis the configurationof supportfor the total wing(SUB-STRUCTURE-I)?
** ONE-SIDE
13) Is the total wing(SUB-STRUCTURE-I) to bemodelledas a 1, 2, or 3 dimensionalsystem?
** 2
14) Wouldyoudescribethe constructionof the total wing (SUB-STRUCTURE-I) to be truss and/or frame
elements(NETWORK),
sheets and/or blocks of material (CONTINUUM), or a combinationof both?
** CONTINUUM
15) Is the overall geometry of the total wing(SUB-STRUCTURE-I) curvedor planar?
** PLANAR
RULE025
iF: 1) Bleeding-history
is oneof thereasons forthisconsultation,
2) There
is anepisode of significant
bleedingin thepatient,
3) Coagulation-defect
is oneof thebleeding disordersin thepatient,
4) Thedefective
coagulation pathwayof thepatientis intrinsic,and
5) There
arenotfactorswhich interferewiththepatientsnormal bleeding
THEN: It is definite(1.0)thatthefollowing is oneofthebleeding diagnoses
of thepatient:The
patient
hasoneor more of thefollowingconditions: Hemophilia
A, vonWillebrands
syndrome,
anIX, XI, or XII deficiency,
or a highmolecularweightKallikreindefect.
PREMISE:(SAND(SAMECNTXTREASONBLEEDING-HISTORY)
(SAMECNTXT
SIGBLD)
(SAMECNTXT
FINALDEFCOAGULATION-DEFECT)
(SAMECNTXT
DEFPATH
INTRINSIC)
(NOTSAME
CNTXTiNTERFERENCE))
ACTION:
(CONCLUDETEXTCNTXT
DX(TEXTDXHEMOPHILIA)
TALLY1000)
Final Diagnosis
Final, Consisttent Estimation
of the Bleeding Defect
t
Estimation of Enzymatic and Platelet
Defects on the Basis of the Coagulation Battery
t
Initial Estimation of Bleeding Defect Type
Based on Clinical Evidence
....... PATIENT-
110........
1) Name:
"1=1110
2) Age:
** 90 YEARS
3) Sex:
** FEMALE
4) Race:
** CAUCASIAN
5) Pleaseindicateyourreason(s)for requesting this consultation:
" BLEEDING-HISTORY
6) Whattype of bleedingdescribesPt110smostrecent episodeof bleeding?
** HEMARTHROSIS
7) Is therea history of a geneticbleedingdisorderin Pt110sfamily?
"* YES
8) Wasthe onset of the bleed immediateor delayed?
** DELAYED
9) BT:
"1 5 MINUTES
10) PT:
** 13
11) PTT:
"50
12) TT:
"15
13) FSF:
** NORMAL
14) HasPt110recently exercised?
** NO
15) Is Pt110currently receivinganyof the followingdrugs:ASA,Heparin,Coumarin, oral-contraceptives,
Ephedrine,Epinephrine,ADH?
" NO
16) Is Ptl10 diagnosed as havingcirrhosis, collagendisease,cancer,or anychronicdisease?
** NO
Conclusions:
the blooddisordersof Pt110are as follows:
COAGULATION-DEFECT (.97)
Conclusions:
the statementsaboutthe consistencyof the casedata andCLOTs
interpretation are as follows:
Bothclinical andlab dataare internally consistent
andthereis overall,consistentinterpretationof the blood
disorder.
Conclusions:
the bleedingdiagnoses
of Ptl "10 are as follows:
Thepatient hasoneor moreof the followingconditions:
Hemophilia A, vonWillebrandssyndrome,
anIX, XI, or XII deficiency,or a high molecularweight
Kallikreindefect.(.97)
!6.4 EMYCIN as
Vehicle
a Knowledge Representation
EnterFarms,Rules,Savechanges,
or Go?
Rules
Rulenumber of NEW:NEW
RULE025
PREMISE:(REASON = BLEEDING, SIGBLD,FINALDEF
= COAGULATION,
DEFPATH = INTRINSIC- INTERFERENCE)
RULE025
ACTION:(DX = DXHEMOPHILIA)
BLEEDING--* BLEEDING-HISTORY?Yes
COAGULATION --, COAGULATION-DEFECT?Yes
Translate,
Nofurtherchanges,
or propname:
for the larger rule sets, the checker detected these inconsistencies, due to
either typing mistakes or actual errors in the rule base logic, and provided
a graceful method for dealing with them. Together these facilities contrib-
uted to the ease and remarkable rapidity of construction of this consultant.
For further details on the design and operation of these aids, see van Melle
(198o).
ITheseestimates represent a simple average that held during the initial construction of these
projects. Theydo not reflect the wide variation in the amountof effort spent defining rules
versus the other knowledgebase developmenttasks that occurred over that time period.
326 Experience Using EMYCIN
two days were spent detailing aspects of the parameters and rules that the
EMYCINsystem required (i.e., specifying expected values, allowable
ranges on numeric parameters, question formats, etc.) and entering these
details into the system itself. We may approximate the average cost of
formulating and implementing a rule in such a system based on the num-
ber of person-hours spent in construction versus the number of rules spec-
ified. CLOTrequired about 60 person-hours to specify 60 rules yielding
a rate of 1 person-hour per rule. The marginal cost for a new rule is
expected to be similar.
Our experience explicating these rule bases provided an opportunity
to make some observations about the process of knowledge acquisition for
consultation systems. Although these observations were made with respect
to the development of SACONand CLOT, other knowledge-based con-
sultation systems have demonstrated similar processes and interactions.
Our principal observation is that the knowledge acquisition process is
composed of three major phases. These phases are characterized strongly
by the types of interaction that occur between expert and knowledge en-
gineer and by the type of knowledge that is being explicated and trans-
ferred between the participants during these interactions. At present only
a small fraction of these interactions can be held directly with the knowl-
edge-based system itself (Davis, 1976; 1977), and research continues
expand the knowledge acquisition expertise of these systems.
of advice that is to be tendered, the team next identifies the major factors
(parameters) and reasoning steps (rules) that will be used to characterize
the object of the consultation (be it patient or airplane wing) and to rec-
ommendany advice. This forms the inference structure of the consultant.
Finally, when the knowledge base is substantially complete, the system de-
signers concentrate on debugging the existing rule base. This process typi-
cally involves the addition of single rules to handle obscure cases and might
involve the introduction of new parameters. However, the major structure
of the knowledge base remains intact (at least for this subdomain), and
interactions with the expert involve relatively small changes. (Chapters
and 9 describe debugging and refining a knowledge base that is nearly
complete.)
The initial development of the knowledge base is greatly facilitated
when the knowledge engineering team elicits a well-specified consultation
goal for the system as well as an inference structure such as that depicted
in Figure 16-1. Without these conceptual structures to give direction to the
knowledge explication process, a confused and unusable web of facts typ-
ically issues from the expert. Wespeculate that the value of these organi-
zational structures is not restricted to the production system methodology.
They seem to be employed whenever human experts attempt to formalize
328 Experience Using EMYCIN
Explaining the
Reasoning
17
Explanation as a Topic of
AI Research
331
332 Explanation
as a Topicof AI Research
property to all clinical parameters, predicate functions, and other key data
structures used in rules. Thus, when a user typed "RULE"in response to
a question from MYCIN,a translation of the current rule was displayed
as an explanation. This was the extent of MYCINsexplanation capability
when the 1973 paper was prepared.
At approximately the same time as that first article appeared, Gorry
published a paper that influenced us greatly (Gorry, 1973). In retrospect,
we believe that this is a landmark essay in the evolution of medical AI. In
it he reviewed the experience of the M.I.T. group in developing a program
that used decision analysis techniques to give advice regarding the diag-
nosis of acute renal failure (Gorry et al., 1973). Despite the successful
decision-making performance of that program, he was concerned by its
obvious limitations (p. 50):
Wewill not dwell here on his discussion of the first two items, but regarding
the third (p. 51):
2Almost ten years later we undertook a formal study (described in Chapter 34) that confirmed
this early intuition. A survey of 200 physicians revealed that high-quality explanation capa-
bilities were the most important requirement for an acceptable clinical consultation system.
3This simple model of explanations still has considerable appeal. See Clark and McCabe
(1982) for a discussion of implementing WHYand HOWin PROLOG,for example.
334 Explanation as a Topic of AI Research
By late 1976 the explanation features of the system had become highly
polished, and Scott, Clancey, Davis, and Shortliffe collaborated on a paper
that appeared in the American Journal of Computational Linguistics in 1977.
That paper is included here as Chapter 18. It describes MYCINsexpla-
nation capabilities in some detail. Although most of the early work de-
scribed in that chapter stressed the need to provide explanations to users,
we have also seen the value such capabilities have tor system builders. As
mentioned in Chapters 9 and 20, system builders--both experts and knowl-
edge engineers--find explanations to be valuable debugging aids. The fea-
tures described in Chapter 18 were incorporated into EMYCIN and exist
there relatively unchanged to the present.
By the mid-1970s much of the project time was being spent on knowledge
base refinement and enhancement. Because we needed assistance from
someone with a good knowledge of the antimicrobial agents in use, we
sought the involvement of a clinical pharmacist. Sharon Bennett, a recent
pharmacy graduate who had taken a clinical internship at the Palo Alto
Veterans Administration Hospital affiliated with Stanlord, joined the proj-
ect and played a key role in knowledge base development during the mid-
to late-1970s. Amongthe innovations she brought to the group was an
eagerness to heighten MYCINsutility by making it an expert at dosage
adjustment as well as drug selection. She and Carli Scott worked together
closely to identify the aspects of pharmacokinetic modeling that could be
captured in rules and to identify the elements that were so mathematical
in nature that they required encoding in special-purpose functions. By this
time, however, the need for explanation capabilities had becomeso obvious
to the projects members that even this specialized code was adapted so
that explanations could be provided. A paper describing the features, in-
cluding a brief discussion of explanation of dosing, was prepared for the
American Journal of Hospital Pharmacyand is included here as Chapter 19.
Weinclude the paper here not only because it demonstrates the special-
purpose explanation features that were developed, but also because it
shows the way in which mathematical modeling techniques were integrated
into a large system that was otherwise dependent on AI representation
methods.
have dealt with~the issue. This level of interest developed out of the MYCIN
experience and a small group seminar series held in 1979 and 1980. Sev-
eral examples of inadequate responses by MYCIN(to questions asked by
users) were examined in an effort to define the reasons for suboptimal
performance. One large area of problems related to MYCINslack of sup-
port knowledge, the underlying mechanistic or associational links that explain
why the action portion of a rule follows logically from its premise. This
limitation is particularly severe in a teaching setting where it is incorrect
to assume that the system user will already knowmost rules in the system
and merely needs to be reminded of their content. Articulation of these
points was largely due to Bill Clanceys work, and they are a central element
of his analysis of MYCINsknowledge base in Chapter 29.
Other sources of MYCINsexplanation errors were its failure to deal
with the context in which a question was asked (i.e., it had no sense of
dialogue, so each question required full specification of the points of in-
terest without reference to earlier exchanges) and a misinterpretation of
the users intent in asking a question. Wewere able to identify examples
of simple questions that could mean four or five different things depend-
ing on what the user knows, the information currently available about the
patient under consideration, or the content of earlier discussions. These
issues are inevitably intertwined with problems of natural language un-
derstanding, and they reflect back on the second of Gorrys three concerns
(language development) mentioned earlier in this chapter.
Partly as a result of work on the problem of student modeling by Bill
Clancey and Bob London in the context of GUIDON,we were especially
interested in how modeling the users knowledge might be used to guide
the generation of explanations. Jerry Wallis began working on this problem
in 1980 and developed a prototype system that emphasized causal reason-
ing chains. The system associated measures of complexity with both rules
and concepts and measures of importance with concepts. These reasoning
chains then guided the generation of explanations in accordance with a
users level of expertise and the reasoning details that were desired. Chap-
ter 20 describes that experimental system and defines additional research
topics of ongoing interest.
Our research group continues to explore solutions to the problems of
explanation in expert systems. John Kunz has developed a program called
AI/MM(Kunz, 1983), which combines simple mathematical models, phys-
iologic principles, and AI representation techniques to analyze abnormal-
ities in fluids and electrolyte balance. The resulting system can use causal
links and general laws of nature to explain physiologic observations by
reasoning from first principles. The program generates English text to
explain these observations.
Greg Cooper has developed a system, known as NESTOR,that cri-
tiques diagnostic hypotheses in the area of calcium metabolism. In order
to critique a users hypotheses, his system utilizes powerful explanation
capabilities. Similarly, the work of Curt Langlotz, who has adapted ON-
COCINto critique a physicians therapy plan (see Chapter 32), requires
336 Explanation
as a Topicof AIResearch
the programto explain the basis for any disagreementsthat occur. Langlotz
has developed a technique knownas hierarchical plan analysis (Langlotz
and Shortliffe, 1983), whichcontrols the comparisonof two therapy plans
and guides the resulting explanatory interaction. Langlotz is also pursuing
a new line of investigation that we did not consider feasible during the
MYCIN era: the use of" graphics capabilities to facilitate explanations and
to minimizethe need for either typing or natural language understanding.
Professional workstations and graphics languages have recently reduced
the cost of high-resolution graphics systems (and the cost of programming
them) enough that we expect considerably more work in this area.
Bill Clanceys NEOMYCIN research (Clancey and Letsinger, 1981),
mentionedbriefly in Chapter 21 and developedpartially in response to his
analysis of MYCIN in Chapter 29, also has provided a fertile arena for
explanation research. Diane Warner Hasling has worked with Clancey to
develop an explanation feature for NEOMYCIN (Hasling et al., 1983)
similar to the HOWsand WHYsof MYCIN (Chapter 18). Because NEO-
MYCIN is largely guided by domain-independent meta-rules, however,
useful explanations cannot be generated simply by translating rules into
English. NEOMYCIN is raising provocative questions about howstrategic
knowledgeshould be capsulized and instantiated in the domainfor expla-
nation purposes.
Finally, we should mention the work of RandyTeach, an educational
psychologist whobecamefascinated by the problem of explanation, in part
because of the dearth of published information on the subject. Teach
joined the project in 1980, discovered the issue while workingon the survey
of physicians attitudes toward computer-based consultants reported in
Chapter 34, and undertook a rather complexpsychological experiment in
an attempt to understand howphysicians explain their reasoning to one
another (Teach, 1984). Wementionthe work because it reflects the way
which the legacy of MYCIN has broadened to involve a diverse group of
investigators from several disciplines. Webelieve that explanation contin-
ues to provide a particularly challenging set of issues for researchers from
computerscience, education, psychology, linguistics, philosophy, and the
domainsof potential application.
17.3Current Perspective
understanding
debugging
CurrentPerspective 337
education
acceptance
persuasion
338
Methods for Generating Explanations 339
Static Knowledge I
Judgmental Knowledge
about domain
DATA BASE
Knowledge of
domain
I
EXPLANATIONI
CAPABILITY
I GeneralFactual I
I t__
explanations
Dynamic Knowledge
Facts about
,~
II the problem
entered by user
consultative
advice
I Deductions
made by system
I
~
Wewill distinguish two functions for an EC: the reasoning status checker
(RSC) to be used during the consultation, and the general question an-
swerer (GQA)to be used during the consultation or after the system has
printed its results. An RSCanswers questions asked during a consultation
about the status of the systems reasoning process. A few simple commands
often suffice to handle the questions that the RSCis expected to answer.
A GQAanswers questions about the current state of the systems knowl-
edge base, including both static domain knowledge and facts accumulated
during the consultation. It must recognize a wide range of question types
about manyaspects of the systems knowledge. For this reason, a few simple
commandsthat are easy to learn but still cover all the possible questions
that might be asked may be difficult to define. Consequently, natural lan-
guage processing may be important for a useful GQA.
In an interactive consultation, the system periodically requests infor-
mation about the problem. This offers the user an opportunity to request
explanations while the consultation is in progress. In noninteractive con-
sultations, the user has no opportunity to interact with the system until
after it has printed its conclusions. Unless there is a mechanismfor inter-
rupting the reasoning process and asking questions, the EC for such a
Methods for Generating Explanations 341
KNOWLEDGE
BASE OF CONSULTATIONSYSTEM
HISTORICAL KNOWLEDGE
OF CONSULTATION
PROCEDURALKNOWLEDGEABOUT THE
CONSULTATIONSYSTEM
Knowledge of [
Knowledge of
I production
rules
the rule
interpreter
MISCELLANEOUSDOMAIN-INDEPENDENT
KNOWLEDGE
that the content of" a rule mayexplain whyit was necessary to use that rule
or may affect which rules will be tried in the future.
A GQAwill need more information about the system since the scope
of its explanations is much broader. It must know how the system stores
knowledge about its area of expertise (the static knowledge with which it
starts each consultation), how it stores facts gathered during a particular
consultation (its dynamic knowledge), and how the dynamic knowledge
was obtained or inferred. Thus the GQAmust have access to all the in-
formation that the RSCuses: a detailed record of" the consultation, an
understanding of the rule interpreter, and the ability to understand rules.
18.1Design Considerations
18.2An Example--MYCIN
tional. The conventions [or storing both dynamic and static knowledge,
including attribute-object-value triples, tables, lists, and rules themselves,
are described in detail in Chapter 5.
type of question. Each specialist knowshow the relevant part of the control
structure works and what pieces of knowledge it uses.
To understand how a specialist might use a template such as that
shown above, consider an explanation that involves finding all rules that
can conclude that the identity of an organism is Neisseria. The appropriate
specialist would start with those rules used by the system to conclude values
for the parameter IDENTITY.Using templates of the various action func-
tions that appear in each of these rules, the specialist picks out only those
(like Rule 009) that have NEISSERIAin their VALUslot.
This also illustrates the sort of knowledge that can be built into a
specialist. The specialist knowsthat the control structure uses stored lists
telling which rules can be used to determine the value of each parameter.
Furthermore, it knowsthat it is necessary to look only at the rules actions
since it is the action that concludes facts, while the premise uses facts.
Manyof the ECs specialists need a record of the interaction with the user.
This record is built during the 5onsultation and is organized into a tree
structure called the history tree, which reflects MYCINs goal-directed ap-
proach. Each node in the tree represents a goal and contains information
about how the system tried to accomplish this goal (by asking the user or
by trying rules). Associated with each rule is a record of whether or not
the rule succeeded, and if not, whyit failed. If evaluating the premise of
a rule causes the system to trace a new parameter, thereby setting up a
new subgoal, the node for this subgoal is the offspring of the node con-
taining the rule that caused the tracing. Figure 18-3 shows part of a rep-
resentative history tree. In this example, Rule 003 caused the tracing of
the parameter CATEGORY, which is used in the premise of this rule.
goal:
I
GRAMof ORGANISM-1 goal:
I
CATEGORYOF ORGANISM-1
L ask: question 11
[no rules]
rules: RULEO37(succeeded) ...
ORGANISM-1
ask: question 15
[no rules]
Because we wish to allow the user to see how MYCINmakes all its decisions,
we have tried to anticipate all types of questions a user might ask and to
make every part of the systems knowledge base and reasoning process
accessible. The EC consists of several specialists, each capable of giving one
type of explanation. These specialists are grouped into three sets: one [or
explaining what the system is doing at a given time, one for answering
questions about the systems static knowledge base, and one tbr answering
questions about the dynamic knowledge base. The first set forms MYCINs
reasoning status checker; the second and third together make up the sys-
tems general question answerer.
[preceded
by the first 14questionsin the consultation]
of these words, and words used elsewhere by the system in describing tile
parameter (e.g., when translating a rule into English or requesting the
value of the parameter).
Wenow briefly describe how MYCINachieves each of the five tasks
outlined in Figure 18-7. An example analysis is shown in Figure 18-8.
Each word in the dictionary has a synonym pointer to its terminal word
(terminal words point to themselves). For the purpose of analyzing the
question, a nonterminal word is considered to be equivalent to its (terminal)
synonym. Terminal words have associated with them a set of properties or
descriptors (Table 18-1) that are useful in determining the meaning of
question that uses a terminal word or one of its synonyms. A given word
may be modified by more than one of these properties.
The first three properties of terminal words are actually inverse point-
ers, generated automatically from attributes of the clinical parameters. Spe-
cifically, a word receives the "acceptable value" pointer to a clinical param-
eter (Property 1 in Table 18-1) if it appears in the parameters list
acceptable values--a list that is used during the consultation to check the
users response to a request for the parameters value (see EXPECT attrib-
ute, Chapter 5).
Also, each clinical parameter, list, and table has an associated list of
keywords that are commonlyused when talking about that parameter, list,
or table. These words are divided according to how sure we can be that a
doctor is referring to this parameter, list, or table when the particular word
An Example--MYCIN 351
RULE006
IF: 1) Theculture wastakenfroma sterile source,and
2) It is definitethat the identityof theorganism
is oneof: staphylococcus-coag-neg bacillus-
subtilis corynebacteriu m-non-diphtheriae
THEN: Thereis stronglysuggestive evidence (.8)
that the organism is a contaminant
The next step is to classify the question so that the program can tell which
specialist should answer it. Since all questions about the consultation must
be about some specific context, the system requires that the name of the
context (e.g., ORGANISM-I) be stated explicitly. This provides an easy
mechanism to separate general questions about the knowledge base from
questions about a particular consultation.
Further classification is done through a pattern-matching approach
similar to that used by Colby et al. (1974). The list of words created by the
first phase is tested against a number of patterns (about 50 at present).
Each pattern has a list of actions to be taken if the pattern is matched.
These actions set flags that indicate what type of question was asked. In
the case of questions about judgmental knowledge(called rule-retrieval ques-
tions), pattern matching also divides the question into the part referring to
the rules premise and the part referring to its action. For example, in
"Howdo you decide that an organism is streptococcus?" there is no premise
part, and the action part is "an organism is streptococcus"; in "Do you ever
use the site of the culture to determine an organisms identity?" the premise
part is "the site of the culture" and the action part is "an organisms
identity."
<value> One of the terminal words in the question has a dictionary property
indicating that it is a legal value fbr the parameter (Property 1, Table
18-1), e.g., THROAT is a legal value for the parameter SITE.
<parm> All of the words in the list are examinedto see if they implicate any
clinical parameters. Strong implications come from words with prop-
erties showing that the word is an acceptable value of the parameter,
on" that the word always implicates that parameter (Properties 1 and
2, Table 18-1). Weakimplications come from words with properties
showing that they might implicate the parameter (Property 3, Table
18-1). The system uses an empirical scoring mechanism for picking
out only the most likely parameters.
<table> Tables are indicated in a manner similar to that for lists except that
an entry in the table must also be present in the question. For ex-
ample, the word "organism" may indicate two tables: one containing
a classification of organisms, and the other containing normal flora
of various portals. The question "What organisms are considered to
be subtypes of Pseudomonas?"will correctly implicate the former ta-
ble, and "What are the organisms likely to be found in the throat?"
will implicate the latter, because PSEUDOMONAS is in the first table
and THROATis in the second.
354 Methods
for GeneratingExplanations
answerer can tell which specialist should answer it. Filling in all blanks in
the template gives the specialist all the information needed to find the
answer.
The specialist for answering questions of a given type has to check the
history tree or the systems knowledge base in order to determine which
of the answer templates is appropriate for a particular question. Some
blanks in the answer template are filled in by the same items as are used
to fill blanks in the question template. The remainder are filled by the
specialist with the information that answers the question. In the above
example, the slots <parm>, <cntxt>, and possibly <value> would be
filled in from the question, and the other slots would be filled from the
history tree.
FIGURE
18-9 Questions about factual knowledge.
FIGURE
18-10 Rule-retrieval questions.
Someof the simplest questions about a specific consultation are those that
inquire about the value of a particular parameter (Figure 18-12). The
specialist that answers these questions must knowhow to retrieve this in-
An Example--MYCIN 357
FIGURE
18-11 Response that uses rule models.
** TO WHATCLASSDOESORGANISM-1 BELONG?
Thecategoryof ORGANISM-1
is enterobacteriaceae
(.9)
** IS ORGANISM-1 A STREPTOCOCCUS?
No.
Theidentity of ORGANISM-1is e.coli (.74) bacteroides(.7)
klebsieUa-pneumonlae (.4) pseudomonas-aeruginosa (.22)
enterobacter(.21) proteus-non-mirabilis
(.t 9) serratia (.03)
salmonella(.02)
The specialist for answering questions like "How do you know the
value of <parm> of <cntxt>?" knows that the value of a parameter can
come from two sources: it can be deduced by rules, or the user can give it
in response to a question. The history tree will show which (possibly both)
of these sources provided the particular information mentioned in the
question (Figure 18-13).
If the question is phrased in the negative, it is necessary first to find
all the ways the conclusion could have been made (this is a simple task of
rule retrieval), then to explain whyit wasnt madein this consultation (Fig-
ure 18-14). The specialist for answering these questions must know what
situations can prevent conclusions from being made. The second question
in Figure 18-14 illustrates how the answer to one question might cause
another question to be asked.
The specialist for answering questions of the form "Howdid you use
<parm> of <cntxt>?" needs to know not only how to find the specific
rules that might use a parameter, but also how a parameter can cause a
rule to fail and how one parameter can prevent another from being used.
The history tree can be checked to see which of the relevant rules used
the parameter, which failed because of the parameter, and which failed
for some other reason, preventing the parameter from being used (Figure
18-15).
For questions of the form "Why didnt you find out about <parm>
of <cntxt>?" general knowledge of MYCINscontrol structure tells the
conditions under which it would have been necessary to find out some
piece of information. The record of the consultation can be used to de-
An Example--MYCIN 359
termine why these conditions never arose for the particular parameter in
question (Figure 18-16). Figure 18-16 also illustrates that MYCINsgeneral
question answerer allows a user to get as much information as is desired.
The first answer given was not really complete in itself, but it led the user
to ask another question to get more information. Then another question
was asked to determine why clause 1 of Rule 159 was false. The answers
to the first two questions both ,mentioned rules, which could be printed if
the user wanted to examine them. The special commandPR (Print Rule)
is for the users convenience. It requires no natural language processing
and thus can be understood and answered immediately ("What is Rule
109?" or "Print Rule 109" also would be understood).
In questions that ask about the application of a rule to a context there
are three possibilities: the rule told us something about the context; the
rule failed when applied to that context; or the rule was never tried in that
context. The history tree tells which of these is the case. Furthermore, if a
rule succeeded, there is a record of all the conclusions it made, and if it
failed, the reason for failure is recorded. As Figure 18-17 illustrates, rules
are only applicable in certain contexts. If the rule could not apply to the
context mentioned in the question but might be relevant when applied to
some other context (which is an ancestor or descendent in the context tree),
then the question is answered as if it had mentioned the relevant context.
There are interactions amongthe different types of questions, but they
werent mentioned earlier for the sake of clarity. Sometimes a question
may indicate an incorrect assumption about the state of MYCINsknowl-
edge. For example, a question could be "Why dont you think that OR-
GANISM-1is E. coli?" when, in fact, the system has concluded that the
organism is E. coli. To answer this question, the explanation system would
explain how it did make such a decision, despite the phrasing of the ques-
tion. Similarly, when asked how the system used (or concluded about) some
parameter that was never traced, the system explains why it didnt need to
find out about the parameter, regardless of how the question was phrased.
18.3 Conclusions
Consultation systems form a class of artificial intelligence programs that
can provide useful solutions to real-world problems. The utility of such
systems, however, depends on their acceptability to humanusers. One fea-
ture that can increase their acceptability is a mechanismwhereby they can
explain or justify their advice.
The development of an explanation mechanism for a consultation sys-
tem is very much related to the problems of representing knowledge and
of making use of different sources of knowledge. Since the use of inference
rules provides a unified way to represent modular pieces of knowledge,
Conclusions 361
** IS ORGANISM-1 CORYNEBACTERIUM-NON-DIPHTHERIAE?
No. Theidentity of ORGANISM-1
is e.coli (.74) bacteroides (.7)
klebsiella-pneumoniae
(.4) pseudomonas-aeruginosa (.22) enterobacter
(.21) proteus-non-mirabilis
(.19) serratia (.03) salmonella
** PR109
RULE109
IF: 1) Theculture wastakenfroma sterile source,and
2) It is not known whetherthe organismis a contaminant, and
3) Thepatient is not a compromised host, and
4) Thepatientwasfebrile dueto the infection, and
5) Theresults of the CBCfromthe timeperiodof the culture are
available,and
6) A - Thewhite countfromthe CBCis greater than10.5, or
B - Thepercentof PMNsin the CBCat the time the culture was
obtainedis greaterthan78, or
C - Thepercent of WBCs whichwerebandsin the CBCat the
time the culture wasobtainedis greaterthan 10
THEN: Thereis stronglysuggestive evidence(.8) that the organism
not a contaminant
This chapter is an abridged version of a paper, some of which was originally presented by
Sharon Wraith Bennett at the 12th Annual Midyear Clinical Meeting of the American Society
of Hospital Pharmacists, Atlanta, Georgia, December8, 1977, and which appeared in American
Journal of Hospital Pharmacy 37:523-529 (1980). Copyright 1980 by American Journal of
Hospital Phmw~acy.All rights reserved. Used with permission.
363
364 SpecializedExplanations
for DosageSelection
brain barrier and may lead to the development of resistance (Fisher et al.,
1975). One antimicrobial may be selected over another, similar drug be-
cause it causes fewer or less severe side effects. For example, nafcillin is
generally preferred over methicillin for treatment of staphylococcal infec-
tions because of the reported interstitial nephritis associated with methi-
cillin (Ditlove et al., 1977). MYCINS knowledge base therefore requires
continual updating with new indications or adverse reactions as they are
reported in the medical literature.
Several patient-specific factors mayfurther limit the list of acceptable
antimicrobials. Tetracycline, for example, is not recommendedfor children
(Conchie et al., 1970) or pregnant (Anthony, 1970) or breast-feeding
(OBrien, 1974) women. Also, prior adverse reactions to antimicrobials
must be considered by the program.
19.1Customizing Doses
renal impairment in infants 1 between one day and one week old (Edel-
mann and Barnett, 1971). Because of the passage of maternal creatinine
into the infant serum at birth, no estimate of renal function is attempted
if the newborn is less than one day old. For infants younger than six
months, MYCINwarns the user of the large degree of possible error in
the estimated doses because of the changing renal function and a poor
relationship between glomerular filtration rate and body surface area
(Rubin et al., 1949).
When the creatinine clearance of an adult patient is not known, it is
estimated from the age, sex, weight, and serum creatinine (Jelliffe and
Jelliffe, 1972) (Figure 19-1). For children less than 12 years of age,
height and serum creatinine are used to estimate the creatinine clearance
(Schwartz et al., 1976). If two consecutive serum creatinines indicate rap-
19.2Selection
ofDosage Regimen
!9.3Explanation of Recommendations
At the conclusion of the consultation, the physician can ask MYCIN simple
questions to obtain assurance that the diagnosis and treatment are reason-
able. These questions may refer to the current consultation or they may
be general, regarding any of the systems knowledge. The program pro-
vides a justification for the therapy selection, which includes the reasons
for selecting one antimicrobial instead of another. Also available is an ex-
planation of the calculations used to decide on a dose (Figure 19-4). The
physician can also ask to see relevant parts of MYCINS knowledge base,
including justification and literature references (Figure 19-5).
368 Specialized Explanations for Dosage Selection
BloodLevelofGEN~MICIN[mcg/ml]
2.0 3.0 4.0 5.0 6.0 7.0
+---- .... + .... + .... .... + .... + .... + .... + .... + .... + .... + .... +---
D B I
I D I
D B I
D B
50+ D B I
D B I
D B I
D B
D B
00+ D B I
I D B I
I DI B I
I D B
I D B I
50+ D B
I D B I
I B I
I B I D
I B I D
H 10 00+ B I D
O I B I D
U I B I D
R I B I D
S IB I D
12 50+ I D
I I D B
II D B
11 D B
I D
15 00+ D B
I D B
D B I
B D
B D I
17 50+ B D I
B DI
( B DI
BI DI
B I *
20.00+ B I DI
MIC
KEY: D- druglevel for modifieddose[70.0mgQ8 h]
I - druglevel for modifiedinterval [119mgQ14h]
B - druglevel for modifiedinterval anddose[105 mgQ12h]
MIC- minimum inhibitory concentration[mcg/ml]
* - indicatesoverlappingcurves
** HOWDID YOUCALCULATE
THEDOSEFORGENTAMICIN?
WhenGENTAMICIN is given for MENINGITIS, the recommended dosageis:
if ageis > 2 yrs then 1.7 mg/kgq8hIV plus considergiving 5 mgq24hIT,
else 2.3 mg/kgq8hIV plus considergiving2.5-4 rag/dayIT.
Thenormaldosefor JohnJonesis: 119mg(3.0 ml, 80mg/2ml ampule)q8h
[calculatedonthe basis of 1.7 mg/kg]plus considergiving 5 mgq24hIT
GENTAMICINis excretedby the kidneys, so its dosagemustbe modifiedin
renalfailure.
Thefollowing table showshowthe patients renal function wasdetermined:
Identifier Value Definition
SCR1 1.9 the mostrecent serumcreatinine (mg/100ml)
SCR2 1.8 the previousserumcreatinine (mg/100ml)
CCr(f) 42.7 estimatedcreatinine clearance,adjustedfor normal
bodysurfacearea(ml/min/1.73sq. meters)
CCr(n) 100 the averagenormalcreatinine clearancefor
JohnJoness agegroup
CCr(min) 80 the minimum normalcreatinine clearancefor
JohnJoness agegroup
SinceCCr(f) < CCr(min),JohnJoneshasrenal impairment.
To modify the dosageof GENTAMICIN,
the following pharmacokinetic parameters
wereused:
Identifier Value Definition
a .007 the fraction hourlyloss of a drugdueto non-renal
processes
b .0022 theslopeof theplot of theeliminationrate
constantof a drugversusthe patients
creatinine clearance
Q .57 the drugseliminationrate fraction: elimination
rateof the drugasa fraction of the drugs
"normal"eliminationrate.
Qwascalculatedusingthe formula:
Q = [a + b * CCr(f)]/[a + b * CCr(n)]
My first dosagerecommendation decreasedthe amountof GENTAMICIN and left
the dosinginterval unchanged. Thenewamountwascalculated by
multiplyingthe old amount by Q.
Thesecondrecommendation increasedthe dosinginterval andleft the amount
of GENTAMICIN unchanged. Thenewinterval wascalculatedby dividing
the old interval by Q.
** HOWDO YOUDETERMINE
BODYSURFACE
AREA?
RULE104
[Thisrule applie~to anypatient,andis tried in orderto find out
aboutthe patients estimatedbodysurfacearea]
IF: 1) Theweightof the patient is known,and
2) Theheightof the patient is known
THEN: UseBoydsalgorithmto calculate bodysurfaceareafrom
weightandheight[Boyd, E. TheGrowthof the Surface
Areaof the Human Body,1935,pp 1-133.]
Author:Wraith
Justification:Acomparative evaluation
with statistical analysisof
the Boydmethodwith the DuBoisformulademonstrated that the
Boydformulais moreaccuratefor abnormal bodytypes.
[SendrayJ. et al., Determinationof human bodysurfaceareafrom
heightandweight,AppliedPhysiology, 7(1):1-12,July 1954.]
371
372 CustomizedExplanationsUsing CausalKnowledge
** DOYOUTREAT PSEUDOMONAS-AERUGINOSA
BACTEREMIA WITHCARBENICILLIN?
Yes,for treatment
of PSEUDOMONAS-AERUGINOSA
in BACTEREMIA,
thedrugs
ofchoice
(ranked
byrelative
efficacy,
assuming
sensitivity)
are:
I stchoice:
CARBENICILLIN-AND-TOBRAMYCIN
CAFIBENICILLIN-AND-G
ENTAMICIN
AMIKACIN-AND-CARBENICILLIN
2ndchoice:
TOBRAMYCIN
GENTAMICIN
AMIKACIN
20.2Knowledge Representation
~precondition-of
IF: PARAMETER-1
of OBJECT-1
Is VALUE-l, and
PARAMETER-2
of OBJECT-1is VALUE-4
THEN:Concludethat PARAMETER-4of OBJECT-3is VALUE-7
nent parts but wishes to provide a brief summary of the knowledge un-
derlying that rule. Complexity, importance, and rule type are described in more
detail below.
In the network (Figure 20-2) rules connect value nodes with other value
nodes. This contrasts with the MYCIN system in which rules are function-
ally associated with an object-parameter pair and succeed or fail only after
376 CustomizedExplanations Using Causal Knowledge
TABLE20-1
Static Information Dynamic Information
Type of Node (associated with node) (consultation-specific)
value node parameter-node link contexts for which this value is true
precondition-rule list certainty factor
conclusion-rule list explanation data
importance ask state
complexity
ask first/last
CF = 1) have been termed unity paths (see Chapter 3). In cases of mutually
exclusive values of parameters, complete investigation of one value re-
quires consideration of any other value that could be reached by a unity
path. Thus the representation must allow quick access to such paths.
Whenreasoning by elimination, similar problems arise if a system fo-
cuses on a single value. One needs the ability to conclude a value by ruling
out all other possible values for that parameter; this entails a slight mod-
ification of the organizational and reasoning scheme. One strategy is to use
this elimination method in cases of mutually exclusive options only after
the normal backward-chaining process fails (provided that the possibilities
represented in the knowledge base are knownto span all potential values).
20.3Explanation Capabilities
SampleInteraction 1 :
**EXPERTISE
3 [the user indicates his or her level of expertiseona scale of 1 to
10 where10 = expert]
**HOWDOES HYPERPARATHYROIDISM CAUSERENAL STONES?
Renalstones are causedby hypercalcemia
Hypercalcemia is causedby hyperparathyroidism
**MOREDETAIL
Renalstonesare causedby increasedurinary calcium
Increasedurinary calciumis causedby hypercalcemia
Hypercalcemiais causedby increased bonebreakdown
Increasedbonebreakdown is causedby hyperparathyroidism
**EXPERTISE
7 [the user indicates that he or she is moreexpert herethan above]
This sample dialogue demonstrates: (1) tile users ability to specify his
her level of expertise, (2) the programs ability to employthe users exper-
tise to adjust the amount of detail it offers, and (3) the users option
request more detailed information about the topic under discussion.
Twouser-specific variables are used to guide the generation of expla-
4nations:
:~Our program functions as shown except that the user input requires a constrained format
rather than free text. Wehave simplified that interaction here for illustrative purposes. The
program actually has no English interlace.
4Another variable we have discussed but not implemented is a [i)cusing parameter that would
put a ceiling on the number of steps in the chain to trace when formulating an explanation.
A highly focused explanation would result in a discussion of only a small part of the reasoning
tree. In such cases, it wouldbe appropriate to increase the detail level as well.
Explanation Capabilities 379
VALUES RULES
Bone breakdown
Comp 6 Imp 3
.6 Cause-effect
Hypercalcemia
Comp 3 Imp 8
.9 Cause-effect
Renal stones
Comp 1 Imp 6
Reasoningsequence:
rl r2 r3 r4 r5
A B ~C D DE ~F
10
concept
complexity
.... expertise
the reasoning chain are selected fin" exposition on the basis of their com-
plexity; those concepts with complexity lying between the users expertise
level and the calculated detail level are used. 5 Consider, for example, the
five-rule reasoning chain linking six concepts shown in Figure 20-4. When
intermediate concepts lie outside the desired range (concepts B and E in
this case), broader inference statements are generated to bridge the nodes
that are appropriate for the discussion (e.g., the statement that A leads to
C would be generated in Figure 20-4). Terminal concepts in a chain are
always mentioned, even if their complexity lies outside the desired range
(as is true for concept F in the example). This approach preserves the
5The default value for DETAILin our system is the EXPERTISEvalue incremented by 2.
Whenthe user requests more detail, the detail measure is incremented by 2 once again. Thus,
fi)r the three interchanges in Sample Interaction 1, the expertise-detail ranges are 3-5, 3-
7, and 7-9 respectively. Sample Interaction 2 demonstrates how this scheme is modified by
the importance measure for a concept.
Explanation Capabilities 381
Reasoningsequence:
rl r2 r3 r4 r5
A C =D E F
10
r5
. r3
r2 ~
rule
complexity
rl
expertise
6The opposite situation does not occur; rules of low complexity do not link concepts of higher
complexity.
7patil has deah with this problem by explicitly representing causal relationships concerning
acid-base disorders at a variety of different levels of detail (Patil et al., 1981).
382 CustomizedExplanationsUsingCausalKnowledge
the reasoning chain represented in Figure 20-4. Although rule r3 links two
concepts (C and D) that are within the complexity-detail range for the user,
the relationship mentioned in rule r3 is itself considered to be outside this
range. Whengenerating the explanation for this reasoning chain, the pro-
gram mentions concepts C and D, and therefore mentions rule r3 despite
its complexity measure. Since the rule is considered too complex for the
user, however, the additional explanatory text associated with the rule is
needed in this case. If the rule had fallen within the complexity-detail range
of the user, on the other hand, the text justification for the rule would not
shave been required,
Further modulation of rule and concept selection is accomplished us-
ing the importance measure associated with parameters. A high impor-
tance forces the inclusion of a reasoning step in an explanation, thereby
overriding the complexity considerations that were shown in Figures 20-4
and 20-5. Whenthe importance level of a concept is two or more points
above the expertise of the user, the item is included in the explanation.
Consider, for example, the following dialogue, which demonstrates the way
in which the importance measure is used:
SampleInteraction 2:
**EXPERTISE6
**HOW DOES HYPERPARATHYROIDISM CAUSE RENAL STONES?
Renalstones
arecaused byincreased urinary
calcium
Increasedurinary
calcium is causedbyhypercalcemia
Hypercalcemia
is caused byincreasedbonebreakdown
Bonebreakdownis caused
byincreasedosteoclast
activity
Increasedosteoclast
activity
is caused
byhyperparathyroidism
Note that this example shows a response to the same question asked in
Sample Interaction 1. This time, however, the expertise level is 6 rather
than 3. Hypercalcemia is therefore mentioned only because its importance
level of 8 (see Figure 20-3) is 2 points higher than the expertise of the
user; the complexity level of hypercalcemia does not fall within the exper-
tise-detail range of the user and thus would not have been included in the
explanation if it were not for its high importance. The other items men-
tioned are either terminal concepts in the chain (renal stones and hyper-
parathyroidism) or have a complexity measure lying within the users ex-
pertise-detail range of 6-8.
Many reasoning chains are not as simple as those shown in Figures
20-3, 20-4, and 20-5. Whenexplaining a branched reasoning chain, tot
example, the explanation system can set aside the branches of the chain
and mention them only when it is appropriate to the level of detail required
by the user. This feature provides users with an overview of the reasoning
process to help them decide whether it is necessary to examine the more
detailed steps. This capability is illustrated in the following dialogue, which
SampleInteraction 3:
**WHYDOESTHE PATIENT HAVEINCREASEDSERUMCALCIUM?
Increasedserumcalciumis suggestedby immobilization and malignancy
**MOREDETAIL
Increasedserumcalciumis implied by increased bonebreakdown
Increasedbonebreakdown is suggestedby 2 paths of reasoning:
Increasedbonebreakdownis implied by increasedosteoclastactivity
Increased
osteoclastactivity is impliedby prolonged
immobilization
Increasedbonebreakdown is also implied by malignantboneinvasion
Our refinement of the rule types presented by Clancey (Chapter 29) yields
five types of" rules 9 that are relevant to explanation strategies:
9Rules considcred here deal with domain knowledge, to be distinguished from strategic or
meta-level rules (Davis and Buchanan, 1977).
lIn manycases self-referencing rules can be replaced by strategy rules (e.g., "If you have
tried to conclude a value [or this parameter and have failed to do so, then use the default
vahle for the parameter").
384 Customized Explanations Using Causal Knowledge
RH INCOMPATABILITY
C~useeffect
FETAL
HEMOLYSIS,,. Other causes
Iff ~x
, E ect cause \Cause effect //__
INCREASED BILIRUBIN
IN AMNIOTIC FLUID
IMPAIRED FETAL
OXYGEN TRANSPORT
\ Definitional
/
FETAL PROBLEM
SampleInteraction 4:
**WHYDO INCREASED BILIRUBIN COMPOUNDS IN THE AMNIOTICFLUID IMPLY FETAL
HEMOLYSIS?
Fetal hemolysis
leadsto bilirubin compounds
in the fetal circulation;
Causal Links and Statistical Reasoning 385
P(effectlcausei) P(causei)
P(causei[effect) k
P(causej) P(effectlcausej)
j= l
386 Customized
ExplanationsUsing CausalKnowledge
This computation of the probability that the ith possible cause is pres-
ent given that the specific effect is observed, P(causeileffect ), requires
knowledge of the a priori frequencies P(causei) for each of the possible
causes (cause l, cause~ ... causek) of the effect. These data are not usually
available for medical problems and are dependent on locale and prescreen-
ing of the patient population (Shortliffe et al., 1979; Szolovits and Pauker,
1978). The formula also requires the value of P(effectlcausei) for all cause-
effect rules leading to the effect, not just tim one for the rule leading from
cause/ to the effect. In Figure 20-6, tot example, the effect-cause rule
leading from "increased bilirubin in amniotic fluid" to "fetal hemolysis"
could be derived from the cause-effect rule leading in the opposite direc-
tion only if all additional cause-effect rules leading to "increased bilirubin
in amniotic fluid" were known(the "other causes" indicated in the figure)
and if the relative frequencies of the various possible causes of "increased
bilirubin in amniotic fluid" were also available. A more realistic approach
is to obtain the inference weighting for the effect-cause rule directly from
the expert who is building the knowledge base. Although such subjective
estimates are fraught with danger in a purely Bayesian model (Leaper et
al., 1972), they appear to be adequate (see Chapter 31) when the numerical
weights are supported by a rich semantic structure (Shortliffe et al., 1979).
Similarly, problems are encountered in attempting to produce the in-
verse of rules that have Boolean preconditions. For example, consider the
following rule:
(or lower) weighting than the sum of the separate manifestations, ll nor
did it provide a way to explain the inference paths involved (Miller et al.,
1982).
PIP (Pauker et al., 1976; Szolovits and Pauker, 1978) handles the im-
plication of diseases by manifestations by using "triggers" for particular
disease frames. No weighting is assigned at the time of frame invocation;
instead PIP uses a scoring criterion that does not distinguish between
cause-effect and effect-cause relationships in assigning a numerical value
for a disease frame. While the information needed to explain the programs
12
reasoning is present, the underlying causal information is not.
In our experimental system, the inclusion of both cause-effect rules
and effect-cause rules with explicit certainties, along with the ability to
group manifestations into rules, allows flexibility in constructing the net-
work. Ahhoughcausal information taken alone is insufficient for the con-
struction of a comprehensive knowledge base, the causal knowledge can
be used to propose effect-cause relationships for modification by the sys-
tem-builder. It can similarly be used to help generate explanations for such
relationships when effect-cause rules are entered.
20.5Conclusion
Wehave argued that a need exists for better explanations in medical con-
sultation systems and that this need can be partially met by incorporating
a user model and an augmented causal representation of the domain
knowledge. The causal network can function as an integral part of the
reasoning system and may be used to guide the generation of tailored
explanations and the acquisition of new domain knowledge. Causal infor-
mation is useful but not sufficient for problem solving in most medical
domains. However, when it is linked with information regarding the com-
plexity and importance of the concepts and causal links, a powerful tool
for explanation emerges.
Our prototype system has been a useful vehicle for studying the tech-
niques we have discussed. Topics for future research include: (1) the de-
velopment of methods for dynamically determining complexity and im-
portance (based on the semantics of the network rather than on numbers
provided by the system builder); (2) the discovery of improved techniques
for using the context of a dialogue to guide the formation of an expla-
Using Other
Representations
21
Other Representation
Frameworks
391
392 OtherRepresentationFrameworks
with which we started. Our choice of rules and fact triples, with CFs, has
been explained in Part Two. As summarized at the end of Chapter 3, we
were under no illusion that we were creating a "pure" production system.
Wehad taken many liberties with the formalism in order to make it more
flexible and understandable. However,we still felt that the stylized condi-
tion-action form of knowledge brought many advantages because of its
simplicity. For example, creating English translations from the LISP rules
and translating stylized English rules into LISP were both somewhat sim-
plified because of the restricted syntax. Similarly, creating explanations of
a line of reasoning was simplified as well, because of the simple backward-
chaining control structure that links rules together dynamically.
Representing knowledge in procedures was one alternative we were
trying hard to avoid. Our experience with DENDRAL and with the therapy
algorithm in MYCIN(Chapter 6) showed how inflexible and opaque a set
of procedures could be for an expert maintaining a knowledge base. And,
as mentioned in previous chapters, we saw that production rules offered
some opportunity for making a knowledge base easier to understand and
modify.
Wewere aware of predicate calculus as a possibility for representing
MYCINsknowledge. We were working in a period in AI research when
logic and resolution-based theorem provers were being recommended for
many problems. Wedid not seriously entertain the idea of using logic,
however, largely because we felt that inexact reasoning was undeveloped
in theorem-proving systems.
Wehad initially experimented with a semantic network representation,
as mentioned in Chapter 3. Although we felt we could store medical knowl-
edge in that form, we felt it was difficult to focus a dialogue in which gaps
in the knowledge were filled both by inference and by the users answers
to questions. Minskys paper on frames (Minsky, 1975) did not appear until
after this work was well underway. Even so, we were looking for a more
structured representation, specifically rules, to build editors and parsers
for, to modify and explain, and to reason with in an understandable line
of reasoning.
In this part we describe three experiments with alternative represen-
tations and control structures in programs called VM, CENTAUR,and
WHEEZE.The first two programs were written for Ph.D. requirements,
the last as a class project. All are programs that work on medical problems,
although in areas outside of infectious diseases. Another experiment with
representations is described in Chapter 20 in the context of explanation.
There MYCINsrules are rewritten in an inference net (cf. Duda et al.,
1978b) in order to facilitate explaining the inferences at different levels of
detail.
The VMprogram discussed in Chapter 22 was selected by Professor
E. Feigenbaum, H. Penny Nii, and Dr. John Osborn and worked on pri-
marily by Larry Fagan for his Ph.D. dissertation. Feigenbaum and Nii had
OtherRepresentationFrameworks 393
been developing the SU/X program I (Nii and Feigenbaum, 1978) for in-
terpretation of multisensor data. Feigenbaum was a friend of Osborns,
knew of Osborns pioneering work on computer monitoring in intensive
care, and saw this as a possible domain in which to explore further the
problems in multisensor signal understanding involving signals for which
the time course is important to the interpretation. Osborn agreed to be
the expert collaborator. Fagan had been working on MYCINand had con-
tributed to the code as well as to the knowledge base of meningitis rules.
(In Feigenbaums words, Fagan had become "MYCINized.") So it was nat-
ural that his initial thinking about the ICU data interpretation problem
was in MYCINsterms. Fagan quickly found, however, that the MYCIN
model was not appropriate for a problem of monitoring data continuously
over time. MYCINwas much too oriented toward a "snapshot" of data
about a patient at a fixed time (although some elements of data in the
"snapshot" name historical parameters, such as dates of prior infections).
The only obvious mechanism for making MYCINwork with a stream of
data in the ICU was to restart the program at frequent time intervals to
reason about each new "snapshot" of data gathered during each 2-5 min-
ute time period. This is inelegant and completely misses any sense of con-
tinuity or the changing context in which data are being gathered. Thus
VMwas designed to remedy this deficiency.
The other two programs in Part Seven were designed as alternatives
to a rule-based representation, varying the representation of one program,
called PUFEAlthough desirable, it is difficult in AI to experiment with
programs by varying one parameter at a time while holding everything
else fixed. Of course, not everything else could remain fixed for such a
gross experiment. Both CENTAUR and WHEEZE,discussed in Chapters
23 and 24, were deliberate attempts to alter the representation and control
of the PUFF program (while leaving the knowledge base unchanged)
order to examine advantages and disadvantages of alternatives.
PUFF is a program that diagnoses pulmonary (lung) diseases. The
problem was suggested to Feigenbaum and Nii by Osborn at the time VM
was being fbrmulated, and appeared to be appropriate for a MYCIN-like
approach. It was initially programmed using EMYCIN (see Part Five),
collaboration with Drs. R. Fallat and J. Osborn at Pacific Medical Center
in San Francisco (Aikins et al., 1983). About 50-60 rules were added
EMYCIN[in a much shorter time than expected (Feigenbaum, 1978)]
interpret the type and severity of pulmonary disorders. 2 The primary data
are mostly from an instrument known as a spirometer that measures flows
and volumes of patients inhalation and exhalation. The conlusions are
diagnoses that account for the spirometer data, the patient history data,
and the physicians observations.
FIGURE
21-1 Five implementations of PUFF.
IfA&B&C,
then A
B &C--,A
This is much more natural to explain than trying to say why, or in what
sense, A can be evidence for itself. CENTAUR was demonstrated using the
same knowledge as in the EMYCINversion of PUFF(Aikins, 1983).
David Smith and Jan Clayton developed WHEEZE as a further ex-
periment with frames. They asked, in effect, if all the knowledge in PUFF
could be represented in frames and what benefits would follow from doing
so. In a short time (as a one-term class project) they reimplemented PUFF
with a frame-based representation. Chapter 24 is a summaryof" their re-
sults.
The version of PUFFwritten in BASIC(BASIC-PUFF)is a simplified
version of the EMYCIN rule interpreter with the medical knowledge built
into the code (Aikins et al., 1983). It was redesigned to run efficiently
OtherRepresentationFrameworks 395
similar approach is via strategy rules, as described in Chapter 29. The unity
path mechanism(Chapter 3) also affects the order of rule invocation.
ONCOCIN(discussed in Chapters 32 and 35) incorporates many
the ideas from these experiments, most notably the framelike representa-
tion of control knowledge and the description of changing contexts over
time. It builds on other results presented in this book as well, so its design
is described later. ONCOCIN clearly shows the influence of the evolution
of our thinking presented in this section.
One piece of recent research not included in this volumeis the rerepre-
sentation of MYCINsknowledge along the lines described in Chapter 29.
The new program, called NEOMYCIN (Clancey and Letsinger, 1981), car-
ries much of its medical knowledgein rules. But it also represents (a) the
taxonomy of diseases as a separate hierarchy, (b) strategy knowledge
meta-rules, (c) causal knowledge as links in a network, and (d) knowledge
about disease processes in the form of frames characterizing location and
temporal properties. One main motivation for the reconceptualization was
to provide improved underpinnings for the tutorial program described in
Chapter 26. Because of the richer knowledge structures in NEOMYCIN,
informative explanations can be given regarding the programs diagnostic
strategies, as well as the medical rules.
NEOMYCIN, along with other recent work, emphasizes the desirabil-
ity of augmenting MYCINshomogeneousset of rules with a classification
of types of knowledge and additional knowledge of each type. In MYCINs
rule set, the causal mechanisms, the taxonomic structure of the domain,
and the problem-solving strategies are all lumped together. An augmented
knowledge base should separate these different types of knowledge to fa-
cilitate explanation and maintenance of the knowledge base, and perhaps
to enhance performance as well. Causal mechanisms have been repre-
sented and used in several domains, including medicine (Patil et al., 1981)
and electronics debugging (Davis, 1983). Mathematical models have been
merged with symbolic causal models in AI/MM(Kunz, 1983). As a result
of this recent work, considerably richer alternatives than MYCINsho-
mogeneous rule set can be found.
Finally, it should be noted that the chapters in this part describe rather
fundamental viewpoints on representation. Within a rule-based or frame-
based (or mixed) framework there are still numerous details of represent-
ing uncertainty, quantified variables, strategies, temporal sequences, book-
keeping information, and other concepts mentioned throughout the book.
22
Extensions to the
Rule-Based Formalism
for a Monitoring Task
This chapter is a longer and extensively revised version of a paper originally appearing in
Proceedings of the Sixth IJCAI (1979, pp. 260-262). Used by permission of International Joint
Conterences on Artificial Intelligence, Iqc.; copies of the Proceedings are available from Wil-
liam Kaufmann, Inc., 95 First Street, Los Altos, CA94022.
IVM was developed as a collaborative research project between Stanford University and
Pacific Medical Center (PMC)in San Francisco. It was tested with patient information acquired
from a physiologic monitoring system implemented in the cardiac surgery ICU at PMCand
developed by Dr..John Osborn and his colleagues (Osborn et al., 1969).
397
398 Extensionsto the Rule-Based
Formalism
for a Monitoring
Task
22.1The Application
The intensive care unit monitoring system at Pacific Medical Center (Os-
born et al., 1969) was designed to aid in the care of patients in the period
immediately following cardiac surgery. The basic monitoring system has
proven to be useful in caring for patients with marked cardiovascular in-
stability or severe respiratory malfunction (Hilberman et al., 1975). Most
of these patients are given breathing assistance with a mechanical ventilator
until the immediate effects of anesthesia, surgery, and heart-lung bypass
have subsided. The ventilator is essential to survival fi)r many of these
patients. Electrocardiogram leads are always attached, and patients usually
have indwelling arterial catheters to assure adequate monitoring of blood
pressure and to provide for the collection of arterial blood fi)r gas analysis.
The ventilator-to-patient airway is monitored to collect respiratory flows,
rates, and pressures. Oxygen and carbon dioxide concentrations are also
Overview
of the VentilatorManager
Program 399
t
LIFE SUPPORT
I~
THERAPY
OBSERVATIONS
T
=1 MONITORINGI VM
"1
FIGURE 22-1 VM system configuration. Physiological mea-
surements are gathered automatically by the monitoring system
and provided to the interpretation program. The summary in-
formation and therapeutic suggestions are sent back to the ICU
for consideration by clinicians.
Summary
generatedat time 15:40
All conclusions:
...I ..... I ..... I ..... I,..
12 13 14 15
~===~===== ==
BRADYCARDIA[PRESENT]
HEMODYNAMICS[STABLE]
HYPERVENTILATION[PRESENT]
HYPOTENSION[PRESENT]
GoalLocation CCCCCCCCCCCCC / AAAAAAAAAAA
PatientLocation V/CCCCCCCCCCCCCCCCCCCCCCC
12 13 14 15
1640..
** SUGGEST CONSIDER PLACINGPATIENTON T-PIECE IF
** PA02> 70 ONFI02 < = .4 [measureof blood gasstatus]
** PATIENTAWAKEANDTRIGGERING VENTILATOR
** ECGIS STABLE
1810..
SYSTEMASSUMES
PATIENTSTARTINGT-PIECE
1819.... 1822..
HYPOVENTILATION
should be within the specified ranges at some point in the future. Thus a
rule examines the current and historical data to interpret what is happen-
ing at the present and to predict events in the future.
Additional information associated with each rule includes the symbolic
name (e.g., STABLE-HEMODYNAMICS), the rule group (e.g., rules about
instrument faults), the main concept (definition) of the ru~e, and all of
therapeutic states in which it makes sense to apply the rule. The list of
states is used to focus the program on the set of rules that are applicable
at a particular point in time. Figure 22-5 shows a sample rule for deter-
mining hemodynamic stability (i.e., a measure of the overall status of the
2cardiovascular system).
STATUSRULE:STABLE-HEMODYNAMICS
DEFINITION:
Definesstablehemodynamics
based on bloodpressuresandheartrate
APPLIES
TO:patients on VOLUME, CMV,ASSIST, T-PIECE
COMMENT: Lookat mean
arterialpressure
for changesin bloodpressure
andsystolicbloodpres-
surefor maximum
pressures,
IF
HEARTRATEis ACCEPTABLE
PULSERATEdoesNOT CHANGE by 20 beats/minute in 15 minutes
MEAN ARTERIAL PRESSURE is ACCEPTABLE
MEANARTERIAL PRESSUREdoesNOTCHANGE by 15 torr in 15 minutes
SYSTOLICBLOOD PRESSURE is ACCEPTABLE
THEN
The HEMODYNAMICS are STABLE
Each reasoning step is associated with a collection of rules, and each rule
is classified by the type of conclusions made in its action portion; e.g., all
rules that determine the validity of the data are classed together.
Most of the rules represent the measurement values symbolically, using the
terms ACCEPTABLE or IDEAL to characterize the appropriate ranges.
The actual meaning of" ACCEPTABLE changes as the patient moves from
state to state, but the statement of the relation between physiological mea-
surements remains constant. For example, the rule shown in Figure 22-5
checks to see if" the patients heart rate is ACCEPTABLE. In the different
clinical states, or stages of mechanical assistance, the definition of AC-
CEPTABLE changes. Immediately after cardiac surgery a patients heart
rate is not expected to be in the same range as it is when he or she is moved
out of" the ICU. Mentioning the symbolic value ACCEPTABLE in a rule,
rather than the state-dependent numerical range, thus reduces the number
of rules needed to describe the diagnostic situation.
The meaning of the symbolic range is determined by other rules that
establish expectations about the values of measured data. For example,
whena patient is taken off the ventilator, the upper limit of acceptability
for the expired carbon dioxide measurement is raised. (Physiologically,
patient will not be able to exhale all the CO,) produced by his or her system,
and so CO2 will accumulate.) The actual numeric calculation of EXPIRED
pCO2 HIGH in the premise of any rule will change when the context
switches (removal from ventilatory support), but the statement of the rules
remains the same. A sample rule that creates these expectations is shown
in Figure 22-6.
Therapy rules can be divided into two classes: the long-term therapy as-
sessment (e.g., when to put the patient on the T-piece), and the determi-
nation of response to a clinical problem, such as hyperventilation or hy-
pertension. The two rules shown in Figure 22-7, for selecting T-piece
therapy and for responding to a hyperventilation problem, demonstrate
several key factors in the design of the rule base:
INITIALIZINGRULE:INITIALIZE-CMV
DEFINITION:Initialize expectations
for
patientsoncontrolledmandatory
ventilation (CMV)
therapy
APPLIESTO: all patients on CMV
IF ONEOF:
PATIENTTRANSITIONED FROMVOLUME TO CMV
PATIENTTRANSITIONED FROMASSISTTO CMV
THENEXPECTTHE FOLLOWING
THERAPY-RULE: THERAPY.A-T
DEFINITION: DEFINESREADINESS TO TRANSITION FROMASSIST MODETO T-PIECE
COMMENT:If patient hasstablehemodynamics, ventilation is acceptable,andpatient hasbeen
awake
andalert enough to interact with the ventilatorfor a periodof timethentransition
to T-pieceis indicated.
APPLIESTO: ASSIST
IF
HEMODYNAMICS ARE STABLE
HYPOVENTILATION NOTPRESENT
RESPIRATIONRATEACCEPTABLE
PATIENTIN ASSISTFOR> 30 MINUTES
THEN
THEGOALIS FORTHEPATIENTTO BE ONTHET-PIECE
SUGGEST CONSIDER PLACINGPATIENTONT-PIECE IF
Pa02> 70 on FI02 < = 0.4
PATIENTAWAKEANDTRIGGERING VENTILATOR
ECGIS STABLE
THERAPY-RULE: THERAPY.VENTILATOR-ADJUSTMENT-FOR-HYPERVENTILATION
DEFINITION: MANAGE HYPERVENTILATION
APPLIESTO: VOLUME ASSIST CMV
IF
HYPERVENTILATION PRESENT for > 8 minutes
COMMENT
wait a short while to seeif hyperventilationpersists
V02not low
THEN
SUGGEST PATIENTHYPERVENTILATING.
SUGGEST REDUCING EFFECTIVEALVEOLAR VENTILATION.
TO REDUCE ALVEOLAR VENTILATION,REDUCE TV BY 15%, REDUCE
RR, OR
INCREASEDISTAL DEADSPACETUBINGVOLUME
CMV
NOT VOLUME r
MONITORED~ ~ VENTILATION
T-PIECE ~ EXTUBATE
I
ASSIST _I
FIGURE
22-8 Therapy state graph.
3Hysteresisis "a lag of e|fect whenthe forces acting ona bodyare changed"
(WebstersNew
WorldDictionaTy,1976).
Details of VM 407
The second model for representing therapeutic goals requires that the
appropriate goal be asserted each time the rule set is evaluated. If no
therapy rules succeed in setting a new goal, the goal is asserted to be the
current therapy. This scheme ignores the apparent practices of clinicians,
but represents a more "conservative" approach that is consistent with the
rule-writing strategy used by our experts. This model is potentially sensi-
tive to minor perturbations in the patient measurements, but such sensi-
tivity implies that a borderline therapy decision was originally made.
22.3Details of VM
22.3.1 Parameters
Co~tstant
Examples: surgery type, sex
Input: once
Reliability: value is good until replaced
Continuous
Examples: heart rate, blood pressure
Input: at regular intervals (6-20 times/hour)
Reliability: presumed good unless input data are missing or artifactual
Volunteered
Examples: temperature, blood gases
Input: at irregular intervals (2-10 times/day)
Reliability: good for a period of time, possibly a [unction of the current
situation
Deduced
Examples: hyperventilation, hemodynamic status
Input: calculated whenever new data are available
Reliability: a function of the reliability of each of" the component
parameters.
RR
DEFINITION:(RESPIRATION RATE)
USED-IN: (TRANSITION.V-CMV TRANSITION.V-A TRANSITION.A-CMV
TRANSITION.CMV-A STATUS.BREATHING-EFFORT/T THERAPY.A-CMV
THERAPY.A-TTHERAPY.T-VABNORMAL-EC02)
EXPECTED-IN:(INITIALIZE.VINITIALIZE.CMV INITIALIZE.V-RETURN
INITIALIZE.AINITIALIZE.T-PIECE)
GOOD-FOR:15 [information is goodfor 15 minutes]
UPDATED-AT:82 [last updated at 82minutesafter start]
LOW:((72.82) (52.59)) [concluded to be LOW from 52-59 minutesand 72-82
minutesafter start]
HEMODYNAMICS
DEFINITION: (HEMODYNAMICS)
CONCLUDED-IN: (STATUS.STABLE-H EMODYN/V,A,CMV)
USED-IN: (THERAPY.CMV-A THERAPY.A-TTHERAPY.T-PiECE-TO-EXTUBATE)
GOOD-FOR:NIL [this is a derivedparameter
so reliability is basedon
other parameters]
UPDATED-AT:110 [last updatedat 110minutes
after start]
STABLE:((99. 110) (82.82) (2.8))
22.3.2 Measurements
4Clinicians can select the dethuh sample rate: fast (2 minutes) or slow (10 minutes). An extra
data sample can be takeq immediately on request.
410 Extensions to the Rule-Based Formalism for a Monitoring Task
Throwing away old measurements does not limit the ability of the program
to utilize historical data. The conclusions based on the original data, which
are stored much more compactly, are maintained throughout the patient
run. Thus the numerical measurement values are replaced by symbolic
abstractions over time.
One current limitation is the programs inability to reevaluate past
conclusions, especially when measurements are taken but are not reported
until some time later. One example of this is the interpretation of blood
gas measurements. It takes about 20-30 minutes for the laboratory to
process blood gas samples, but by that time the context may have changed.
The program cannot then back up to the time that the blood gases were
taken and proceed forward in time, reevaluating the intervening measure-
ments in light of the new data. The resolution of conflicts between expec-
tations and actual events may also require modification of old conclusions.
This is especially true when forthcoming events are used to imply that an
alternative cause provides a better explanation of observed phenomena.
22.3.3 Rules
Whena rule succeeds, the action part of the rule is activated. The
action portion of each rule is divided into three sections: conclusions (or
interpretations), suggestions, and expectations. The only requirement is
that at least one statement (of any of the three types) is made in the action
part of the rule. The first section of the action of the rule is composedof
the conclusions that can be drawn from the premise of the rule. These
conclusions (in the form of a parameter assuming a value) are asserted
the program to exist at the current time and are stored away for producing
summaries and to provide new facts for use by other rules. Whenthe same
conclusion is also asserted in the most recent time when data are available
to the program, then the new conclusion is considered a cantinuatian of the
old one. The time interval associated with the conclusion is then extended
to include the current time. This extension presumes that the time period
between successive conclusions is short enough that continuity can be
asserted.
The second section of the action is a list of suggestions that are printed
for the clinician. Each suggestion is a text string to be printed that sum-
marizes the conclusions made by the rule. 5 Often this list of suggestions
includes additional /actors to check that cannot be verified by the pro-
gram---e.g., the alertness of the patient. By presenting the suggestions as
conditional statements, the need to interact with the user to determine the
current situation is minimized. The disadvantage of this method is that the
program maintains a more nebulous view of the patients situation, unless
it can be ascertained later that one of the suggestions was carried out.
The last section of the action part of the rule is the generation of new
expectations about the ranges of measurements for the future. Expecta-
tions are created to help the program interpret future data. For example,
when a patient is first moved from assist mode to the T-piece, many pa-
rameters can be expected to change drastically because of the stress as well
as the altered mode of breathing. Whenthe measurements are taken, then,
the program is able to interpret them correctly. New upper and lower
bounds are detined for the acceptable range of values for heart rate, for
example, for the duration of time specified. The duration might be spec-
ified in minutes or in terms of a context (e.g., "while the patient is on the
T-piece").
MYCINdoes not place any constraints on the types of conclusions
made in the action part of the rule, although most rules use the CON-
CLUDEfunction in their right-hand sides. For example, MYCINcalls a
program to compute the optimal therapy as an action part of a rule (Chap-
ter 6). The basic motivation behind imposing some structure on rules was
to act as a mnemonic device during rule acquisition. The same advantage
is found in framelike systems with explicit component names--e.g.,
CAUSED-BY,MUST-HAVE,and TRIGGERSin the Present Illness Pro-
gram (Szolovits and Pauker, 1978).
STATUSRULE: STATUS.STABLE-HEMODYNAMICS
DEFINITION: Defines stable hemodynamics based
on bloodpressuresandheart rate
APPLIESTO: patients on VOLUME, CMV,ASSIST,
T-PIECE
COMMENT: Look at meanarterial pressurefor
changes in bloodpressureandsystolic
blood pressurefor maximum pressures.
IF
HEARTRATEis ACCEPTABLE
PULSERATEdoes NOTCHANGE by 20 beats/minute
in 15minutes
MEANARTERIALPRESSURE is ACCEPTABLE
MEANARTERIALPRESSURE does NOTCHANGE by 15
torr in 15minutes
SYSTOLICBLOODPRESSURE is ACCEPTABLE
THEN
The HEMODYNAMICS are STABLE
RULEGROUP:STATUS-RULE
DEFINITION: ((DEFINES STABLEHEMODYNAMICS BASED)
(ON BLOODPRESSURES ANDHEARTRATE))
COMMENT: ((LOOK AT MEANARTERIALPRESSURE FOR)
(CHANGES IN BLOODPRESSURE ANDSYSTOLIC)
(BLOOD PRESSURE FORMAXIMUMPRESSURES))
NODE:(VOLUMECMVASSIST T-PIECE)
EVAL: (ALL OF)
ORIGLHS:((HEARTRATEIS ACCEPTABLE)
(PULSERATEDOESNOTCHANGE BY 20 BEATS/MINUTE
IN 15
MINUTES)
(MEANARTERIALPRESSURE IS ACCEPTABLE)
(MEANARTERIALPRESSURE DOESNOT CHANGE BY 15 TORR
IN 15 MINUTES)
(SYSTOLICBLOODPRESSURE IS ACCEPTABLE))
FILELOCATION: (<puffNM>VM.RULES;18 12538 13143)
M: ((MSIMP HR ACCEPTABLE NIL)
(FLUCTPRCHANGE 20 (0.0 15) NOT)
(MSIMPMAPACCEPTABLE NIL)
(FLUCTMAPCHANGE 15 (0.0 15) NOT)
(MSIMPSYSACCEPTABLE NIL))
I: ((INTERPHEMODYNAMICS = STABLE
NIL))
,.,1
415
.,-= >[.-,
-- D
~~..~
Z,....., c/3
>
,.-,
"~.~
baC
,....,
416 Extensionsto the Rule-Based
Formalism
for a Monitoring
Task
22.3.6 Uncertainty in VM
related strongly with existing premise clauses--e.g., using both mean and
systolic blood pressures. The choice of measurement ranges in several ther-
apy rules also took into account the element of uncertainty. Although the
experts wanted four or five parameters within the IDEALlimit prior to
suggesting the transition to the next optimal therapy state, they often used
the ACCEPTABLE limits. In fact, it would be unlikely that all measure-
ments would simultaneously fall into IDEALrange. Therefore, incorpo-
rating these "grey areas" into the definition of the symbolic ranges was
appropriate. There are at least two possible explanations for the lack of
certainty factors in VMrule base: (1) on the wards, it is only worthwhile
to make an inference if one strongly believes and can support the conclu-
sion; and (2) the measurements available from the monitoring system were
chosen because of their high correlation with patients conditions.
As mentioned elsewhere, the PUFF and SACONsystems also did not
use the certainty factor mechanism. The main goal of these systems was
to classify or categorize a small number of conclusions as opposed to mak-
ing fine distinctions between competing hypotheses. This view of uncer-
tainty is consistent with the intuitions of" other researchers in the field
(Szolovits and Pauker, 1978, p. 142):
Symbolicvalue Interpretation
IDEAL The desired level or range of a measurement
ACCEPTABLE The limits of acceptable values beyondwhich
corrective action is necessary--boundsare high
and low (similar for rate)
VERY UNACCEPTABLE Limit at which data are extremely out of range--
e.g., on whichthe definition of severe hypotension
is based
IMPOSSIBLE Outside the limits that are physiologically possible
setting specific limits are minimizedby the practice of using multiple mea-
surements in coming to specific conclusions. One alternative to using sym-
bolic ranges would be to express values as a percentage of some predefined
norm. This has the same problems as discrete numeric values, however,
when the percentage is used to draw conclusions. Whenit was important
clinically to differentiate how much an expectation was exceeded, the no-
tion of alternate ranges (e.g.., VERYHIGH)was utilized. For the physio-
logical parameters, several types of bounds on expectations have been es-
tablished, as shown in Figure 22-13. In VMthese limits are not static; they
are adapted to the patient situation. Currently, the majority of the expec-
tation changes are associated with changes in ventilator support. These
expectations are established on recognition of the changes in therapy and
remain in effect until another therapy context is recognized. A more global
type of expectation can be specified that persists for the entire time patient
data are collected. A third type of expectation type corresponds to a per-
turbation, or local disturbance in the patients situation. An example of
this is the suctioning maneuver where a vacuum device is put in the pa-
tients airway. This disturbance has a characteristic effect on almost every
measurement but only persists for a short period of time, usually 10-15
minutes. After this time, the patients state is similar to what it was in the
period just preceding the suction maneuver. It is possible to build a hier-
archy out of these expectation types based on their expected duration; i.e.,
assume the global expectation unless a specific contextual expectation is
set, provided a local perturbation has not recently taken place.
Knowledge about the patient could be used to "customize" the expec-
tation limits for the individual patient. The first possibility is the use of
historical information to establish a priori expectations based on type of
surgery, age, length of time on the heart/lung machine, and presurgical
pulmonary and hemodynamic status. The second type of customization
could be based on the observation that patient measurements tend to run
within tighter bands than the a priori expectations. The third type of ex-
pectation based on transient events can be used to adjust for the effects of
Summary and Conclusions 419
in such situations based on the length of time a patient has been in a given
state and on the patients previous therapy or therapies.
The VMprogram has been used as a test-bed to investigate methods
for increasing the capabilities of symbolic processing approaches by ex-
tending the production rule methodology. The main area of investigation
has been in the representation of knowledge about dynamic clinical set-
tings. There are two components of representing a situation that changes
over time: (1) providing the mechanism for accessing and evaluating data
in each new time frame, and (2) building a symbolic model to represent
the ongoing processes and transitions in the medical environment.
23
A Representation Scheme
Using Both Frames and
Rules
Janice s. Aikins
424
A RepresentationSchemeUsingBoth FramesandRules 425
PUFF
I
I I I I
NORMAL RESTRICTIVE OBSTRUCTIVE DIFFUSION NEUROMUSCULAR
LUNG
DISEASE AIRWAYS
DISEASE DEFECT DISEASE
I
I I
OAD SEVERE
OAD
MILD MODERATE MODERATELYSEVERE
OAD OAD I I I
I
ASTHMA
BRONCHITIS EMPHYSEMA
FIGURE
23-1 A portion of the prototype network.
totype. Associated with each component are rules used to deduce a value
for the component. The prototypes focus the search for new information
by guiding the invocation of the rules and eliciting the most relevant in-
formation from the user. These prototypes are linked together in a net-
work in which the links specify the relationships between the prototypes.
For example, the obstructive airways disease prototype is linked to the
asthma prototype with a SUBTYPElink, because asthma is a subtype of
obstructive airways disease (see Figure 23-1).
This chapter discusses the problems of a purely rule-based system and
the advantages afforded by using a combination of rules and frames in
the prototype-directed system. A complementary piece of research (Aikins,
1979), not discussed here, deals with the problems of a frame-based system.
Previous research eftbrts have discussed systems using frames [see, for
example, Minsky (1975) and Pauker and Szolovits (1977)] and systems
ing a pure rule-based approach to representation (Chapter 2). Still other
systems have used alternate knowledge representations to perform large
knowledge-based problem-solving tasks. For example, INTERNIST(Po-
ple, 1977) represents its knowledgeusing a framelike association of diseases
with manifestations. Each manifestation, in turn, is associated with the list
of diseases in which the manifestation is known to occur. In PROSPECTOR
(Duda et al., 1978a), the framelike data structures have been replaced
a semantic network. Few researchers, however, have used both frames and
production rules or have attempted to draw comparisons between these
knowledge representation methodologies. CENTAUR offers an appropri-
ate mechanism with which to experiment with these representation issues.
This paper presents an example of the CENTAUR system performing
an interpretation of a set of pulmonary function test results and focuses
on CENTAURsknowledge representation and control structure. In ad-
dition, some advantages of the prototype-directed system over the rule-
based approach for this problem are suggested.
426 A RepresentationSchemeUsing Both FramesandRules
23.1TheCENTAUR
System
totype, and the process repeats. The system moves through the prototype
network confirming or disproving disease prototypes. The attempt to
match data and prototypes continues until each datum has been explained
by some confirmed prototype or until the system has concluded that it
cannot account fo.r any more of" the data. A portion of the prototype net-
work for the puhnonary function application is given in Figure 23-1. De-
tails of the knowledge representation and control structure for the CEN-
TAURsystem are given in Section 23.2 and Section 23.3.
Figure 23-2 is an example of an interpretation of a set of pulmonary
function test results for one patient. Commentsare in italics. Manyaddi-
tional lines of trace are printed to show what CENTAUR is doing between
questions.
"CENTAUR
14-Jan-7913:54:07
CURRENT
PROTOTYPE:PUFF
Thecurrenthypothesis
is that aninterpretationof the pulmonary
functiontests is desired.
[Controlslot of PUFF
prototypebeingexecuted
...]
........ PATIENT-7446
........
(Theinitial datagivenby theuser.)
1) Patientsidentifying number:
** 9007
2) referral diagnosis:
** ASTHMA
[Trigger for ASTHMA and CM900]
PrototypeASTHMA is triggered by the value ASTHMAfor the referral diagnosis. Thecertainty measure (CM)
indicateson a numericalscale the degreeof certainty with whichthe prototypeis indicatedby the data.)
3) RV/RV-predicted:
** 261
4) TLC(bodybox)observed/predicted:
** 139
5) FVC/FVC-predicted:
** 81
[Trigger for NormalandCM500]
(Thequestioningcontinuesandother prototypesare triggered by the data values.)
6) FEV1/FVC ratio:
** 40
7) the DLCO/DLCO-predicted:
** 117
[Trigger for NORMAL and CM700]
8) Changein FEV1 post-dilation- pre-dilation:
** 31
9) MMF/MMF-predicted:
** 12
[Trigger for OAD and CM900]
10) Theslope (F50-obs-F25-obs)/FVC-obs:
** 9
[Trigger for OAD
and CM900]
TRIGGEREDPROTOTYPES
PROTOTYPE:
ASTHMA,CM: 900, REASON:RDXwas ASTHMA
PROTOTYPE:
NORMAL, CM: 500, REASON: FVCwas 81
PROTOTYPE:
NORMAL, CM: 700, REASON:DLCOwas 117
PROTOTYPE:
OAD,CM: 900, REASON: MMFwas 12
PROTOTYPE:
OAD,CM: 900, REASON: F5025 was 9
(A list of the prototypesthat havebeentriggered is given. TheCMandthe valuethat caused
the triggerare alsolisted.)
.....................................
ConfirmedList: PUFF
It is confirmed
that aninterpretationof the pulmonary
functiontests is desired.
CURRENTPROTOTYPE:OAD
Thecurrenthypothesis
is that thereis aninterpretationof ObstructiveAirwaysDisease.
Components
of OADchosento trace: F25D-RV/TLC
ConfirmedList: OADPUFF
It is confirmed
that there is aninterpretationof Obstructive
AirwaysDisease.
(TheOAD prototypeis confirmed.Controlinformationassociatedwith the prototypespecifies that
the degreeof OADshouldbe determinednext, followed by the subtypeof OAD.)
CURRENT
PROTOTYPE:SEVERE-OAD
Components
of SEVERE-OAD
chosento trace: FEV1
13) FEV1
** 42
(Theconsultation continueswith newcomponents
beingaskedandclassified.)
ConfirmedList: SEVERE-OAD
OADPUFF
It is confirmed
that thereis aninterpretationof Severe
ObstructiveAirwaysDisease.
MoreSpocificPrototypeschosen:ASTHMA
HypothesisList: (ASTHMA
900)
CURRENTPROTOTYPE:ASTHMA
Thecurrenthypothesis
is that thereis aninterpretationof Asthma.
Components
of ASTHMA
chosento trace: DEG-REV
430 A Representation Scheme Using Both Frames and Rules
14) Thechange
in resistancepre-dilation- post-dilation:
** 20
It is confirmed
that thereis aninterpretationof Asthma.
OADAction Clause
Displaythe findings aboutthe diagnosisof obstructiveairways
disease
Conclusions: the findings aboutthe diagnosisof obstructive
airwaysdiseaseare as follows:
Elevated lung volumes indicate overinflation.
TheRv/rLCratio is increased,suggestinga SEVERE degreeof air trapping.
ForcedVital Capacityis normalbut the FEV1/FVC ratio is
reduced,suggestingairwayobstruction of a SEVERE degree.
Lowmid-expiratoryflow is consistentwith severeairway
obstruction.
Obstructionis indicatedby curvatureof the flow-volume loop
whichis of SEVERE degree.
Reversibility of airwayobstructionis confirmed by improvementin airwayresistancefollowing
bronchodilation.
PUFFAction Clause
Displaythe conclusionstatementsaboutthis interpretation
Conclusions:the conclusionstatementsaboutthis interpretation
are as follows:
Smoking probablyexacerbatesthe severity of the patients
airwayobstruction.
Discontinuationof smoking shouldhelp relieve the symptoms.
Goodresponseto bronchodilatorsis consistentwith an
asthmaticcondition,andtheir continueduseis indicated.
PUFFAction Clause
Displaythe summary
statementsaboutthis interpretation
Conclusions:the summarystatements
aboutthis interpretation are as
follows:
SEVERE
Obstructive Airways Disease, SubtypeASTHMA
Doyouwishadviceon anotherpatient?
** NO
PROTOTYPE
SLOT
SLOT domain-independent
information
SLOT
COMPONENT
SLOT
SLOT
domain-specific
information
COMPONENT
SLOT
SLOT
lAs in MYCIN, tile rule is stored internally in the Interlisp form shown;the English trans-
lation is generated fromthat.
432 A Representation Scheme Using Both Frames and Rules
RULE013
RULE013
[This rule appliesto anypatient, andis tried in orderto find out aboutthedegreeof obstructiveairways
diseaseasindicatedby the MMF or the findings aboutthe diagnosisof obstructiveairways
disease.]
OADprototype, there are components for many of" the pulmonary func-
tion tests that are useful in characterizing a patient with OAD;two of these
are shown in the figure. For example, the total lung capacity of" a patient
with OADis typically higher than that of a person with normal pulmonary
function. Thus there is a component, TOTALLUNGCAPACITY,with a
range of plausible values that are characteristic of a person with OAD.
In addition to a set of plausible values, that is, values consistent with
the hypothesis represented by the prototype, the components may have
additional information associated with them. (The ways in which this in-
formation is used are discussed in Section 23.3.) There may be one or
more possible error values, that is, values that are inconsistent with the pro-
totype or that might have been specified by the expert to check what he
or she considers to be a measurement error. Generally, both a reason for
the error and a possible fix for the error are specified. For example, the
expert may specify that one of the pulmonary function tests be repeated
to ensure accuracy. A component may also have a default value. Thus all of
the components in a disease prototype, with their default values, form a
picture of the typical patient with the disease. Finally, each componenthas
an importance measure (from 0 to 5) that indicates the relative importance
of a particular component in characterizing the disease.
In addition to the domain-specific components, each prototype con-
Knowledge Representation in CENTAUR 433
--Bookkeeping
Information Author:Aikins
Date: 27-OCT-78
Source:Dr. Fallat
--Pointersto other Pointers: (degree MILD-OAD)
prototypes (degree MODERATE-OAD) ...
(link prototype) (subtype ASTHMA) ...
--English phrases Hypothesis:"Thereis an
interpretationof OAD."
COMPONENTS TOTALLUNGCAPACITY
PlausibleValues Plausible Values: >100
Default Value Importance:4
PossibleError Values
Rules REVERSIBILITY
Importance of value Rules: 19,21,22,25
to this prototype Importance:0 (valuenot
considered)
tains slots for general information associated with it. This includes book-
keeping information (name of the prototype, its author, date on which the
prototype was created, and source for the information contained there)
and English phrases used in communicating with the user. There are also
pointers to other prototypes in the prototype network, which are useful,
for example, when either more general disease categories or more specific
subtypes of disease are indicated. Somecontrol information is represented
explicitly in slots associated with the prototype (Section 23.3). This infor-
mation includes what to do in order to confirm the prototype and what to
do when the prototype has been confirmed or disproved. Each prototype
also has associated with it a certainty measure (from -1000 to 1000) that
indicates how certain the system is that the prototype matches the data in
each case.
23.2.2 Rules
The CENTAUR knowledge base also includes rules, which are grouped
into four sets according to their functions. They refer to values for com-
ponents in their premise clauses and make conclusions about values of
434 A RepresentationSchemeUsing Both Framesand Rules
23.2.3 Facts
The fourth field associated with the fact indicates where it was ob-
tained: from the user (this includes the initial pulmonary function test
results), from the rules, or as a default value associated with a prototype
component. Thus, in the fact about total lung capacity, the fourth field
would have the value USER.
The fifth field of each fact becomes instantiated once fact values are
classified as being plausible values, possible error values, or surprise values
fi)r a given prototype. Surprise values are all of those values that are neither
plausible values nor possible error values. They indicate facts that cannot
be accounted fi)r by the hypothesis represented by the prototype. In the
fact about total lung capacity, the fifth field might contain the classification
(PV OAD)and (SV NORMAL) meaning that the value of 126 for the total
lung capacity of a patient would be a plausible value if the patient had
obstructive airways disease, but would be a surprise value if the patient
were considered to have normal pulmonary function.
The last field associated with a fact indicates which confirmed proto-
types can account for the given value. Whena prototype is confirmed, all
of the facts that correspond to components in the prototype and whose
values are plausible values for the component are said to be "accounted
for" by that prototype. Whenthe OADprototype is confirmed, for a patient
with total lung capacity of 126, for example, the last field of the sample
fact for total lung capacity would be filled in with the prototype name
OAD.
7This statement oversi,nplifies the actual matching criteria used by the system. Sometolerance
tot a mismatch between knownfact values and plausible values in the prototype is allowed.
436 A RepresentationSchemeUsingBoth Framesand Rules
this stage. For example, further lab tests may be suggested or additional
test results maybe required before a final diagnosis is given.
The result of executing the refinement rules is a final set of confirmed
prototypes and a list of all facts with an indication of which prototypes
account for which facts. The system then executes the clauses specified in
the action slot of each confirmed prototype. Typically, these clauses express
a clean-up chore such as executing summary rules associated with the
prototype 8 or printing interpretation statements. The action slot of the
PUFFprototype itself causes the final interpretation and pulmonary di-
agnosis to be printed.
Four of the slots associated with a prototype contain clauses that are exe-
cuted by the system at specific times to control the consultation. Each clause
expresses some action to be taken by the system at different stages: (a)
order to instantiate the prototype (CONTROL slot), (b) upon confirmation
of the prototype (IF-CONFIRMED slot), (c) in the event that a prototype
is disproved (IF-DISPROVED slot), and (d) in a clean-up phase after
system processing has been completed (ACTIONslot).
Whena prototype is first selected as the current prototype, the system
executes the clauses in the CONTROL slot of that prototype. The infor-
mation in this slot indicates how to proceed in order to instantiate the
prototype, usually specifying what data should be acquired and in what
order they should be acquired. Therefore, executing these clauses will
cause values to be obtained for the prototype components. The CONTROL
slot can be thought of as a rule whose implicit premise is "if this prototype
is selected as the current prototype" and whose action is the given set of
clauses. If no CONTROL slot is associated with a prototype, the interpreter
will attempt to fill in values for the prototype components in order ac-
cording to their importance measures.
Whenall of the clauses in the CONTROL slot have been executed and
the prototype has been instantiated, a decision is made as to whether the
prototype should be confirmed as matching the facts of the case. 9 The
system then checks either the IF-CONFIRMED slot or the IF-DISPROVED
slot to determine what should be done next. These slots can be viewed as
rules whose implicit premise is either "if this prototype is confirmed as
matching the data" or "if this prototype is proved not to match the data."
The appropriate actions are then indicated in the set of clauses contained
in the slot.
SRecallthat the premiseof a summary rule typically checksthe values for one or more
parametersand that the action generatesan appropriatesummarizing statement.
aIt wouldbe possibleto associatesucha confirmationcriterion witheachindividualprototype,
but this has not beenfoundto be necessaryfor the pulmonary diagnosisproblem.Instead,
the systemusesa generalalgorithm,applicableto all of the prototypes,that checksthe values
of the components and their importancemeasuresto determineif the prototype shouldbe
markedas confirmed.
Advantages
of the Prototype-Directed
Approach 437
One question addressed by this research is this: in what ways are both
frames and rules superior to either alone? Comparisons can be drawn
between purely rule-based systems, such as PUFF, at one end of the spec-
trum and purely frame-based systems at the other. This section states some
of the advantages of the prototype-directed approach used in CENTAUR
for the pulmonary function interpretation task, as compared to the purely
rule-based approach used in PUFE The next chapter discusses a purely
frame-based approach to the same problem. These advantages can be
grouped into two broad categories: those dealing with knowledge base
representation, and those dealing with reasoning and performance.
This rule expresses some of the control structure of the system, namely,
that when there is an interpretation of OAD,then the degree, subtype,
and findings associated with the OADshould be determined. The rule is
confusing because it implies that finding out the degree, subtype, and find-
ings leads to an interpretation of OAD--whichmight be misinterpreted as
438 A RepresentationSchemeUsing BothFramesandRules
medical expertise. In fact, this rule is executed for every case and causes
all of the other OADrules to be invoked, even when no OADis present.
In CENTAUR, rules that guide computation have been removed from
the rule base, leaving a less confusing, more uniform rule base, where each
rule represents some "chunk" of medical expertise. Computation is now
guided by the prototypes. For example, the CONTROL slot represents
information dealing with how to instantiate the prototype. For the OAD
prototype, this CONTROL slot specifies that deducing the degree, subtype,
and findings of obstructive airways disease are the steps to take in instan-
dating that prototype.
A second category of advantages deals with the way the system reasons
about the problem. This is evident in part by watching the performance
of the system, that is, the questions that are asked and the order in which
information is acquired. Some of the advantages of a prototype-directed
system are the following:
23.5 Summary
CENTAUR was designed in response to problems that occurred while us-
ing a purely rule-based system. The CENTAUR system offers an appro-
priate environment in which to experiment with knowledge representation
issues such as determining what knowledge is most easily represented in
rules and what is most easily represented in frames. In summary, much
research remains to be done on this and associated knowledge represen-
tation issues. This present research is one attempt to make explicit the art
of choosing the knowledge representation in AI by drawing comparisons
between various approaches and by identifying the reasons for selecting
one fundamental approach over another.
The success of MYCIN-like systems has demonstrated that for many di-
agnostic tasks expert behavior can be successfully captured in simple goal-
directed production systems. However, even for this class of problems,
difficulties have arisen with both the representation and control mecha-
nisms. One such system, PUFF(Kunz et al., 1978), has established a cred-
itable record in the domain of pulmonary function diagnosis. The repre-
sentation problems in PUFFare manifest in a number of rules that have
awkward premises and conclusions. The control problems are somewhat
more severe. Physicians have criticized PUFFon the grounds that it asks
questions that do not follow a logical line of reasoning and that it does not
notice data that are atypical or erroneous for the determined diagnosis.
In the CENTAUR sygtem, described in Chapter 23, an attempt was
made to correct representational deficiencies by using prototypes (frames)
to characterize some of the systems knowledge. A more complex control
scheme was also introduced. It made use of triggering rules for suggesting
and ordering system goals, and included an additional attention-focusing
mechanismby using frames as an index into the set of relevant rules.
In an attempt to carry the work of Aikins one step further, we have
constructed an experimental system for pulmonary function diagnosis,
called WHEEZE.Our objectives were to provide a uniform declarative
representation for the domain knowledge and to permit additional control
flexibility beyond that offered by PUFFor CENTAUR. To achieve the first
of these objectives, all of PUFFsrules have been translated into a frame
representation (discussed in Section 24.1). The second objective, control
flexibility, is achieved by using an agenda-based control scheme (discussed
This chapter is an expanded version of a paper originally appearing in Proceedings of the First
National Conference on Artificial Intelligence, Stanford, Calif., August 1980, pp. 154-156. Used
with permission of the American Association for Artificial Intelligence.
441
442 AnotherLookat Frames
in Section 24.2). Newgoals for the agenda are suggested by the success or
failure of other goals on the agenda. In the final section, results and the
possibilities of generalization are discussed.
24.1Representation
24.1.1 The Language
24.1.2 Vocabulary
In our knowledge base, there are three different kinds of frames that
contain domain-specific diagnostic knowledge and knowledge about the
case: assertion frames, patient frames, and patient datum frames.
Assertion Frames
Isa Assertion
Description <commentary>
Certainty <a number between - 1000 and 1000 that indicates to what
degree the assertion is believed, if its manifestations are
believed>
DegreeOfBelief <a number between - I000 and 1000 that indicates to what
degree the assertion is believed>
of the Manifestation slot; i.e., it contains a list of the assertions that have
that assertion as a manifestation.
The Certainty slot, in WHEEZE,is an indicator of how likely an as-
sertion is, given that its manifestations are believed. If the manifestations
are strong indicators of the assertion, the Certainty slot will have a high
value. The Certainty slot is a property of the knowledge rather than a
statement about a particular consultation.
When an assertion is directly related to a patient datum, it is termed
a categorization of that patient datum. This relationship is specified by the
CategorizationOf and CategoryCriterion slots of the assertion.
CategorizationOf indicates which patient datum the assertion depends on,
while CategoryCriterion specifies the range in which the value must be for
the assertion to be verified. For example, the assertion "the patients TLC
is greater than 110" (TLC stands for total lung capacity) would be a cate-
gorization of the TLC value with the category criterion being value> 110.
444 AnotherLookat Frames
1. Sum the products of the DegreeOfBelief slots and the importance fac-
tors for each manifestation, then use a thresholding mechanism.
2. Sum the products of the DegreeOfBelief slots and the importance fac-
tors for each manifestation, then multiply this by the certainty factor.
3. Threshold the minimumof the DegreeOfBelief/importance ratios for
the manifestations.
There are two assertion slots that indicate related assertions worth
pursuing when an assertion is confirmed or denied. The SuggestiveOf slot
contains a list of assertions to investigate if the current assertion is con-
firmed. Conversely, the ComplementaryToslot is a list of assertions that
should be pursued if the current assertion is denied. These slots function
like the "triggering" rules in CENTAUR since they suggest goals to inves-
tigate.
The Findings slot of an assertion contains text that should be printed
out if the assertion is confirmed. In PUFF, this text was contained in the
conclusion portions of rules.
Representation 445
Isa Patient
TLC <the value of the total lung capacity for the patient>
Patient Frames
Information about the patient is kept in a frame named after that patient.
In general, it contains slots for all of the patient data and for the state of
the consultation. As shown in Figure 24-2, the majority of the slots in the
patient frame contain the values of test data, derived data, or more general
facts about the patient. Most of these values are entered directly by the
physician; however, there are data that are derived or calculated from other
values. The slots in the patient frame do not contain any information about
obtaining the value for that slot. Instead, that information is kept in the
corresponding patient datum frame (discussed below). The Confirmed-
Assertions and DeniedAssertions slots keep track of the assertions that have
already been tested. The Agenda slot contains a pointer to the agenda
frame fbr the patient. It is important to note that the patient frame does
not contain any heuristic knowledge about the system. Its only purpose is
to hold current information about the patient.
In addition to patient and assertion frames, there are frames in the knowl-
edge base for each type of patient datum (as shown in Figure 24-3). These
frames indicate how a datum is obtained (whether it is requested from the
physician or derived from other data), what a typical value for the datum
446 AnotherLookat Frames
Isa PatientDatum
FIGURE
24-3 Organization of a patient datumframe.
might be, and what categories the value may be placed in. Whenthe value
of a patient datum is requested and not yet known, the frame for that
patient datum is consulted and the information about how to obtain that
datum is applied. This information takes the form of a procedure in the
ToGetValue slot of the frame.
For a given patient datum, there may be manylow-level assertions that
are categorizations of the datum. These are specified by the Categorization
slot. For example, the Categorization slot of TLC(total lung capacity) might
contain the assertions TLC=80toI00, TLC=100tol20, TLC<80, and
TLC>120, indicating that there are four major categories of the values.
Thus the patient datum contains heuristic knowledge about how the datum
is derived and howit relates to assertions in the network.
24.1.3 Translation
PUFF
Rule42
If: 1) There
arepostbronchodilation
test results,and
2) Thedegree
of reversibilityof airway
obstruction
of thepatientis lessthanor equal
to slight,and
3) Asthma
is oneof thereferraldiagnoses
of the patient
Then: It is definite(1000)
thatthefollowing
is oneof theconclusion
statements
aboutthis interpretation:
The
poorresponse to bronchodilators
is anindication
of anasthmatic
condition
in a refractorystate.
REFRACTORY-ASTHMA
Isa PhysiologicalState
Manifestation (OADBronchodilationTestResults
RDX-Asthma
(*OneOf
OADReversibility-None
OADReversibility-Slight))
Certainty 1000
DegreeOfBelief
24.2Control Structure
Complementa
ry To
8 ~ FEV1/FVC2 80 /
ALS ~RLD ~) RV<80
~ "- RDX-AL$
ing or ordering the initial set of goals. Consequently, the system may ex-
plore many "red herrings" and ask irrelevant questions before encounter-
ing a good hypothesis. In addition, a startling piece of evidence (strongly
suggesting a different hypothesis) cannot cause suspension of the current
investigation and pursuit of the alternative.
For the assertion network in Figure 24-5, a depth-first, goal-directed
system like PUFFwould start with the goals Asthma, Bronchitis, and ALS
(amyotrophic lateral sclerosis) and work backwards in a goal-directed fash-
ion toward OAD(obstructive airways disease) and RLD(restrictive lung
disease) and then toward FEV1/FVC<80,MMF->14,etc. In contrast, the
CENTAUR system would make use of triggering rules to allow primitive
data (e.g., RDX-ALSand FEV1/FVC<80) to suggest whether ALS and
OADwere worth investigating and the order in which to investigate them.
It wouldthen proceed in a goal-directed fashion to try to verify those goals.
Expert diagnosticians use more than simple goal-directed reasoning.
They seem to work by alternately constructing and verifying hypotheses,
corresponding to a mix of data- and goal-directed search. They expect
expert systems to reason in an analogous manner. It is therefore necessary
that the system designer have some control over the reasoning behavior of
Control Structure 449
order in which questions are asked and results are printed out. (In the
example, FEV1/FVC was asked for before RV.)
2. Surprise values (data contrary to the hypothesis currently being inves-
tigated) may suggest goals to the agenda that are high enough to cause
suspension of the current investigation. (The surprise FEV1/FVC value
caused suspension of the RLDinvestigation in favor of the OADinves-
tigation. If the Suggestivity of the link from FEV1/FVC<80to OAD
were not as high, this would not have occurred.)
3. Low-level data assertions cause the suggestion of high-level goals, thus
selecting and ordering goals to avoid irrelevant questions. (In the ex-
ample, RLD and ALS were suggested and ordered by the low-level
assertion RDX-ALS.)
24.3Conclusions
It is no surprise that WHEEZE exhibits the same diagnostic behavior as its
predecessors, PUFFand CENTAUR, on a standard set of ten patient test
cases. The three systems are also roughly comparable in efficiency.
WHEEZEand CENTAURare somewhat slower than PUFF, but this may
be misleading, since little effort has been expended on optimizing either
of these systems.
The frame representation described in Section 24.1 has proved en-
tirely adequate for capturing the domain knowledge of both PUFF and
CENTAUR. In some cases, several rules were collapsed into a single as-
sertion frame. In other cases, intermediate assertions, corresponding to
commongroups of clauses in rule premises, were added to the knowledge
base. This had the effect of simplifying other assertion frames. The com-
bination of representation and control structure also eliminated the need
for many awkward interdependent rules and eliminated the need for
screening clauses in others.
There are several less tangible effects of using a frame representation.
Our purely subjective view is that a uniform, declarative representation is
often more perspicuous. As an example, all of the interconnections be-
tween assertions about disease states are made explicit by the Manifestation
and ManifestationOfslots. As a result, it is easier to find all other assertions
related to a given assertion. This in turn makes it somewhat easier to
understand and predict the control flow of the system.
Since the agenda-based control mechanism includes backward-chain-
ing and goal-triggering capabilities, it has also proved adequate for cap-
turing the control flow of PUFFand CENTAUR. In addition, the flexibility
of agenda-based control was used to advantage. Suggestiveness and im-
portance factors were used to change the order in which questions were
Conclusions 451
asked and conclusions printed out. They were also used to eliminate the
need to order carefully sets of antecedent assertions.
There is evidence that mixed goal-directed and data-directed control
models human diagnostic behavior much more closely than either pure
goal-directed or data-directed search (Elstein et al., 1978). The diagnostic
process is one of looking at available symptoms, allowing them to suggest
higher-level hypotheses, and then setting out to prove or disprove those
hypotheses, all the while recognizing hypotheses that might be suggested
by symptoms appearing in the verification process. Pauker and Szolovits
(1977) have noted that a physician will go to great lengths to explain data
inconsistent with a partially verified hypothesis before abandoning it. This
type of behavior is not altogether inconsistent with the strategy we have
employed, albeit for a different reason. The combination of a partially
verified hypothesis and data inconsistent with it may be enough to boost
an assertion that would explain the inconsistent data "above" an alternative
hypothesis on the agenda. Oddly enough, some of this behavior seems to
be a natural consequence of the control structure we have employed.
24.3.1 Generalizing
In the discussion above, claims were made about the perspicuity of the
frame representation and about the flexibility of the agenda-based control
mechanism. Of course, the acid test would be to see how well domain
452 AnotherLookat Frames
experts could adapt to the representation and to see whether or not they
would becomefacile at tailoring control flow.
A second question that we pondered is this: how would WHEEZE be
different if we had started with a basic frame system and the agenda-based
control mechanism and worked with an expert to help build up the system
from scratch? It is entirely possible that the backward-chaining production
system paradigm had a significant effect on the vocabulary and knowledge
that make up both PUFF and CENTAUR.In other words, the medium
may have influenced the "message."
To a large extent, we have only paraphrased PUFFsrules in a different
representational medium. This paraphrase may not be the most natural
way to do diagnosis in the new architecture. Unfortunately, we do not have
sufficient expertise in pulmonary function diagnosis to consider radical
reformulations of the domain knowledge. For this reason, it would be in-
teresting to see a new diagnostic system developed using the basic archi-
tecture we have proposed.
PART EIGHT
Tutoring
25
Intelligent Computer-Aided
Instruction
The idea of directly teaching students "how to think" goes back at least to
Polya (1957), if not to Socrates, but it reached a new stage of development
in Paperts laboratory (Papert, 1970). In the LOGOlab, young students
were taught AI concepts such as hierarchical decomposition, opening up
a new dimension by which they could take apart a problem and reason
about its solution. In part, Polyas heuristics have seemed vague and too
general, too hard to follow in real problems (Newell, 1983). But progress
in AI programming, particularly expert system design, has suggested a
vocabulary of structural concepts that we now see must be conveyed along
with the heuristics to make them intelligible (see Chapter 29).
Developing in parallel with Paperts educational experiments and cap-
italizing even more directly on AI technology, programs called intelligent
tutoring systems (ITS) were constructed in the 1970s. In contrast with the
computer-aided instruction (CAI) programs of the 1960s, these programs
used new AI formalisms to separate out the subject matter they teach from
the programs that control interactions with students. This is called intel-
ligent computer-aided instruction (ICAI). This approach has several ad-
vantages: it becomes possible to keep records of what the student knows;
the logic of teaching can be generalized and applied to multiple problems
in multiple problem domains; and a model of student knowledge can be
inferred from student behavior and used as a basis for tutoring. The well-
known milestones in ITS research include:
interacting with the student in a mixed-initiative dialogue 1 (Carbonell,
1970b) and tutoring by the Socratic method (Collins, 1976)
Parts of this chapterare takenfromthe final report to the Office of NavalResearchfor the
first period of GUIDON research (1979-1982).That report appearedas a technical memo
(HPP-82-2)written by WilliamJ. Clanceyand BruceG. Buchananfromthe Heuristic Pro-
gramming Project, Departmentof ComputerScience, StanfordUniversity.
1In a mixed-initiativedialoguebetweena student and a program,either party can initiate
questionsandexpectreasonableresponsesfromthe other party. Thiscontrasts sharply with
drill and practice programsor MYCINs dialogue, in whichusers cannot volunteerinfor-
mationor direct the programsreasoning.
455
456 Intelligent Computer-Aided
Instruction
~In GUIDON teaching knowledgeis treated as a formof expertise. That is, GUIDON
has a
knowledge base of teachingrules that is distinct fromMYCINs
knowledgebaseof infectious
diseaserules.
458 Intelligent Computer-Aided
Instruction
It was within this intellectual context that Clancey began asking about
the adequacy of MYCINsknowledge base for education. Weinitially be-
lieved that the rules and tables MYCINused for diagnosing causes of
infections would be a sufficient instructional base for an ICAI program.
Wefelt that the only missing intelligence was pedagogical knowledge: how
to carry on a mixed-initiative dialogue, how to select and present infor-
mation, how to build and use a model of the student, and so on. Clancey
began work on a tutorial program, called GUIDON,within two years after
the material quoted above was written. The initial model of interaction
between MYCINand GUIDONis shown schematically in Figure 25-1.
GUIDON was first conceived as an extension of the explanation system
of the MYCINconsultation program. This previous research provided the
building blocks for a teaching program:
GUIDON
MYCIN
medical
/
inferenceengine
knowledge
= diagnosticknowledge
tutorial program
= pedagogical
knowledge
FIGURE
25-1 Model of interaction between MYCIN
and GUI-
DON.
ical kinds of" questions that can be asked about MYCINsreasoning ("Why
didnt you ask X?" or "How did you use X to conclude about Y?")
advice about what to do next--revealed that the "glue" that was missing
had something to do with the system of rules as a whole. With over 400
rules to learn, there had to be some kind of underlying logic that made
them fit together; the idea of teaching a set of weakly structured rules was
now seriously in question. Significantly, this issue had not arisen in the
years of developing MYCIN but was now apparently critical for teaching,
and probably had important implications for MYCINsexplanation and
knowledgeacquisition capabilities as well.
It soon became clear that GUIDONneeded to know more than MY-
CIN knows about diagnosis. MYCINsroute from goal to specific questions
is not the only acceptable line of reasoning or strategy for gathering evi-
dence. The order in which MYCIN asks for test results, for example, is
often arbitrary. Thus a student is not necessarily wrongif he or she deviates
from that order. Moreover, MYCINsexplicit knowledge about medicine is
often less complete than what a tutor needs to convey to a student. It is
associational knowledge and does not represent causal relationships ex-
plicitly. The causal models have been "compiled into" the associations.
Thus MYCIN cannot justify an inference from A to B in terms of a causal
chain, A ~ Al --, A2 --* B. A student, therefore, is left with an incomplete,
and easily forgotten, model of the disease process. These two major short-
comings are discussed at length in Chapters 26 and 29.
deem to be the essential knowledge that separates the expert from the
novice and teaching it to the novice in practice sessions in which its value
for getting a handle on difficult, confusing problems will be readily ap-
parent. Empirical studies are a key part of this research.
Weview our work as the logical "next step" in knowledge-based tu-
toring. Just as representing expert knowledge in a simulation program
provides a vehicle for testing hypotheses about how people reason, using
this knowledge in a tutoring system will enable us to see how the knowledge
might be explained and recognized in student behavior. The experience
with the first version of GUIDON,as detailed further in Chapter 26, il-
lustrates how the tutoring framework provides a "forcing function" that
requires us to clarify what we want to teach and how we want to teach it.
During 1979-1980 a study was undertaken to determine how an
expert remembered MYCINsrules (the "model of process" glue) and how
he or she remembered to use them. This study utilized several commonAI
methods for knowledge acquisition but built upon them significantly
through the development of an epistemological framework for character-
izing kinds of knowledge, detailed in Chapter 29. The experts explanations
werecharacterized in terms of: strategy, structure, inference rule, and sup-
port. With this kind of framework, discussions with the expert were more
easily focused, and experiments were devised for filling in the gaps in what
we were told.
By the end of 1980, we had formulated and implemented a new, com-
prehensive psychological model of medical diagnosis (Clancey and Letsin-
ger, 1981) based on extensive discussions with Dr. Tim Beckett. NEO-
MYCIN is a consultation program in which MYCINs rules are
reconfigured according to our epistemological framework. That is, the
knowledge representation separates out the inference rules (simple asso-
ciations among data and hypotheses) from the structural and strategic
knowledge: we separate out what a heuristic is from when it is to be applied.
Moreover, the strategies and structure we have chosen model how an ex-
pert reasons. Wehave attempted to capture the experts forward-directed
inferences, "diagnostic task structure," and the types of focusing strategies
he or she uses. This explicit formulation of diagnostic strategy in the form
of meta-rules is exactly the material that our original proposal only men-
tioned as a hopeful aside. Recently, we have been fine-tuning NEOMYCIN,
investigating its applicability to other domains, and exploiting it as the
foundation of a student model.
William J. Clancey
464
Descriptionof the Knowledge
Base 465
i i "OWL
00 I
DATA BASE
/ TEACHING I
INTERPRETER EXPERTISE
problem-
solution
~stance &
trace
instruction
STUDENT
FIGURE
26-1 Modulesfor a multiple-domain tutorial system.
rules applied to achieve specific goals. In general, the topics of this dialogue
are precisely those "goals" that are conchtded by MYCINrules. 1 During
the dialogue, only one goal at a time is considered; data that cannot be
used in rules to achieve this goal are "irrelevant." This is a strong constraint
on the students process of asking questions and making hypotheses. A
goal-directed dialogue helps the tutor to follow the student as he or she
solves the problem, increasing the chance that timely assistance can be
2provided.
Our design of GUIDON has also been influenced by consideration of
the expected sophistication of the students using it. Weassume the students
are well motivated and capable of a serious, mixed-initiative dialogue. Var-
ious features (not all described in this paper) make the program flexible,
so that students can use their judgment to control the depth and detail of
the discussion. These features include the capability to request:
IA typical sequence of (nested) goals is as [i~llows: (a) reach a diagnosis, (b) determine
organisms might be causing the infection, (c) determine the type of infection, (d) determine
if the infection has been partially treated, etc.
~Sleeman uses a similar approach lk)r allowing a student to explore algorithms (Sleeman,
1977).
:~See Carr and Goldstein (1977) fi)r a related discussion.
Development of a Tutorial Program Based on MYCIN-Iike Systems 469
4There is always the possibility that a student maypresent an exotic case to GUIDON that is
beyond its expertise. While MYCIN has been designed to detect simple instances of this (i.e.,
evidence of an infection other than bacteremia or meningitis), we decided to restrict GUIDON
tutorials to the physician-approved cases in the library (currently over 100 cases).
51n the WUMIUS program (Carr and Goldstein, 1977), for example, it is possible to rank
each legal move(analogous to seeking case data in MYCIN) and so rate the student according
to "rejected inferior moves" and "missed superior moves." The same analysis is possible in
the WESTprogram (Burton, 1979).
*iSee, tor example, Sprosty (1963).
7MYCINs rules are nut based on Bayesian probabilities, so it is not possible to use optimization
techniques like those developed by Hartley et al. (1972). Arguments against using Bayes
Theorem in expert systems can be found in Chapter 11.
470 Use of MYCINs
Rulesfor Tutoring
to the given case. 8 Manyof the 450 rules are not tried because they con-
clude about goals that do not need to be pursued to solve the case.
Hundreds of others fail to apply because one or more preconditions are
not satisfied. Finally, 20%of the rules typically make conclusions that con-
tribute varying degrees of belief about the goals pursued.
Thus MYCINsinterpreter provides the tutorial program with much
information about the case solution (see Figure 26-1). It is not clear how
to present this to a student. What should the tutor do when the student
pursues a goal that MYCINdid not pursue? (Interrupt? Wait until the
student realizes that the goal contributes no useful information?) Which
dead-end search paths pursued by MYCINshould the tutor expect the
student to consider? For many goals there are too many rules to discuss
with the student; how is the tutor to decide which to present and which to
omit? What techniques can be used to produce coherent plans for guiding
the discussion through lines of reasoning used by the program? One so-
lution is to have a frameworkthat allows guiding the dialogue in different
ways. The rest of this paper shows how GUIDONhas been given this
flexibility by viewing it as a discourse program.
8Befbrea tutorial session, GUIDON scans each rule used by MYCIN and compilesa list of
all subgoalsthat neededto be achievedbefore the premiseof the rule couldbe evaluated.
In the case of a rule that failed to apply,GUIDON determinesall preconditions of the premise
that are false. Bydoingthis, GUIDONs knowledge of the case is independentof the order
in whichquestions wereaskedand rules wereapplied by MYCIN, so topics can be easily
changedand the depth of discussion controlled flexibly by both GUIDON and the student.
Thisprocessof automaticallygeneratinga solutiontrace for anycase canbe contrastedwith
SOPHIEs single, fixed, simulatedcircuit (Brown et al., 1976).
A Framework
for a CaseMethodTutorial Program 471
[It is] ... useful to havea modelof howsocial interactions typically fit
together, and thus a model of discourse structure. Such a model can be
viewedas a heuristic which suggests likely action sequences .... There are
places in a discourse where questions makesense, others where explanations
are expected. [These paradigms] ... facilitate generation and subsequent
understanding.
-- T- rule Packet
-- Discourse Procedure
-- Primitive Function
I. META-LEVEL
ABSTRACTIONS: rule models
rule schemata
Performance Tier
The performance knowledge consists of all the rules and tables used by
MYCIN to make goal-directed conclusions about the initial case data. The
output of the consultation is passed to the tutor: an extensive AND/OR
tree of traces showing which rules were applied, their conclusions, and the
case data required to apply them. GUIDON fills in this tree by determining
which subgoals appear in the rules. In Figure 26-4 COVERFOR signifies
the goal to determine which organisms should be "covered" by a therapy
recommendation; d-rule 578, shown in Figure 26-5, concludes about this
goal; BURNED is a subgoal of this rule.
Tutorial rules make frequent reference to this data structure in order
to guide the dialogue. For example, the response to the request for help
shown in Figure 26-6 (line 17) is based first of all on the rules that were
used by MYCIN for the current goal. Similarly, the t-rules for supplying
the case data requested by the student check to see if MYCIN asked for
the same information, e.g., the WBC(white blood count) in the sample
A Frameworkfor a Case MethodTutorial Program 475
COVERFOR
D-RLILE 578
BURNED
TYPE
{rules}
/
WBC
CSF-FINDINGS
Support Tier
The support tier of the knowledge base consists of annotations to the rules
and the factors used by them. 14 For example, there are "canned-text" de-
scriptions of every laboratory test in the MYCINdomain, including, for
instance, remarks about how the test should be performed. Mechanism
descriptions provided by the domain expert are used to provide some
explanation of a rule beyond the canned text of the justification. For the
infectious disease domain of MYCIN,they indicate how a given factor leads
~sOtherpossibilities include: the question is not relevant to the current goal; the case data
can be deducedby definition from other knowndata; or a d-rule indicates that the requested
data are not relevant to this case.
n4Rulejustifications, author, and edit date were first proposedby Davis (1976) as knowledge
base maintenance records.
476 Use of MYCINsRules for Tutoring
AbstractionLevel
RULE-SCHEMA:
MENINGITIS.COVERFOR.CLINICAL
RULE-MODEL: COVERFOR-IS-MODEL
KEY-FACTOR: BURNED
DUAL: D-RULE577
Performance
Level
D-RULE578
IF: 1) Theinfection whichrequirestherapyis meningitis,and
2) Organismswerenot seenonthe stain of the culture, and
3) Thetypeof theinfectionis bacterial, and
4) Thepatient hasbeenseriously burned
THEN: Thereis suggestiveevidence(.5) that pseudomonas-aeruginosa
is oneof the organisms
(other than
thoseseenoncultures or smears)whichmightbecausingthe infection
UPDATES: COVERFOR
USES: (TREATINF ORGSEEN
TYPE BURNED)
SupportLevel
MECHANISM-FRAME: BODY-INFRACTION.WOUNDS
JUSTIFICATION:
"Fora very brief periodof time after a severeburnthe surfaceof the wound
is sterile. Shortly
thereafter, the areabecomes colonizedby a mixedflora in whichgram-positiveorganisms predominate. By
the 3rd post-burndaythis bacterial populationbecomes dominatedby gram-negative
organisms.Bythe
5th daythese organisms haveinvadedtissue well beneaththe surfaceof the burn. Theorganismsmost
commonly isolated fromburnpatientsare Pseudomonas, Klebsiella-Enterobacter,
Staph.,etc. Infection
with Pseudomonasis frequentlyfatal."
LITERATURE:
MacMillanBG:Ecologyof Bacteria Colonizingthe BurnedPatient GivenTopicaland System
Gentamicin Therapy:a five-year study, J Infect Dis 124:278-286,1971.
AUTHOR:
Dr. Victor Yu
LAST-CHANGE: Sept. 8, 1976
Abstraction Tier
The abstraction tier of the knowledge base represents patterns in the per-
formance knowledge. For example, a rule schema is a description of a kind
of rule: a pattern of preconditions that appears in the premise, the goal
concluded, and the context of its application. The schema and a canned-
A Framework
for a Case Method
Tutorial Program 477
examine the rule (if it was tried in the consultation) and determine what
subgoals needed to be achieved before it could be applied; if the rule
failed to apply, determine all possible ways this could be determined
(perhaps more than one precondition is false)
examine the state of" application of the rule during a tutorial interaction
(what more needs to be done before it can be applied?) and choose
appropriate method of presentation
generate different questions fbr the student
use the rule (and variations of it) to understand a students hypothesis
summarize arguments using the rule by extracting the key point it ad-
dresses
The d-rules that were fired during the consultation associated with the
~7
given case are run in a forward direction as the student is given case data.
In this way, GUIDONknows at every moment what the expert program
would conclude based on the evidence available to the student. Wemake
use of knowledge about the history and competence of the student to form
hypotheses about which of the experts conclusions are probably knownto
the student. This has been termed an overlay model of the student by Gold-
stein, because the students knowledge is modeled in terms of a subset and
simple variations of the expert rule base (Goldstein, 1977). Our work was
originally motivated by the structural model used in the WESTsystem
(Burton and Brown, 1982).
Special t-rules for updating the overlay model are invoked whenever
the expert program successfully applies a d-rule. These t-rules must decide
whether the student has reached the same conclusion. This decision is
based on:
the inherent complexity of the d-rule (e.g., some rules are trivial defi-
nitions, others have involved iterations),
whether the tutor believes that the student knows how to achieve the
subgoals that appear in the d-rule (factors that require the application
of rules),
background of the student (e.g, year of medical school, intern, etc.), and
evidence gathered in previous interactions with the student.
The purpose of the focus record is to maintain continuity during the dia-
logue. It consists of a set of global variables that are set when the student
asks about particular goals and values for goals. T-rules reference these
variables when selecting d-rules to mention or when motivating a change
in the goal being discussed. An example is provided in Section 26.4.1.
ls(;,oldsteins "syllabus" and BIPs "Curriculum Information Network" are fixed networks that
relate skills in terms of their complexities and dependencies. The lesson plan discussed here
is a program-generated plan for guiding discussion of a particular problem with a particular
student. Webelieve that a skill network relating MYCINs rules will be useful for constructing
dialogue plans.
480 Use of MYCINs
Rulesfor Tutoring
D-rule 578 (Figure 26-5) was chosen because it became the focus of the
discussion when the student asked about the relevance of the "burned"
factor. That is, when the student asked the question in line 8, a variable
was set to indicate that the most recent factor referred to for this goal was
"burned" (the focus topic). Then when the packet of t-rules for choosing
2
a d-rule to present was invoked, the following t-rule succeeded:
IJStudentinput to the GUIDON programis in the tormof menuoptions and simple English
phrasesthat are parsedusing keywordanalysis and pattern-matchingroutines developedfor
MYCINs question-answeringmodule(see Chapter18).
2T-rulenumbersare of the [orm<procedurenumberthat invokesthe rule>.<indexof the
rule>. Thust-rule 26.03is the third rule in discourseprocedurenumber26.
T-Rules for Guiding Discussion of a Goal 481
** HELP
{TheHELPoptionis a requestfor assistance:the studentasks
"wheredo I go from here?"}
20
Try to determine the typeof theinfection: bacterial,
fungal,viral, or Tb.
** WHATIS THEPATIENTSWBC?
3O
Thewhite countfromthe patients peripheralCBC
is 1.9 thousand.
T-RULE26.03
Returning to our example, after selecting d-rule 578, the tutor needed to
select a methodfor presenting it. The following t-rule was successfully
applied:
21Forexample,if the goal is the "organismcausing the infection" and the certainty associated
with the value "pseudomonas"
is 0.3, then this value is significant.
T-Rules for Responding to a Students Hypothesis 483
T-RULE2.04
IF: 1) Thenumber of factors appearingin the d-rule whichneedto beaskedby the studentis
zero, and
2) Thenumber of subgoalsremainingto be determined before the d-rule canbe appliedis
equalto 1
THEN: Substepi. Say: subgoal-suggestion
Substepii. Discuss the goal with the studentin a goal-directedmode [ProcO01]
Substepiii. Wrapup the discussion of the rule beingconsidered [Proc017]
The premise of this t-rule indicates that all preconditions of the d-rule can
be evaluated, save one, and this d-rule precondition requires that other d-
rules be considered. The action part of this t-rule is a sequence of actions
to be followed, i.e., a discourse pattern. In particular, substep (i) resulted
in the program printing "try to determine the type of the infection ... "
(line 22). 22 The discourse procedure invoked by substep (ii) will govern
discussion of the type of the infection (in simple terms, a new context is
set up for interpreting student questions and use of options). After the
type of the infection is discussed (relevant data are collected and
hypotheses drawn), the tutor will direct the dialogue to a discussion of the
conclusion to be drawn from d-rule 578.
Other methods for suggesting a d-rule are possible and are selected
by other t-rules in the packet that contains t-rule 2.04. For example, the
program could simply tell the student the conclusion of the d-rule (if the
d-rule can be evaluated based on data currently available to the student),
or quiz the student about the d-rule, or sequentially discuss each precon-
dition of the d-rule, and so on.
zz"Say <label>" designates something the program will "say" to the student. The label is
usefld for debugging, because every print statement is uniquely labeled.
484 Use of MYCINsRules for Tutoring
dent (lines 17-32). "Entrapment," as used here, involves forcing the stu-
dent to make a choice that will reveal some aspect of his or her
understanding. 2~ In this example, all choices listed (lines 24-32) actually
Figure 26-8 illustrates how the overlay model is updated for the hypothesis
in line 1 of Figure 26-7. T-rules are invoked to determine how strongly
the tutor believes that the student has taken each of the relevant d-rules
into accounc That is, a packet of t-rules (packet number 6 here) is tried
in the context of each d-rule. Those t-rules that succeed will modify the
cumulative belief that the given d-rule was considered by the student. T-
rule 6.05 succeeded when applied to d-rules 545 and 557. The student
mentioned a value (PSEUDOMONAS) that they conclude (clause 1 of
t-rule) but missed others (clause 3). Moreover, the student did not mention
values that can only be concluded by these d-rules (clause 2), so the overall
24
evidence that these d-rules were considered is weak (-0.70).
T-RULE6.05
IF: 1) Thehypothesisdoesincludevaluesthat canbe concluded by this d-rule, as well as others,
and
2) Thehypothesisdoesnot includevaluesthat canonly be concluded by this d-rule, and
3) Valuesconcluded by the d-rule are missingin the hypothesis
THEN: Definethe belief that the d-rule wasconsideredto be -.70
T-RULET.05
IF: This domain
rule containsa factor that appears
in severalrules, noneof whichare believedto
havebeenconsideredto makethe hypothesis
THEN:Modifythe cumulativebelief that this rule wasconsidered by -.30
24The certainty |actor of -0.70 was chosen by the author. Experience with MYCINshows
that the precise value is not important, but the scale from - 1 to 1 should be used consistently.
486 Use of MYCINsRules for Tutoring
GOAL: COVERFOR
., \w: ~ .R?
\ /~"
k () k,m~.~ Pseudomono, ,
! ~
\ ~
oip,ococc., m
." i
1. Variation in the premise of a d-rule: The student is using a d-rule that fails
T-Rules for Responding to a Students Hypothesis 487
Returning to our example, after updating the overlay model, the tutor
needs to deal with discrepancies between the students hypothesis and what
the expert program knows. The following t-rules are from a packet that
determines how to present a d-rule that the student evidently did not
consider. The tutor applies the first tutorial rule that is appropriate. In our
example, t-rule 9.02 generated the question shown in lines 10-14 of Figure
26-7. T-rule 9.03 (a default rule) generated the question shown in lines
17-32.
T-RULEg.01
IF: 1) Thed-rule is not onthe lessonplanfor this case,and
2) Basedon theoverlaymodel,the studentis ignorantaboutthe d-rule
THEN: Affirm the conclusions made by the d-rule by simplystating the keyfactors andvaluesto
be concluded
T-RULE9.02
IF: Thegoal currently beingdiscussedis a true/falseparameter
THEN: Generate a questionaboutthe d-rule using"facts" formatin the premise
part and"actual
value"formatin theactionpart
T-RULE9.03
IF: True
THEN: Generate a questionaboutthe d-rule using"fill-in" formatin the premise
part and"actual
value"formatin theactionpart
T-RULE3.06
IF: 1) Theactionpart of the questionis not "wrongvalue," and
2) Theaction part of the questionis not "multiplechoice,"and
3) Notall of the factorsin the premiseof the d-rule are true/falseparameters
THEN:Include"multiple choice"as a possibleformatfor the premisepart of the question
T-rule 3.06 says that if the program is going to present a conclusion that
differs from that in the d-rule it is quizzing about, it should not state the
premise as a multiple choice. Also, it would be nonsensical to state both
the premise and action in multiple-choice form. (This would be a matching
question--it is treated as another question type.) Clause 3 of this t-rule is
necessary because it is nonsensical to make a multiple-choice question when
the only choices are true and false.
As can be seen here, the choice of a question type is based on purely
logical properties of the rule and interactions among question formats.
About 20 question types (combined premise/conclusion formats) are pos-
sible in the current implementation.
26.6Concluding Remarks
looking ahead to see what knowledge is needed to solve the problem) and
to carry on flexible dialogues (by being able to switch the discussion at any
time to any portion of" the AND/OR solution tree).
Early experience with this program has shown that the tutor must be
selective about its choice of topics if the dialogues are not to be overly
tedious and complicated, That is, it is desirable for tutorial rules to exert
a great deal of control over which discourse options are taken. Webelieve
that it is chiefly in selection of topics and emphasis of discussion that the
"intelligence" of this tutor resides.
PART NINE
493
494 AdditionalKnowledge
Structures
already known to be false (or not "true enough"). In both instances, MY-
CIN is reasoning about its rules before executing them. The important
difference between these mechanisms and the meta-knowledge that
evolved from work by Davis is that the former are buried in the code of
the rule interpreter and thus are not open to examination by other parts
of the system, or by the user. After these initial meta-level reasoning tech-
niques were added to the rule interpreter, however, Davis was careful to
separate any additional meta-level knowledge structures from the editor,
explanation generator, and interpreter, just as we had done with the (ob-
ject-level) medical knowledge. As a result, the new system (MYCINplus
TEIRESIAS) contains considerably more knowledge about its own knowl-
edge structures than did MYCINalone. Many of these ideas have subse-
quently been incorporated into EMYCIN.Chapter 28 provides a summary
of the knowledge structures used by TEIRESIASfor knowledge acquisition
(see Chapter 9) and control of MYCINsinferences. This was a line
development that was not anticipated in DENDRAL, 1 and its systematic
treatment by Davis in his dissertation was an advance for AI.
Bill Clancey was working on GUIDON at about the same time and was
discovering that additional knowledge structures, including meta-level
knowledge, were essential for tutoring. TEIRESIASknowledge about the
form and contents of MYCINsrules was certainly helpful in constructing
GUIDON,but Clancey began focusing more on representing MYCINs
strategies. In the course of his research, he also uncovered the importance
of two additional kinds of knowledge: knowledge about the structure of the
domain (and thus about the structure of the rule set), and support knowl-
edge that justifies individual rules. Chapter 29 is a careful analysis of these
three types of meta-level knowledge that Clancey terms "strategic, struc-
tural and support knowledge." This analysis was written in 1981-1982 (and
published in 1983) and thus is a recent critique of the structure of MYCINs
knowledge base. Wewere not unaware of many of the issues raised here,
but Clancey provides a coherent framework for thinking about them.
ISTRUCTURE
]
composed-of
I SUBSTRUCTURE
I
l applied
-to
1 LOADINGI
composed-of
I LOADCOMPONENT
l
FIGURE
27-1 SACONs
static tree of context-types.
0
498 Additional Knowledge Structures
itself. The instance tree organization makes clear which LOADING in-
stances are associated with which SUBSTRUCTURE instance.
If a rule is applied to some context-instance and uses information
about context-instances lower in the tree, however, an implicit iteration oc-
curs: the rule is applied to each of" the lower instances in turn. If" the lower
context-types have not yet been instantiated, the program digresses to ask
about their creation at this time. Thus contexts are instantiated because
rules need them, 2 just as parameters are traced when rules need them. In
fact, since the goals of the consultation usually consist of finding out some-
thing about the root of the tree, the only way that lower context-types are
instantiated at all is through the application of rules that use information
about lower context-types.
There have been a few rather stereotypic uses of the context tree. Although
experience to date has by no means exhausted the possible uses, the ex-
amples shown here should help readers to understand how an expert and
knowledge engineer might select appropriate context-types and organize
them in a new domain.
The primary use of additional contexts has been to structure the data or
evidence to be collected. Thus, in the MYCIN system, the culture contexts
describe the tests performed to isolate organisms. Additional information
about the patients current and previous therapies, the cultures, and
MYCINsown estimation of the suspected infections are also represented
in the tree. The current context organization for MYCIN is shown in Figure
27-3 and should be contrasted with the sample instance tree of Figure
5-1 ~
(which reflects MYCINscontext-types as they were defined in 1974).
The second major use of the context tree has been to organize the
important components of some object. For example, in the SACON system the
substructures of the main structure correspond to components or regions
of the object that have some uniform property, typically a specific geometry
or material. Each substructure instance is considered independently, and
conclusions about individual responses to stress loadings are summarized
on the structure level to provide a "global" sense of the overall response
of the structure. A recent, additional example of this use of a part-whole
hierarchy is found in a system called LITHO(Bonnet, 1979), which inter-
prets data from oil wells. In this system, each well is decomposedinto a
number of zones that the petrologist can distinguish by depth (Figure
27-4).
A context need not correspond to some physical object but may be an
abstract entity. However, the relationships amongcontexts are explicitly
composed.of
FIGURE
27-4 LITHOsstatic tree and an instance tree.
fixed by the tree of context-types. For this reason, physical objects, repre-
sented in this part-whole fashion, lend themselves more readily to the current
context tree mechanism.
The last major use of the context tree, which is closely related to the
part-whole use described above, has been to represent important events or
situations that happen to an object. Thus, in the SACON system, a LOAD-
ING describes an anticipated scenario or maneuver (such as pounding or
braking) to which the particular SUBSTRUCTURE is subjected. Each
LOADING,in turn, is composed of a number of independent LOAD-
COMPONENTS, distinguished by the direction and intensity of the ap-
plied force. Other uses of this organizational idea have been to represent
individual past PREGNANCIES and current vISITS of a pregnant woman
in the GRAVIDA system of Catanzarite (unpublished; see Figure 27-5) and
the anticipated use of BLEEDING-EPISODES of a PATIENTin the CLOT
4system (Figure 27-6; see also Chapter 16).
The primary reason for defining additional context-types in a consul-
tant is to represent multiple instances of an entity during a case. Some
users may like to define context-types that always have one instance and
no more, primarily for purposes of organization, but this is often unnec-
essary (and even cumbersome). 5 For example, one might want to write
rules that use various attributes of a patients liver, but since there is always
exactly one liver for a patient there is no need to have a liver context; any
attribute of the liver can simply be viewed as an attribute of the patient.
Reference to parameters of contexts in different parts of an instance
tree is currently very awkward. For example, in MYCIN,a particular drug
may be associated somehowwith a particular organism (Figure 27-7). How-
ever, this relationship between context-instances is not one that always holds
4It should be noted that use of the context mechanism to handlesequential visits in the
GRAVIDA systemis experimentaland required the definition of numerous additional func-
tions for this purpose.Theyare not currently in EMYCIN.
5Note,however,that separatinguniqueconceptsout into single contextsmayprovidemore
understandablerule translations dueto the conventionsof context-name
substitutionsin text
generation.See Chapter18 for further discussionof this point.
501
502
GrainSize of Rules 503
between all organisms and all drugs: not all drugs are prescribed to treat
all identified organisms. This "prescribed for" relationship cannot be stated
statically, independently of the case. Special predicate and action functions
must be written to establish and manipulate these kinds of relationships
between instances. It is best to avoid these interactions between disjoint
parts of the tree during the initial design of the knowledgebase.
Summingup our experience with this mechanism and considering its
relative inflexibility, we offer this final caveat: for an initial system design,
those using EMYCINshould start small and should use only one or two
context-types. They should plan the structure of the consultants context
tree carefully before running the EMYCIN system, since restructuring a
context tree is perhaps the most difficult and time-consuming knowledge-
base construction task. Indeed, restructuring the context tree implies a
complete restructuring of the rest of the knowledge base.
to give with the first in order to produce the desired effect. Or, in a
nonmedical domain, a mechanic often makes adjustments in response to
manifestations of an automobile problem (e.g., adjusting the carburetor in
response to stalling) and considers more detail only if the first few adjust-
ments fail. An example from MYCINis cited by Clancey in Chapter 29,
in his discussion of the tetracycline rule: "If the patient is less than 8 years
old, dont prescribe tetracycline." This rule lacks ties to the deeper under-
standing of drug action of which it is a consequence. Thus it is not only
difficult for a student to remember, but also difficult for one to knowhow
to modify or to know exactly how far the premise clause can be stretched
safely.
Wealso recognized that many of the attributes mentioned in rules are
not primitive observational terms in the same sense that values of labora-
tory tests are. For example, MYCIN asks whether a patient is getting better
or worse in response to therapy, just as it asks for serum glucose levels.
Obviously, there are a number of rules that could be written to infer
whether the patient is better, mentioning such things as change in tem-
perature, eating habits, and general coloring. That is, we chose a rule of
the form A -~ B, with A as a primitive, rather than several rules in the
fbllowing form:
AI-,A
A2--, A
An --~ A
A -,B
The missing knowledgeis of" three classes: strategic, structural, and sup-
port. Strategic knowledgeis an important part of expertise. MYCIb~sbuilt-
in strategy is cautious: gather as much evidence as possible (without de-
Strategic, Structural, and Support Knowledge 505
manding new tests) for and against likely causes and then weigh the evi-
dence. Operationally, this translates into exhaustive rule invocation
whereby (a) all (relevant) rules are tried and (b) all rules whose left-hand
sides match the case (and whose right-hand sides are relevant to problem-
solving goals) have their right-hand sides acted upon. But under different
circumstances, other strategies would be more appropriate. In emergen-
cies, for example, physicians cannot take the time to gather much history
data. Or, with recurring illness, physicians will order new tests and wait
for the results. Deciding on the most appropriate strategy depends on
medical knowledge about the context of the case. MYCINscontrol struc-
ture is not concerned with resource allocation; it assumes that there is time
to gather all available information that is relevant and time to process it.
Thus MYCINasks 20-70 questions and processes 1-25 rules between
questions. We estimate that MYCINexecutes about 50 rules per second
(exclusive of I/O wait time). With larger amounts of data or larger numbers
of rules, the control structure would need additional meta-rules that esti-
mate the costs of" gathering data and executing rules, in order to weigh
costs against benefits. Also, in crisis situations or real-time data interpre-
tation, the control structure would need to be concerned with the allocation
7of resources.
One way to make strategic knowledgeexplicit is by putting it in meta-
rules, as discussed in Chapter 28. They are rules of the same IF/THEN
form as the medical rules, but they are "meta" in the sense that they talk
about and reason with the medical rules. One of the interesting aspects of
the meta-rule formalism, as Davis designed it, is that the same rule inter-
preter and explanation system work for meta-rules as for object-level rules.
(Chapter 23 discussed the use of prototypes, or frames, for representing
much of" the same kind of knowledge about problem solving.) Making
strategy knowledge explicit has come to be recognized as an important
design consideration for expert systems (Barnett and Erman, 1982; de
Kleer et al., 1977; Genesereth, 1981; Patil et al., 1981) because it can make
a systems reasoning more efficient and more understandable.
Structural knowledge in medicine includes anatomical and physiolog-
ical information about the structure and function of the body and its sys-
tems. ~ It is part of what we believe is needed for "deeper" reasoning about
diagnosis. A structural model showing, inter alia, the normal connections
of" subparts can be used for reasoning about abnormalities. In contrast,
representing this information in rules would force explicit mention of the
7111 the AMand EURISKOprograms (Lenat, 1976; 1983), Lenat has added information
about nlaximunl amounts of time to spend on various tasks, which keeps those programs
fiom "overspending" computer time on difficult tasks of low importance. (EURISKOcan also
decide to change those time allocations.) In PROSPECTOR (Duda et al., 1978a), attention
tocused on the rules that will add the most information, i.e., that will most increase or decrease
the probability of the hypothesis being pushed. In Foxs system (Fox, 1981), the estimated cost
of evaluatiqg premises of rules helps determine which rules to invoke.
8More generally, we want to talk about the structure of any system or device we want an
expert system to analyze, such as electronic circuits or automobiles.
506 AdditionalKnowledge
Structures
This chapter is an expanded and edited version of a paper originally appearing in Proceedings
of the Fifth IJCAL 1977, pp. 920-928. Used by permission of International Joint Conferences
on Artificial Intelligence, Inc.; copies of the Proceedings are available from William Kaufmann,
Inc., 95 First Street, Los Altos, CA94022.
IFollowing standard usage, knowledge about objects and relations in a particular domain will
be referred to as object-level knowledge.
507
508 Meta-LevelKnowledge
28.1Rule Models
(attribute)
(attribute)-ia (attribute).isnt
FIGURE
28-2 Organization of the rule models.
Figure 28-3 shows an example of a rule model, one that describes the
3subset of rules concluding affirmatively about the area for an investment.
(Since not all details of implementation are relevant here, this discussion
will omit some.) As indicated above, there is a list of rules from which this
model was constructed, descriptions characterizing the premises and ac-
tions, and pointers to more specific and more general models. Each char-
acterization in the description is shownsplit into its two parts, one con-
cerning the presence of individual attributes and the other describing
correlations. The first item in the premise description, for instance, indi-
cates that "most" rules about the area of investment mention the attribute
RETURNRATE in their premises; when they do mention it, they "typi-
cally" use the predicate functions SAMEand NOTSAME;and the
"strength," or reliability, of this piece of advice is 3.83.
The fourth item in the premise description indicates that when the
attribute RETURNRATE (rate of return) appears in the premise of a rule
in this subset, the attribute TIMESCALE "typically" appears as well. As
before, the predicate functions are those usually associated with the attri-
butes, and the numberis an indication of reliability.
3These examples were generated by substituting investment terms for medical terms in ex-
anaples from TEIRESIAS using MYCINs medical knowledge.
Rule Models 511
MODELFORRULESCONCLUDING
AFFIRMATIVELYABOUTINVESTMENT
AREA
EXAMPLES ((RULE116.33)
(RULE050.70)
(RULE037.80)
(RULE095.90)
(RULE152
1.0)
(RULE140
1.0))
DESCRIPTION
PREMISE ((RETURNRATESAMENOTSAME3.83)
(TIMESCALE
SAMENOTSAME3.83)
(TRENDSAME2.83)
((RETURNRATESAME)(TIMESCALE
SAME)
((TIMESCALESAME)(RETURNRATE
SAME)
((BRACKETSAME)(FOLLOWS NOTSAME
SAME)(EXPERIENCE
SAME)
ACTION ((INVESTMENT-AREA
CONCLUDE
4.73)
(RISK CONCLUDE
4.05)
((INVESTMENT-AREA
CONCLUDE)
(RISK CONCLUDE)
4.73))
MORE-GENL (INVESTMENT-AREA)
MORE-SPEC (INVESTMENT-AREA-IS-UTILITIES)
the knowledge base. The process starts with the expert challenging the
system with a specific problem and observing its performance. If the expert
believes its results are incorrect, there are available a numberof tools that
will allow him or her to track down the source of the error by selecting
the appropriate rule model. For instance, if the problem is a missing rule
in the knowledge base to conclude about the appropriate area for an in-
vestment, then TEIRESIASwill select the model shown in Figure 28-3 as
the appropriate one to describe the rule it is about to acquire. Note that
the selection of a specific model is in effect an expression by TEIRESIAS
of its expectations concerning the new rule, and the generalizations in the
model become predictions about the likely content of the rule.
At this point the expert types in the new rule (Figure 28-4), using the
vocabulary specific to the domain. (In all traces, computer output is in
mixed upper and lower case, while user responses are in boldface capitals.)
As mentioned in Chapter 9 and further described in Chapter 18, En-
glish text is understood by allowing keywords to suggest partial interpre-
tations and intersecting those results with the expectations provided by the
selection of a particular rule model. Wethus have a data-directed process
(interpreting the text) combined with a goal-directed process (the predic-
tions made by the rule model). Each contributes to the end result, but it
is their combination that is effective. TEIRESIASdisplays the results of
512 Meta-Level Knowledge
this initial interpretation of the rule (Figure 28-5). If there are mistakes
(as there are in this case), a rule editor is available to allow the expert
indicate necessary changes. This is easily accomplished, since TEIRESIAS
can often make an effective second choice by determining the likely source
of error in its initial guess.
Once the expert is satisfied that TEIRESIAS has correctly understood
what was said, it is the systems turn to see if it is satisfied with the content
of the rule. The main idea is to use the rule model to see how well this
new rule "fits into" the systems model of its knowledge--i.e., does it "look
like" a typical rule of the sort expected?
If the expert agrees to the inclusion of a new clause, TEIRESIAS
attempts to create it (Figure 28-6). The system relies on the context of the
current dialogue (which indicates that the clause should deal with the
amount of the clients investment experience) and the fact that the rule
must work for this case or it wont fix the bug (it is not shown here, but
earlier in the interaction the expert indicated that the client had a moderate
amount of experience). TEIRESIAS guess is not necessarily correct, of
course, since the desired clause may be more general, but it is at least a
plausible attempt.
It should be noted that there is nothing in this concept of "second-
guessing" that is specific to the rule models as they are currently designed,
or indeed to associative triples of rules as a knowledge representation. The
most general and fundamental point was mentioned above--testing to see
how something "fits into" the systems model of its knowledge. At this point
Thisis myunderstanding
of your
rule:
RULE383
IF: 1) Theclientsincome-taxbracket
is 50%,and
2) Themarkethasfollowed a upward
trendrecently,
and
3) Theclientmanages hisassetscarefully
THEN:
Thereis evidence(.8) that theareaof theinvestment
should
behigh-technology
I hateto criticize, Randy, but did youknowthat mostrules aboutwhatthe areaof investment
mightbe, that mention-
the income-taxbracketof the client, and
howcloselythe client follows the market
ALSOmention-
[A] - the amount of investment experienceof the client
ShallI try to writea clauseto account for [A]?
++** y
Howabout-
[A] Theamount
of investment
experience
of the client is moderate
Ok?
++**y
the system might perform any kind of check for violations of any estab-
lished prejudices about what the new chunk of knowledge should look like.
Additional kinds of checks for rules might concern the strength of the
inference, the number of clauses in the premise, etc. In general, this "sec-
ond-guessing" process can involve any characteristic that the system may
have "noticed" about the particular knowledge representation in use.
Automatic generation of rule models has several interesting implica-
tions, since it makes possible a synthesis of the ideas of model-based un-
derstanding and learning by experience. While both of these have been
developed independently in previous AI research, their combination pro-
duces a novel sort of feedback loop: rule acquisition relies on the set of
rule models to effect the model-based understanding process; this results
in the addition of a new rule to the knowledge base; and this in turn
triggers recomputation of the relevant rule model(s).
Note, first, that performance on the acquisition of a subsequent rule
may be better, because the systems "picture" of its knowledge base has
improved--the rule models are now computed from a larger set of in-
stances, and their generalizations are more likely to be valid. Second, since
the relevant rule models are recomputed each time a change is made to
the knowledge base, the picture they supply is kept constantly up to date,
and they will at all times be an accurate reflection of the shifting patterns
in the knowledge base.
Finally, and perhaps most interesting, the models are not hand-tooled
by the system architect or specified by the expert. They are instead formed
by the system itself, and formed as a result of its experience in acquiring
rules from the expert. Thus, despite its reliance on a set of models as a
basis for understanding, TEIRESIASabilities are not restricted by a pre-
existing set of models. As its store of knowledge grows, old models can
become more accurate, new models will be formed, and the systems stock
of knowledge about its knowledge will continue to expand.
514 Meta-LevelKnowledge
28.2 Schemata
ROOT
c VALUE-SCHEMA ATTRIBUTE-SCHEMA
Figure 28-9 shows the schema for a stock name; information corre-
sponding to each of the categories listed above is grouped together. The
first five lines in Figure 28-9 contain structure information and indicate
some of the entries on the property list (PLIST) of the data structure that
represents a stock name. The information is a triple of the form
The slot name labels the "kind" of thing that fills the blank and serves as
a point around which much of the "lower-level" information in the system
is organized. The blank specifies the format of the information required,
while the advice suggests how to find it. Someof the information needed
may be domain-specific, and hence must be requested from the expert.
But some of it may concern completely internal conventions of represen-
tation, and hence should be supplied by the system itself, to insulate the
domain expert from such details. The advice provides a way of indicating
which of these situations holds in a given case.
STOCKNAME-SCHEMA
PLIST [( INSTOF STOCKNAME-SCHEMA GIVENIT
SYNONYM(KLEENE(1 0) < ATOM ASKIT
TRADEDON(KLEENE(1 1 2) <(MARKET-INST
FIRSTYEAR-INST)>) ASKIT
RISKCLASSCLASS-INST ASKIT
CREATEIT]
RELATIONS ( (AND* STOCKNAMELIST
HILOTABLE)
(OR* CUMVOTINGRIGHTS)
(XOR* COMMONPFD CUMPFDPARTICPFD)
((OR* PFDCUMPFDPARTICPFD)PFORATETABLE)
((AND* CUMPFD)
OMITTEDDIVS)
INSTANCES (AMERICAN-MOTORS
AT&T... XEROXZOECON)
FATHER (VALUE-SCHEMA)
OFF-SPRING NIL
DESCR "the STOCKNAME-SCHEMA
describes the format for a stock name"
AUTHOR DAVIS
DATE 1115
INSTOF (SCHEMA-SCHEMA)
For instance, suppose in the process of adding a new rule to the system
the expert mentions a stock the system hasnt heard about yet. Learning
about the new stock (i.e., creating a new instance of the concept of stock
name) becomes a subproblem in the task of acquiring the new rule. The
schema for the concept of stock name is retrieved and used as the basis
for a dialogue that requests the necessary information from the expert.
An abbreviated version of the dialogue is shown in Figure 28-10.
518 Meta-Level Knowledge
{... andwith the newitem clearly marked,the systemcanthen do somevery simplepattern matchingto
decidewhats goingon.}
Nowpleasetell mea tew things aboutit...
{TEIRESIAS
retrieves the schema
for a stock, and"reads"it to find out whatit needsto knowaboutany
stock.}
On what marketsis GEORGIA-PACIFIC traded?
Theanswershouldbein the formof 1 or moreof the following-
<marketname> <year first traded>
[Type1 set to a line, then anemptyline whendone.]
+ +** NYSE1945
+ +** PCE1968
+ +**
{Eachof the questionsfromthe systemis prompted by the attemptto fill in somecomponent
of the data
structure beingbuilt, accordingto the representationconventions
indicatedby the schema.}
Whatis the current risk classification of GEORGIA-PACIFIC?
Theanswershouldbein the formof a
<risk classification>
+ +** AMEX
FIGURE28-10 continued
The preceding subsection showed one instance of using schemata for main-
tenance of the knowledge base. They help ensure that one change to the
knowledge base (adding a new instance of a known representation) will
not violate necessary relationships between data structures. The schemata
also support other capabilities. Besides being useful in maintaining the
knowledge base, they offer a convenient mechanism for organizing and
implementing data structure access and storage functions.
One of the ideas behind the design of the schemata is to use them as
points around which to organize knowledge. The information about struc-
ture and interrelationships described above, for instance, is stored this way.
In addition, access and storage information is also organized in this fash-
ion. By generalizing the advice concept slightly, it is possible to effect all
data structure access and storage requests in the appropriate schema. That
is, code that needs to access a particular structure "sends" an access request,
520 Meta-LevelKnowledge
and the structure "answers" by providing the requested item. 4 This offers
the well-known advantage of insulating the implementation of a data struc-
ture from its logical design. Codethat refers only to the latter is far easier
to maintain in the face of modifications to data structure implementation.
28.3Function Templates
Function Template
SAME (object attribute value)
FIGURE
28-11 Template for the predicate function SAME.
4This was suggested by the perspective taken in workon SMALLTALK (Goldbergand Kay,
1976) and ACTORS(Hewitt et al., 1973). This style of writing programshas cometo
knownas object-oriented programming.
Meta-Rules 521
28.4 Meta-Rules
28.4.1 Meta-Rules---Strategies to Guide the Use of
Knowledge
would make a good investment, it retrieves all the rules that make a con-
clusion about that topic (i.e., they mention STOCKNAME in their action
clauses). It then invokes each one in turn, evaluating each premise to see
if the conditions specified have been met. The search is exhaustive because
the rules are inexact: even if one succeeds, it was deemedto be a wisely
conservative strategy to continue to collect all evidence about a subgoal.
The ability to use an exhaustive search is of course a luxury, and in
time the base of rules may grow large enough to make this infeasible. At
this point some choice would have to be made about which of the plausibly
useful rules should be invoked. Meta-rules were created to address this
problem. They are rules about object-level rules and provide a strategy for
pruning or reordering object-level rules before they are invoked.
METARULEO01
IF 1) the culture wasnot obtainedfroma sterile source,and
2) there are rules whichmentionin their premisea previous
organismwhichmaybe the sameas the current organism
THEN
it is definite (1.0) that eachof themis not goingto beuseful.
PREMISE: (SAND(NOTSAMECNTXTSTERILESOURCE)
(THEREARE OBJRULES(MENTIONSCNTXTPREMISE
SAMEBUG)SET1))
ACTION:(CONCLISTSET1UTILITY NOTALLY1.0)
METARULE002
IF 1) the infectionis a pelvic-abscess,
and
2) there are rules whichmentionin their premise
enterobactariaeeae,and
3) there are rules whichmentionin their premise
gram-positive
rods,
Thereis suggestiveevidence(.4) that the formershouldbe donebefore
thelatter.
PREMISE:(SAND(SAMECNTXTPELVIC-ABSCESS)
(THEREAREOBJRULES(MENTIONS CNTXTPREMISE
ENTEROBACTERIACEAE)SET1)
(THEREAREOBJRULES(MENTIONS CNTXTPREMISEGRAMPOS-RODS)
SET2))
ACTION:CONCLISTSET1DOBEFORE
SET2TALLY .4)
METARULE003
IF 1) there are rules whichdo not mentionthe currentgoal in
their premise
2) there are rules whichmentionthe currentgoalin their
premise
THEN
it is definite that the formershouldbedonebeforethe latter,
PREMISE: ($AND(THEREARE OBJRULES(SAND(DOESNTMENTION
FREEVAR
ACTIONCURGOAL))SET1)
(THEREAREOBJRULES (SAND(MENTIONSFREEVARPREMISE
CURGOAL)SET2))
ACTION: (CONCLISTSET1 DOBEFORE
SET21000)
METARULE004
IF 1) there are rules whichare relevantto positive cultures, and
2) thereare rules whichare relevantto negativecultures
THENit is definite that the formershouldbedonebeforethe latter.
PREMISE:($AND(THEREARE OBJRULES (SAND(APPLIESTOFREEVARPOSCUL))
SET1
)
(THEREARE OBJRULES (SAND(APPLIESTOFREEVAR
NEGCUL))
SET2))
ACTION: (CONCLISTSET1 DOBEFORESET21000)
of rules relevant to the current goal (call the list L). But before attempting
to invoke them, it first determines if there are any meta-rules relevant to
the goah5 If so, these are invoked first. As a result of their actions, we may
obtain a numberof conclusions about the likely utility and relative ordering
of the rules in L. These conclusions are used to reorder or shorten L, and
the revised list of rules is then used. Viewed in tree-search terms, the
current implementation of meta-rules can either prune the search space
or reorder the branches of the tree.
28.5Conclusions
We have reviewed four examples of meta-level knowledge and demon-
strated their application to the task of building and using large stores of
domain-specific knowledge. This has showed that supplying the system
with a store of information about its representations makes possible a num-
ber of useful capabilities. For example, by describing the structure of its
representations (schemata, templates), we make possible a form of transfer
of expertise, as well as a number of facilities for knowledge base mainte-
nance. By supplying strategic information (meta-rules), we make possible
a finer degree of control over use of knowledge in the system. And by
giving the system the ability to derive empirical generalizations about its
knowledge (rule models), we make possible a number of useful abilities
that aid in knowledgetransfer.
The examples reviewed above illustrate a number of general ideas
about knowledge representation and use that may prove useful in building
large programs. Wehave, first, the notion that knowledge in programs
should be made explicit and accessible. Use of production rules to encode
528 Meta-LevelKnowledge
William J. Clancey
531
532 Extensionsto Rulesfor Explanation
andTutoring
ent things. Also, some rules are present mostly to control the invocation
of others. The uniformity of the representation obscures these various
functions of clauses and rules. In looking beyond the surface of the rule
representation to make explicit the intent of the rule authors, this paper
has a purpose similar to Woods "Whats in a Link?" (1975) and Brachmans
"Whats in a Concept?" (1976). Weask, "Whats in a Rule?"
In building GUIDON,we thought that we were simply being "appli-
cations engineers" by making use of MYCINsexplanation facility for a
tutorial setting. As noted in Chapter 26, it was surprising to find out how
little the explanation facility could accomplish for a student. Without a
crisp characterization of what we expected an explanation to convey, the
program was of questionable tutorial value. On the positive side, the study
of these shortcomings led to a radical change in our conception of MY-
CINs rules and supplied a new epistemological framework for building
expert systems.
In this chapter we provide a review of MYCINsexplanatory capability
and an overview of an epistemological framework for enhancing that ca-
pability. The following two sections examine in detail the problems of jus-
tifying a rule and explaining an approach, thereby elucidating the support and
strategic aspects of the epistemological framework. Implications for per-
formance of a consultation system and modifiability are considered briefly.
Finally, in the last section, the frameworkis used to analyze other expert
systems.
Figure 29-2 illustrates how, in the questioning session after the consulta-
tion, one can inquire further about the programs intermediate reasoning
steps, including why it didnt ask about something. These are the expla-
nation capabilities that we sought to exploit in a teaching program.
MYCINs explanations are entirely in terms of its rules and goals. The
question WHYmeans "Why do you want this information?" or "How is
this information useful?" and is translated internally as "In what rule does
this goal appear, and what goal does the rule conclude about?" Davis, who
534 Extensions to Rules for Explanation and Tutoring
developed the explanation facility, pointed out that MYCIN did not have
the knowledge to respond to other interpretations of a WHYquestion
(Davis, 1976). He mentioned specifically the lack of rule justifications and
planning knowledge addressed in this chapter.
In order to illustrate other meanings for the question WHYin
MYCIN,we illustrate the rule set as a network of goals, rules, and
hypotheses in Figure 29-3. At the top level are all of the systems goals that
it might want to pursue to solve a problem (diagnostic and therapeutic
decisions). Examples of goals, stated as questions to answer, are "What is
the shape of the organism?" and "What organism is causing the meningi-
tis?" At the second level are hypotheses or possible choices for each of the
goals. Examples of hypotheses are "The organism is a rod." and "E. coli is
causing the meningitis." At the third level are the rules that support each
hypothesis. At the fourth level appear the premises of these rules, specific
hypotheses that must be believed for the rule to apply. For example, for
Rule 543 to apply (shown in Figure 29-1) it must be the case that the
infection is meningitis, that the meningitis was caused by bacteria, that the
patient is receiving steroids, and so on.
A key aspect of MYCINsinterpreter is that, when confronted with a
hypothesis in a rule premise that it needs to confirm, it considers all related
hypotheses by pursuing the more general goal. For example, attempting
to apply Rule 543, the program will consider all rules that conclude about
the infection, rather than just those that conclude that the infection is
meningitis. Similarly, it will consider all rules that conclude about the kind
of meningitis (viral, fungal, TB, or bacterial), rather than just those that
hypothesize that the meningitis is bacterial. 2 These new goals deriving
from rules can now be seen conceptually as level 1 goals, and the process
recurs.
The links in Figure 29-3 and their ordering are points of flexibility in
the rule representation. For example, the author of a rule defines each
goal and its specific hypotheses (levels 1 and 2 and 4 and 5). Less trivially,
it is the authors choice to define rules that link hypotheses to one another.
For example, rules on level 3 link levels 2 and 4. Wecall the rationale
behind this link the justification of the rule. GUIDON cannot teach rule
justifications because they are not represented in MYCIN.Section 29.3
examines the nature of rule justifications and how a tutoring system can
provide them.
2This is not inefficient, given the programs exhaustive search strategy and tile fact that the
other hypotheses will be referenced by other rules. Note also that some hypotheses, such as
"the patient is receiving steroids," are not generalized but are represented as goals directly.
Whether or not a hypothesis is represented as a yes/no parameter or as a "value" of a multi-
valued parameter (such as "kind of meningitis") is a decision of the author of the rule, deriving
from a pattern of hypotheses that he or she wishes to collapse tbr clarity into a more general
goal. By this process of abstraction, a single muhi-valued parameter dealing with kinds of
surgery would replace individual yes/no parameters that specified "cardiac surgery," "neu-
rosurgery," etc. These organizational decisions have no bearing on system performance, so
the knowledge base is somewhat inconsistent in how these choices are made.
MYCINs
Explanation Capability 535
** DID YOUCONSIDER
WHETHER
THEPATIENT
WASRECEIVING
STEROIDS?
RULE543couldhaveused:
whetherJ.Smith
hasreceivedcorticosteroids.
Clause
4 of RULE543
"thepatienthasreceived
corticosteroids"
wasalready
known
to befalse,
3Meta-rulescould have been used fl)r ordering rules, as described in Chapter 28. Thepresent
chapter is a rethinking of" the wholequestion.
536
oo
0
tJ
o
e-
...I
An Epistemological Frameworkfor Rule-Based Systems 537
RULE543
IF: 1) Theinfection whichrequirestherapy is meningitis,
2) Onlycircumstantialevidence is availablefor this case,
3) Thetypeof theinfection is bacterial,
4) Thepatientis receiving corticosteroids,
THEN: Thereis evidence that the organisms whichmightbe causingthe infectionare
e.coli (.4), klebsiella-pneumoniae(.2), or pseudomonas-aeruginosa
FIGURE
29-4 The steroids rule.
Figure 29-5 shows how this diagnostic heuristic is justified and incor-
porated in a problem-solving approach by relating it to strategic, structural,
and support knowledge. Recalling Section 29.1, we use the term strategy to
refer to a plan by which goals and hypotheses are ordered in problem
solving. A decision to determine "cause of the infection" before "therapy
to administer" is a strategic decision. Similarly, it is a strategic decision to
pursue the hypothesis "E. coli is causing meningitis" before "Cryptococcus is
causing meningitis." And recalling an earlier example, deliberately decid-
ing to ask the user about steroids before alcoholism would be a strategic
decision. These decisions all lie above the plane of goals and hypotheses,
1The English fk)rm of rules stated in this paper has been simplified for readability. Sometimes
clauses are omitted. Medical examples are for purposes of illustration only.
538 Extensions to Rules for Explanation and Tutoring
ESTABLISH HYPOTHESISSPACE:
(STRATEGY)
CONSIDERDIFFERENTIAL-BROADENINGFACTORS
/
IN BACTERIALMENINGITIS, COMPROMISED
HOST
(RULE MODEL)
I
[
RISK FACTORSSUGGESTUNUSUALORGANISMS
I
ANY DISORDER
INFECTION
(STRUCTURE)
MENINGITIS
COMPROMISEDHOST
ACUTE CHRONIC
I if STEROIDSthen GRAM-NEGATIVE
RODORGS
I
I
(INFERENCERULE)
\
STEROIDS IMPAIR IMMUNO-RESPONSE
MAKING PATIENT SUSCEPTIBLETO
(SUPPORT)
INFECTION BY ENTEROBACTERIACEAE,
NORMALLYFOUNDIN THE BODY
classify causes of" disease into commonand unusual causes, for example,
of bacterial meningitis. These concepts provide a handle by which a strategy
can be applied, a means of referencing the domain-specific knowledge. For
example, a strategy might specify considering commoncauses of a disease;
the structural knowledgeabout bacterial meningitis allows this strategy to
be instantiated in that context. This conception of structural knowledge
follows directly from Davis technique of content-directed invocation of knowl-
edge sources (see Chapter 28). A handle is a means of indirect reference
and is the key to abstracting reasoning in domain-independent terms. The
discussion here elaborates on the nature of handles and their role in the
explanation of reasoning.
The structural knowledge we will be considering is used to index two
kinds of" hypotheses: problem features, which describe the problem at hand
(for example, whether or not the patient is receiving steroids is a problem
feature); and diagnoses, which characterize the cause of the observed prob-
lem features. For example, acute meningitis is a diagnosis. In general,
problem features appear in the premises of diagnostic rules, and diagnoses
appear in the conclusions. Thus organizations of problem features and
diagnoses provide two ways of indexing rule associations: one can use a
strategy that brings certain diagnoses to mind and consider rules that sup-
port those hypotheses; or one can use a strategy that brings certain prob-
lem features to mind, gather that information, and draw conclusions (apply
rules) in a data-directed way.
Figure 29-5 shows how a rule model, or generalized rule, 5 as a form of
structural knowledge, enables either data-directed consideration of the ste-
roids rule or hypothesis-directed consideration. Illustrated are partial hier-
archies of problem features (compromised host factors) and diagnoses
(kinds of infections, meningitis, etc.)--typical forms of structural knowl-
edge. The specific organisms of the steroids rule are replaced by the set
"gram-negative rods," a key hierarchical concept we use for understanding
this rule.
Finally, the justification of the steroids rule, a link between the problem
feature hypothesis "patient is receiving steroids" and the diagnostic hy-
pothesis "gram-negative rod organisms are causing acute bacterial infec-
tious meningitis," is based on a causal argument about steroids impairing
the bodys ability to control organisms that normally reside in the body.
While this support knowledgeis characteristically low-level or narrow in con-
trast with the strategical justification for considering compromised host
risk factors, it still makesinteresting contact with structural terms, such as
the mention of Enterobacteriaceae, which are kinds of gram-negative rod
organisms. In the next section, we will consider the nature of rule justifi-
cations in more detail, illustrating howstructural knowledge enables us to
make sense of a rule by tying it to the underlying causal process.
29.3Explaining a Rule
Here we consider the logical bases for rules: what kinds of arguments
justify the rules, and what is their relation to a mechanistic model of the
domain? Weuse the terms "explain" and "justify" synonymously, although
the sense of "making clear what is not understood" (explain) is intended
more than "vindicating, showing to be right or lawful" (justify).
"If the patient is less than 8 years old, dont prescribe tetracycline."
This rule simply states one of the things that MYCINneeds to know to
properly prescribe drugs for youngsters. The rule does not mention the
underlying causal process (chelation, or drug deposition in developing
bones) and the social ramifications (blackened permanent teeth) on which
it is based. From this example, it should be clear that the justifications of
MYCINsrules lie outside of the rule base. In other words, the record of
inference steps that ties premise to action has been left out. A few questions
need to be raised here: Did the expert really leave out steps of reasoning?
Whatis a justification for? Andwhat is a good justification?
Frequently, we refer to rules like MYCINsas "compiled knowledge."
However, when we ask physicians to justify rules that they believe and
follow, they very often cant explain why the rules are correct. Or their
rationalizations are so slow in comingand so tentative that it is clear they
are not articulating reasoning steps that are consciously followed. Leaps
from data to conclusion are justified because the intermediate steps (like
the process of chelation and the social ramifications) generally remain the
same from problem to problem. There is no need to step through this
knowledge--to express it conditionally in rules. Thus, for the most part,
MYCINsrules are not compiled in the sense that they represent a delib-
erate composition of reasoning steps by the rule authors. They are com-
piled in the sense that they are optimizations that leave out unnecessary
steps---evolved patterns of reasoning that cope with the demands of or-
dinary problems.
If an expert does not think about the reasoning steps that justify a
rule, why does a student need to be told about them? One simple reason
542 Extensionsto Rulesfor Explanation
andTutoring
tetracycline in youngster
-~ chelation of the drug in growingbones
--, teeth discoloration
undesirable body change
dont administer tetracycline
photosensitivity,diarrhea,...
teeth nausea
discoloration
drugsy,z,...
tetracycline drugx
standing one) does not require that every detail of causality be considered.
Instead, a relatively high level of explanation is generally satisfying--most
readers probably feel satisfied by the explanation that tetracycline causes
teeth discoloration. This level of satisfaction has something to do with the
students prior knowledge.
For an explanation to be satisfying, it must make contact with already
known concepts. We can characterize explanations by studying the kinds
of intermediate concepts they use. For example, it is significant that most
contraindication rules, reasons for not giving antibiotics, refer to "unde-
sirable body changes." This pattern is illustrated hierarchically in Figure
29-7. The first level gives types of undesirable changes; the second level
gives causes of these types of changes. Notice that this figure contains the
last step of" the expanded tetracycline rule and a leap from tetracycline to
this step. The pattern connecting drugs to the idea of undesirable body
changes forms the basis of" an expectation for explanations: we will be
satisfied if" a particular explanation connects to this pattern. In other words,
given an effect that we can interpret as an undesirable body change, we
will understand why a drug causing that effect should not be given. We
might want to knowhow the effect occurs, but here again, we will rest easy
on islands of familiarity, just as we dont feel compelled to ask why people
dont want black teeth.
To summarize, key concepts in rule explanations are abstractions that
connect to a pattern of reasoning we have encountered before. This sug-
gests that one way to explain a rule, to make contact with a familiar rea-
soning pattern, is to generalize the rule. Wecan see this more clearly from
the viewpoint of diagnosis, which makes rich use of hierarchical abstrac-
tions.
Consider the following fragment from a rule we call the leukopenia
rule:
How can we explain this rule? First, we generalize the rule, as shown
in Figure 29-8. The premise concepts in the rules on the left-hand side of
levels 1 through 3 are problem features (cf. Section 29.2), organized hier-
archically by different kinds of relations. Generally, a physician speaks
loosely about the connections--referring to leukopenia both as a cause of
immunosuppression as well as a kind of immunosuppression--probably
because the various causes are thought of hierarchically.
"caus/
pregnancy
I "subtype"
immunosu pp ression
T
"subset"
"caUS/steroids I "evidence"
leukopenia
I"is a"
E.coli, Pseudomonas,
and Klebsiella
~ ~mponentof"
CBC Oata
Here the relation is a social fact; if the patient is not an adult, we assume
that he is not an alcoholic. The third relation we observe is a subtype, as
in
"If... the patient has undergone surgery and the patient has
undergone neurosurgery, then ... "
All screening relations can be expressed as rules, and some are, such as
"If" the patient has not undergone surgery, then the patient
has not undergone cardiac surgery."
restate the rule on a higher level. We point out that a low WBCindicates
leukopenia, which is a form of immunosuppression, thus tying the rule to
the familiar pattern that implicates gram-negative rods and Enterobacteri-
aceae. This is directly analogous to pointing out that tetracycline causes
teeth discoloration, which is a form of undesirable body change, suggesting
that the drug should not be given.
By re-representing Figure 29-8 linearly, we see that it is an expansion
of the original rule:
The expansion marches up the problem feature hierarchy and then back
down the hierarchy of diagnoses. The links of this expansion involve caus-
ality composed with identification, subtype, and subset relations. By the
hierarchical relationships, a rule on one level "explains" the rule below it.
For example, the rule on level 3 provides the detail that links immuno-
suppression to the gram-negative rods. By generalizing, we have made a
connection to familiar concepts.
Tabular rules provide an interesting special case. The CSF protein rule
shown in Figure 29-9 appears to be quite formidable. Graphing this rule
as shown in Figure 29-10, we find a relatively simple relation that an expert
states as "If the protein value is less than 40, I think of viral infections; if
it is more than 100, I think of bacterial, fungal, or TB." This is the first
level of generalization, the principle that is implicit in the rule. The second
level elicited from the expert is "If the protein value is low, I think of an
RULE50O
(TheCSFProteinRule)
IF: 1) Theinfectionwhich
requires therapyis meningitis,
2) Alumbar puncture
hasbeen performed onthepatient,and
3) TheCSFproteinis known
THEN:Thetypeof theinfection is asfollows:
If theCSF
protein
is:
a)lessthan41then:
notbacterial(.5), viral(.7), notfungal (.6), nottb
b) between
41and100then:bacterial (.1), viral(.4), fungal
c) between
100and200then:bacterial (.3), fungal(.3), tb
d) between
200and300then: bacterial(.4), notviral(.5), fungal (.4), tb
e)greater
orequalto 300then:bacterial (.4), notviral(.6), fungal(.4), tb
..~
II II II II
m t- > t~
ID v-
II
tn i---
i-
i
o
548 Extensionsto Rulesfor Explanation
andTutoring
In general, causal rules argue that some kind of process has occurred. We
expect a top-level explanation of a causal rule to relate the premise of the
rule to our most general idea of the process being explained. This provides
a constraint for howthe rule should be generalized, the subject of the next
section.
Words in italics in the first sentence constitute the pattern of "portal and
passage." Wefind that the premise of a rule generally supplies evidence
for only a single step of the causal process; the other steps must be inferred
by default. For example, the alcoholic rule argues for passage of the Diplo-
coccus to the lungs. The person reading this explanation must know that
Diplococcus is normally found in the mouth and throat of any person and
that it proceeds from the lungs to the meninges by the blood. The organism
finds conditions favorable for growth because the patient is compromised,
as stated in the explanation. In contrast, the leukopenia rule only argues
for the patient being a compromisedhost, so the organisms are the default
organisms, those already in the body, which can proceed to the site of
7infection.
These explanations say which steps are enabled by the data. They place
the patient on the path of an infection, so to speak, and leave it to the
understander to fill in the other steps with knowledge of how the body
normally works. This is why physicians generally refer to the premise data
as "predisposing factors." To be understood, a rule must be related to the
prior steps in a causal process, the general concepts that explain many
rules.
The process of explanation is a bit more complicated in that causal
relations may exist between clauses in the rule. Wehave already seen that
one clause may screen another on the basis of world facts, muhicomponent
test relations, and the subtype relation. The program described here knows
these relations and "subtracts off" screening clauses from the rule. More-
over, as discussed in Section 29.4, some clauses describe the context in
which the rule applies. These, too, are made explicit for the explanation
program and subtracted off. In the vast majority of MYCINrules, only
one premise clause remains, and this is related to the process of infection
in the way described above.
7As physicians would expect, alcoholism also causes infection by gram-negative rods and
Enterobacteriaceae. Wehave omitted these for simplicity. However,this example illustrates that
a MYCIN rule can have multiple conclusions reached by different causal paths.
550 Extensionsto Rulesfor Explanation
andTutoring
Whenmore than one clause remains after the screening and contex-
tual clauses have been removed, our study shows that a causal connection
exists between the remaining clauses. Wecan always isolate one piece of
evidence that the rule is about (for example, WBCin the leukopenia rule);
we call this the key factor of the rule. Wecall the remaining clauses restriction
clauses, s There are three kinds of relations between a restriction clause and
a key factor:
(Rule536) Q34ALCOHOLIC
PNEUMOCOCCUS
COVERFOR (Rule559) Q38SPLENECTOMY
(Rule545) Q35NOSOCOMIAL
H.INFLUENZA
Q36 EPIGLOTTITIS
(Rule395)
Q37OTITIS-MEDIA
RULE092 (TheGoalRule)
IF: 1) Gather
information
aboutcultures
takenfromthepatientandtherapy
heis receiving,
2) Determine
if theorganisms
growing
onculturesrequire
therapy
3) Consider
circumstantial
evidence
foradditionalorganisms
thattherapy
shouldcover
THEN: Determinethe besttherapyrecommendation
RULE535
(TheAlcoholicRule)
IF: 1) Theinfectionwhichrequires therapyis meningitis,
2) Only circumstantial
evidenceis available
forthiscase,
3) Thetypeofmeningitis is bacterial,
4) Theageof thepatient is greaterthan17years,and
S) Thepatient is analcoholic,
THEN: There is evidencethattheorganisms whichmightbecausing
theinfection
are
diplococcus-pneumoniae
(.3) ore.coli(.2)
plan at this level; the program is simply applying rules (methods) exhaus-
tively. This lack of similarity to human reasoning severely limits the use-
fulness of the system for teaching problem solving.
However, MYCINdoes have a problem-solving strategy above the level
of rule application, namely the control knowledge that causes it to pursue
a goal at a certain point in the diagnosis. We can see this by examining
how rules interact in backward chaining. Figure 29-13 shows the goal rule
and a rule that it indirectly invokes. In order to evaluate the third clause
of the goal rule, MYCINtries each of the COVERFOR rules; the alcoholic
rule is one of these (see also Figure 29-12). Wecall the goal rule ta sk ru le
to distinguish it from inference rules. Clause order counts here; this is
more a procedure than a logical conjunction. The first three clauses of the
alcoholic rule, the context clauses, also control the order in which goals are
pursued, just as is true for a task rule. We can represent this hidden struc-
ture of goals by a tree which we call the inference structure of the rule base
(produced by "hanging" the rule set from the goal rule). Figure 29-14
91
illustrates part of MYCINsinference structure.
The programs strategy comes to light when we list these goals in the
order in which the depth-first interpreter makes a final decision about
them. For example, since at least one rule that concludes "significant" (goal
4 in Figure 29-14) mentions "contaminant" (goal 3), MYCINapplies all of
the "contaminant" rules before making a final decision about "significant."
Analyzing the entire rule set in a similar way gives us the ordering (shown
in Figure 29-14):
REGIMEN
= main goal
/
rule92, = TREATFOR
C~)9~ERFOR= rule92,
(2)
c,au
WHAT-INF?SIGNIFICANT?IDENTITY?
(4)
MENINGITIS? BA~T)ERIA
(7)
L?
INFECTION?CONTAMINANT? INFECTION?
(1) (3) (6)
1. Is there an infection?
2. Is it bacteremia, cystitis, or meningitis?
3. Are there any contaminated cultures?
4. Are there any good cultures with significant growth?
5. Is the organism identity known?
6. Is there an infection? (already done in Step 1)
7. Does the patient have meningitis? (already done in Step 2)
8. Is it bacterial?
9. Are there specific bacteria to cover for?
META-RULE002
IF: 1)Theinfection
is pelvic-abscess,
and
2)Thereareruleswhich
mention intheirpremise
enterobacteriaceae,
and
3)Thereareruleswhich
mentionintheirpremise
gram-positive
rods,
THEN:There
is suggestive
evidence
(.4) thattheformer
should
bedone
before
thelatter
FIGURE29-15 A MYCINmeta-rule.
space and the nature of the search strategy to students. This means that
we need to represent explicitly the fact that the diagnosis space is hierar-
chical and to represent strategies in a domain-independent form. If a strat-
egy is not in domain-independent form, it can be taught by examples, but
not explained.
RULE086
iF: I) Theaarobicity
oftheorganism
is notknown,
and
2) Theculture
wasobtained
more
than2 days
ago,
THEN: There
is evidence
thattheaerobicity
oftheorganism
is obligate-aerob
(.5) orfacultative
(.5)
FIGURE
29-16 The aerobicity rule.
This rule is tried only after all of the non-self-referencing rules have
been applied. The cumulative conclusion of the non-self-referencing rules
is held aside, then the self-referencing rules are tried, using in each rule
the tentative conclusion. Thus the first clause of Rule 86 will be true only
if none of the standard rules made a conclusion. The effect is to reconsider
a tentative conclusion. Whenthe original conclusion is changed by the self-
referencing rules, this is a form of nonmonotonic reasoning (Winograd,
1980). Wecan restate MYCINsself-referencing rules in domain-indepen-
dent terms:
If nothing has been observed, consider situations that have no visible manifesta-
tions. For example, the aerobicity rule: "If no organism is growing in the
culture, it may be an organism that takes a long time to grow (obligate-
aerob and facultative organisms)."
The self-referencing mechanismmakes it possible to state this rule with-
out requiring a long premise that is logically exclusive from the remain-
der of the rule set.
To illustrate further the idea of the strategy, structure, and support frame-
work and to demonstrate its usefulness for explaining how a program
reasons, several knowledge-based programs are described below in terms
of the framework. For generality, we will call inference associations such
as MYCINsrules knowledge sources (KSs). Wewill not be concerned here
with the representational notation used in a program, whether it be frames,
production rules, or something else. Instead, we are trying to establish an
understanding of the knowledge contained in the system: what kinds of
inferences are madeat the KSlevel, how these KSs are structured explicitly
in the system, and how this structure is used by strategies for invoking
KSs. This is described in Table 29-1.
561
..~
,.o
..~
,.o
,1
o
o ~- ++-
~~ ~.~ ~.~
+~ ~ .-
~ ~ ~.~
0
~ &~ ~.~ .
~N =.n
~.~ ~ "=
~~ =
+-
~,.~
0
8~
~~++
~o g "
i i
+:
<~
11
~= z~
~~
562 Extensionsto Rulesfor Explanation
andTutoring
1. "Consider KSs that would demonstrate a prior cause for the best
hypothesis."
2. "Dont consider KSs that are subtypes of ruled-out hypotheses."
3. "Consider KSs that abstract known data."
4. "Consider KSs that distinguish between two competing kinds of
processes."
5. "Consider KSs relevant to the current problem domain."
29.6.4 Summary
29.7 Conclusions
Evaluating
Performance
3O
The Problem of Evaluation
571
572 TheProblemof Evaluation
The design inherently assumed that the opinions of" recognized ex-
perts provided the "gold standard" against which the programs perfor-
mance should be assessed. For reasons outlined below, other criteria (such
as the actual organisms isolated or the patients response to therapy) did
not seem appropriate. Despite the encouraging results of this experiment
(hereafter referred to as Study 1), several problems were discovered during
its execution:
The evaluators complained that they could not get an adequate "feel"
for the patients by merely reading a typescript of the questions MYCIN
asked (and they therefore wondered how the program could do so).
Because the evaluators knew they were assessing a computer program,
there was evidence that they were using different (and perhaps more
stringent) criteria for assessing its performance than they would use in
assessing the recommendations of a human consultant.
MYCINs"approval rating" of just under 75% was encouraging but in-
tuitively seemed to be too low for a truly expert program; yet we had
no idea how high a rating was realistically achievable using the gold
standard of" approval by experts;
The time required from evaluators was seen to be a major concern; the
faculty and fellows agreed to help with the study largely out of curiosity,
but they were all busy with other activities and some of them balked at
the time required to thoroughly consider the typescripts and treatment
plans for all 15 cases.
Questions were raised regarding the validity of a study in which the
evaluators were drawn from the same environment in which the pro-
gram was developed; because of regional differences in prescribing hab-
its and antimicrobial sensitivity patterns, some critics urged a study de-
sign in which MYCINsperformance in settings other than Stanford
could be assessed.
forms were designed to allow evaluators to fill them out largely by using
checklists, the time required to complete them was still lengthy if the phy-
sician was careful in the work, and there were once again long delays in
getting the evaluation forms back for analysis. In fact, despite the "moti-
vating honorarium," some of the evaluators took more than 12 months to
return the booklets.
Although the MYCINknowledge base for bacteremia had been con-
siderably refined since Study 1, we were discouraged to find that the results
of Study 2 once again showed about 75% overall approval of the programs
advice. It was clear that we needed to devise a study design that would
"blind" the evaluators to knowledge of which advice was generated by
MYCINand that would simultaneously allow us to determine the overall
approval ratings that could be achieved by experts in the field. Webegan
to wonder if" the 75% figure might not be an upper limit in light of the
controversy and stylistic differences amongexperts.
As a result, our meningitis study (hereafter referred to as Study 3)
used a greatly streamlined design to encourage rapid turnaround in eval-
uation forms while keeping evaluators unaware of what advice was pro-
posed by MYCIN(as opposed to other prescribers from Stanford). Study
3 is the subject of Chapter 31, and the reader will note that it reflects many
of the lessons from the first two studies cited above. With the improved
design we were able to demonstrate formally that MYCINsadvice was
comparable to that of" infectious disease experts and that 75%is in fact
better than the degree of agreement that could generally be achieved by
Stanford faculty being assessed under the same criteria.
In the next section we summarize some guidelines derived from our
experience. Webelieve they are appropriate when designing experiments
for the evaluation of expert systems. Then, in the final section of this
chapter, we look at some previously unpublished analyses of the Study 3
data. These demonstrate additional lessons that can be drawn and on
which future evaluative experiments may build.
Decisions/Advice/Performance
Correct Reasoning
Not all designers of expert systems are concerned about whether their
program reaches decisions in a "correct" way, so long as the advice that it
offers is appropriate. As we have indicated, for example, MYCIN was not
intended to simulate human problem solving in any formal way. However,
there is an increasing realization that expert-level performance mayrequire
heightened attention to the mechanisms by which human experts actually
solve the problems for which the expert systems are being built. It is with
regard to this issue that the interface between knowledge engineering and
psychology is the greatest, and, depending on the motivation of the system
designers and the eventual users of the expert program, some attention to
the mechanisms of reasoning that the program uses may be appropriate
during the evaluation process. The issue of deciding whether or not the
reasoning used by the program is "correct" will be discussed further below.
the choice of" words used in the questions and responses generated by
the program;
the ability of the expert system to explain the basis for its decisions and
to customize those explanations appropriately for the level of expertise
of the user;
the ability of the system to assist the user when he or she is confused or
wants help; and
the ability of the expert system to give advice and to educate the user in
a congenial fashion so that the frequently cited psychological barriers to
computer use are avoided.
It is likely that issues such as these are as important to the ultimate success
of an expert system as is the quality of its advice. For this reason such issues
also warrant formal evaluation.
they are motivated to learn. For that reason we have seen the development
of light pen interfaces, touch screens, and specialized keypads, any of
which may be adequate to facilitate simple interactions between users and
systems. Details of the hardware interface often influence the design of the
system software as well. The intricacies of this interaction cannot be ig-
nored in system evaluation, nor can the mundane details of" the users
reaction to the terminal interface. Once again, it can be difficult to design
evaluations in which dissatisfaction with the terminal interface is isolated
as a variable, independent of discourse adequacy or decision-making per-
formance. As we point out below, one purpose of staged evaluations is to
eliminate some variables from consideration during the evolution of the
system.
Efficiency
Cost Effectiveness
The evaluation process is a continual one that should begin at the time of
system design, extend in an informal fashion through the early stages of
development, and become increasingly formal as a developing system
moves toward real-world implementation. It is useful to cite nine stages of
~
system development, which summarize the evolution of an expert system.
They are itemized in Table 30-1 and discussed in some detail below.
2Theseimplementation
steps are based~ma discussion of expert systemsin Short|iffe and
Davis(1975).
A Summary
of EvaluationConsiderations 577
TABLE
30-1 Steps in the Implementation of an Expert System
1. Top-level design with definition of long-range goals
2. First version prototype, showingfeasibility
3. System refinement in which informal test cases are run to generate feedback
from the expert and from users
4. Structured evaluation of performance
5. Structured evaluation of acceptability to users
6. Service functioning for extended period in prototype environment
7. Follow-upstudies to demonstratethe systems large-scale usefulness
8. Programchanges to allow wide distribution of the system
9. General release and distribution with firm plans for maintenanceand updating
the results of the liver biopsy, it maybe possible to avoid the more invasive
procedure in future patients. The parallel in expert system evaluation is
obvious; if we can demonstrate that the expert systems advice is compa-
rable to the gold standard for the domain in question, it may no longer
be necessary to turn to the gold standard itself if it is less convenient, less
available, or more expensive.
In general there are two views of how to define a gold standard for an
expert systems domain: (1) what eventually turns out to be the "correct"
answer for a problem, and (2) what a human expert says is the correct
answer when presented with the same information as is available to the
program. It is unfortunate that for many kinds of problems with which
expert systems are designed to assist, the first of these questions cannot be
answered or is irrelevant. Consider, for example, the performance of MY-
CIN. One might suggest that the gold standard in its domain should be
the identity of the bacteria that are ultimately isolated from the patient, or
the patients outcome if he or she is treated in accordance with (or in
opposition to) the programs recommendation. Suppose, then, that MYCIN
suggests therapy that covers for four possibly pathogenic bacteria but that
the organism that is eventually isolated is instead a fifth rare bacterium
that was totally unexpected, even by the experts involved in the case. In
what sense should MYCINbe considered "wrong" in such an instance?
Similarly, the outcome for patients treated for serious infections is not
100%correlated with the correctness of therapy; patients treated in ac-
cordance with the best available medical practice may still die from ful-
minant infection, and occasionally patients will improve despite inappro-
priate antibiotic treatment. Accordingly, we said that MYCINperformed
at an expert level and was "correct" if it agreed with the experts, even if
both MYCINand the experts turned out to be wrong. The CADUCEUS
program has been evaluated by comparing the diagnoses against those
published on selected hard cases from the medical literature (Miller et al.,
1982).
Informal Standards
Controlling Variables
Sensitivity Analysis
lead to detrimental effects on problems that were once handled very well
by the system. An awareness of this potential problem is crucial as system
builders iterate from Step 3 to Step 4 and back to Step 3 (see Table 30-1).
One method for protecting against the problem is to keep a library of old
cases available on-line for batch testing of the systems decisions. Then, as
changes are made to the system in response to the Step 4 evaluations of
the programs performance, the old cases can be run through the revised
version to verify that no unanticipated knowledge interactions have been
introduced (i.e., to show that the programs performance on the old cases
does not deteriorate).
When the Study 3 data had been analyzed and published (Chapter 31),
we realized there were still several lingering questions. The journal editors
had required us to shorten the data analysis and discussion in the final
report. Wealso had asked ourselves several questions regarding the meth-
odology and felt that these warranted further study.
Accordingly, in 1979 Reed Letsinger (then a graduate student in our
group) undertook an additional analysis of the Study 3 data. What follows
is largely drawn from an internal memothat he prepared to report his
findings. The reader should be familiar with Chapter 31 before studying
the sections below.
584 The Problem of Evaluation
The tendency of the experts to agree with one another has a direct impact
on the power of the study to discriminate good performance from bad.
Consider two extreme cases. At one end is the case where on the average
the evaluators agree with each other just as much as they disagree. This
means that on each case the prescribers would tend to get scores around
the midpoint--in the case of the MYCINstudy, around 4 out of 8. The
cumulative scores would then cluster tightly around the midpoint of the
Further Commentson the Study 3 Data 585
possible range, e.g., around 40 out of 80. The differences between the
quality of" performance of the various subjects would be "washed out," the
scores would all be close to one another, and consequently, it would be
very unlikely that any of the differences between scores would be signifi-
cant. At the other extreme, if the evaluators always agreed with each other,
the only "noise" in the data would be contributed by the choice of the
sample cases. Intermediate amounts of disagreement would correspond-
ingly have intermediate effects on the variability of the scores, and hence
on the power of" the test to distinguish the performance capabilities of the
subjects.
A rough preliminary indication of" the extent of this agreement can be
derived from the MYCIN data. A judgment situation consists of a partic-
ular prescriber paired with a particular case. Thus there are 100judgment
situations in the present study, and each receives a score between 0 and 8,
depending on how many of the evaluators found the performance of the
subject acceptable on the case. The range between 0 and 8 is divided into
three equal subranges, 0 to 2, 3 to 5, and 6 to 8. A judgment situation
receiving a score in the first of these ranges may be said to be generally
unacceptable, while those receiving scores in the third range are generally
acceptable. The situations scoring in the middle range, however, cannot be
decided by a two-thirds majority rule, and so may be considered to be
undecided due to the evaluators inability to agree. It turns out that 53 out
of" the 100judgment situations were undecided in this sense in the MYCIN
study.
For a more accurate indication of the level of this disagreement, the
evaluators can be paired in all possible combinations, and the percentage
of judgment situations in which they agree can be calculated. The mean
of this percentage across all pairs of evaluators reflects howoften we should
expect two experts to agree on the question of whether or not the perfor-
mance of a prescriber is acceptable (when the experts, the prescriber, and
the case are chosen from populations for which the set of evaluators, the
set of subjects, and the set of cases used in the study are representative
samples). In the MYCIN study, this mean was 0.591. Thus, if the evalua-
tors, prescribers, and cases used in this study are representative, we would
in general expect that if we choose two infectious disease experts and a
judgment situation at random on additional cases, the two experts will
disagree on the question of whether or not the recommended therapy is
acceptable 4 out of every 10 times!
Before such a number can be interpreted, more must be known about
the pattern of agreement. One question is how the disagreement was dis-
tributed across the subjects and across the cases. It turns out that the var-
iation across subjects was remarkably low for the MYCINdata, with a
standard deviation of less than 6 percentage points. The standard devia-
tion across cases was slightly higher--just under 10 percentage points. Very
little of the high level of disagreement amongthe graders can be attributed
to the idiosyncracies of a few subjects or of a few cases. If it had turned
586 TheProblemof Evaluation
The previous discussion of the tendency of the experts to agree with one
another is subject to at least one objection. Suppose that, for a particular
case, four of the ten prescribers made the same recommendation, and
FurtherComments
on the Study3 Data 587
expert e 1 agreed with the recommendation while expert e2 did not. Then
el and e2 would be counted as disagreeing four times, when in fact they
are only disagreeing over one question. If a large number of the cases lead
to only a few different responses, then it might be worth lumping together
the prescribers that made the same therapy recommendation. Then the
experts will be interpreted as judging the responses the subjects made,
rather than the subjects themselves. As is noted in the next section, this
kind of collapsing of the data is useful for other purposes as well.
Deciding whether two treatments are identical may be nontrivial.
Sometimes the responses are literally identical, but in other cases the re-
sponses will differ slightly, although not in ways that would lead a physician
with a good understanding of the problem to accept one without also
accepting the other. One plausible criterion is to lump together two therapy
recommendations for a case if no evaluator accepts one without accepting
the other. A second test is available when one of the evaluators gives a
recommendation that is identical to one of the prescribers recommenda-
tions. Recommendations that that evaluatorjudged to be equivalent to his
own can then be grouped with the evaluators recommendation, so long as
doing so does not conflict with the first criterion. In using either of these
tests, the data should first be made consistent in the manner discussed in
Section 30.3.1.
Using these tests, the ten subjects in the ten cases of the MYCIN study
reduced to an average of 4.2 different therapy recommendations for each
case, with a standard deviation of 1.55 and a range from 2 to 6. This seems
to be a large enough reduction to warrant looking at the data in this col-
lapsed form.
0.699, which is both higher than the mean agreement (0.591) and higher
than the mean of the prescribers scores (0.585). This latter fact is to
expected, since the subjects included people who were chosen for the study
because their level of expertise was assumed to be lower than that of the
evaluators. Nevertheless, half of the evaluators scored above the highest-
scoring prescriber (while the other half spread out evenly over the range
between the top-ranking subject and the eighth-ranking subject). The fact
that agreement between the evaluators looks higher on this measure than
it does on other measures indicates that much of the disagreement was
over therapies that none of the evaluators themselves recommended.
It is interesting to ask whythe evaluators ranked higher in this analysis
than the Stanford faculty members among the prescribers, many of whom
would have qualified as experts by the criteria we used to select the national
panel. A plausible explanation is the method by which the evaluators were
asked to indicate their own preferred treatment for each of the ten cases.
As is described in Chapter 31, [or each case the expert was asked to indicate
a choice of treatment on the first page of the evaluation form and then to
turn the page and rank the ten treatments that were recommended by the
prescribers. There was no way to force the evaluators to make a commit-
ment about therapy before turning the page, however. It is therefore quite
possible that the list of prescribers recommendations served as "memory
joggers" or "filters" and accordingly influenced the evaluators decisions
regarding optimal therapy for some of the cases. Since none of the pre-
scribers was aware of the decisions made by the other nine subjects, the
Stanford faculty members did not benefit from this possible advantage.
Wesuspect this may partly explain the apparent differences in ratings
among the Stanford and non-Stanfbrd experts.
30.3.5 Summary
This chapter is an edited version of an article originally appearing in Journal of the American
Medical Association 242:1279-1282 (1979). Copyright 1979 by the American Medical As-
sociation. All rights ,eserved. Used with permission.
589
590 AnEvaluationof MYCINs
Advice
to show that its therapeutic regimens are as reliable as those that an infec-
tious disease specialist would recommend. An evaluation of the systems
ability to diagnose and treat patients with bacteremia yielded encouraging
results (Yu et al., 1979a). The results of that study, however, were difficult
to interpret because of the potential bias in an unblinded study and the
disagreement among the infectious disease specialists as to the optimal
therapeutic regimen for each of the test cases.
The current study design enabled us to compare MYCINsperfor-
mance with that of clinicians in a blinded fashion. This study involved a
two-phase evaluation. In the first phase, several prescribers, including MY-
CIN, prescribed therapy for the test cases. In the second phase of the
evaluation, prominent infectious disease specialists, the evaluators, assessed
these prescriptions without knowing the identity of the prescribers or
knowing that one of them was a computer program.1
31.1Materials
and Methods
in infectious diseases. None of" these individuals was associated with the
MYCINproject. The seven Stanford physicians and the medical student
were asked to prescribe an antimicrobial therapy regimen for each case
based on the information in the summary. If they chose not to prescribe
antimicrobials, they were requested to specify which laboratory tests (if any)
they would recommend for determining the infectious etiology. There
were no restrictions concerning the use of textbooks or any other reference
materials, nor were any time limits set for completion of the prescriptions.
Ten prescriptions were compiled for each case: that actually given to
the patient by the treating physicians at the county hospital, the recom-
mendation made by MYCIN, and the recommendations of the medical
student and of the seven Stanford physicians. In the remainder of this
chapter, MYCIN,the medical student, and the eight physicians will be
referred to as prescribers.
The second phase of the evaluation involved eight infectious disease
specialists at institutions other than Stanford, hereafter referred to as eval-
uators, who had published clinical reports dealing with the managementof
infectious meningitis. They were given the clinical summaryand the set of
ten prescriptions for each of the ten cases. The prescriptions were placed
in random order and in a standardized format to disguise the identities of
the individual prescribers. The evaluators were asked to make their own
recommendations for each case and then to assess the ten prescriptions.
The 100 prescriptions (10 each by 10 prescribers) were classified by each
evaluator into the following categories:
31.2 Results
The evaluators ratings of each prescriber are shown in the second column
of Table 31-1. Since there were 8 evaluators and 10 cases, each prescriber
received 80 ratings from the evaluators. Sixty-five percent of MYCINs
prescriptions were rated as acceptable by the evaluators. The correspond-
ing mean rating for the five facuhy specialists was 55.5% (range, 42.5% to
62.5%). A significant difference was found among the prescribers; the
hypothesis that each of the prescribers was rated equally by the evaluators
is rejected (standard F test, F= 3.29 with 9 and 70 d];" p < 0.01).
Consensus among evaluators was measured by determining the num-
ber of cases (n = 10) in which the prescriber received a rating of acceptable
from the majority (five or more) of experts (third column of "Fable 31-1).
Seventy percent of MYCINstherapies were rated as acceptable by a ma-
jority of the evaluators. The corresponding mean ratings [or the five fac-
uhy prescribers was 44% (range, 30% to 50%). MYCINfailed to win
rating of acceptable from the majority of evaluators in three cases. MYCIN
prescribed penicillin fl)r a case of meningococcal meningitis, as did fi)ur
evaluators. However, [bur other evaluators prescribed penicillin with chlor-
amphenicol as initial therapy before identification of the organism, and
they rated MYCINstherapy as not acceptable. MYCINprescribed peni-
cillin as treatment for group B Streptococcus; however, most evaluators se-
lected ampicillin and gentamicin as initial therapy. MYCIN prescribed pen-
icillin as treatment {br Lister[a; however, most evaluators used combinations
of two drugs.
Comment 593
31.3 Comment
In clinical medicine it may be difficult to define precisely what constitutes
appropriate therapy. Our study used two criteria for judging the appro-
priateness of therapy. One was simply whether or not the prescribed ther-
apy would be effective against the offending pathogen, which was ulti-
mately identified (fourth column of Table 31-1). Using this criterion, five
prescribers (MYCIN,three faculty prescribers, and the actual therapy
given the patient) gave effective therapy for all ten cases. However, this
was not the sole criterion, since failure to cover other likely pathogens and
the hazards of overprescribing are not considered. The second criterion
used was the judgment of eight independent authorities with expertise in
the management of meningitis (second and third columns of Table 31-1).
Using this criterion, MYCIN received a higher rating than any of the nine
human prescribers.
This shows that MYCINscapability in the selection of antimicrobials
for meningitis compares favorably with the Stanford infectious disease spe-
cialists, who themselves represent a high standard of excellence. Three of
the Stanford faculty physicians would have qualified as experts in the man-
agement of meningitis by the criteria used for the selection of the national
evaluators.
Of" the five prescribers who never failed to cover a treatable pathogen
(fourth column of Table 31-1), MYCINand the faculty prescribers were
relatively efficient and selective as to choice and number of antibiotics
prescribed. In contrast, while the actual therapy prescribed by the physi-
cians caring for the patient never failed to cover a treatable pathogen, their
therapeutic strategy was to prescribe several broad-spectrum antimicro-
bials. In eight cases, the physicians actually caring for the patient pre-
scribed two or three antimicrobials; in six of these eight cases, one or no
antimicrobial would have sufficed. Overprescribing of antimicrobials is not
necessarily undesirable, since redundant or ineffective antimicrobial ther-
apy can be discontinued after a pathogen has been identified. However,
an optimal clinical strategy attempts to limit the number and spectrum of
antimicrobials prescribed to minimize toxic effects of drugs and superin-
594 AnEvaluationof MYCINs
Advice
fection while selecting antimicrobials that will still cover the likely patho-
gens.
The primary limitation of our investigation is the small number of
cases studied. This was a practical necessity, since we had to consider the
time required for the evaluators to analyze 10 complex cases and rate 100
therapy recommendations. Although only 10 patient histories were used,
the selection criteria provided for diagnostically diverse and challenging
cases to evaluate MYCINsaccuracy. The selection of consecutive or ran-
dom cases of meningitis admitted to the hospital might have yielded a
limited spectrum of meningitis cases that would not have tested fully the
capabilities of either MYCIN or the Stanford physicians. In addition to
our evaluation, the program has undergone extensive testing involving
several hundred cases of retrospective patient histories, prospective patient
cases, and literature cases of meningitis. These have confirmed its com-
petence in determining the likely identity of the pathogen, selecting an
effective drug at an appropriate dosage, and recommending further di-
agnostic studies (a capability not evaluated in the current study).
Because of the diagnostic complexities of the test cases, unanimity in
all eight ratings in an individual case was difficult to achieve. For example,
in one case, although the majority of evaluators agreed with MYCINs
selection of antituberculous drugs for initial therapy, two evaluators did
not and rated MYCINstherapy as not acceptable. Six of the ten test cases
had negative CSFsmears for any organisms, so in these cases antimicrobial
selection was madeon a clinical basis. It is likely that if" more routine cases
had been selected, there would have been greater consensus among eval-
uators.
The techniques used by MYCINare derived from a subfield of com-
puter science knownas artificial intelligence. It may be useful to analyze
some of the factors that contributed to the programs strong performance.
First, the knowledge base is extremely detailed and, for the domain of
meningitis, is more comprehensive than that of most physicians. The
knowledge base is derived from clinical experience of infectious disease
specialists, supplemented by information gathered from several series of
cases reported in the literature and from hundreds of actual cases in the
medical records of three hospitals.
Second, the program is systematic in its approach to diagnosis. A pop-
ular maximamong physicians is "One has to think of the disease to rec-
ognize it." This is not a problem for the program; rare diseases are never
"forgotten" once information about them has been added to the knowledge
base, and risk factors for specific meningitides are systematically analyzed.
For example, the duration of headache and other neurological symptoms
for one week before hospital admission was a subtle clue in the diagnosis
of tuberculous meningitis. The program does not overlook relevant data
but also does not require complete and exact information about the patient.
For example, in a case involving a patient with several complex medical
Comment 595
599
600 Human
Engineeringof MedicalExpertSystems
tThe CON(;EN programwithin DENDRAL had just been recoded from Interlisp to BCPL,
and wewereacutely awareof the manpowerinvestmentit tookby someoneintimately familiar
withthe designandcode. This efli~rt couldonlyhavebeenundertakenunderthe conviction
that the result wouldbe widelyused.
TheInterfaceLanguage
for Physicians 601
activities in the areas of human engineering and user attitudes. Our new
work on ONCOCIN,for example, has been based on underlying knowl-
edge structures developed for MYCINbut has been augmented and re-
vised extensively because of our desire to overcome the barriers that pre-
vented the clinical implementation of MYCIN.Our attitude on the
importance of human factors in designing and building expert systems is
reflected in the title of a recent editorial we prepared on the subject: "Good
Advice is Not Enough" (Shortliffe, 1982b).
Jump
Ahead
\
Next
Blank
\ \
The design of the display is derived from the paper flow sheet used for
manyyears fl)r protocol data gathering and analysis. The display screen is
divided into four sections as indicated in Figure 32-3:
a. the explanation field, which presents the justification for the recommen-
dation indicated by the user-controlled cursor location (the black block
in the figure)
b. the message.field, which identifies the patient and provides a region for
sending pertinent messages from ONCOCIN to the physician
c. theflow sheet, which displays a region of the conventional hard copy flow
sheet; the display includes columns for past visits, and the physician
enters data and receives recommendations in the right-hand column
d. the soft key ident!fiers, labels that indicate the special functions associated
with numbered keys across the top of the terminal keyboard
Note that when the physician is entering patient data, the explanation
field specifies the range of expected entries for the item with which the
cursor is aligned. Whenthe system has recommendedtherapy (as in Figure
32-3), the explanation field provides a brief justification of the drug dosage
indicated by the cursor location.
II i,
~0 CO -e-
..o c5~.
go w~
~a"7
~oa
0
~ .00 R
{g Z
0
Z
o
W~ m,
e~
0
CO
g,,-:
"004
g~
n
Cxl
n"
w z,. E~
~ oE~
o~ ~ .;he
610 Human
Engineeringof MedicalExpertSystems
maker about the patients care, and that the computer-based consultant is
intended to remind the physician about the complex details of the proto-
cols and to collect patient data. Membersof our group meet with oncology
faculty and physicians occasionally to give them progress reports on our
research.
Wealso enlisted the help of" a data manager who is responsible for
training sessions, ensures that on-line patient records are current, and sees
that the system runs smoothly. The data manager is available whenever
the system is running in the clinic and offers assistance when necessary.
This role has proved to be particularly crucial. The data manager is the
most visible representative of our group in the clinic (other than the col-
laborating oncologists themselves). The person selected for this role there-
fore must be responsible, personable, tactful, intelligent, aware of the sys-
tems goals and capabilities, and able to communicateeffectively with the
physicians. If the person in this role is unable to satisfy these qualifications,
he or she can make system use seem difficult, undesirable, and imposing
to the physician users.
Integration of the system into the clinic was planned as a gradual
process. Whenthe system was first released, the program handled a small
number of patients and protocols. As the program became more familiar
to the physicians, we added more patients to the system. Weare in the
process of adding new protocols, which in turn will mean additional pa-
tients being handled on the computer. ONCOCIN was initially available
only three mornings per week. It is now available whenever patients who
are being followed on the computer are scheduled. This plan for slow
integration of the system into the clinic has made ONCOCINs initial re-
lease less disruptive to the clinic routine than it would have been if we had
attempted to incorporate a comprehensive system that handled all patients
and protocols from the onset. This method of integration has also allowed
us to fine-tune our system early in its development, based on responses
and suggestions from our physician users.
After the systems initial release, the data manager and the collaborating
oncologists collected comments and suggestions from the physicians who
used the system. We have made numerous program changes in response
to suggestions for modifications and desirable new features. Wehave also
conducted a number of fi)rmal studies to evaluate the impact of the system
on physicians attitudes, the completeness and accuracy of data collection,
and the quality of the therapeutic decisions.
Wesoon learned that some of our initial design decisions had failed
to anticipate important physician concerns. For example, if the Reasoner
needed an answer to a special question not on the regular flow sheet form,
Clinical Implementation
of an ExpertSystem 611
our initial approach was to have the Interviewer interrupt data entry to
request this additional information. The physicians were annoyed by these
interruptions, so we modified the scheme to insert the question less obtru-
sively on a later section of" the flow sheet, and to stop forcing the physician
to answer such questions.
Another concern was that ONCOCIN was too stringent about its drug
dosage recommendations, requesting justifications from the physician even
for minor changes. Weneeded to take into account, for example, that a
different pill size might decrease or increase a dose slightly and yet would
be preferable for a patients convenience. Wesubsequently obtained from
the oncologists on our team ranges for each chemotherapeutic agent,
within which any dosage modifications could be considered insignificant.
4Such minor modifications no longer generate requests for justification.
We also modified the program to recommend the same dose that the phy-
sician prescribed during a prior visit if that recommendationis within the
acceptable range calculated by the program.
Some system users also asked whether the program could generate a
progress note for the patients visit. Whenwe developed this feature and
installed a small printer to prepare these notes in duplicate, use of the
system was immediately made more desirable because this capability saved
the physician the time required to dictate a note. This feature also helps
to encourage the physician to enter relevant data completely and accurately
because the quality of the resulting progress note is dependent on the data
entry process.
Whenthe system was first released, it was available only on the three
mornings per week when the majority of lymphoma patients were seen
(the computer, a DECSystem 2020, is used at other times by other mem-
bers of our research community). This allowed us to provide rapid re-
sponse time through an arrangement for high-priority use of the com-
puter. Since some lymphomaprotocol patients were seen at other times,
however, there were continuing problems in keeping the computer-based
files up to date and thus in establishing ONCOCINs role as a reliable aid
for the managementof that subset of patients. In response to this problem,
we have made the system available whenever a patient known to the system
is seen in the clinic. Whenthe physician initiates a consultation, the pro-
gram checks to see if" the computer response is likely to be slow and, if so,
prints out a warning to that effect. The physician may then either abort
the session or proceed with the anticipation that the interaction will take
longer than usual. Wehave found that the physicians understand and
appreciate this feature and will often continue despite the delays.
Alain Bonnet
613
614 Strategiesfor Understanding
StructuredEnglish
I amready
** THISIS A 26 YEAR
OLDMALE
PATIENT
Myunderstanding
is:
Theageof thepatient
is 26
Thesexof thepatient
is male
** FIVE DAYSAGO,HE HADRESPIRATORY-TRACT
SYMPTOMS
Whatis his name?
** JO
Myunderstanding
is:
Thename
of thepatientis Jo
Respiratory-tract
is oneof thesymptoms
that thepatienthad
** A COUPLE OFDAYS BEFORE THEADMISSION,
HEHADA MALAISE
Please
give methedateof admission
** MARCH12, 1979
Myunderstanding
is:
Thepatient
wasadmittedat thehospital
3 daysago
Malaise
is oneof thesymptomsthatthepatient
had5 daysago
some of the frames might rule out others, thus enabling the space of
possible inferences to be pruned. This isstie has also been raised by Char-
niak (1978). Embodying world knowledge in frames (Minsky, 1975)
scripts (Abelson, 1973; Schank and Abelson, 1975) led to the development
of" programs that achieved a reasonably deep level of understanding, for
example, GUS (Bobrow et al., 1977), NUDGE(Goldstein and Roberts,
1977), FRUMP(DeJong, 1977) and SAM (Cullingford, 1977).
BAOBABand the other programs mentioned so far have a common
feature: they do not interpret sentences in isolation. Rather, they interpret
in the context of an ongoing discourse and, hence, use discourse structure.
BAOBAB also explores issues of (a) what constitutes a model for structured
texts and (b) how and when topic shifts occur. However, BAOBABis in-
terested neither in inferring implicit facts that might have occurred tem-
porally between facts explicitly described in a text nor in explaining inten-
tions of characters in stories (main emphases of works using scripts or
plans). Our program focuses instead on coherence of texts, which is mainly
a task of detecting anomalies, asking the user to clarify vague pieces of
information or disappointed expectations, and suggesting omissions. The
domain of application is patient medical summaries, a kind of text for
which language-processing research has mainly consisted of filling in for-
matted grids without demanding any interactive behavior (Sager, 1978).
BAOBABsobjectives are to understand a summary typed in "natural med-
616 Strategiesfor Understanding
StructuredEnglish
In the $DESCRIPT schema (Figure 33-2), the first three global slots (AU-
TH()R, CREATION-I)ATE, and COMMENT)are used for documenta-
tion, whereas the next four are used to define strategies for schema-shifts
(see below). Then six individual slots (corresponding to parameter names)
define the schema. Each of them is described by subslots, or facets, some
of which (e.g., EXPECT, TRANS, LEGALVALS, CHECK, PROMPT) al-
ready exist in the structure of MYCINsknowledge base. Others have been
618 Strategies for Understanding Structured English
~.OE_sg.R!P_T_
AUTHOR:BONNET
CREATION-DATE: OCT-10-78
COMMENT:
Patient identification
CONFIRMED-BY: (NAMEAGESEXRACE)
TERMINATED-BY: ($SYMPTOM)
SUGGESTED-BY: (WEIGHTHEIGHT)
PREF-FOLLOWED-BY: ($SYMPTOM)
NAME
EXPECT:ANY
TRANS:
("the nameof" *)
TOBEFILLED:T
WHENFILLED:DEMONNAME
AGE
EXPECT: POSNUMB
TRANS:
("the age of" *)
CHECK:
(CHECK VALU0 100.0(LIST "Is the patient really"
VALU "years old?") T)
TOBEFILLED: T
WHENFILLED:SETSTATURE
SEX
EXPECT:(MALEFEMALE)
TRANS:
("the sex of" *)
TOBEFILLED:T
WHENFILLED:SEXDEMON
RACE
EXPECT:(CAUCASIAN BLACKASIANINDIAN LATINOOTHER)
TRANS:
("the race of" *)
WEIGHT
EXPECT: POSNUMB
TRANS:
("the weightof" *)
CHECK:
(CHECK VALULIGHTHEAVY (LIST "Does the patient
really weigh"VALU
"kilograms?")T)
HEIGHT
EXPECT: POSNUMB
CHECK:
(CHECK VALUSMALL TALL(LIST "Is the patient
really" VALU
"centimeters
tall?") T)
FIGURE
33-2 Schemaof a patient description.
created to allow the program to intervene during the course of" the dia-
logue. For example, when the slot TOBEFILLED holds the value T (true),
it means that the value of the variable must be asked if the physician does
not provide it. The WHENFILLED feature specifies a procedure to run
as soon as the slot is filled in. This is the classic way of" makinginferences.
For example, SETSTATUREspecifies narrower ranges of weight and
height for a patient according to his or her age.
Schemata
andTheirRelations 619
33.2.2 Facets
a. Produceinferences. If the attribute of a clause that has just been built has
an attached procedure, it can trigger the building of another clause; for
example, INFERFEVER is run as soon as the temperature is known and
can lead to a clause such as "The patient is not febrile."
b. Narrowa range ~[" expected values. Consider, for example, the weight of a
patient. This has a priori limits, by default, of 0 and 120 kilograms. This
range is narrowed according to the age of the patient as soon as the
latter is known.
c. Make predictions. An event like "a lumbar puncture" can cause predic-
tions about "CSFdata" (not about their values, but about the fact that
620 Strategiesfor Understanding
StructuredEnglish
BAOBAB
distinguishes among three kinds of default values:
are used to parse "BP 130/94" or "T 98 E" The category <TEMPNUM>
has an attached procedure, a specific piece of code that recognizes "F" as
Fahrenheit, detaches it from "98," verifies that 98 is a reasonable value for
a temperature, and finally returns "98 degrees" as the value of the tem-
perature.
The following are examples of the "syntax" of purely semantic rules:
This subset of the grammar enables the program to recognize inputs such
as the following:
where <NP> stands for noun phrase, <VP> for verb phrase, <DET>
for determiner, <PREPP> for prepositional phrase and <PREP> for
preposition. The set of rules enables the system to recognize input sentence
1 above (except for the notion of time), as shown in the syntactic tree
Figure 33-3.
Whenthe semantic component interprets such a syntactic tree, it
checks that <NOUN> is matched by a person (whereas the direct use of
<PATIENT>would make useless such a verification). Input sentences
such as the following would thus be rejected:
<SENTENCE)
<NP> <VP>
I
<NOUN> <VERB> <PREPP>
<PREP> <NP>
<SAME>-~ IS[HAS[...
33.4Schema-Shift
Strategies
Bullwinkle makes the distinction [Bullwinkle (1977); see also Sidner (1979)]
between potential and actual shifts of focus, pointing out that the cues
Schema-Shift Strategies 625
1. "The patient was found comatose. She was admitted to the hospital. A
lumbar puncture was performed. She denied syncope or diplopia..."
2. "The patient was found comatose. He was admitted to the hospital. The
protein from CSF was 58 mg%..." (CSF = cerebrospinal fluid)
In Example 1, the lumbar puncture suggests CSFresults that are not given
(weak clue). In Example 2, a detail of CSFresults (strong clue) is given
directly ("the protein"). In other words, the physician jumps into detail,
and the frame is directly confirmed.
A simple case in which a schema can be terminated is when all of its slots
have been filled. This is an ideal situation, but it does not occur very often.
Another case is when the intervention of a schema implies that another
schema is out of focus, which could be, but is not necessarily, the result of
chronological succession. In general, this phenomenon occurs when the
speaker actually starts the plot after setting the characters of the story.
There is no standard way to decide when the setting is finished. However,
as soon as the story actually starts, the setting could be closed and possibly
completed with default values or with the answers to questions about what-
ever was not clear or onfitted. A TERMINATED-BY slot has been created
626 Strategiesfor Understanding
StructuredEnglish
to define which schemata can explicitly terminate others; for example, the
$SYMPTOM schema usually closes the $DESCRIPT schema (name, age,
sex, race), as it is very unlikely that the speaker will give the sex of the
patient in the middle of the description of" the symptoms. This fact is due
to the highly constrained nature of the domain.
Whena schema is terminated, the program infers all the default values of
the unfilled slots. It also checks whether the expectations set during the
story have been fulfilled. These actions can be performed only when a shift
has been detected or at the end of the dialogue; otherwise, the program
might ask too early about information that the user will give later. In the
case where a schema has been exhausted (all its slots filled), an a priori
choice with regard to the predicted next schema is made. This choice is
possible by using a PREFERABLY-FOLLOWED-BY pointer that, in the
absence of a bottom-up (data-driven) trigger for the next schema, decides
in a top-down fashion which schema is the most probable to follow at a
given point.
The BAOBAB grammar is currently not very large (about 200 rules), and
only seven schemata have been implemented. In order to be able to parse
more complicated texts efficiently, ones inw~lving symptoms that might
imply different infections (with interactions between them), a computer
program will need more sophisticated clues to determine which schema is
the most appropriate to apply and in what priority. Furthermore, certain
concepts are currently ignored because their relevance to medical knowl-
edge is not always straighttorward. For example, an infection acquired by
a memberof the patients family or even the patients occupation could
give hints to the physician during the early stages of the diagnostic tbr-
mulation, but these are not currently used.
33.6Sample Sessions
**A 19 YEAR
OLDBOYWITHPETECHIAL
RASH,
LETHARGY
AND
POSITIVE
BRUDZlNSKIS
SIGN.
Patients
name?
(The $DESCRIPTschemahas been activated by the beginningof the input.
Thenthere is a schema-shift which generates the request for the missing name.)
** BILL
Myunderstanding
is:
Sample Sessions 629
** REASONS
FORADMISSION:CHILLS HEADACHE
MALAISE (sets up expectations)
SoI will expectchill headache
malaiseas valuesfor symptoms
DATEOF ADMISSION:
DECEMBER
25 1978
Myunderstanding is that:
the patient wasadmittedat the hospital 4 daysago
*" HISTORYOF THEPRESENT
ILLNESS (announcesthe $1DENTIFICATION
schema)
Ok.
** PATIS A 26 YEAROLDCAUCASIAN
FEMALE
Myunderstanding is that:
the ageof the patientis 26years
the raceof the patientis Caucasian
the sexof the patientis female
the nameof the patient is Pat
*ONEDAY PRIORTO ADMISSIONSHE EXPERIENCED (shift to $SYMPTOMS)
CHILLS ANDHEADACHE
Myunderstanding is that:
chill is oneof the symptoms
that the patient had5 daysago
headache is oneof the symptoms
that the patient had5 daysago
(time inferences)
* ONEDAYAFTERSHEHADA MALAISE
Myunderstanding
is that:
630 Strategies for Understanding Structured English
** PERIPHERAL
CBC (sets anothercontext)
Ok.
** WBC
23.5 (understood
in context)
Myunderstanding is that:
the whitecountfromthe patientsperipheral
CBC(in thousands)is 23.5
** THEVALUE
OF WBC
IS 22.5 (correctsa mistyping)
Insteadof 23.5? (makingsure that it wasmeant)
"* YES
Mynewunderstanding is that: Thewhite countfrom
the patients peripheralCBC(in thousands)
is 22.5
** BYE
I haveassumed the following by default: (inferencesby default)
the stateof consciousnessof the patientis alert
the patientsX-rayis not abnormal
the patient is not immunosuppressed
Doyou agree?
** THEPATIENTIS IMMUNOSUPPRESSED (overridesa default value)
Mynewunderstanding
is that: Thepatient is immunosuppressed
Doyou agree?
** YES
bye.
** REASONS
FORADMISSION:HEADACHE,
LETHARGY
ANDFEVER
So I will expect HEADACHE
LETHARGY
FEBRILEas symptoms
** DATEOF ADMISSION
ONMARCH
14
Thatwas6 daysago.
Myunderstanding is
the patient wasadmittedat the hospital 6 daysago
*" BARBARA
IS A 3 YAEROLDWHITEFEMALE
= YEAR (automaticspellingcorrection)
Myunderstanding is
the ageof the patientis 3 years
the raceof the patientis Caucasian
the sexof the patientis female
the nameof the patient is Barbara
** ONEDAY PRIORTO ADMISSION,SHEBECAME
LETHARGIC
Myunderstandingis
lethargyis oneof the symptoms
that the patient had7 daysago
** SHE COMPLAINED
OF HEADACHE
ANDHADA TEMPERATURE
OF 101 DEGREES
Myunderstanding is
headacheis oneof the symptoms that the patient had
the temperatureof the patientis 101
thepatientis febrile
632 Strategies for Understanding Structured English
** A PHYSICIANOBTAINEDA COMPLETE
BLOODCOUNT
Myunderstanding is
the results of a recentCBC
of the patientare available
** A LUMBARPUNCTURE
WASPERFORMED
Myunderstandingis
a lumbarpuncturehasbeenperformedon the patient
** WEIGHT
39 LBS
Myunderstanding is
the weightof the patientis 17.7 kgms
** HEIGHT
1.03 METER
Is the patientreally 103.0centimeters
tall? (askingfor confirmationbecause
that seems
tall for 3 years)
(Y or N) ** YES
Myunderstanding is
the heightof the patient is 103cms
** BP120/70P 140
Myunderstandingis
the bloodpressureof the patient is 120/70
the pulseof the patientis 140
** RR28 T 101 (redundancy,the temperaturewasalready known)
Myunderstanding is
therespiratoryrate of the patientis 28
** SKIN: PETECHIAE
ONFACEANDSHOULDERS
Myunderstanding is
petechialis oneof the typesof rashwhichthe patient has
** NECK:DEFINITEMENINGISMUS
Myunderstanding
is
meningismus
is oneof the signs that the patient showed
** LABDATA:CSFRESULTS (announces
lab results; no clauseis built)
Ok.
** WBC110 WITH96%POLYS (interpretationin the contextof CSF)
Myunderstanding is
the WBC from the CSFis 110
the percento! PMNsin the CSFis 96
** PROTEIN25 MG%ANDGLUCOSE
95
Myunderstanding is
the CSFprotein is 25
the CSFglucosevalueis 95
** PERIPHERAL
CBC
Ok.
** WBC12300
Is the whitecountfromthe patients peripheralCBC(in thousands)really 12300?
(the physicianforgot that a number
of thousands
wasexpected)
** NO
Pleaseenter the right valuenow
** 12.3
Myunderstanding
is
Conclusion 633
33.7 Conclusion
The relationships could be rather loose, but the more constrained they are,
the better this feature would work.
Expert systems usually need some kind of understanding to commu-
nicate in natural jargon with their users (expert, consultant, and/or stu-
dent). The technique described here--breaking the knowledge down into
schemata that correspond to different pieces of texts, associating semantic
grammar rules with the schemata, and using strategies fi)r recognizing
episode shifts--should be generally applicable in such domains.
34
An Analysis of Physicians
Attitudes
635
636 AnAnalysisof PhysiciansAttitudes
Our study was motivated by the belief that the future of research in medical
computing, particularly the development of computer-based consultation
systems, depends on improving our understanding of the needs, expec-
tations and performance demands of clinicians. The previous studies had
not specifically addressed these issues. Our study used a questionnaire,
similar in format to the instrument developed by Startsman and Robinson
(1972) but different in content. One modification was to limit the scope
our survey by focusing only on physicians attitudes regarding clinical con-
sultation systems. Previous studies had been more general in their focus
and had surveyed a broader range of opinion. Wechose this more limited
focus because several research groups currently developing medical con-
sultation systems are concentrating on physician users and have recognized
the need for better information about the concerns and performance de-
mands of clinicians. Another change was the inclusion of statements de-
signed to ascertain the performance capabilities that physicians consider
necessary for a consultation program to be clinically acceptable. Previous
studies had not addressed this important aspect. Wehoped that with these
modifications the study would yield results from which guidelines could
be formulated to help medical computing experts design more acceptable
clinical consultation systems.
34.1Methods
34.1.1 Instrument
VIhe tutorial was oftcred by tile Departments of Medicine and Computer Science at Stanford
University in Atugt,st of 1980. h was organized in conjunction with the Sixth Annual Work-
shop on Artificial Intelligence ill Medicine, which was sponsored by the Division of Research
Resources of the NIll.
ZThe statements are shownin "fable 34-3. For identification purposes in this paper, each is
identified by an E followed by a ntunher. The letter E denotes that the statement belongs to
the Expectation-scale.
638 AnAnalysisof PhysiciansAttitudes
34.1.2 Participants
Two samples of physicians were included in the study. One included reg-
istrants for the tutorial mentioned above. The 85 physicians who filled out
the questionnaire represented 90% of the physicians registered for the
tutorial. Twenty-nine nonphysician attendees who were engaged in either
basic medical research or medical computing also returned survey forms.
By announcing that the course was appropriate for physicians with
little or no knowledge of medical computing, we hoped to attract a cross
section of physicians. Although continuing medical education (CME)credit
was also available, we were aware that the backgrounds and attitudes of
these physicians might contrast with those who chose not to attend the
tutorial. Therefore, a second sample of physicians was selected from Stan-
ford Medical School clinical facuhy and from Stanford-affiliated physicians
practicing in the surrounding community.
34.1.3 Procedure
3TheDemand-scale
statementsare shownin Table34-5. Eachstatementis identified by a D
followedby a number.
Results 639
34,2Results
34.2.1 Characteristics of Physicians Studied
~AII recipients had also received an initial announcementfor the course several weeks earlier,
and none had registered in response to the initial mailing.
64O AnAnalysisof PhysiciansAttitudes
The options for the Acceptance question are shown in Table 34-1. Physi-
cians had an average Acceptance rating of" 5.5 application~, out of the 8 .
included on the scale. The table shows that support for th~ 5 major ap-
plications exceeded 80% of respondents.
Medical speciality was the only characteristic that was significantly pre-
dictive of a respondents Acceptance of computing applications. Table
34-2 shows that surgeons were less accepting of medical computing appli-
cations than either of the other two subgroups. There was no significant
difference in the Acceptance rating between tutorial and nontutorial par-
ticipants, private practice and academic physicians, those with several years
in practice and those who had recently graduated, physicians engaged in
research and those who were not, or physicians with and without comput-
ing experience.
~xx
642 AnAnalysisof PhysiciansAttitudes
Table 34-3 displays the ratings and standard deviations for each statement
on the Expectation-scale. The statements are listed in order of their av-
erage ratings, from those outcomes that physicians thought were the most
likely to occur to those that were expected to occur less frequently. The
average Expectation rating for physicians was slightly positive (X = .42).
This was comparable to that of the nonphysician sample, shown in the
right-hand column. Only 3 of the 17 statements received negative ratings
(i.e., were judged likely to occur), including fears about the possibility that
consultation systems will increase government control of medicine, con-
cerns that systems will increase the cost of care, and expectations that pa-
tients will blame the computer program for ineffective treatment decisions.
On the other hand, physicians felt strongly that consultation systems would
neither interfere with their efficiency nor force them to adapt their think-
ing to the reasoning process used by the computer program. They also
felt that the use of consultation systems would not reduce the need for
either specialists or paramedical personnel.
Subgroups of physicians displayed significant differences in their Ex-
pectations about how computer-assisted consultations will affect medical
practice. The means and standard deviations for all the significant findings
are summarizedin "Fable 34-4. A significance level of .01 was used for each
analysis in order to maintain an overall significance level of less than .06.
The Expectations of tutorial registrants were on the average more positive
than those of the nontutorial group, although neither group thought that
consultation programs would adversely affect medical practice. Physicians
in academic settings and those in training indicated overall positive Ex-
pectations, whereas private practice physicians tended to hold slightly neg-
ative Expectations. Young doctors expressed more positive Expectations
than did physicians with 10 to 20 years of experience, although the recent
graduates were no more positive than physicians with at least 20 years
experience. Experience with computers was positively related to Expecta-
tions, as was Knowledge about computing concepts.
Results 643
TABLE34-5 Means Ratings and Standard Deviations (in Parentheses) for De-
mand Statements
Physicians Nonphysicians
n = 146 n = 129
D1. Should be able to explain their diagnostic and 1.42 1.78
treatment decisions to physician users (.80) (.42)
D2. Should be portable and flexible so that 1.14 1.52
physician can access them at any time and (.81) (.51)
place
D3. Should display an understanding of their own .99 1.48
medical knowledge (.94) (.80)
D4. Should improve the cost efficiency of tests and .85 I.I 1
therapies (.99) (1.58)
D5. Should automatically learn new information .84 1.41
when interacting with medical experts (1.02) (.75)
D6. Should display commonsense .75 1.11
(1.20) (.97)
D7. Should simulate physicians thought processes .64 .93
(1.16) (1.07)
D8. Should not reduce the need for specialists .46 .70
(1.18) (1.07)
D9. Should demandlittle efibrt from physician to .35 1.19
learn or use (1.20) (.92)
DI0. Should respond to voice command and not .26 .56
require typing (1.23) (1.05)
DI I. Should not reduce the need for .26 .85
paraprofessionals (1.06) (1.03)
D 12. Should significantly reduce amount of -.08 .00
technical knowledge physician must learn and (1.34) (1.49)
remember
D13. Should never make an error in treatment -.25 -.22
planning (1.33) (1.34)
D14. Should never make an incorrect diagnosis -.45 -.26
(1.31) (1.46)
D I5. Should become the standard for acceptable -.80 .00
medical practice (1.13) ( 1.07)
Total scale = .44 .81
646 AnAnalysisof PhysiciansAttitudes
Factor 1 includes statements E7, E8, E11, El3, and El7 (Table 34-3).
It relates to Expectations about how physicians might be personally af-
fected by a consultation system. All of these statements received positive
ratings (i.e., the outcomes were judged to be unlikely) ranging from .34
61.05. Factor loadings for the statements ranged from .43 to .59.
Factor 2 includes statements D 1, D2, D3, D5, and D6 from the D-scale
(Table 34-5). The factor is composed of the performance Demandsthought
by physicians to be the most important. Ratings of the statements ranged
from .75 to 1.42. Factor loadings for the statements ranged from .41 to
.65.
Factor 3 relates to Demandsabout system accuracy. It includes state-
ments D13 and D14, which were rated relatively unimportant by the re-
spondents. Factor loadings were .84 and .89, respectively.
Factor 4 includes statements from both scales and relates to physicians
attitudes regarding the effect of computing systems on the need for health
care personnel. It includes statements El5, El6, D8, and D11. The factor
reflects the opinion that consultation systems will not and should not affect
the need for either specialists or paraprofessionals.
Factor 5 includes statements E 1, E4, E5, E6, E8, E9, and E 11 from the
E-scale. It is similar to Factor 1 because statements E8 and E11 relate to
both factors; however, its focus appears to be slightly different. Whereas
Factor 1 related to the individual practitioner, Factor 5 is concerned with
the effect of consultation programs on medical practice in general. Factor
loadings ranged from - .70 to - .41.
Nearly the same pattern of differences among physicians was found
for the factors as was found for the full-scale ratings. Individual differences
in Expectations on Factors 1 and 5 were related to differences in knowledge
about computer concepts, experience with computers, time in medical
practice, professional orientation, and tutorial participation. Individual dif-
ferences were not found on ratings of the other three factors.
Table 34-6 shows the relationship between the scale ratings and Knowl-
edge about computers and medical computing concepts. Acceptance was
6Factorloadingscan rangefrom- 1.0 to + 1.0 and indicate the degreeof relationship be-
tweeneachstatementand the factm.
Discussion 647
34.3 Discussion
The study we have described had three principal goals: (1) to measure
physicians attitudes regarding consultation systems, (2) to compare the
attitudes of subgroups of physicians, including those who chose to attend
a medical computing tutorial and those who did not, and (3) to assess the
impact of the continuing education course on the attitudes and knowledge
of the physicians who enrolled. In this section, we discuss some of the
results relevant to each of these goals.
34.4 Recommendations
The results of this survey counter the commonimpression that physicians
tend to be resistant to the introduction of clinical consultation systems.
Although we have polled physicians only from the immediate vicinity of"
our medical center, there is no reason to assume that a nationwide survey
would achieve markedly different results. Wehave found that a significant
segment of the medical community believes that assistance from computer-
based consultation systems will ultimately benefit medical practice. How-
ever, a major concern at present is whether system developers can respond
adequately to physician demands tor performance capabilities that extend
Recommendations 651
1. the Reasoner, a rule-based expert consultant that is the core of the sys-
tem; and
2. the Interviewer, an interface program that controls a high-speed terminal
and the interaction with the physicians using the system.
This chapter is based on an article originally appearing in Proceedings of the Seventh IJCAI,
1981, pp. 876-881. Used by permission of" International Joint Conferences on Artificial
Intelligence, Inc.; copies of the Proceedings are available from William Kaufmann, Inc., 95
First Street, l.os Ahos, CA94022.
lEach program runs in a separate fork under the TENEXor TOPS-20 operating systems,
thereby approximating a parallel processing system architecture. Another program, the In-
temctor, handles interprocess communication. There is also a process that provides back-
ground utility operations such as file backup. This chapter does not describe these aspects of
the system design or their implementation. Details are available elsewhere (Gerring et al.,
1982).
653
654 AnExpertSystemfor OncologyProtocol Management
tumors, it is often the case that a busy clinic schedule, coupled with a
complex protocol description, leads a physician to rely on memory when
deciding on drug doses and laboratory tests. Furthermore, solutions for
all possible treatment problems cannot be spelled out in protocols. Physi-
cians use their own judgment in treating these patients, resulting in some
variability in treatment from patient to patient. Thus patients being treated
on a protocol do not always receive therapy in exactly the manner that the
experimental design suggests, and the data needed for formal analysis of
treatment results are not always completely and accurately collected. In
some cases, patients suffer undue toxicity or are undertreated simply be-
cause protocol details cannot be remembered, located, or are not explicitly
defined.
The problems we have described reach far beyond the oncology clinic
at Stanford Medical Center. There are now several institutions designing
protocol management systems to make the details of treatment protocols
readily available to oncologists and to insure that complete and accurate
data are collected. :~ ONCOCIN is superficially similar to some of the de-
veloping systems, but both its short- and long-term goals are unique in
ways we describe below. One overriding point requires emphasis: in order
to achieve its goals, ONCOCIN must be used directly by busy clinicians;
the implications of" this constraint have pervaded all aspects of the system
design.
35.2Research Objectives
:~A memolronl the M.I.T. Laboratory for Computer Science (Szolovits, 1979) describes
collaboration between M.I.T. and oncologists who have been building a protocol management
system at Boston University (Horwitz et al., 1980). They are planning to develop a program
for designing new chemotherapy protocols. To our knowledge, this is the only other project
that proposes to use AI techniques in a clinical oncology system. However, the stated goals
of that effort differ trom those of ONCOCIN.
656 An Expert System for Oncology Protocol Management
aWe also implemented the complex protocol fbr treating oat cell carcinoma of tbe lung.
Because the oat cell protocol is the ntost complex at Stanford, and it took only a month to
encode the relevant rules, we are hopeful that the representation scheme we have devised
will be able to manage, with only minor modifications, the other protocols we plan to encode
in the future.
System Overview 657
Reasoner Interviewer
Interactor
(Interlisp) (SAIL)
see Chapter 32), reviewing time-oriented data from the patients previous
visits to the clinic, entering information regarding the current visit, and
receiving recommendations, generated by the Reasoner, of appropriate
therapy and tests. "Fhe Reasoner and Interviewer are linked with one an-
other as shown ill Figure 35-1. Each is able to use a data base of prior
patient data. In addition, the Reasoner has access to information regarding
the execution of chemotherapy protocols (control blocks) and specific in-
formation (rules) about the chemotherapy being used to treat the patient.
Before terminating an interaction, the physician can examine the expla-
nation provided with each recommendation. 5 The physician may approve
~Wehave chosen a rcpresenlation that had also facilitated early work to allow ONCOCIN to
ofti:r a .jusfilicafion lot any intermediary conclusions that the system made in deriving the
advice (Langlotz and Shortliffe, 1983).
658 An Expert System for Oncology Protocol Management
tqhis same point led to the development of Fagans VMsystem (Chapter 22), a rule-based
program that was influenced by EMYCIN but differed in its detailed implementation because
of the need to fbllow trends in patients under treatment in an intensive care unit. The
development o[" similar capabilities for ONCO(IN is au active area of research at present.
The Reasoner 659
35.4.2 Representation
Knowledge about the oncology domain is represented using five main data
structures: contexts, parameters, data blocks, rules, and control blocks. 7 In
addition, we use a high-level description of each of these structures to serve
as a template for guiding knowledge acquisition during the definition of
8individual instances.
Contexts represent concepts or entities of the domain about which the
system needs static knowledge. Individual contexts are classified by type
(e.g., disease, protocol, or chemotherapy) and can be arranged hierarchi-
cally. During a consuhation, a list of "current" contexts is created as infor-
mation is gathered. These current contexts together provide a high-level
description of the patient in terms of knownchemotherapeutic plans. This
description serves to focus the systems recommendation process.
Parameters"represent the attributes of patients, drugs, tests, etc., that
are relevant for the protocol managementtask (e.g., white blood count,
recommended dose, or whether a patient has had prior radiotherapy).
Each piece of information accumulated during a consultation is repre-
sented as the value of a parameter. There are three steps in determining
the value of a parameter. First, the system checks to see if the value can
be determined by definition in the current context. If not, the "normal"
method of finding the value is used: if the parameter corresponds to a piece
of laboratory data that the user is likely to know, it is requested from the
user; otherwise, rules for concluding the parameter are tried. Finally, the
system may have a (possibly context-dependent) default value that is used
in the event that the normal mechanism fails to produce a value, or the
9user may be asked to provide the answer as a last resort.
Data bloct~" define logical groupings of" related parameters (e.g., initial
patient data or laboratory test results). A data block directs the system to
treat related parameters as a unit when requesting their values from the
Interviewer, storing the values on a patients file, or retrieving previously
stored values.
Rules" are the familiar productions used in MYCINand other rule-
based systems; they may be invoked in either data-driven or goal-directed
mode. A rule concludes a value for some parameter on the basis of values
of other parameters. A rule may be designated as providing a definitional
7There are a few additional data structures designed to coordinate the interaction between
the Reasoner and the Interviewer.
8The knowledge base editor is based on the similar programs designed and implemented for
EMYCIN.A graphics editor has also been developed for use on the LISP machine worksta-
tions to which we iutend to transter ONCOCIN (Tst~ji and Shortliffe, 1983).
)This "pure" description of ONCOCINstechnique for assigning values to parameters is
actually further complicated by the free-form data entry allowed in the Interviewer. The
details of howthis is handled, and the corresponding relationship to control blocks, will not
be described here.
660 An Expert System for Oncology Protocol Management
value or a default value as defined above. The rules are categorized by the
context in which they apply.
As in EMYCINsystems, rules are represented in a stylized format so
that they may be translated from Interlisp into English for explanation
purposes. 1 This representation scheme more generally allows the system
to "read" and manipulate the rules. It has also facilitated the development
of programs to check for consistency and completeness of the rules in the
knowledge base (Chapter 8).
Below are the English translations of two ONCOCINrules. Note that
ql
Rule 78 provides a default value for the parameter "attenuated dose.
RULEO75
Todetermine the currentattenuated dosefor all drugsin MOPP
or for all drugsin PAVe:
IF: 1) Thisis thestart of thefirst cycleaftercyclewasaborted,
and
2) Thebloodcountsdonot warrantdoseattenuation
THEN:Conclude that the currentattenuated doseis 75percentof the previousdose.
RULE078
Aftertryingall othermethodsto determine
the currentattenuated
dosefor a~l drugs:
IF: Thebloodcountsdowarrantdoseattenuation
THEN:Conclude that the currentattenuateddoseis the previous
doseattenuatedbythe minimum
of the doseattenuationdueto lowWBC andthedoseattenuation dueto lowplatelets.
1In keeping with the philosophy reflected in ~)ther systems we have designed, ONCO(]IN
able to produce natural language explanations for its recommendations.See also the criti-
quing workof Langlotz and Shortliffe (1983).
l lPAVeand MOPP are acronymsfor two of the drug combinations used to treat Hodgkins
disease.
The Reasoner 661
35.4.3 Control
Whena user specifies the task that ONCOCIN is to perform, the corre-
sponding control block is invoked. This simply causes the steps in the
control block to be taken in sequence. These steps may entail the following:
~.,
1 Fetching a data block, either by loading previously stored data or by re-
questing them from the user. This causes parameter values to be set,
resulting in data-directed invocation of rules that use those parameters
(and that apply in the current context).
2. Determining the value of a parameter. This causes goal-directed invocation
of the rules that conclude the value of the parameter (and apply in the
current context). Definitional rules are applied first, then the normal
rules, and if no value has been found by these means, the default rules
are tried. It" a rule that is invoked in a goal-directed fashion uses some
parameter whose value is not yet known, that parameters value is de-
termined so that the rule can be evaluated. In addition, concluding the
value of any parameter, either by the action of rules or when infor-
mation is entered by the user, may cause data-directed invocation of
other rules.
3. Invoking another control block.
4. Calling a special-purpose,function (which may be domain-dependent).
The effects of this control mechanism contrast with the largely back-
ward-chained control used in MYCINand other EMYCINsystems. Figure
35-2 shows the goal-oriented procedure used in EMYCIN.All invocation
of rules results because the value of a specific parameter is being sought.
Rules used to determine the value of that parameter can be referenced in
any order, although ordering is maintained for the assessment of the pa-
rameters occurring in the conditional statements in each rules premise.
Antecedent (data-driven) rules are used when the users response to
question, or (less commonly) the conc;usion from another rule, triggers
662 An Expert System for Oncology Protocol Management
START
(Goal)
~B~
BC
@oQ
BC
~r
A Mm
BC
KEY
P Find Parameter
R Try Rule
A Ask User
"Causes"
BC BackwardChaining
FC Forward Chaining
one of tile systems forward-chained rules. These rules can only be used
as antecedent rules, they typically have single conditions in their premises,
and repeated forward chaining is permitted only if one rule concludes with
certainty that the premise of another is true.
In ONCOCIN (Figure 35-3), on the other hand, initial control is de-
rived from the control block invoked in response to the task selected by
the user. Forward chaining and backward chaining of" rules are intermin-
gled, v~ and any rule can be used in either direction.
12The broken line in Figure 35-3 outlines the portion of the ONCOCIN
control structure
that is identical to that [~)t, nd in EMYCIN
(Figure 35-2).
WhyArtificial Intelligence Techniques? 663
START
BC
KEY
CB Invoke Control Block
DB Fetch DataBIock
R Try Rule
P Find Parameters Value
A Ask User
"Causes"
FC Forward Chaining
BC Backward Chaining
As shown here, the protocols often defer to the opinions of the at-
tending physicians without providing guidelines on which they might base
their decisions. Hence there is no standardization of responses to unusual
problems, and the validity of the protocol analysis in these cases is accord-
ingly subject to question. One goal is to develop approaches to these more
complex problems that characterize the management of patients being
treated for cancer. It is when these issues are addressed that the need for
AI techniques is most evident and the task domain begins to look similar
in complexity to the decision problems in a system like MYCIN.Rules will
eventually have uncertainty associated with them (we have thus far avoided
the need for certainty weights in the rules in ONCOCIN),and close col-
laboration with experts has been required in writing new rules that are not
currently recorded in chemotherapy protocols or elsewhere. In addition,
however, AI representation and control techniques have already allowed
us to keep the knowledge base flexible and easily modified. They have also
allowed us to develop explanation capabilities and to separate kinds of
knowledge explicitly in terms of their semantic categories (Langlotz and
Shortliffe, 1983; Tsuji and Shortliffe, 1983).
35.6 Conclusion
In summary,the project seeks to identify new techniques for bringing large
AI programs to a clinical audience that would be intolerant of systems that
are slow or difficult to use. The design of a novel interface that uses both
custom hardware and efficient software has heightened the acceptability
of" ONCOCIN. Formal evaluations are underway to allow us to determine
both the effectiveness and the acceptability of the systems clinical advice.
For the present we are trying to" build a useful system to which in-
creasingly complex decision rules can be added. We are finding, as ex-
pected, that the encoding of complex knowledge that is not already stated
explicitly in protocols is arduous and requires an enthusiastic community
of" collaborating physicians. Hence we recognize the importance of one of
our research goals noted earlier in this report: to establish an effective
relationship with a specific group of physicians so as to facilitate ongoing
research and implementation of advanced computer-based clinical tools.
PART TWELVE
Conclusions
36
Major Lessons from This
Work
reiterate the main goals that provide the context for the experimental
work;
discuss the experimental results from each of the major parts of the
book; and
summarize the key questions we have been asked, or have asked our-
selves, about the lessons we have learned.
669
670 MajorLessonsfromThis Work
36.1Two
Sets of Goals
36.2Experimental Results
Although we were not always explicitly aware of the hypotheses our work
was testing, in retrospect a number of results can be stated as consequences
of the experiments performed. The nature of experiments in AI is not
well established. Yet, as we said in the preface, an experimental science
grows by experimentation and analysis of results. The experiments re-
ported here are not nearly as carefully planned as are, for example, clinical
trials in medicine. However, once some uncharted territory has been ex-
plored, it is possible to review the path taken and the results achieved.
Wehave used the phrase "MYCIN-likesystem" in many places to char-
acterize rule-based expert systems, and we have tried throughout the book
to say what these are. In summary, then, let us say what we mean by rule-
based systems. They are expert systems whose primary mode of represen-
tation is simple conditional sentences; they are extensions of production
systems in which the concepts are closer in grain size to concepts used by
experts than to psychological concepts. Rule-based systems are deductively
not as powerful as logical theorem-proving programs because their only
rule of inference is modus ponens and their syntax allows only a subset of
logically well-formed expressions to be clauses in conditional sentences.
Their primary distinction from logic-based systems is that rules define facts
in the context of how they will be used, while expressions in logic-based
systems are intended to define facts independently of their use. 3 For ex-
ample, the rule A -~ B in a rule-based system asserts only that fact A is
evidence for fact B.
Rule-based systems are primarily distinguished from frame-based sys-
tems by their restricted syntax. The emphasis in a rule is on the inferential
relationship between facts (for example, "A is evidence for B" or "A causes
B"). In a frame the emphasis is on characterizing concepts by using links
of manytypes (including evidential relations).
Rule-based systems are sometimes characterized as "shallow" reasoning
systems in which the rules encode no causal knowledge. While this is largely
(but not entirely) true of MYCIN,it is not a necessary feature of rule-based
systems. An expert may elucidate the causal mechanisms underlying a set
of rules by "decompiling" the rules (see Section 29.3.2 for a discussion of
decompiling the knowledge on which the tetracycline rule is based). The
difficulties that one encounters with an expanded rule set are knowledge
engineering difficulties (construction and maintenance of the knowledge
base) and not primarily difficulties of representation or interpretation.
However, the causal knowledge thus encoded in an expanded rule set
would be usable only in the context of the inference chains in which it fits
and would not be as generally available to all parts of the reasoning system
as one might like. A circuit diagram and the theoretical knowledge under-
neath it, in contrast, can be used in manydifferent ways.
Winston (1977) summarized the main features of MYCINas follows:
While this is a reasonable summary of what the program can do, it stops
short of analyzing how the main features of MYCINwork or why they do
not work better. The analysis presented here is an attempt to answer those
questions. Not all of the experiments have positive results. Someof the
most interesting results are negative, occasionally counter to our initial
beliefs. Some experiments were conceived but never carried out. For ex-
ample, although it was explicitly our initial intention to implementand test
MYCINon the hospital wards, this experiment was never undertaken.
Instead the infectious disease knowledge base was laid to rest in 19784
despite studies demonstrating its excellent decision-making performance.
This decision reflects the unanticipated lessons regarding clinical imple-
mentation (described in Part Eleven) that would not have been realized
without the earlier work.
Finally, a word about the organization of this section on results. We
have described the lessons mostly from the point of view of what we have
learned about building an intelligent program. Wewere looking for ways
to build a high-performance medical reasoning program, and we made
many choices in the design of MYCIN to achieve that goal. For the program
itself, we had to choose (1) a model of diagnostic reasoning, (2) a repre-
sentation of" knowledge, (3) a control structure for using that knowledge,
and (4) a model of how to tolerate and propagate uncertainty. Wealso had
to formulate (5) a methodology for building a knowledge base capable
making good judgments. Our working hypothesis, then, was that the
choices we made were sufficient to build a program whose performance
was demonstrably good. 5 If we had failed to demonstrate expert-level per-
formance, we would have had reason to believe that one or more of our
choices had been wrong. In addition, other aspects of the program were
lMuch of the MYCIN-inspired work reported in this volume was done after this date, how-
ever.
5Note that sufficiency is a weak claim. Wedo not claim that any choice we made is necessary~
nor do we claim that our choices cannot be improved.
674 MajorLessons fromThis Work
also tested: (6) explanation and tutoring, (7) the user interface, (8)
dation, (9) generality, and (10) project organization. The following
subsections review these ten aspects of the program and the environment
in which it was constructed.
or about 109. Obviously, the method of evidence gathering does not gen-
erate all of them.
36.2.2 Representation
MYCIN is known partly for its model of inexact inference (the CF model),
a one-number calculus for propagating uncertainty through several levels
of inference flom data to hypotheses. MYCINsperformance shows that,
for some problems at least, degrees of evidential support can be captured
adequately in a single number,l and a one-number calculus can be devised
~This was not donewith meta-rules, however,becauseit could easily be handledby the
previewmechanism and judicious use of screeningclauses.
IAlthoughthe CFmodelwasoriginally basedon separate conceptsof belief and disbelief
(as definedfor MBandMD in Chapteri 1), recall that eventhen the net belief is reflected
in a single numberand onlyone number is associatedwitheach inferential rule.
68O MajorLessons fromThis Work
P(hle ) - P(h)
1 - P(h)
A&B&C--,A
Such a rule is saying, in effect, that if you already have reason to be!ieve
A, and if B and C are likely in this case, then increase the importance of
A. In principle, we could have separated probabilities from utilities. In
practice, that would have required more precision than infectious disease
experts were willing or able to supply.
Experimental
Results 681
A&B&C~D
if any clause is not "true enough," the subsequent clauses will not be pur-
sued. If clause A, after tracing, has not accumulated evidence over the 0.2
threshold then the system will not bother to ask about clauses B and C. In
brief, the threshold was invented for purposes of human engineering since
it shortens a consultation and reduces the number of questions asked of
the user.
This value of the threshold is arbitrary, of course. It should simply be
high enough to prevent the system from wasting its time in an effort to
use very small pieces of evidence. With a sick patient, there is a little evi-
dence for almost every disease, so the threshold also helps to avoid covering
for almost every possible problem. The threshold has to be low enough,
on the other hand, to be sure that important conclusions are considered.
Once the 0.2 threshold was chosen, CFs on rules were sometimes set with
it in mind. For example, two rules concluding Streptococcus, each at the
CF=0.1 level, would not be sufficient alone to include Streptococcus in the
list of possible causes to consider further.ll
Because we are not dealing with probabilities, or even with "pure"
strength of inference alone, our attempt to give a theoretical justification
for CFs was flawed. Webased it on probability theory and tried to show
that CFs could be related to probabilities in a formal sense. Our desiderata
for the CF combining function were based on intuitions involving confir-
mation, not just probabilities, so it is not surprising, in retrospect, that the
justification in terms of formal probability theory is not convincing (see
Chapter 12). So the CF model must be viewed as a set of heuristics for
combining uncertainty and utility, and not as a calculus for confirmation
theory. As we noted in Chapter 13, the Dempster-Shafer theory of evi-
dence offers several potential advantages over CFs. However, simplifying
assumptions and approximations will be necessary to make it a computa-
tionally tractable approach.
In a deductive system the addition of new facts, as axioms, does not
change the validity of theorems already proved. In many interesting prob-
lena areas, such as medical diagnosis, however, new knowledge can invali-
date old conclusions. This is called nonmonotonic reasoning (McDermott
and Doyle, 1980) because new inferences are not always adding new con-
clusions monotonically to the accumulating knowledge about a problem.
In MYCIN,early conclusions are revised as new data are acquired--for
example, what looked like an infection of" one type on partial evidence
looks like another infection after more evidence is accumulated. The prob-
lems of" nonmonotonicity are mostly avoided, though, because MYCIN
gathers evidence for and against many conclusions, using CFs to adjust
the strength of evidence of each, and only decides at the end which con-
clusions to retain. As pointed out in Section 29.4.3, self-referencing rules
can change conclusions after all the evidence has been gathered and thus
may be considered a form of nonmonotonic reasoning.
Data:
Erroneous
lnconlplete
Rules:
Erroneous(or only partly correct)
Incomplete
Conceptual framework (domain-dependentand domain-independentparts):
Incorrect vocabularyof attributes, predicates, and relations
Incorrect inference structure
Incomplete set of concepts
Incompletelogical structure
FIGURE
36-1 Sources of uncertainty in rule-based systems.
with a conclusitm, in this view, requires examining rules with similar evi-
dence or similar conclusions to see how strong the association should be,
relative to the others. For example, to set the CF on a new rule, A --, Z,
one would look at other rules such as:
Then, if" evidence A is about as strong as Y (0.8) and much stronger than
X (0.2), the new CF should be set around the 0.8 level. The exchange
messages at the end of Chapter 10 reflects the controversy that arose in
our group over these two styles of CF assignment.
In both cases, the sensitivity analysis mentioned in Chapter 10 con-
vinced us that the rules we were putting into MYCIN were not dependent
on precise values of CFs. That realization helped persons writing rules to
see that they could be indifferent to the distinction between 0.7 and 0.8,
for example, and the system would not break down.
WThere are so-calledclinical algorithmsin medicine,but they do not carry the guaranteesof
cnrrecmessthat characterize mathe,naticalor computationalalgorithms.Theyare decision
flow charts in whichheuristics havebeenbuilt into a branchinglogic so that paramedical
personnelcan use themto provide goodcare in manycommonly occurringsituations.
684 MajorLessonsfromThis Work
One of the major lessons of this and other work on expert systems is that
large knowledge bases must be built incrementally. In many domains, such
as medicine, the knowledgeis not well codified, so it is to be expected that
the first attempts to build a knowledgebase will result in approximations.
As noted earlier, incremental improvements require flexible knowledge
structures that allow easy extensions. This means not only that the syntax
should be relatively simple but that the system should allow room for
growth. Rapid feedback on the consequences of changes also facilitates
improvements. A knowledge base that requires extra compilation steps
before it can be tried (especially long ones) cannot grow easily or rapidly.
Knowledgeacquisition is now seen as the critical bottleneck in building
expert systems. Wecame to understand through this work that the knowl-
edge-engineering process can be seen as a composite of three stages:
In each stage, the limiting factors are (a) the expressive power of the rep-
resentation, (b) the extent to which knowledge of the domain is already
well structured, (c) the ability of the expert to formulate new knowledge
based on past experience, (d) the power of the editing and debugging tools
available, and (e) the ability of the knowledge engineer to understand the
basic structure and vocabulary of the domain and to use the available tools
to encode knowledge and modify the framework.
Our experiments focus largely on the refinement stage.l~ Within this
stage, the model that we have found most useful is that of" debugging in
context; an expert can more easily critique a knowledge base and suggest
changes to it in the context of specific cases than in the abstract. Initial
formulations of rules are often too general since the conceptualization
stage appropriately demands generality. Such overgeneralizations can
often best be found and fixed empirically, i.e., by running cases and ex-
amining the programs conclusions.
One important limitation of our model is its failure to address the
problem of integrating knowledge from different experts. For some ex-
tensions to the knowledgebase there is little difference between refinement
by one expert or many. For extensions in which different experts use dif-
ferent concepts (not just synonyms for the same concept), we have no tools
HWedo reco,d the author of each rule with date, justification, and literature citations, but
these are ,rot used by the program except as text strings to be printed.
~SMore recent work by others at Stanford explores the use of knowledge-based techniques
lot infierring new medical knowledge from a large data base of patient information (Blum,
1982).
688 MajorLessonsfromThis Work
Whenwe began this work, there had been little attempt in AI to provide
justifications of a programs conclusions because programs were mostly
used only by their designers. PARRY (Colby, 1981) had a selective trace
that allowed designers to debug the system and casual users to understand
its behavior. DENDRALs Predictor also had a selective trace that could
explain the origins of predicted data points, but it was used only for de-
bugging. As part of our goal of making MYCIN acceptable to physicians,
we tried from the start to provide windowsinto the contents of the knowl-
edge base and into the line of reasoning. Our working assumption was
that physicians would not ask a computer program for advice if they had
to treat the program as an unexaminable source of expertise. They nor-
mally ask questions of, or consult, other physicians partly for education to
help with future cases and partly for clarification and understanding of
Experimental Results 689
The Model
A1 8c A2 8c A3 ~ B
where A1 is already known (or believed) to be true. Then the user may ask
how A1 is knownand will then see the rules that concluded it (or be told
that it is primary information entered at the terminal if no rules were used).
Similarly, the user may ask how A3 will be pursued if the condition re-
garding A2 is satisfied.
Explanations can be much richer. For example, they can provide in-
sights into the structure of the domain or the strategy behind the line of
reasoning. All of these extensions require more sophistication than is em-
bodied in looking up and downa history list. This is a minimal explanation
690 MajorLessons fromThis Work
Tutoring
Wehad initially assumed that physicians and students would learn about
infectious disease diagnosis and therapy by running MYCIN,especially if
they asked why and how. This mode of teaching was too passive, however,
to be efficient as a tutorial system, so we began to investigate a more active
tutor, GUIDON.The program has two parts: (a) the knowledge base used
by MYCIN,and (b) a set of domain-independent tutorial rules and pro-
cedures.
Weoriginally assumed that a knowledgebase that is sufficient for high-
performance problem solving would also be sufficient for tutoring. This
assumption turned out to be false, and this negative result spawned revi-
sions in our thinking about the underlying representation of MYCINs
knowledge. Weconcluded that, for purposes of teaching, and for expla-
nation to novices, the facts and relations known to MYCINare not well
enough grounded in a coherent model of medicine (Chapter 29). MYCINs
knowledge is, in a sense, compiled knowledge. It performs well but is not
very comprehensible to students without the concepts that have been left
out. For example, a MYCINrule such as
A--+A
1
A1 --+A2
A,2--* B
Experimental Results 691
Consuhation Model
l~i()ur one attempt to permit w)hmteered information (Chapter 33) was of limited success,
largely because of the complexity of getting a computer to understand free text.
17The ability to accept w)hmteered i,lformation is a major feature of the PROSPECTOR
model of interaction emhodied in KAS(Reboh, 1981).
692 Major Lessons from This Work
usually), and the number increases as the knowledge base grows. Few phy-
sicians want to type answers to that many questions--in fact, few of them
want to type anything. With current technology, then, the consultation
model increases the cost of getting advice beyond acceptable limits. Clini-
cians would rather phone a specialist and discuss a case verbally. Moreover,
the consultation model sets up the program as an "expert" and leaves the
users in the undesirable position of asking a machine for help. In some
professions this maybe acceptable, but in medicine it is difficult to sell.
One way to avoid the need for typing so many answers is to tap into
on-line patient data bases. Manyof MYCINsquestions, for example, could
be answered by looking in automated laboratory records or (as PUFFnow
does) could be gathered directly from medical instruments (Aikins et al.,
1983). Another way is to wait for advanced speech understanding and
graphical input.
The consultation model assumes a cooperative and knowledgeable
user. Weattempted to make the system so robust that a user cannot cause
an unrecoverable error by mistake. But the designers of any knowledge
base still have to anticipate synonymsand strange paths through the rules
because we know of no safeguards against malice or ignorance. Some med-
ically impossible values are still not caught by MYCIN. l~ If users are co-
operative enough to be careful about the medical correctness of what they
type, MYCINsimplementation of the consultation model is robust enough
to be helpful.
18For example, .John McCarthy (maliciously) told MYCIN that the site of a culture was am-
niotic fluid--for a male patient--and MYCINincorrectly accepted it (McCarthy, 1983).
Nonmedica[ users (including one of" the authors) have found similar "tar-out bugs" as
consequence of sheer ignorance of medicine.
Experimental
Results 693
English Understanding
36.2.8 Validation
Decision-Making Performance
Acceptability
36.2.9 Generality
One of the most far-reaching sets of experiments in this work involved the
generalizability of the MYCINrepresentation scheme and inference en-
gine. Webelieved the skeletal program could be used for similar problem-
solving tasks in other domains, but no amount of analysis and discussion
696 MajorLessons fromThis Work
The Data
EMYCIN was designed to analyze a static collection of data. The data may
be incomplete, interdependent, incorrect ("noisy"), and even inconsistent.
A system built in EMYCIN can, if the knowledge base is adequate, resolve
ambiguities and cope with uncertainty and imprecision in the data. EMY-
CIN does assume, however, that there is only one set of data to analyze
and that new data will not arrive later from experiments or monitoring.
The number of elements of data in the set has been small--roughly 20-
100--in the cases analyzed by MYCINand other EMYCINsystems. But
there seems to be no reason why more data cannot be accepted.
Reasoning Processes
Knowledge Base
Solutions
Funding
Funding for the research presented here was not easy to find because of
the duality of goals mentioned above. Clinically oriented agencies of the
government were looking for fully developed programs that could be sent
to hospitals, private practices, military bases, or space installations. They
saw the initial demonstration with bacteremia as a sign that ward-ready
programs could be distributed as soon as knowledge of other infections
was added to MYCIN.And they seemed to believe that transcribing sen-
tences from textbooks into rules would produce knowledge bases with clin-
ical expertise. Other funding agencies recognized that research was still
required, but we failed to convince them that both medical and AI research
were essential. Wefelt that the kinds of techniques we were using could
help codify knowledge about infectious diseases and could help define a
consensus position on issues about which there are differences of medical
opinion. But we also felt that the AI techniques themselves needed analysis
and extension before they could be used for wholesale extensions to med-
ical knowledge. More generally, we saw medicine as a difficult real-world
domain that is typical of many other domains. Failing to find an agency
that would support both lines of activity, we submitted separate proposals
for the dual lines. Alter the initial three years of NIHsupport for MYCIN,
only the AI line was funded by the NSF, ONR, and DARPA (in the efforts
that produced EMYCIN, GUIDON, and NEOMYCIN).By 1977 our med-
ical collaborators were in transition for other reasons
i anywayso we largely
....
stopped developmg the mfecuous disease knowledge base. 20
Technology Transfer
2That is not to say, however, that all medical efforts stopped. Shortliffe rejoined the project
in 1979 and began defining and implementing ONCOCIN.Clancey needed to retormulate
MYCINsknowledge base in a form more suitable for tutoring (NEOMYCIN)and enlisted
the help of Dr. Tim Beckett. Several medical problem areas were investigated and prototype
systems were built using EMYCIN.These include pulmonary function testing (PUFF), blood
clotting disorders ((;LOT), and complications of pregnancy (GRAVIDA).And several masters
and doctoral students have continued to use medicine as a test-bed for ideas in AI and
decision making, causal reasoning, representation and learning. Several projects undertaken
after 1977 are included in the present volume.
KeyQuestionsandAnswers 699
operating systems. Since hospital wards and physicians offices do not have
access to the same equipment that computer science laboratories do, we
would have had to rewrite this large and complex system in another lan-
guage to run on smaller machines. We were not motivated to undertake
this task. Now, however, smaller, cheaper machines are available that do
run Interlisp and other dialects of LISP, so technology transfer is much
more feasible than when MYCINwas written.
Stability
Wewere tortunate with MYCIN in finding stability in (a) the goals of the
project, (b) the code, and (c) the system environment.
The group of researchers defining the MYCINproject changed as
students graduated, as interests changed, and as career goals took people
out of our sphere. Shortliffe, Buchanan, Davis, Scott, Clancey, Fagan, Aik-
ins, and van Melle formed a core group, however, that maintained a certain
continuity. Even with a fluid group, we found stability in the overall goal
of trying to build an AI system with acknowledged medical expertise.
Those who felt this was too narrow a goal moved on quickly, while others
found this sharp focus to be an anchor for defining their own research.
Another anchor was the code itself. Much of any individuals code is
opaque to others, and MYCIN contains its share of "patches" and "hacks."
Yet because the persons writing code felt a responsibility to leave pieces of
program that could be maintained and modified by others, the program-
ming practices of most of the group were ecologically sound. 21 Finally, the
stability of Interlisp, TENEX,and the SUMEX-AIM facility contributed
greatly to our ability to build a system incrementally. Without this outside
support, MYCINcould not have expanded in an orderly fashion and we
would have been forced to undertake massive rewrites just to keep old
code running.
Werealize that a book of this size, describing several experiments that are
interrelated in complex and sometimes subtle ways, may leave the reader
asking exactly what has been learned by the research and what lessons can
be borrowed by others already working in the field or about to enter it.
This final chapter has attempted to summarize those lessons, but we feel
the need to close with a brief list of frequently asked questions and our
answers to them. The responses are drawn from the work described in
earlier chapters but are also colored by our familiarity with other work in
AI (particularly research on expert systems). Despite the brevity and sim-
plicity of the questions and answers, we feel that they do summarize the
key lessons learned in the MYCIN experiments. For those readers who like
to start at the end when deciding whether or not to read a book, we hope
that the list will pique their curiosity and motivate them to start reading
from the beginning.
703
Appendix: Annotated
Typescript of a Sample
MYCINConsultation
Instructions?(Y or N)
** YES
This is a computerprogramnamed MYCINthat hasbeendesignedto adviseyouregardingan appropriate
choiceof infectious diseasetherapy.It is understood that youhavea patientwhomayhaveaninfection. Please
answerthe following questions,terminatingeachresponsewith RETURN. To correct typing errors, usethe
DELETE key to delete single characters,<ctrl>Wto deletea word,and<ctrl>Qto deletethe wholeline.
If youare not certain of youranswer,youmaymodifythe response by inserting a certainty factor (a number
from1 to 10) in parentheses after yourresponse.Absolutecertainty (10) is assumed for everyunmodified answer.
It is likely that some of the followingquestionscannotbeanswered with certainty.
Youmaychange an answerto a previousquestionin two ways.If the programis waitingfor a response from
you(that is, hastyped**), enter CHANGE followedby the number(s) of the question(s)whoseanswers will
altered. Youmayalso change a previousanswerat anytime (evenwhenthe program is not waitingfor
responsefromyou) by typing <ctrl>F(Fix), whichwill causethe program to interrupt its computationand
whatyouwantto change.(If the response to <ctrl>Fis not immediate, try typing the RETURN keyin addition.)
Try to avoidgoingbackbecause the processrequiresreconsidering the patient fromthe beginningandtherefore
maybe slow.
Notethat youmayalso enter UNK (for UNKnown) if youdonot knowthe answerto a question,? if youwish
seea moreprecisedefinition of the questionor some examplesof recognized responses, ??if youwantto seeall
recognizedresponses, the wordRULE if youwouldlike to seethe decisionrule whichhasgenerated the question
beingasked,the wordWHY if youwouldlike to seea moredetailed explanationof the question,or the letters QA
if youwouldlike to interrupt the consultationin orderto askquestions regardingthe decisionsmade so far in the
consultation.If youare ever puzzledaboutwhatoptionsare availableto youduringa consultation,enterthe word
HELP anda list of optionswill belisted for you.
705
706 Appendix: Annotated Typescript of a Sample MYCINConsultation
SUMMARY:
(typectrI-Oto abortprintout)
UNK- Answernot known.
?- Rephrases the questionandgives examples of recognizedresponses.
??- Prints a list of all recognizedresponses.
RULE- Prints thecurrentdecisionrule.
QA- Programenters question-answering mode.
CHANGE # - Gobackandre-request answerto questionnumber #.
COMMENT - Enter comments aboutanyaspectof the systemsperformance. Yourcomments will be
forwardedto those in chargeof the MYCIN program.
WHY- Giveshigh-levelexplanationof the currentreasoning chainthat provoked this question.
HOW# - ExplainsHOW the systemwill achievea goal referred to by number # in a previous
explanation.
EXPLAIN- Providesa moredetailed explanationof a previousanswergiven by a WHY command.
FORGET - Resetsthe explanationof the reasoning chainbackto lowestlevel, as if youneverusedthe
WHY/EXPLAIN commands.
STOP- Halts the program withoutcompletingthe consultation.
SAVE- Haltsthe program, savingthe currentpatientona disk file, retrievableat a later date.
HELP- Printsthis list.
6)** CSF123430-Jan-77YES
stainof morphl [MYClN follows up onthe stain of
smear the smearand the morphology.]
[type DONE
if no organismwasseenon the smear]
7)**DONE
[No informationaboutstain or morphologyis available. MYClN
now
followsup question6, askingaboutotherpositive cultures.]
site cul # datecoll. smear coll. methodl
examined?l
8)**DONE [Thereare no morepositive cultures.]
[i.e.
WHYisit important
to determine
whether
PATIENT-538
hasbeenseriously
burned?]
[4.0]
Thiswillaidindeterminingtheorganisms
(other
thanthose
seenon
culturesorsmears)whichmightbecausing
theinfection.
Ithasalreadybeenestablished
that
[4.1] theinfection
whichrequires
therapy
ismeningitis,
and
[4.2] organisms
werenotseenonthestainofthispending
CSFculture,
and
[4.3]thetypeoftheinfection
isbacterial
Therefore,
if
[4.4] pt538hasbeenseriouslyburned
then
there is suggestive
evidence(.5)thatpseudomonas-aeruginosa
is one
of theorganisms
(otherthanthose
seenoncultures
orsmears]
whichmightbecausingtheinfection
Author:
Yu
[RULE578]
[backto question39...]
** WHY [in this case,the user asksanotherWHY
question,backingup the subgoa/tree.]
[i.e. WHYis it importantto determine
the organisms
(other than thoseseen
oncultures or smears)whichmightbecausingthe infection?]
[5,0] ... in order to determinethe therapeuticregimenof PATIENT-538.
If
[5.1] informationhasbeengatheredaboutorganisms isolated fromthe
patient, organismsnotedonsmears of the patient, negative
culturesof the patient, suspected
infectionswithout
microbiologicalevidence, currentdrugsof the patientandprior drugsof the patient, and
[5.2] an attempthasbeenmade to deducethe organismswhichrequire
therapy,and
[5.3] Youhavegivenconsiderationto organisms (other than those
seenin cultures or smears)whichmightbe present,andfound
at least oneorganism whichrequirestherapy.
then
determinethe best therapyrecommendation fromamong the drugslikely
to beeffectiveagainstthe significant organisms
[RULE092]
[backto question39...]
** y
[MYCIN
nowprints its intermediateconclusions
aboutthe likely causesof the infection.]
I havefoundevidence(RULE545RULE557) that treatmentshouldcover for E.coli
causingmeningitis.
I havefound evidence(RULE545 RULE557 RULE578) that treatment should cover
for Pseudomonas-aeruginosa
causingmeningitis.
I havefoundevidence(RULE545RULE557) that treatmentshouldcover for
Klebsiella-pneumoniae
causingmeningitis.
I havefoundevidence(RULE507)
that treatmentshouldcoverfor Diplococcus-
pneumoniae causingmeningitis.
[NowMYC/N
beginsto ask questionsrelated to appropriatetherapy.]
40) DoesPt538havefocal neurologicalsigns?
** UNKNOWN
[Somequestionsare motivatedby the organismunderconsideration.Thus
the followinglines sometimes
provideuseful contextualinformationto a user.]
[Determining
whichdrugsare desirablefor useagainstthe E.coli...]
[Determiningwhichdrugsare desirable for useagainstthe Pseudomonas-
aeruginosa...]
Infectious Disease Consultation Session 711
[Determining
whichdrugsare desirablefor useagainstthe Klebsiella-
pneumoniae...]
[Determining
whichdrugsare desirablefor useagainstthe Diplococcus-
pneumoniae...]
Mytherapyrecommendations will bedesignedto treat for organisms
that either are verylikely or, althoughless
likely, wouldhavea significanteffect ontherapyselectionif theywerepresent.It is importantto coverfor the
followingprobableinfection(s) andassociated organism(s):
INFECTION-1
is MENINGITIS
+ <ITEM-l>E.COLI[from clinical evidenceonly]
+ <ITEM-2>PSEUDOMONAS-AERUGINOSA [from clinical evidence only]
+ <ITEM-3>KLEBSIELLA-PNEUMONIAE [from clinical evidenceonly]
+ <ITEM-4>DIPLOCOCCUS-PNEUMONIAE [from clinical evidence only]
41) DoesPt538havea clinically significant allergic reactionto any
antimicrobialagent?
** NO
[ConsideringAMPICILLIN-AND-GENTAMICIN for use against INFECTION-l...]
42) Doyou havereasonto suspectthat Pt538mayhaveimpairedrenal
function?
"" NO
43) Pt538s weight in kilograms (or <number>
POUNDS):
"" 70
[Finally, MYClNprints the best therapyit hasfoundto maximizethe chances of effective coverage
of the cause(s)of the infection whileminimizingthe number
of drugs,their toxici~ andside effects.]
[REC-1]Mypreferred therapyrecommendation is as follows:
In order to coverfor Items<1 2 3 4>:
Givethe followingin combination:
1) AMPICILLIN
Dose:3.5g (28.0 ml) q4hIV [calculatedon basis of 50mg/kg]
2) GENTAMICIN
Dose: 119mg(3.0 ml, 80mg/2ml ampule)q8hIV [calculatedon basis
1.7 mg/kgplus considergiving 5 mgq24hIntrathecal
Comments:Monitor serumconcentrations
Sincehighconcentrations
of penicillins caninactivate
aminoglycosides,
donot mix thesetwoantibiotics in the sameIV
bottle.
RECOMMENDATION-2
is equivalent to RECOMMENDATION-1.
Wouldyoulike to seedetails of this comparison?
** YES
Looking
at the coverage
of the mostlikely items,weseethat:
RECOMMENDATION-1
consists of 1 first choice drug (AMPICILLIN for ITEM-1
and ITEM-4)and1 third choice drug (GENTAMICINfor ITEM-l, ITEM-2and
ITEM-3)
RECOMMENDATION-2 consists of 1 first choice drug (PENICILLIN
for ITEM-4)
and 1 third choicedrug (GENTAMICIN for ITEM-2,ITEM-1andITEM-3)
ORGANISMS RECOMMENDATION-1 RECOMMENDATION-2
mostlikely Drug-- ChoiceRanking Drug-- ChoiceRanking
ITEM-1 AMPICILLIN-AND-GENTAMICIN--lst GENTAMICIN--3rd
ITEM-2 GENTAMICIN--2nd GENTAMICIN--2nd
ITEM-3 GENTAMICIN--3rd GENTAMICIN--3rd
ITEM-4 AMPICILLIN--lst PENICILLIN--lst
(Thedesirability of a drugis definedto beits lowestrankingfor the items
it covers.)
Doyouwishto seethe next choicetherapy?
** NO
**JUSTIFICATION
*" LITERATURE
**WHYDID YOUTREATE.COLI?
ITEM-1
is E.coli associated with meningitis.I decidedto treat ITEM-1
on
the basisof clinical evidence
only.
Thefollowing rules concluded about:
the likelihoodthat e.coli is oneof the organisms
(otherthanthose
seenoncultures or smears)whichmightbe causingthe infection
RULE545
wasusedto concludethat e.coli is oneof the organisms
(other than
thoseseenoncultures or smears)whichmightbe causingthe infection
(.74). Thelast questionaskedbefore the conclusionwasmade was35.
RULE557
wasusedto concludethat e.coli is oneof the organisms
(other than
thoseseenoncultures or smears)whichmightbe causingthe infection
(.93). Thelast questionaskedbefore the conclusionwasmade was37.
** HOWDOYOUUSETHETYPEOF THEINFECTION?
Thefollowingrules use:
the typeof the infection
154,578, 574, 572, 570, 565, 564, 559, 557, 553, 551,549,547, 545, 543, 541,
539, 537, 535, 533, 531,529,523, 520, 519, 517, 515, 513, 511,507,300,
289, 288, 287, 280, 174, 167
Theparameter
mostfrequentlyconcluded by theserules is:
the organisms(other than those seenoncultures or smears)whichmight
becausingthe infection
Wouldyou like a summary
of ALLthe parametersconcludedby these rules?
"" NO
Whichof the rules do youwishto see?
** 280
714 Appendix: Annotated Typescript of a Sample MYCINConsultation
RULE280
Therapy RecommendationConsultation
A.4 Session
2) CHLORAMPHENICOL
Dose:443mg(4.4 ml) q6hIV [calculatedonbasis of 25 mg/kg]
TypeHELP
for the list of questionoptions.
** JUSTIFICATION
** LITERATURE
[MYCIN
cancompare
its recommendation
againstonethat the user wouldlike to consider.]
** REC
For eachitem in turn, enter the drugyouwouldhaveprescribedin
RECOMMENDATION-1.
In order to minimizethe number of drugsin your recommendation,
you maynot
wantto prescribetherapyfor everyitem. Itemswhichrepresentthe most
likely organisms
are indicatedwitha plus sign ( + ).
+ ITEM-l--thevirus **
+ ITEM-2--thohemophilus-influenzae ** CARBENICILLIN
+ ITEM-3~the neisseria-meningitidis ** CARBENICILLIN
[Checking
for contraindications...]
[ConsideringCARBENICILLIN for use against INFECTION-1...]
[Nocontraindications
found...]
[Nowcomparing
your prescription to MYCINs...]
ORGANISMS MYCtNsregimen Yourregimen
"mostlikely" Drug-- ChoiceRanking Drug-- ChoiceRanking
ITEM-2 AMPICILLIN-AND-CHLORAMPHENICOL--lst CARBENICILLIN--3rd
ITEM-3 AMPICILLIN--lst CARBENICILLIN--2nd
(Thedesirability of a drugis definedto beits lowestrankingfor the items
it covers.)
[Givingonedrugis often preferableto giving two. In this case,however,
the onewouldnot be as effective.]
Yourrecommendation for the mostlikely organism(s)
is not optimal,since
your prescription of 1 third choicedrug(CARBENICILLINfor ITEM-3andITEM-2)
shouldbe avoided.
[You mayrefer to your regimenas RECOMMENDATION-2
in later questions.]
** STOP
References
717
718 References
Barr, A., Beard, M., and Atkinson, R. C. 1976. The computer as a tutorial
laboratory: The Stanford BIP project. International Journal of Man-
Machine Studies 8: 567-596.
Bartlett, E C. 1932. Remembering:A Study in Experimental and Social Psy-
chology. Cambridge, U.K.: Cambridge University Press.
Bennett, J. S. 1983. ROGET:A knowledge-based consultant for acquiring
the conceptual structure of an expert system. Report no. HPP-83-24,
Computer Science Department, Stanford University.
Bennett, J. S., and Goldman, D. 1980. CLOT:A knowledge-based consul-
tant for bleeding disorders. Report no. HPP-80-7, Computer Science
Department, Stanford University.
Bennett, J. S., and Hollander, C. R. 1981. DART:An expert system for
computer fault diagnosis. In Proceedings of the 7th International Joint
Conference on Artificial Intelligence (Vancouver, B.C.), pp. 843-845.
Bennett, J. S., Creary, L., Engelmore, R., and Melosh, R. 1978. SACON:
A knowledge-based consultant for structural analysis. Report no. HPP-
78-23, Computer Science Department, Stanford University.
Bischoff, M., Shortliffe, E. H., Scott, A. C., Carlson, R. W., and Jacobs, D.
1983. Integration of a computer-based consultant into the clinical set-
ting. In Proceedings of the 7th Symposiumon ComputerApplications in Med-
ical Care (Baltimore, MD), pp. 149-152.
Blum, B. I., Lenhard, R., and McColligan, E. 1980. Protocol directed pa-
tient care using a computer. In Proceedings of 4th Symposiumon Computer
Applications in Medical Care (Washington, D.C.), pp. 753-761.
Blum, R. L. 1982. Discovery and representation of causal relationships
from a large time-oriented clinical database: The RX project. Ph.D.
dissertation, Stanford University. (Also in Computersand Biomedical Re-
search 15: 164-187.)
Bobrow, D. G. 1968. Natural language input for a computer problem-
solving system. In Semantic Information Processing, ed. M. Minsky, pp.
146-226. Cambridge, MA: MIT Press.
Bobrow, D. G., and Winograd, T. 1977. An overview of KRL, a knowledge
representation language. Cognitive Science 1: 3-46.
Bobrow, D. G., Kaplan, R. M., Kay, M., Norman, D., Thompson, H., and
Winograd, T. 1977. GUS:A frame-driven dialog system. Artificial In-
telligence 8: 155-173.
Bobrow, R. J., and Brown, J. S. 1975. Systematic understanding: Synthesis,
analysis and contingent knowledge in specialized understanding sys-
tems. In Representation and Understanding: Studies in Cognitive Science,
eds. D. G. Bobrow and A. Collins, pp. 103-129. NewYork: Academic
Press.
Bonnet, A. 1981. LITHO:An expert system for lithographic analysis. In-
ternal working paper, Schlumberger Corp., Paris, France.
Boyd, E. 1935. The Growth of the Surface Area of the HumanBody. Minne-
apolis: University of Minnesota Press.
References 719
Buchanan, B. G., Mitchell, T. M., Smith, R. G., and Johnson, C. R., Jr.
1978. Models of learning systems. In Encyclopedia of Computer Science
and Technology 11, ed. J. Belzer, pp. 24-51. NewYork: Marcel Dekker.
Bullwinkle, C. 1977. Levels of complexity in discourse for anaphora dis-
ambiguation and speech act interpretation. In Proceedings of the 5th
International Joint Conferenceon Artificial Intelligence (Cambridge, MA),
pp. 43-49
Burton, R. R. 1976. Semantic grammar: An engineering technique for
constructing natural language understanding systems. Report no.
3453, Bolt Beranek and Newman
1979. An investigation of computer coaching for informal learning
activities International Journal of Man-Machine Studies 11 : 5-24.
Burton, R. R., and Brown, J. S. 1982. An investigation of computer coach-
ing for informal learning activities In Intelligent Tutoring Systems, eds.
D. Sleeman and J. S. Brown, pp. 79-98 NewYork: Academic Press.
Carbonell, J. R. 1970a. AI in CAI: An artificial-intelligence approach to
computer-assisted instruction IEEE Transactions on Man-MachineSys-
tems, MMS-11: 190-202
--. 1970b. Mixed-initiative man-computer instructional dialogues Re-
port no. 1971, Bolt Beranek and Newman
Carbonell, J. R., and Collins, A. M. 1973. Natural semantics in artificial
intelligence In Advance Papers of the 3rd International Joint Conference
on Artificial Intelligence (Stanford, CA), pp. 344-351.
Carden, T. S. 1974. The antibiotic problem (editorial) NewPhysician 23:
19.
Carnap, R. 1950. The two concepts of probability In Logical Foundations
of Probability, pp. 19-51. Chicago: University of Chicago Press
--. 1962. The aim of inductive logic. In Logic, Methodology, and Philos-
ophy of Science, eds. E. Nagel, E Suppes, and A. Tarski, pp. 303-318.
Stanford, CA: Stanford University Press
Carr, B., and Goldstein, I. 1977. Overlays: A theory of modeling for CAI.
Report no. 406, Artificial Intelligence Laboratory, Massachusetts In-
stitute of Technology
Chandrasekaran, B., Gomez, E, Mittal, S., and Smith,J. 1979. An approach
to medical diagnosis based on conceptual schemes. In Proceedings of
the 6th International Joint Conferenceon Artificial Intelligence (Tokyo), pp.
134-142.
Charniak, E. 1972. Toward a model of childrens story comprehension.
Report no. AI TR-266, Artificial Intelligence Laboratory, Massachu-
setts Institute of Technology
1977. A framed painting: The representation of a commonsense
knowledge fragment Journal of Cognitive Science 1(4): 355-394
1978. With a spoon in hand this must be the eating frame. In
Proceedings of the 2nd Conferenceon Theoretical Issues in Natural Language
Processing, pp. 187-193.
References 721
Grosz, B. 1977 The representation and use of focus in a system for un-
derstanding dialogs. In Proceedings of the 5th International Joint Confer-
ence on Artificial Intelligence (Cambridge, MA),pp. 67-76.
Gustafson, D. H., Kestly, J. J., Greist, J. H., and Jensen, N. M. 1971. Initial
evaluation of a subjective Bayesian diagnostic system. Health Services
Research 6:204-213.
Harr6, R. 1970. Probability and confirmation. In The Principles of Scientific
Thinking, pp. 157-177. Chicago: University of Chicago Press.
Hartley, J., Sleeman, D., and Woods, E 1972. Controlling the learning of
diagnostic tasks International Journal of Man-MachineStudies 4:319-
340.
Hasling, D. W., Ciancey, W. J., Rennels, G. D. 1984. Strategic explanations
for a diagnostic consultation system. International Journal of Man-Ma-
chine Studies: forthcoming.
Hayes-Roth, E, and McDermott, J. 1977. Knowledge acquisition from
structural descriptions. In Proceedingsof the 5th International Joint Con-
ference on Artificial Intelligence (Cambridge, MA),pp. 356-362
Hayes-Roth, E, Waterman, D., and Lenat, D. (eds.). 1983. Building Expert
Systems. Reading, MA:Addison-Wesley.
Hearn, A. C. 1971. Applications of symbol manipulation in theoretical
physics. Communicationsof the Association for ComputingMachinery14(8):
511-516.
Heiser, J. E, Brooks, R. E., and Ballard, J. P. 1978. Progress report: A
computerized psychopharmacology advisor (abstract). In Proceedings
of the 1 l th Collegium Internationale Neuro-Psychopharmacologicum(Vi-
enna), p. 233.
Helmer, O., and Rescher, N. 1960. On the epistemology of the inexact
sciences Report no. R-353, Rand Corporation.
Hempel, C. G. 1965. Studies in the logic of confirmation. In Aspects of
Scientific Explanation and Other Essays in the Philosophy of Science, pp. 3-
51. NewYork: Free Press.
Hendrix, G. G. 1976. The Lifer manual: A guide to building practical
natural language interfaces. Report no. 138, Artificial Intelligence
Center, Stanford Research Institute.
1977. A natural language interface facility. SIGARTNewsletter 61:
25-26
Hewitt, C. 1972. Description and theoretical analysis (using schemata)
PLANNER: A language for proving theorems and manipulating
models in a robot. Ph.D. dissertation, Massachusetts Institute of Tech-
nology.
Hewitt, C., Bishop, P., and Steiger, R. 1973. A universal modular ACTOR
formalism for artificial intelligence In AdvancePapers of the 3rd Inter-
national Joint Conference on Artificial Intelligence (Stanford, CA), pp.
235-245.
728 References
--. 1979 Decision analysis: A look at the chief complaint. New England
Journal of Medicine 300: 556
Schwartz, W. B., Gorry, G. A., Kassirer, J. E, and Essig, A. 1973 Decision
analysis and clinical judgements. American Journal of Medicine 55: 459-
472.
Scott, A. C., Clancey, W. J., Davis, R., and Shortliffe, E. H. 1977. Expla-
nation capabilities of knowledge-based production systems. American
Journal of Computational Linguistics Microfiche 62. (Appears as Chapter
18 of this volume.)
Scragg, G. W. 1975a. Answering process questions In Advance Papers of the
4th International Joint Conferenceon Artificial Intelligence (Tbilisi, USSR),
pp. 435-442
--. 1975b. Answering questions about processes. In Explorations in Cog-
nition, eds. D. A. Normanand D. E. Rumelhart. San Francisco: Free-
man.
Selfridge, O. 1959 Pandemonium:A paradigm for learning. In Proceedings
of Symposium on Mechanisation of Thought and Processes, pp. 511-529.
Teddington, U.K.: National Physics Laboratory
Shackle, G. L. S. 1952 Expectation in Economics. Cambridge, U.K.: Cam-
bridge University Press
1955. Uncertainty in Economics and Other Reflections Cambridge,
U.K.: Cambridge University Press.
Shafer, G. 1976 A Mathematical Theory of Evidence. Princeton, N J: Princeton
University Press
Shortliffe, E. H. 1974. MYCIN:A rule-based computer program for ad-
vising physicians regarding antimicrobial therapy selection. Ph.D. dis-
sertation, Stanford University (Reprinted with revisions as Shortliffe,
1976)
1976. Computer-Based Medical Consultations: MYCIN. NewYork:
American Elsevier.
. 1980 Consultation systems for physicians: The role of artificial
intelligence techniques In Proceedings of the 3rd National Conference of
the CanadianSociety for ComputationalStudies of Intelligence (Victoria,
B.C.), pp. 1-1 I. (Also in Readingsin Artificial Intelligence, eds. B. Web-
ber and N. Nilsson, pp. 323-333. Menlo Park, CA: Tioga Press, 1981)
1982a. Computer-based clinical decision aids: Somepractical con-
siderations. In Proceedings of the AMIACongress 82 (San Francisco, CA),
pp. 295-298
. 1982b. The computer and medical decision making: Good advice
is not enough (guest editorial). IEEEEngineering in Medicine and Biology
Magazine 1(2): 16-18
Shortliffe, E. H., and Buchanan, B. G. 1975 A model of inexact reasoning
in medicine Mathematical Biosciences 23:351-379
Shortliffe, E. H., and Davis, R. 1975 Someconsiderations for the imple-
mentation of knowledge-based expert systems. SIGARTNewsletter 55:
9-12.
References 735
van Melle, W., Scott, A. C., Bennett, J. S., and Peairs, M. 1981. The EMY-
CIN manual Report no. HPP-81-16, Computer Science Department,
Stanford University
Waldinger, R., and Levitt, K. N. 1974. Reasoning about programs Artificial
Intelligence 5:235-316.
Warner, H. R., Toronto, A. E, Veasey, L. G., and Stephenson, R. 1961. A
mathematical approach to medical diagnosis: Application to conge~aital
heart disease. Journal of the American Medical Association 177(3): 177-
183.
Warner, H. R., Toronto, A. E, and Veasy, L. G. 1964. Experience with
Bayes theorem for computer diagnosis of congenital heart disease.
Annals of the NewYork Academyof Science 115: 2.
Waterman, D. A. 1970. Generalization learning techniques for automating
the learning of heuristics Artificial Intelligence 1: 121-170
1974. Adaptive production systems. Complex Information Pro-
cessing Working Paper, Report no. 285, Psychology Department, Car-
negie-Mellon University
1978. Exemplary programming In Pattern-Directed Inference Sys-
tems, eds. D. A. Waterman and E Hayes-Roth, pp. 261-280. NewYork:
Academic Press.
Weiner, J. L. 1979. The structure of natural explanation: Theory and ap-
plication. Report no. SP-4305, System Development Corporation
1980. BLAH:A system which explains its reasoning. Artificial In-
telligence 15: 19-48
Weiss, C. E, Glazko, A. J., and Weston, J. K. 1960. Chloramphenicol in the
newborn infant New England Journal of Medicine 262: 787-794
Weiss, S. M., Kulikowski, C. A., Amarel, S., and Safir, A. 1978. A model-
based method for computer-aided medical decision-making. Artificial
Intelligence 11: 145-172
Weizenbaum, J. 1967. Contextual understanding by computers Commu-
nications of the Association for ComputingMachinery 10(8): 474-480.
1976. Computer Power and HumanReason: From Judgment to Calcu-
lation. San Francisco: Freeman.
Wilson, J. V. K. 1956. Two medical texts from Nimrud. IRAQ 18: 130-
146.
1962. The Nimrud catalogue of medical and physiognomical om-
ina. IRAQ 24: 52-62
Winograd, T. 1972. Understanding natural language Cognitive Psychology
3: 1-191
1975. Frame representations and the procedural/declarative con-
troversy. In Representation and Understanding,Studies in Cognitive Science,
eds. D. G. Bobrow and A. Collins, pp. 185-210 New York: Academic
Press
1977. A framework for understanding discourse. Report no. AIM-
297, Artificial Intelligence Laboratory, Stanford University.
738 References
ABEL, 387 Brown, J. S., 133, 134, 141, Cooper, G., 218, 335, 582,
Abelson, R., 615, 617 173,456, 457,459, 465, 6O3
ACT, 46 468, 470, 478, 5(36, 551, Cronbach, L. J., 639
ACTORS, 520 552, 555 CRYSALIS, 563
Adams, J. B., 214, 263 Bruce, B. C., 471 Cullingford, R., 615
ADVICE TAKER, 670 Buchanan, B. G., 8-9, 50, Cumberbatch, J., 263
AGE, 394f 92, 149, 153, 174, 201,
Aiello, N., 394-395 210, 221ff, 233, 263- Dambola, J., 16
Aikins, J. S., 19, 157,221ff, 271,302,383,455,507, DART, 11, 312
312, 392,424, 441,561, 525, 561,562,589, 699 Davis, R., 10, 18, 20, 42, 48,
565, 692, 699 Bullwinkle, C., 624 51, 171-205,326, 333,
AI/MM, 335, 396 Burton, R. R., 173, 456, 469, 338, 348, 355, 383, 396,
ALGOL,27, 39, 40, 41, 47 478, 621 464, 475, 477, 493ff,
Allen, J., 6 505, 506, 507, 520, 524,
AM, 505, 561,563, 564 CADUCEUS(see 528, 533f, 539, 564, 576,
Anderson, J., 21, 41, 46 INTERNIST) 699
Anthony, J. R., 364 Campbell, B., 653 Day, E., 635
ARL, 152, 307, 324-325, Carbonell, J. R., 9, 55, 199, De Dombal, E T., 263
687 200, 331,377,455,469 De Finetti, B., 241
Carden, T. S., 16 DeJong, G., 615
Armstrong, R., 590
Carnap, R., 242-244 De Kleer, J., 468, 505
Axline, S., 8-10, 55, 209,
571,599 Carr, B., 456, 468, 469, 479 Delfino, A. B., 462
CASNET, 506 DENDRAL, 8, 11, 23, 25ff,
Catanzarite, V., 500 29, 32, 37, 39, 46f, 49,
BAIL, 173 55f, 149, 151, 171,209,
BAOBAB, 11,602,613-634 CENTAUR,11, 19, 392-
461-462,494, 506, 562,
Barker, S. E, 244 394, 424-440, 444, 561,
671,674, 676f, 687f,
Barnett, G. O., 234-236 562,565,676, 679
692, 700
Barnett, H. L., 365 Chandrasekaran, B., 275
Dempster, A., 215, 272
Barnett, J. A., 272, 288, 292, Charniak, E., 6, 615, 619, Deutsch, B. G., 471
5O5 625
Ditlove, J., 364
Barr, A., 153 Chi, M., 456 Doyle, J., 682
Bartlett, E C., 613 CHRONICLER, 138-146 Duda, R. O., 3, 55, 211,214,
BASIC, 394 Ciesielski, V., 153 374, 392,425, 505
Beckett, T., 461,698 Clancey, W. J., 19, 57, 133,
Bennet, J. E., 590 214,217, 221ff, 328, Edelmann, C. M., 365
Bennett, J. S., 12, 152, 307, 333-334, 335-337,338, Edwards, L. E, 18
312, 314, 412, 589, 686 372, 383, 396, 455ff, Edwards, W., 236, 259
Bennett, S. W., 334, 363, 589 464, 494, 504, 506, 531, Eisenberg, L., 635
BIP, 479 557,561,582, 589, 679, ELIZA, 693
Bischoff, M. B., 604, 653 698, 699 Elstein, A. S., 211,439, 451,
BIount, S. E., 478 Clark, K. L., 333 552, 651
BLUEBOX, 312 Clayton, J. E., 394, 441 EMYCIN,6, 11, 18, 60, 132,
Blum, B. l., 604 CLOT, 11,312, 314, 318- 152, 154-157, 160-162,
Blum, R. L., 153,687 323,500f, 698 165, 196, 210, 214-216,
Bobrow, D. G., 45, 134, 141, Cohen, S. N., 8-10, 55, 209, 284, 295-301,302-313,
471,525,614, 615 571,589 314-328, 393, 412, 439,
Bobrow, R. G., 134, 141 Colby, K. M., 151,333, 352, 451,494ff, 602,605,
Bonnet, A., 312, 498, 602, 688, 693 653, 658-663, 670,
613 Collen, M. E, 263 674f, 685f, 696ff
Boyd, E., 365, 370 Collins, A., 199, 200, 455, Engelmore, R. S., 314, 563
Brachman, R. J., 532 457,469, 484, 613 EPAM, 26
Bransford, J., 613 Conchie, J. M., 364 Erman, L. D., 395, 505,561
Brown, B. W., 210 CONGEN, 11,600 EURISKO, 505
739
Evans, A., 22 HEADMED, 312 Interlisp, 110, 140, 157,
EXPERT, 152 Heaps, H. S., 263 173, 188, 2t9, 308,
Hearn, A. C., 151 431,472,600, 605,
Fagan, L. M., 19, 392-393, HEARSAY-1I,44, 195, 561, 616, 660, 664, 699
397,589, 658, 699 563 LITHO, 312,498
Falk, G., 189 Heiser, J. E, 312 LOGO, 455
Fallat, R., 393 Helmer, O., 233 London, R., 335
Faught, W., 471 Hempel, C. G., 244 Luce, R. D., 246
Feigenbaum, E. A., 8, 11, 23, Hendrix, G. G., 412,621, Ludwig, D., 214
26, 151, 153, 171,201, 624
392-393, 397, 561 Hewitt, C., 49, 103, 520 Manna, Z., 528
Feigin, R. D., 590 Hilberman, M., 398 Mayne, J. G., 635
Feldman, J., 8, 86 Hollander, C. R., 312 McCabe, E G., 333
Feltovich, P., 456 Horn, B., 6 McCarthy, J., 6, 86, 152, 670,
Feurzeig, W., 465 Horwitz, J., 604,655 672, 692
Fisher, L. S., 364 Howrey, S. E, 370 McDermott, D. V., 681
Floyd, R., 6, 22 McDermott, J., 174, 201
Forsythe, G., 8 Interlisp (see LISP) MEDIPHOR, 8-9
Fox, M., 505 INTERNIST, 283, 289, 386- Melhorn, J. M., 636
Franks, J. L., 613 387, 425,580 Merigan, T., 9, 590
Friedman, L., 272 Meta-DENDRAL, 1l, 153,
Friedman, R. B., 635 Jacobs, C. D., 653 494
FRL, 614 Jacques, J. A., 263 Michie, D., 151
FRUMP, 615 Jaynes, J., 12 Miller, R. A., 282,289, 387,
Jelliffe, R. W., 365 580
Garvey, T. D., 272 Johnson, P. E., 457, 581 Minsky, M. L., 60, 392,425,
Gaschnig, J., 578 613, 615,617
Genesereth, M. R., 456, 505, Kahneman, D., 211 Mitchell, T. M., 174
5O6 Kay, A., 520 Model, M., 213, 337
Gerring, P. E., 605, 653 Keynes, J. M., 242 MOLGEN,561,563, 565
Gibaldi, M., 367 King, J. J., 20 Moran, T., 23, 38, 42, 46, 52
Ginsberg, A. S., 263 Kintsch, W., 613
Moses, J., 171,304
Glantz, S. A., 635 Koffman, E. B., 478 Muller, C., 16
Glesser, M. A., 263 KRL, 614
Mulsant, B., 312
Goguen, J. A., 245,373 Kulikowski, C., 152, 506
Goldberg, A., 520 Kunin, C. M., 16, t8
NEOMYCIN, 11,396, 460-
Goldman, D., 314 Kunz, J. C., 335, 396, 397, 461,506, 557, 560, 561,
Goldstein, I. P., 374, 456, 424, 441,506, 603
562,565, 567, 676, 679,
457, 464, 465, 468, 469,
698
478, 551,552, 555,614, Langlotz, C. J., 335-336,
615, 617 603, 611,657,660, 664, NESTOR, 335
Gordon, J., 215, 272 692 Neu, H. C., 370
Gorry, G. A., 234-236, 263, Leaper, D. J., 386 Newell, A., 6, 8, 22, 25, 27,
332, 371 Lederberg, J., 8, 296 40, 45, 52, 171,303, 455
GPS, 303,304 Ledley, R. S., 259 Nie, N. H., 639
GRAVIDA,11,500f, 698 Lenat, D. B., 44, 149, 153, Nii, H. E, 11,213, 392-394
Grayson, C. J., 241 442, 464, 505, 562, 563, Nilsson, N. J., 304
Green, C. C., 8-10 573 Norusis, M. J., 263
Greiner, R., 442 Lerner, E, 590
Grinberg, M. R., 506 Lesgold, A. M., 456, 581 OBrien, T. E., 364
Grosz, B., 614 Letsinger, R., 396, 460, 506, ONCOCIN,11, 58, 152, 156,
GUIDON,11, 19, 126, 372- 557, 561,583 159-170, 335, 396, 599,
373,451,458-463, Levitt, K. N., 528 601,603,604-612,
464-492, 494, 531ff, Levy, A. H., 636 653ff, 676, 685, 692,
690, 691,698 LIFER, 624 693,698
Gustafson, D. H., 263,635 Linde, C., 373 Osborn, J. J., 392-394, 397,
Lindsay, R. K., 8, 23, 29, 398
Hannigan, J. E, 589 153, 304
Harr6, R., 240, 244, 249, 257 LISP, 6, 9, 23, 46, 47, 49, 50, Papert, s., 455
Hartley, J., 469 57, 70, 80, 81, 86, 90, PARRY, 333,688, 693
Hasling, D. W., 336 93, 124, 132, 149, 154, Parzen, E., 239
HASP/SIAP(see SU/X) 164, 169, 174, 182, 194, PAS-1I, 25, 32, 40, 44, 46-48
Hayes-Roth, E, 149f, 174, 296, 307, 407, 601,670f, Patil, R. S., 381,387, 396,
201,573 687, 698 505,506, 691
740
Pauker, S. G., 214, 217, 332, SCA paradigm, 141 Teach, R., 336, 603, 635
386, 387-388,411,417, Schank, R., 333,615,617 TEIRESIAS, 11, 18, 152ff,
425,451,540, 551 Scheckler, W. E., 16 153-157, 160f, 165,
Perrier, D., 367 Schefe, E, 214 168, 171-205, 310, 333,
Peterson, O. L., 16 Scheinok, E A., 263 493ff, 507ff, 601,687f
PIP, 332, 386-387 SCHOLAR,9, 55, 331 Teitelman, W., 110, 173
Pipberger, H. V., 263 Schwartz, G. J., 365 Terry, A., 563
PLANNER,49, 50, 103 Schwartz, W. B., 234, 635 Tesler, L. G., 23
Poker player, 149-153 Scott, A. C., 10, 159, 212, Trigoboff, M., 214
Politakis, 1:, 152 221ff, 333-334, 338, Tsuji, S., 659, 664
Polya, G., 455 363, 653, 699 Turing, A. M., 694
Pople, H. F., 304, 386-387, Scragg, G. W., 133, 134 Tversky, A., 246
425,464, 5(16, 683 Selfridge, O., 619
Popper, K. R., 249 Sendray, J., 370 Van Lehn, K., 456
Post, E., 20 Servan-Schreiber, D., 312 Van Melle, W., 18, 67, 157,
Present Illness Pr()gram (see Shackle, G. L., 246, 247 215, 295-301,302, 325,
PIP) Shackleford, E J., 590 494, 653, 699
PROLOG, 333 Shafer, G., 215, 272, 282 VIS, 23, 25, 38, 42, 26
PROSPECTOR, 55, 211, Shortliffe, E. H., 3, 8-9, 50, Visconti, J. A., 16, 18
214,425, 505,578, 581 58, 78, 92, 106-107, VM,11, 19, 313, 392-394,
PSG, 23, 25, 26, 38, 40, 47, 153, 159, 210, 211,214, 397-423,658, 685, 692
48 221ff, 233, 252, 263- Vosti, K., 226
PUFE 11,312, 393-394, 271,272, 302,333,338,
417,424,437-440, 348, 371,373,386, 458, Waldinger, R., 528
441-452,495, 565, 675, 525, 571,576, 599, 601, Wallis, J., 220, 335,371
692, 698 603,611,635, 653, 657, Warner, D. (see Hasling, D.
660, 664, 659, 692, 695, W.)
Rahal, J. J., 590 698, 699 Warner, H. R., 234, 236, 267
Ramsey, E P., 241 Siber, G. R., 364 Waterman, D. A., 8, 32-34,
Reddy, D. R., 195 Sidner, C., 614, 624 40, 41, 46, 48, 52, 149,
Reimann, H. H., 16 Simmons, H. E., 16 153, 201,573
Reiser, J. E, 173 Simon, H. A., 22, 27, 52, Wehrle, E E, 590
Remington, J. S., 590 171,303 Weiner, J. L., 373
Resnikoff, M., 635 Sleeman, D. H., 468 Weiss, C. E, 364
Resztak, K. E., 16 Slovic, E, 267 Weiss, S. M., 152, 374, 469,
Rieger, C., 374 SMALLTALK, 520 5O6
Rinaldo, J. A., 263 Smith, D. E., 394, 441 Weizenbaum, J., 365,693
RLL, 442 Smith, D. H., 462 WEST, 469, 478
Roberts, A. W., 16, 17 SOPHIE, 457, 470 WHEEZE, 392-394, 441-
Roberts, B., 464, 614, 617 Sprosty, P. J., 469 452, 676
Robinson, R. E., 590 Startsman, T. S., 635, 636 Williams, R. B., 16
ROGET,152, 307, 686 Stefik, M. J., 561,565, 614, Wilson, J. V. K., 12
Rosenberg, S., 614 617 Winograd, T., 28, 30, 32,
Ross, E, 263 Stevens, A. L., 456 133, 134, 352, 471,525,
Rubin, M. I., 365 Stolley, E D., 16 558, 614, 627
Rumelhart, D., 627 STUDENT, 45 Winston, E, 6, 153, 174, 673
RX, 153 Suppes, E, 210, 211,246 Wirtschafter, D. D., 604
Rychener, M. D., 44, 45 Suwa, M., 159 Woods, W. A., 532,621
SU/X, 11,393 Wraith, S. (see Bennett, S.
SACON,11,304, 312, 417, Swanson, D. B., 567 W.)
495ff, 675, 691,697 Swartout, W. R., 328, 372- WUMPUS,469, 479
Sager, N., 615 373,691
SAIL, 86, 173, 664 Swinburne, R. G., 240, 242, Yeager, A. S., 590
Salmon, W. C., 245,257 246 Yu, V. L., 221ff, 572, 589,
SAM, 615 Szolovits, E, 214, 217,386- 590, 599
Sanner, L., 612 388, 411,417,425,451,
Savage, L. J., 241 540, 551,655 Zadeh, L. A., 210, 245
741
Subject Index
abbreviated rule language (ARL) (see rule causal knowledge (see knowledge)
language) causal models, 374, 381,456, 460, 484, 539,
acceptance, by user community (see human 548ff
engineering) certainty factors (see also inexact inference),
acid-base disorders, 381 23, 61, 63, 65, 210ff. 81, 91-93, 112,
adaptive behavior, 52 202, 209-232, 233, 247ff, 262, 267-271,
agenda, 441-452, 525f, 561 272ff, 321,374,434, 443f, 472,485,
algebra, 304 525,540, 545, 582,675, 679ff, 700
algorithm (see also therapy algorithm), 3, 125, assigning values to, 154f, 221ff, 252
133, 134, 150, 185, 283 with associative triples, 70
allergies (see drugs, contraindications) combining function, 116, 216, 219, 254ff,
anatomical knowledge (see knowledge, 277, 284
structural) gold standard for, 221 ff
SAND(see also predicates), 80, 97ff, 105 justification for, 56, 221ff, 239ff, 681
AND/ORgoal tree, 49, 103-112 propagation of, 162, 212ff, 255,444
answers, to questions (see dialogue) sensitivity analysis, 217ff, 582, 682f
antecedent rules (see rules) threshold, 94, 211,283
antecedents (see also rules; syntax), CFs (see certainty factors)
architecture (see control; representation) chemistry (see also DENDRAL), 8, 26, 37,
artificial intelligence, 3, 6, 86, 150, 331f, 360, 149, 304
chunks of knowledge (see modularity)
381,424, 455, 663f, 687
circular reasoning, 63, 116ff
as an experimental science, 19, 672
classification problems, 312,426, 675, 697
ASKFIRST(LABDATA), 64, 89, 105, 120,
clinical algorithms, 683
374
clinical parameters (see parameters)
associative triples (see representation)
closed-world assumption, 469, 675
attitudes of physicians (see also human
CNTXT(see contexts)
engineering), 57, 602f, 605, 635-652
code generation (see automatic programming)
attributes (see parameters) cognitive modeling(see also psychology), 26, 211
automatic programming, 188, 193f, 520
combinatorial explosion, 524
commonsense knowledge (see knowledge)
backtracking, 82, 127,410, 420. 697 deductions (see unity path)
backward chaining (see control) completeness (see also knowledge base; logic),
batch mode(see patient data) 199f, 305, 656, 684
Bayes Theorem, 79, 210, 211,214, 215, complexity, 335, 375,377ff, 387
234ff, 263ff, 385, 386 computer-aided instruction (see tutoring)
belief (see certainty factors) concept broadening (see diagnosis, strategies
biases (see evaluation) fi~r)
big switch, 13 concept identification (see knowledge
blackboard model, 395,563 acquisition, conceptualization)
blood clotting (see CLOT) conceptual framework, 374f, 391,495, 684f
bookkeeping information, 433,472,516, 527, conceptualization (see knowledge acquisition,
676 conceptualization)
bottom-up reasoning (see control, forward conclude function, 113ff
chaining) confirmation (see also certainty factors), 57,
breadth-first reasoning (see control) 210, 218, 240, 241,242, 243-245, 247,
272,426, 681
CAI(see tutoring) conflict resolution, 22, 38, 43, 48, 50, 162
cancer chemotherapy (see ONCOCIN) conflicts (see knowledgebase, conflicts in)
case library (see also patient data), 137, 156, consequents (see also rules),
479, 583, 594, 602 consequent theorems (see rules, consequent)
case-method tutoring (see tutoring) consistency (see also rule checking;
categorical reasoning (see certainty factors; subsumption), 65, 77, 156, 159-170,
knowledge, inexact), 56, 209 195,202, 324,432,440, 456, 656, 686
742
checking, 41, 180 data:
contradictions, 308 acceptable values (see expectations)
constraints, 135ff, 145 collection, 398, 409f, 655
constraint satisfaction, 133, 313,685, 697 snapshop of, 313, 393, 675
consuhation, 3,201,302, 3611, 422,426, 457, time varying, 409f, 655ff
610, 635f[, 671,691,701 uncertainty in, 674, 684, 696
example of, 69f, 298ff, 319f, 323f, 427- data base (see also patient data), 22, 112, 386,
430, 533, 553, 704-711 655, 692
subprogram in MYCIN, 5, 10, 67-73, 78- data-directed reasoning (see control, forward
132, 184 chaining)
content-directed invocation (see control) data structures (see representation)
contexts, 60, 64, 70-71, 82, 99, 163,297, debugging {see also knowledge base,
344, 353, 360, 493,670 refinement), 51, 152, 159
context tree, 60, 62, 79, 82-86, 99 104, decision analysis (see also utilities), 217,234,
112, 118ff, 128, 132, 295, 324,494- 332
503, 675,678 decision trees, 23f, 311
context types, 82ff, 495ff declarative knowledge (see knowledge;
instantiation, 62, 118ff, 495ff representation)
in ONCOCINsrules, 163ff, 659 deep knowledge {see knowledge, causal)
contextual information, 179, 185-198,201, defaults (see knowledge)
203, 335, 393f, 396, 398, 410, 421ff, definitional rules (see rules)
471,477,677 definitions (see knowledge, support)
contradictions (see consistency) demand ratings, 637,644-647
contraindications (see drugs, demons(see control)
contraindications), 543 Dempster-Shafer theory of evidence, 215,
control (see also control knowledge), 28, 32, 272ff, 681
33, 43-45, 48-50, 60-65, 103-112, depth-first reasoning (see control)
220f, 358, 416, 435ff, 441-452, 493, design considerations, 3ff, 10, 19, 51, 57-59,
495, 526, 531f, 670, 673,677ff, 696f 67, 78, 176, 238, 304, 331,340, 342f,
backward chaining, 5, 27, 40, 57, 60, 71if, 349, 397f, 403f, 417, 421ff, 458, 467f,
104, 176, 187, 304, 346, 376, 395,426, 505, 531,576ff, 603, 605f, 636, 648,
447,465,511,532, 539, 601,659ff, 649ff, 671ff
677,681,700 diagnosis, 13-16, 234, 312,441,461,545
blocks, 659t" strategies for, 426, 448f, 537, 552ff, 673,
content-directed inw)cation, 527, 539 679, 702
data-directed (see control, torward dialogue (see also humanengineering), 335,
chaining) 467ff, 615, 670, 687
demons, 29, 619 evaluation, 575
of dialogue, 71
exhaustive search, 56, 521 management of, 9, 60, 71, 105, 110, 119,
127, 260, 374, 395,439f, 447,456,
forward chaining, 4f, 13, 27, 57, 60, 195,
387,419, 426, 449, 456, 461,511,539, 459, 465, 470ff, 480ff, 483ff, 601,
606ff, 613f, 618, 651,656
561,601,606, 626, 658, 659, 661ff,
677, 681 mixed initiative, 455,458
goal-directed (see control, backward dictionary (see also humanengineering), 68,
chaining) 73, 99, 193, 306, 349, 620
hypothesis-directed (see control, backward disbelief (see also inexact inference), 247ff,
chaining) 273
message passing, 561 disconfirmation (see confirmation)
model-directed, 195 discourse (see dialogue)
MONITORfunction (see MONITOR) discrimination nets, 625
prototypes f~)r (see prototypes) disease hierarchies (see inference structure)
of search, 57, 220, 04, 521,674 documentation, 529
select-execute loop, 24 domain independence (see generality)
control knowledge(see also rules, meta-rules), drugs:
134, 394ff, 677 allergies to (see drugs, contraindications)
explicitness, 394 antibiotics, 13if, 122ff, 234, 363ff, 372,
correctness (see evaluation) 395, 593, 600
cost-benefit analysis, 62, 215,217, 235, 246, contraindications, 15ff, 135
522, 565, 576, 578, 680 dosing, 17, 125f, 137, 163-170, 334, 363-
COVERFOR,222, 223,474ff, 486, 554 370
credit assignment (see also knowledge base, optimal therapy (see also therapy), 137
refinement), 177,688 overprescribing, 16ff
critiquing model, 467,692 prophylactic use, 17
743
sensitivities, 15, 133, 135 user models in (see user models)
toxicity (see drugs, contraindications) WHY?/HOW?, 75t, 111, 173, 310, 373,
533f, 601,689t"
editor (see also rule editor; rule language), explicitness (see also knowledge; transparency;
180, 307, 391,670 understandability; modularity), 545, 564f
education (see also tutoring), 337,450, 575 extensibility (see flexibility)
efficiency, 48, 576, 578
electronics, 396 facets, 617,619
ELSEclauses, 61, 79ff, 115 facts (see representation, of facts)
English understanding (see dialogue; human fear of computers, 648
engineering; natural language) feedback, 9f, 204, 459, 513, 551, 577, 686,
entrapment, 483ff 702
error checking (see rule checking) FINDOUT (see also rule interpreter), 105-
EVAL, 71 110, 116f, 121, 125f, 130, 132
evaluation, 67, 137, 155ff, 337,439f, 450, flexibility (see al~o knowledgebase,
571-588, 589f1. 602, 651,674,694f, 701 refinement), 3, 6, 50, 149, 296, 311,342,
of acceptability, 575, 578, 602, 636 450, 465, 470, 488, 493, 559f, 565, 669f,
of attitudes, 610f, 635-652 687
gold standard, 572, 579 inflexibility, 503, 520
methodology, 573, 579, 581,588, 590 focus of attention (see also control), 179, 186,
of MYCIN, 571-577, 583-588, 589-596 441,447, 471,479
of ONCOCIN, 606, 610 FOREACH, 223
of performance, 218, 574, 644 formal languages, 6
sensitivity analysis, 217-219, 582 formation problems (see synthesis problems)
events, representation of, 500 forward chaining (see control)
evidence, 498, 550 frames, 60, 63, 394ff, 425,431ff, 437, 441-
evidence gathering (see also control; 452,505,613ff, 617, 633,672,676
confirmation), 5, 176, 460, 469, 674L function templates (see templates; predicates)
696, 700 funding, 599, 698
evidential support (see inexact inference) fuzzy logic, 210, 214, 245-247
hard and soft evidence, 152
exhaustive search, 505, 534 game-playing, 150
EXPECT (attribute of parameters), 88ff, 350 generality (see also EMYCIN),451,465,656,
expectations, 177, 182f, 188, 195, 203,401f, 674, 677,695f, 701
411,417-419, 450, 511,637ff generate and test, 135ff, 145, 674, 697
expertise, 580ff, 636 geography (see SCHOLAR)
nature of, 233, 373, 456, 459f, 467f geology (see PROSPECTOR)
transfer of (see knowledgeacquisition) glaucoma (see CASNET)
use in explanation, 378ff global criteria, 135
experts, 158, 170, 234, 236, 242, 262, 264, goal-directed reasoning (see control)
580, 686 goal rule, 104, 554f
agreement among, 584ff, 592 goal tree (see rule invocation, record of)
disagreement among, 582, 584-588, 682 gold standard (see evaluation; certainty
evaluations of, 582, 584-588 factors)
interactions with (see knowledge grain size (see modularity)
engineering) grammar, 22, 80f, 620-624
expert systems, 3ff, 7, 25, 247, 272, 282,385, graphics and graphical presentations, 336,
455f, 460, 530, 568, 574, 577ff, 634 368, 399f, 419, 608ff
building (see also knowledgeacquisition), GRID/GRIDVAL, 102f
150, 387,577, 670, 686ff
validating (see evaluation) handcrafting (see knowledge acquisition)
explanation (see also question-answering; hardware, 575,578, 612, 659, 665
reasoning status checker; natural help facilities, 64, lllf, 310, 474, 480f, 599,
language), 27, 31, 42, 65, 133, 161, 171, 704f
233, 331-337, 338-362,363-370, 371- HERSTORY list (see rule invocation, record
388, 394,451,457,465,475, 493,531- of)
568, 575, 599f, 644, 651,664, 670, 674, heuristics, 3, 48, 50, 133, 144, 150, 2ll, 482,
677,688ff, 693, 695, 705, 707f 524, 550f, 676, 681
of drug dosing, 363-370 heuristic search (see control)
of meta-rules, 526, 528 hierarchical organization of knowledge (see
of rules, 38, 72, 132, 133, 238, 305f knowledge)
subprogram in MYCIN,4, 7, 10, 57, 67, Hodgkins disease (see also ONCOCIN),656
73ff, 79, 111, 112,339, 371f, 458, 532, HOW?(see explanation)
537 human engineering, 19, 42, 146, 156, 308,
of therapy (see therapy, explanation of) 309f, 331-337, 338, 349, 411,439, 599-
744
612, 674, 678, 688t"t", 691t"t" causal, 335, 374ff, 377ff, 385ff, 396, 460,
acceptance, 3"421", 337, 37 lff, 578,595,599, 503,552ff, 672, 676, 702
637,688, 695 commonsense, 73, 150, 540, 559, 651
dictionary of terms and synonyms (see compiled, 503f, 541,551,566, 679, 690
dictionary) default, 61, 164f, 376, 432, 509, 559, 620,
English understanding (see also natural 659
language), 67, 73, 76, 693L 701 domain-specific (see also vocabulary), 149
I/O handling (.gee also dialogue), 68, 110f, hierarchic organization (see also contexts,
297,600 context tree), 274f, 292, 403f, 515ff,
modelsof interaction, 67 l, 691 f, 70 l 678
preview (.gee preview mechanism) inexact (see also certainty factors), 67, 209ff,
unity path (.gee unity path) 416f, 673, 683ff
hypothesis-directed reasoning (see control) interactions, 582
hypntbesis tormation, 8 intermediate concepts, 551,560
hysteresis, 406, 422 judgmental, 3,236ff, 316, 525,540, 663,
682
I/O (.tee dialogue) about knowledge (see meta-level knowledge)
ICAI (see tutoring) meta-level (see also rules, meta-rules), 172-
IDENT, 93, 107, ll6, 123, 222, 223 205, 328, 336, 342, 396, 458, 461,464,
ill-structured problems, 9, 209, 683, 686 474, 476ff, 488, 493-506, 507-530
importance (see alto CFs), 335, 375, 377ff, multiple uses of, 468f, 477, 507, 529, 673
387, 4.42, 438, 442,449 pedagogical (see also tutoring), 464, 691
incnmpletcncss (.gee completeness) procedural, 57, 64, 341,446, 528, 554,
inconsistency (.gee consistency) 557, 619, 677
independence, 258f, 264, 267, 270, 386, 685 separation from inference procedure, 6,
indexing, 13,416, 441,524, 538f, 557,562, 174, 175, 295-301,464, 527,678, 696
565,670, 677,679, 697 separation of types, 134, 437,457, 460f,
indirect referencing (see also control, content- 493, 506, 508,531,670, 676, 679, 691
directed invocation), 56.4 strategy (see also rules, meta-rules), 19, 56,
induction, 174, 201,687f 73,315, 336, 407,467, 470, 503,
inexact inference (.gee also certainty factors), 504ff, 508, 521ff, 531,537ff, 551-
50, 56, 63, 162, 209, 233ff, 255f, 392, 559, 564f, 678, 691,702
416, 433,442ff, 482,664, 679-685 structural, 316, 496, 504ff, 516, 538ff,
vs. categorical reasoning, 56, 295, 317 562ff, 676, 691
coml)ining function, (.)3, l l6, 21 l, 216, support, 126, 372, 385,464, 469, 474,
one-number calculus, 214 475f, 504ff, 539, 556, 565
precision, 210, 680, 682, 700 of syntax (see templates)
inexact knowledge (.gee knowledge) taxonomic, 396, 425,670, 676
infectious diseases, 13ff, 55, 104, 214, 217, temporal, 406f, 416, 420, 658
234, 260, 370, 591 textbook, 456
inference (.gee also control): knowledge acquisition (see also ROGET;
deductive (.gee logic) TEIRESIAS; knowledge engineering),
engine (see also rule interpreter), 175f, 33, 50f, 55f, 59, 76f, 149-158, 159ff,
295tt 159f, 168, 171-205,225ff, 297ff, 306ff,
structure (see also contexts, context tree), 314, 318, 325ff, 372, 387,411,461,462,
55, 314, 316f, 321f, 326f, 374ff, 392, 493, 507,510ff, 517ff, 560, 670, 673,
4()7f, 448f, 485f, 534ff, 542ff, 554f, 676f, 682,686ff, 700
567 advice taking, 670
inheritance, 515, 563,676t" conceptualization, 155, 161, 170, 314, 326f,
INITIALDATA(MAINPROI)S), 56, 60, 119, 503, 686
120, 705 debugging (see knowlege base, refinement),
intensive care unit (ICU), 393,397-423 160ff
interaction (see models of interaction) hand crafting, 151, 171,513,687
interdiscil)linary research, 8ff learning, 33, 52, 152f, 186f, 203, 205,513,
interface (.tee humanengineering) 644, 651
lnterlisp (.gee LISP) models of, 150ff, 687f
Interviewer (in ONCOCIN),605, 653, 656 subprogram in MYCIN,4, 7, 10, 67, 76f
iteration, 313 knowledge base, 342, 343, 465, 697, 700
completeness, 156, 159ff, 159-170
jaundice, 273ff conflicts in, 162, 559, 582
construction (see knowledge acquisition)
key factors, in rules, 477, 543, 550, 702 czar, 221-228, 687
keyword matching (see parsing) display of (see also explanation), 160, 169
knowledge: maintenance, 309, 519, 521,582,644,
algorithmic, 57, 66, 124 686ff
745
refinement, 9, 72, 137, 150, 152ff, 159, chunks of knowledge, 27, 39, 42, 52, 55,
161, 172ff, 187f, 297ff, 310f, 327f, 71, 72, 85, 154, 224, 238, 242, 438
331,337, 391,439, 528, 582, 644, 686 global, 30, 32
structure of, 493-506 grain size, 503f, 672
validation (see also evaluation), 129, 152, modus ponens (see logic)
594 MONITOR (see also rule interpreter), 105-
knowledge-based system (see expert system) 110, 116f, 121f, 125f, 130, 132
knowledge engineering, 5-7, 55f, 145f, 149- monitoring, 9, 393, 397-423,675
158, 159f, 170, 202, 567,672, 686, 700 MYCINgang, 222-232, 699, 703
tools for, 152-158, 170, 171,295-301,
302-313, 324, 655, 686ff, 699 natural language (see also human
knowledge sources, 557, 560ff engineering), 57, 67, 73, 76, 144, 176,
KNOWN (see predicates) 179f, 182, 188-196, 202, 210, 306, 331,
333, 335,340, 342, 348ff, 422, 458,601,
LABDATA(see ASKFIRST) 605,613-634, 693f
language: nonmonotonic reasoning (see logic)
formal, 22
understanding (see natural language) object-centered programming, 56
learning (see knowledge acquisition) oncology (see ONCOCIN)
least commitment, 565 opportunistic control (see blackboard model)
lesson plan, 47 I, 479 optimization (see also constraints, satisfaction),
LHS(see also rules), 133
linguistic variables (see fuzzy logic) ordering (see also control):
logic, 65, 392, 212, 343, 345,672, 681 of clauses/questions (see also dialogue,
completeness, 156 managementof), 61, 63, 72, 130f, 395,
conflict, 162 535, 554, 678f
consistency, 41, 42, 43, 238 of rules (see also rules, meta-rules), 130,
contradiction, 41,230, 238 535, 679
modus ponens, 21, 65 organisms (see infectious diseases)
nonmonotonic (see also backtracking), 558, overlay model (see student models)
681
predicate calculus, 28, 233 parallel processing, 82
quantification, 62, 65 parameters, 70, 86-90, 118, 163f, 297, 298ff,
redundancy, 162 321,353, 374, 376, 407ff, 496, 659
subsumption, 41, 156, 162,230, 259 multi-valued, 87, 108, 283, 534, 619
LOOKAHEAD,89f, 115, 355 properties of, 88-90, 408
LTM(see also memory), 33 single-valued, 87, 282, 619
symbolic values for, 403,418f
MAINPROPS(see INITIALDATA) types, 87, 408
maintenance (see knowledge-base typical values tor, 445
yes-no, 87, 93f, 534
maintenance)
management (see project management) parsing, 73, 76, 188, 193ff, 333, 349-354,
man-machine interface (see dialogue) 412, 480, 511,616, 620ff, 693, 701
mass spectrometry (see DENDRAL) part-whole relations (see also contexts, context
matching (see also predicates), 186 tree), 498, 545, 677
mathematical models, 316, 334, 335, 396 patient data, 65, 79, 112-115, 127-129,
445f, 583
mathematics, 151
pattern matching, 73
MB/MD (see also certainty factors), 211,215,
patterns, in rules (see rule models)
247ff, 265ff, 288, 679 pedagogical knowledge (see knowledge)
medicine, use of computers in, 304, 640, 652 performance (see evaluation)
memory, 22, 26, 31, 33, 44, 613 pharmacokinetics, 334, 363ff
meningitis, 217 philosophy of science, 210, 239ff
message passing (see control) planning, 136, 313, 336, 534, 563
meta-rules (see rules) poker, 8, 46
mineral exploration (see PROSPECTOR) precision, 210, 680, 682, 700
missing rules (see also knowledge base, predicates (see also templates), 37, 62, 65, 70,
completeness), 162f, 511 72, 80, 87, 93-99 182, 192, 324,412-
models (see rule models) 415,421), 510
models of interaction (see also consulation; presentation methods (see dialogue)
critiquing model; monitoring), 30If, 692 preview mechanism, 61, 63, 72, 131,395,
modifiability (see design considerations; 493,678, 679
flexibility) probabilities (see also inexact reasoning; Bayes
modularity, 10, 47f, 56, 305, 361,458, 529, ~Iheorem), 70, 79, 91,234ft. 239-242,
670, 676, 684, 702 259, 263-271,385-387, 680
746
problem difficuhy, 675 uniform, 52, 396, 441,526, 532, 568, 675
problem solving (see control; evidence REPROMPT, 210
gathering) resource allocation, 505
production systems, 6ff, 12t, 20ff, 672, 675, response time (see human engineering)
7OO restart (see also backtracking), 129
appropriate domains, 28 RHS(see also rules),
pure, 20, 30 risks (see utilities)
taxonomy, 21, 45 robustness, 67, 685, 692
programming: rule-based system, 672
environment, 306-311 rule checking (see also knowledgebase,
knowledge programming, 153, 670, 688 completeness), 180, 183, 197f, 307f, 324,
style, 529f 513
program understanding, 528 rule compilation, 311
project management, 674 rule editor, 180, 195f, 493, 512
PROMPT,88, 110, 118, 210, 617, 619 rule interpreter (see also inference engine),
prompts, 64, 88 24, 31, 61, 71ff, 212, 304f, 310, 341,
propagation of" uncertainty (see certainty 524, 534
factors; knowledge, inexact) rule invocation, record of, 65, 74, 115, 133,
protocols, 604ff, 654 138ff, 160, 187, 333, 345, 354, 358, 458,
prototypes (see also frames; rule models), 56, 469
189f, 424-440, 505 rule language, 153, 297
prototypical values (see knowledge, default) rule model, 76, 156, 165, 168, 189-200, 202,
psychology, 25, 47, 52, 210, 338, 388, 439, 355, 477, 508, 509ff, 520, 539
448, 451, 46 I, 566, 613, 651 rule network (see inference structure)
psychopharmacology (see BLUEBOX; rule pointers, 374
HEADMED) rules, 4, 6, 12f, 55-66, 79-103, 134, 209,
pulmonary physiology (see PUFF; VM) 297, 305, 375-377, 410-413,431-434,
675-677
QA(see question-answering) advantages, 72, 238, 669f
quantification (see logic) annotations in, 62, 367
question-answering (see also explanation), 73, antecedent, 60, 678
138ff, 198ff, 306, 333, 340, 342, 348- Babylonian, 12f
362, 457, 601 causal, 383, 540f
examples, 74, 143, 348, 349, 350f, 355ff, circular (see circular reasoning)
361, 711-713 consequent, 49, 103
default, 164
randomized controlled trials, 579 definitional, 164, 295, 383,541,676, 678
Reasoner (in ONCOCIN),606, 653, 657 domain fact, 541
reasoning network, 103ff, 108 examples of, 71, 100, 164, 238, 296, 317,
reasoning status checker (RSC) (see also 322, 344, 432, 447, 543ff, 660
explanation), 73, 75, 340ff, 346ff grain size (see modularity)
recursion, 524 identification, 540
redundancy, 157, 162, 684f independence of (see modularity)
refinement (see control; knowledge indexing, 164
acquisition) initial, 164
reliability (see robustness) justifications for, 367, 475, 506, 531ff,
relevancy tags, 377 540ff, 675, 690
renal thilure (see also drugs, dosing), 332, mapping, 62
365ff meta-rules, 19, 48, 56, 63, 65, 73, 130, 212,
representation (see also frames; logic; 383, 395, 521-527, 535, 556ff, 676,
prototypes; rules; schemata; semantic 678f
networks), 8, 19, 161, 173, 323ff, 391t"t, ordering of clauses in (see ordering)
406t", 424-440, 441-452, 514ff, 527ff, predictive, 462
531-568, 651,673, 675tt, 697 premises of, 496
associative triples, 23, 68, 76, 86, 87, 190, production rules, 21ff, 55ff, 59ff, 70ff, 70f,
209, 282, 304, 509, 516 136, 161,391f, 700
explicitness of (see explicitness) refinement rules, 434
expressive power of, 134, 670, 676f, 686 restriction clauses, 550
of facts (see also representation, associative schemata (see schemata)
triples), 431,434 screening, 661
lists, 99 screening clauses in, 61,394f, 544f, 549,
procedures, 20, 28, 57, 64, 392, 446, 557, 566, 679
566 self-referencing, 42, 61, 115, 130, 383, 385,
tabular knowledge, 99t" 394, 558f, 680, 682
uncertainty (see knowledge, inexact) statistics, 157f, 218, 688
747
strategy, 47, 56, 387, 396, 556ff theory of choice, 246
syntax of (see also predicates), 4, 35, 46f, therapy, 9, 13-18, 57, 133-146, 234, 336,
70, 76, 79, 157,212,392, 401,410- 399-407,411,593,671,713-715
412 algorithm, 57, 63, 66, 122ff, 132, 133ff,
summary rules, 434ff 261-262,685
tabular, 62, 217, 546ff comparison, 141-144
therapy, 136, 140 explanation of, 133, 138-141, 144f, 333,
translations of, 71, 90, 102f, 238 715
triggering, 434, 441,444 protocols, 163-170,654f
tutoring (see tutoring) threshold, in CF model (see also certainty
uncertainty in, 674 factors), 211,216, 220, 222-232,681
world fact, 540 time (see knowledge, temporal)
rule types, 383 top-down refinement (see also control), 555,
562, 565
SAME(see predicates) topic shifts, 615ff
scene analysis (see vision) toxicity (see contraindications)
schemata, 476, 508, 514-520, 613ff, 616ff, trace, of reasoning (see rule invocation,
624, 627,633 record of)
screening clauses (see rules) tracing, of parameters, 64, 108, 304, 345
scripts, 548ff, 615, 617 TRANS,90, 102f, 119, 210, 617,619
search (see control) transfer of expertise (see knowledge
second-guessing (see expectations) acquisition)
semantic nets, 9, 55, 374, 392, 425,545 transition network, 138ff, 145, 348, 404ff,
sensitivity analysis (see evaluation, sensitivity 421
analysis) transparency (see understandability)
signal understanding, 343 trigger (see control, forward chaining), 387
simplicity, 323f, 392, 670, 676f triples (see representation)
simulation, of human problem solving, 313, Turing machines, 21, 52
315, 327,439, 461 Turings test (see evaluation)
smart instruments, 345 tutoring (see also GUIDON),19, 58, 126, 145,
Socratic dialogue, 455,484 238, 328, 335, 371,372,396, 455-463,
speech understanding (see also HEARSAY), 464-489,494, 531-568, 670, 674, 676,
201,692f 688ff, 701
spelling correction (see humanengineering, case method, 457,467ff
I/O handling) rules, 372,463,472ff, 690
spirometer (see PUFF)
state transition network (see also uncertainty (see certainty factors; knowledge,
representation), 134, 138, 404-407,421 inexact)
statistics (see also rules, statistics) 209, 210, understandability (see also explanation), 3, 9,
234, 239, 509, 591,603,639 671 41, 56, 150, 174, 176, 331f, 334, 337,
STM(see also memory), 22ff 403,437-440, 450f, 493, 503, 506
strategies (see knowledge, strategy) uniformity, of representation (see
structural analysis (see SACON) representation)
structured programming, 35 unity path, 63, 73, 130, 377, 396, 493
student models, 466, 471,473,478, 483ff UPDATED-BY,90, 105, 229, 231,355, 679
subsumption, 156, 162, 308, 324, 685 user interaction (see humanengineering)
summaries of conclusions, 399, 419, 430 user models (see also student models), 335,
symbolic reasoning (see artificial intelligence) 373ff, 387,466
synonyms(see dictionary) utilities (see cost-benefit analysis)
syntax (see also rules, syntax of), 35, 508, 521,
529, 620ff validation (see evaluation)
verification/checking, 159, 161, 184
tabular data, 62, 482 vision, 189, 201,613
tabular knowledge (see representation) vocabulary, of a domain, 73, 150, 210, 442ff,
TALLY (see also certainty factors), 98, 114, 467, 503,564, 684,686, 702
211 volunteered information (see also control,
taxonomy (see knowledge, taxonomy) forward chaining), 602,613ff, 678, 691,
teaching (see tutoring) 693
technology transfer, 395,698f examples, 628ff
templates, for functions or predicates, 37, 72,
157, 164f, 188, 194, 305, 344 477, 508, weight of evidence (see inexact inference)
520f what-how spectrum, 315
terse mode, 64 WHY?(see explanation)
test cases (see case library) workstations (see hardware)
testing (see evaluation) world knowledge (see knowledge, common
theorem proving (see logic) sense)
748