0% found this document useful (0 votes)
480 views753 pages

Rule-Based Expert Systems

Rule-Based Expert Systems

Uploaded by

palmtron
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
480 views753 pages

Rule-Based Expert Systems

Rule-Based Expert Systems

Uploaded by

palmtron
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 753

Rule-Based Expert Systems

The Addison-Wesley Series in Artificial Intelligence

Buchanan and Shortliffe (eds.): Rule-Based Expert Systems: The MYCIN


Experiments of the Stanford Heuristic Programming
Project. (1984)
Clancey and Shortliffe (eds.): Readings in Medical Artificial Intelligence: The
First Decade. (1984)
Pearl: Heuristics: Intelligent Search Strategies for ComputerProblemSolving.
(1984)
Sager: Natural Language Information Processing: A Computer Grammarof
English and Its Applications. ( 1981)
Wilensky: Planning and Understanding: A Computational Approach to Human
Reasoning. (1983)
Winograd: Languageas a Cognitive Process Vol. I: Syntax. (1983)
Winston: Artificial Intelligence, Second Edition. (1984)
Winston and Horn: LISP, Second Edition. (1984)
Rule-Based Expert Systems
The MYCIN Experiments
of the Stanford Heuristic
Programming Project

Edited by

Bruce G. Buchanan
Department of Computer Science
Stanford University

EdwardH. Shortliffe
Department of Medicine
Stanford University School of Medicine

Addison-Wesley Publishing Company


Reading, Massachusetts Menlo Park, California
London Amsterdam Don Mills, Ontario Sydney
This book is in The Addison-WesleySeries in Artificial Intelligence.

Library of Congress Cataloging in Publication Data

Main entry under title:

Rule-based expert systems.

Bibliography: p.
Includes index.
1. Expert systems (Computer science) 2. MYCIN
(Computer system) I. Buchanan, Bruce G. II. Short-
liffe, Edward Hance.
QA76.9.E96R84 1984 001.535 83-15822
ISBN 0-201-10172-6

Reprinted with corrections, October 1984

Copyright 1984 by Addison-Wesley Publishing Company, Inc. All rights reserved. No


part of this publication maybe reproduced, stored in a retrieval system, or transmitted, in
any form or by any means, electronic, mechanical, photocopying, recording, or otherwise,
without the prior written permission of the publisher. Printed in the United States of
America. Published simultaneously in Canada.

BCDEFGHIJ-MA-8987654
For Sally and Linda
Contents

Contributors ix

Foreword by Allen Newell xi


Preface xvii

PART ONE Background


Chapter 1 The Context of the MYCINExperiments 3
Chapter 2 The Origin of Rule-Based Systems in AI 2O
Randall Davis and Jonathan J. King

PART TWO Using Rules


Chapter 3 The Evolution of MYCINs Rule Form 55
Chapter 4 The Structure of the MYCINSystem 67
William van Melle
Chapter 5 Details of the Consultation System 78
EdwardH. Shortliffe
Chapter 6 Details of the Revised Therapy Algorithm 133
WiUiamJ. Clancey

PART THREE Building a Knowledge Base


Chapter 7 Knowledge Engineering 149
Chapter 8 Completeness and Consistency in a Rule-Based System 159
Motoi Suwa, A. Carlisle Scott, and EdwardH.
Shortliffe
Chapter 9 Interactive Transfer of Expertise 171
Randall Davis

PART FOUR Reasoning Under Uncertainty


Chapter 10 Uncertainty and Evidential Support 209
Chapter 11 A Model of Inexact Reasoning in Medicine 233
Edward H. Shortliffe and Bruce G. Buchanan
Chapter 12 Probabilistic Reasoning and Certainty Factors 263
J. Barclay Adams

vi
Contents vii

Chapter 13 The Dempster-Shafer Theory of Evidence 272


Jean Gordon and Edward H. Shortliffe

PART FIVE Generalizing MYCIN

Chapter 14 Use of the MYCINInference Engine 295


Chapter 15 EMYCIN:A Knowledge Engineers Tool for 302
Constructing Rule-Based Expert Systems
William van Melle, EdwardH. Shortliffe, and Bruce
G. Buchanan
Chapter 16 Experience Using EMYCIN 314
James s. Bennett and Robert S. Engelmore

PARTSIX Explaining the Reasoning


Chapter 17 Explanation as a Topic of AI Research 331
Chapter 18 Methods for Generating Explanations 338
A. Carlisle Scott, William J. Clancey, Randall Davis,
and EdwardH. Shortliffe
Chapter 19 Specialized Explanations for Dosage Selection 363
Sharon Wraith Bennett and A. Carlisle Scott
Chapter 20 Customized Explanations Using Causal Knowledge 371
Jerold W. Wallis and EdwardH. Shortliffe

PARTSEVENUsing Other Representations


Chapter 21 Other Representation Frameworks 391
Chapter 22 Extensions to the Rule-Based Formalism for a 397
Monitoring Task
Lawrence M. Fagan, John C. Kunz, Edward A.
Feigenbaum, and John J. Osborn
Chapter 23 A Representation Scheme Using Both Frames and 424
Rules
Janice S. Aikins
Chapter 24 Another Look at Frames 441
David E. Smith and Jan E. Clayton

PART EIGHT Tutoring


Chapter 25 Intelligent Computer-Aided Instruction 455
Chapter 26 Use of MYCINsRules for Tutoring 464
William J. Clancey
viii Contents

PART NINE Augmenting the Rules


Chapter 27 Additional Knowledge Structures 493

Chapter 28 Meta-Level Knowledge 507


Randall Davis and Bruce G. Buchanan
Chapter 29 Extensions to Rules for Explanation and Tutoring 531
William J. Clancey

PARTTEN Evaluating Performance


Chapter 30 The Problem of Evaluation 571

Chapter 31 An Evaluation of MYCINsAdvice 589


Victor L. Yu, Lawrence M. Fagan, Sharon Wraith
Bennett, WiUiamJ. Clancey, A. Carlisle Scott, John F.
Hannigan, Robert L. Blum, Bruce G. Buchanan,
and Stanley N. Cohen

PART ELEVEN Designing for HumanUse


Chapter 32 Human Engineering of Medical Expert Systems 599
Chapter 33 Strategies for Understanding Structured English 613
Alain Bonnet
Chapter 34 An Analysis of Physicians Attitudes 635
Randy L. Teach and Edward H. Shortliffe
Chapter 35 An Expert System for Oncology Protocol Management 653
EdwardH. Shortliffe, A. Carlisle Scott, MiriamB.
Bischoff, A. Bruce Campbell, William van MeUe,and
Charlotte D. Jacobs

PART TWELVE Conclusions


Chapter 36 Major Lessons from This Work 669

Epilog 703
Appendix 7O5
References 717
Name Index 739
Subject Index 742
Contributors

J. Barclay Adams, M.D., Ph.D. Bruce G. Buchanan, Ph.D.


Associate Physician Professor of Computer Science (Research)
Department of Medicine Department of Computer Science
Brigham and Womens Hospital Stanford University
Harvard Medical School Stanford, California 94305
Boston, Massachusetts 02115

Janice s. Aikins, Ph.D. A. Bruce Campbell, M.D., Ph.D.


Research Computer Scientist Practice of Hematology/Oncology
IBMPalo Alto Scientific Center 9834 Genesee Avenue, Suite 311
1530 Page Mill Road La Jolla, California 92037
Palo Alto, California 94304
William J. Clancey, Ph.D.
James s. Bennett, M.S.
Research Associate
Senior Knowledge Engineer
Department of Computer Science
Teknowledge, Inc.
Stanford University
525 University Avenue
Stanford, California 94305
Palo Alto, California 94301

Sharon Wraith Bennett, R.Ph. Jan E. Clayton, M.S.


Clinical Pharmacist Knowledge Engineer
University Hospital, RC32 Teknowledge, Inc.
University of Washington 525 University Avenue
Seattle, Washington 98109 Palo Alto, California 94301
Miriam B. Bischoff, M.S.
Research Affiliate Stanley N. Cohen, M.D.
Medical Computer Science, TC-135 Professor of Genetics and Medicine
Stanford University Medical Center Stanford University Medical Center
Stanford, California 94305 Stanford, California 94305

Robert L. Blum, M.D., Ph.D.


Research Associate Randall Davis, Ph.D.
Department of Computer Science Assistant Professor of Computer Science
Stanford University Artificial Intelligence Laboratory
Stanford, California 94305 Massachusetts Institute of Technology
Cambridge, Massachusetts 02139
Alain Bonnet, Ph.D.
Professor
Ecole Nationale Superieure des Robert S. Engelmore, Ph.D.
Telecommunications Director, Knowledge Systems Development
46, rue Barrault Teknowledge, Inc.
75013 Paris 525 University Avenue
France Palo Alto, California 94301
ix
Lawrence M. Fagan, M.D., Ph.D. Edward H. Shortliffe, M.D., Ph.D.
Senior Research Associate Assistant Professor of Medicine and
Department of Medicine Computer Science
Stanford University Medical Center Medical Computer Science, TC-135
Stanford California 94305 Stanford University Medical Center
Stanford, California 94305
Edward A. Feigenbaum, Ph.D.
Professor of Computer Science David E. Smith
Department of Computer Science Research Assistant
Stanford University
Department of Computer Science
Stanford California 94305 Stanford University
Stanford, California 94305
Jean Gordon, Ph.D.
Research Assistant
Medical Computer Science, TC-135 Motoi Suwa, Ph.D.
Stanford University Medical Center Chief, Man-Machine Systems Section,
Stanford California 94305 Computer Systems Division
Electrotechnical Laboratory
John E Hannigan, Ph.D. 1-1-4 Umezono, Sakura-mura
Statistician Niihari-gun, Ibaraki 305
Northern California Cancer Program Japan
1801 Page Mill Road
Palo Alto, California 94304 Randy L. Teach
Deputy Assistant Secretary for
Charlotte D. Jacobs, M.D. Evaluation and Technical Analysis
Assistant Professor of Medicine Department of Health and Human
(Oncology) Services
Stanford University Medical Center Washington, D.C. 20201
Stanford California 94305

Jonathan J. King, Ph.D. William van Melle, Ph.D.


Knowledge Engineer Computer Scientist
Teknowledge, Inc. Xerox Palo Alto Research Center
525 University Avenue 3333 Coyote Hill Road
Palo Alto, California 94301 Palo Alto, California 94304

John C. Kunz Jerold W. Wallis, M.D.


Manager, Custom Systems Resident in Medicine
IntelliGenetics University of Michigan Hospital
124 University Avenue Ann Arbor, Michigan 48105
Palo Alto, California 94301

John J. Osborn, M.D. Victor L. Yu, M.D.


President Associate Professor of Medicine
Jandel Corporation Division of Infectious Disease
3030 Bridgeway 968 Scaife Hall
University of Pittsburgh School of
Sausalito, California 94965
Medicine
A. Carlisle Scott, M.S. Pittsburgh, Pennsylvania 15261
Senior Knowledge Engineer
Teknowledge, Inc.
525 University Avenue
Palo Alto, California 94301
Foreword

The last seven years have seen the field of artificial intelligence (AI) trans-
formed. This transformation is not simple, nor has it yet run its course.
The transformation has been generated by the emergence of expert systems.
Whatever exactly these are or turn out to be, they first arose during the
1970s, with a triple claim: to be AI systems that used large bodies of heu-
ristic knowledge, to be AI systems that could be applied, and to be the
wave of the future. The exact status of these claims (or even whether my
statement of them is anywhere close to the mark) is not important. The
thrust of these systems was strong enough and the surface evidence im-
pressive enough to initiate the transformation. This transformation has at
least two components. One comes from the resulting societal interest in
AI, expressed in the widespread entrepreneurial efforts to capitalize on
AI research and in the Japanese Fifth-Generation plans with their subse-
quent worldwide ripples. The other component comes from the need to
redraw the intellectual map of AI to assimilate this new class of systems--
to declare it a coherent subarea, or to fragment it into intellectual subparts
that fit the existing map, or whatever.
A side note is important. Even if the evidence from politics is not
persuasive, science has surely taught us that more than one revolution can
go on simultaneously. Taken as a whole, science is currently running at
least a score of revolutions--not a small number. AI is being transformed
by more than expert systems. In particular, robotics, under the press of
industrial productivity, is producing a revolution in AI in its own right.
Although progressing somewhat more slowly than expert systems at the
moment, robotics in the end will produce an effect at least as large, not
just on the applied side, but on the intellectual structure of the field as
well. Even more, both AI and robotics are to some degree parts of an
overarching revolution in microelectronics. In any event, to focus on one
revolution, namely expert systems, as I will do here for good reason, is not
to deny the importance of the others.
The book at whose threshold this foreword stands has (also) a triple
claim on the attention of someone interested in expert systems and AI.
First, it provides a detailed look at a particular expert system, MYCIN.
Second, it is of historical interest, for this is not just any old expert system,
but the granddaddy of them all--the one that launched the field. Third,
it is an attempt to advance the science of AI, not just to report on a system
or project. Each of these deserves a moments comment, for those readers
who will tarry at a foreword before getting on with the real story.
MYCIN as Example It is sometimes noted that the term expert system
is a pun. It designates a system that is expert in some existing humanart,

xi
xii Foreword

and thus that operates at human scale--not on some trifling, though per-
haps illustrative task, not on some toy task, to use the somewhatpejorative
term popular in the field. But it also designates a system that plays the role
of a consultant, i.e., an expert who gives advice to someone whohas a task.
Such a dual picture cannot last long. The population of so-called expert
systems is rapidly becoming mongrelized to include any system that is ap-
plied, has some vague connection with AI systems and has pretentions of
success. Such is the fate of terms that attain (if only briefly) a positive halo,
when advantage lies in shoehorning a system under its protective and pro-
ductive cover.
MYCINprovides a pure case of the original pun. It is expert in an
existing art of humanscale (diagnosing bacterial infections and prescribing
treatment for them) and it operates as a consultant (a physician describes
a patient to MYCIN and the latter then returns advice to the physician).
The considerations that came to the fore because of the consultant mode--
in particular, explanation to the user--play a strong role throughout all of
the work. Indeed, MYCINmakes explicit most of the issues with which
any group who would engineer an expert system must deal. It also lays
out some of the solutions, making clear their adequacies and inadequacies.
Because the MYCIN story is essentially complete by now and the book tells
it all, the record of initial work and response gives a perspective on the
development of a system over time. This adds substantially to the time-
sliced picture that constitutes the typical system description. It is a good
case to study, even though, if we learn our lessons from it and the other
early expert systems, we will not have to recapitulate exactly this history
again.
One striking feature of the MYCIN story, as told in this book, is its
eclecticism. Those outside a systems project tend to build brief, trenchant
descriptions of a system. MYCIN is an example of approach X leading to
a system of type Y. Designers themselves often characterize their own sys-
tems in such abbreviated terms, seeking to make particular properties
stand out. And, of course, critics do also, although the properties they
choose to highlight are not usually the same ones. Indeed, I myself use
such simplified views in this very foreword. But if this book makes anything
clear, it is that the MYCIN gang (as they called themselves) continually
explored, often with experimental variants, the full range of ideas in the
AI armamentarium. We would undoubtedly see that this is true of many
projects if we were to follow their histories carefully. However,it seems to
have been particularly true of the effort described here.
MYCINas History MYCINcomes out of the Stanford Heuristic Pro-
gramming Project (HPP), the laboratory that without doubt has had the
most impact in setting the expert-system transformation in motion and
determining its initial character. I said that MYCIN is the granddaddy of
expert systems. I do not think it is so viewed in HPP. They prefer to talk
about DENDRAL,the system for identifying chemical structures from
mass spectrograms (Lindsay, Buchanan, Feigenbaum, and Lederberg,
Foreword xiii

1980), as the original expert system (Feigenbaum, 1977). True, DENDRAL


was the original system built by the group that became HPP, and its origins
go back into the mid-1960s. Also true is that many basic design decisions
that contributed to MYCINcame from lessons learned in DENDRAL. For
instance, the basic production-system representation had been tried out in
DENDRAL for modeling the mass spectrometer, and it proved highly ser-
viceable, as seen in all the work on Meta-DENDRAL, which learned pro-
duction rules. Andcertainly true, as well, is that the explicit focus on the
role of expertise in AI systems predates MYCIN by a long stretch. I trace
the focus back to Joel Mosess dissertation at M.I.T. in symbolic integration
(Moses, 1967), which led to the MACSYMA project on symbolic mathe-
matics (Mathlab Group, 1977), a system often included in the roster
early expert systems.
Even so, there are grounds for taking DENDRAL and MACSYMA as
precursors. DENDRAL has strong links to classical problem-solving pro-
grams, with a heuristically shaped combinatorial search in a space of all
isomers at its heart and a representation (the chemical valence model) that
provided the clean space within which to search. DENDRAL started out
as an investigation into scientific induction (on real tasks, to be sure) and
only ended up becoming an expert system when that view gradually
emerged. MYCIN,on the other hand, was a apure rule-based system that
worked in an area unsupported by a clean, gcientifically powerful repre-
sentation. Its search was limited enough (being nongenerative in an im-
portant sense) to be relegated to the background; thus MYCINcould be
viewed purely as a body of knowledge. MYCINembodied all the features
that have (it must be admitted) become the clich6s of what expert systems
are. MACSYMA also wears the mantle of original expert system somewhat
awkwardly. It has never been an AI system in any central way. It has been
regarded by those who created it, and now nurture it, as not belonging to
the world of AI at all, but rather to the world of symbolic mathematics.
Only its roots lie in AI--though they certainly include the attitude that
computer systems should embody as much expertise as possible (which
may or may not imply a large amount of knowledge).
Myposition here is as an outsider, for I did not witness the day-to-day
development of MYCINin the research environment within which (in the
early 1970s) DENDRAL was the reigning success and paradigm. But I still
like my view that MYCIN is the original expert system that made it evident
to all the rest of the world that a new niche had opened up. Indeed, an
outsiders view mayhave a validity of its own. It is, at least, certain that in
the efflorescence of medical diagnostic expert systems in the 1970s (CAS-
NET, INTERNIST, and the Digitalis Therapy Advisor; see Szolovits,
1982), MYCINepitomized the new path that had been created. Thus,
gathering together the full record of this system and the internal history
of its development serves to record an important event in the history of
AI.
MYCINas Science The first words of this foreword put forth the
image of a development within AI of uncertain character, one that needed
xiv Foreword

to be assimilated. Whatever effects are being generated on the social or-


ganization of the field by the development of an applied wing of AI, the
more important requirement for assimilation, as far as I am concerned,
comes from the scientific side. Certainly, there is nothing very natural about
expert systems as a category, although the term is useful for the cluster of
systems that is causing the transformation.
AI is both an empirical discipline and an engineering discipline. This
has manyconsequences for its course as a science. It progresses by building
systems and demonstrating their performance. From a scientific point of
view, these systems are the data points out of which a cumulative body of
knowledge is to develop. However, an AI system is a complex join of many
mechanisms, some new, most familiar. Of necessity, on the edge of the art,
systems are messy and inelegant joins--thats the nature of frontiers. It is
difficult to extract from these data points the scientific increments that
should be added to the cumulation. Thus, AI is case-study science with a
vengeance. But if that were not enough of a problem, the payoff structure
of AI permits the extraction to be put off, even to be avoided permanently.
If a system performs well and breaks new ground--which can often be
verified by global output measures and direct qualitative assessment--then
it has justified its construction. Global conclusions, packaged as the dis-
cursive views of its designers, are often the only increments to be added
to the cumulated scientific base.
Of course, such a judgment is too harsh by half. The system itself
constitutes a body of engineering know-how. Through direct study and
emulation, the next generation of similar systems benefits. However, the
entire history of science shows no alternative to the formation of explicit
theories, with their rounds of testing and modification, as the path to gen-
uine understanding and control of any domain, whether natural or tech-
nological. In the present state of AI, it is all too easy to moveon to the
next system without devoting sufficient energies to trying to understand
what has already been wrought and to doing so in a way that adds to the
explicit body of science. An explosive development, such as that of expert
systems, is just the place where engineering progress can be expected to
occur pell-mell, with little attention to obtaining other than global scientific
lessons.
This situation is not to be condemnedout of hand, but accepted as a
basic condition of our field. For the difficulties mentioned above stem from
the sources that generate our progress. Informal and experiential tech-
niques work well because programmed systems are so open to direct in-
spection and assessment, and because the loop to incremental change and
improvement is so short, with interactive creation and modification. AI,
like any other scientific field, must find its ownparticular way to science,
building on its own structure and strengths. But the field is young, and
that way is not yet clear. Wemust continue to struggle to find out how to
extract scientific knowledge from our data points. The situation is hardly
unappreciated, and many people in the field are trying their hands at
varying approaches, from formal theory to more controlled system exper-
Foreword xv

imentation. There has been exhortation as well. Indeed, I seem to have


done my share of exhortation, especially with respect to expert systems.
The editors of the present volume, in inviting me to provide a foreword
to it, explicitly noted that the book was (in small part) an attempt to meet
the calls I had made for more science from our expert-systems experi-
ments. And recently, Harry Pople asserted that his attempt at articulating
the task domain of medical diagnosis for INTERNISTwas (again, in small
part) a response to exhortation (he called it criticism) of mine (Pople, 1982).
I am not totally comfortable with the role of exhorter--I prefer to be in
the trenches. However, if comments of mine have helped move anyone to
devote energy to extracting the science from our growing experience with
expert systems, I can only rejoice.
The third claim of this book, then, is to extract and document the
scientific lessons from the experience with MYCIN.This extraction and
documentation occurs at two levels. First, there has been a very substantial
exploration in the last decade of many of the questions that were raised
by MYCIN.Indeed, there are some 26 contributors to this book, even
though the number of people devoted to MYCINproper at any one time
was never very large. Rather, the large number of contributors reflects the
large number of follow-on and alternative-path studies that have been un-
dertaken. This book documents this work. It does so by collecting the
papers and reports of the original researchers that did the work, but the
present editors have made substantial revisions to smooth the whole into
a coherent story. This story lays to rest the simplified view that MYCIN
was a single system that was designed, built, demonstrated and refined; or
even that it was only a two-stage affair--MYCIN, the original task-specific
system, followed by a single stage of generalization into EMYCIN, a kernel
system that could be used in other tasks. The network of studies was much
more ramified, and the approaches considered were more diverse.
The step to EMYCIN does have general significance. It represents a
major way we have found of distilling our knowledge and making it avail-
able to the future. It is used rather widely; for example, the system called
EXPERT(Kulikowski and Weiss, 1982) bears the same relation to the CAS-
NETsystem as EMYCIN does to MYCIN.It is of a piece with the strategy
of building special-purpose problem-oriented programming languages to
capture a body of experience about how to solve a class of problems, a
strategy commonthroughout computer science. The interesting aspect of
this step, from the perspective of this foreword, is its attempt to capitalize
on the strong procedural aspects of the field. The scientific abstraction is
embodied in the streamlined and clean structure of the kernel system (or
programming language). The scientific advance is communicated by direct
study of the new artifact and, importantly, by its use. Such kernel systems
still leave muchto be desired as a vehicle for science. For example, evalu-
ation still consists in global discussion of features and direct experience,
and assessment of its use. (Witness the difficulty that computer science has
in assessing programminglanguages, an entirely analogous situation.) Still,
xvi Foreword

the strategy represented by EMYCIN is an important and novel response


by AI to producing science.
The second level at which this book addresses the question of science
is in surveying the entire enterprise and attempting to draw the major
lessons (see especially the last chapter). Here the editors have faced a hard
task. Of necessity, they have had to deal with all the complexity of a case
study (more properly, of a collection of them). Thus, they have had
settle for reflecting on the enterprise and its various products and expe-
riences, and to encapsulate these in what I re[erred to above as qualitative
discussion. But they have a long perspective available to them, and there
is a lot of substance in the individual studies. Thus, the lessons that they
draw are indeed a contribution to our understanding of expert systems.
In sum, for all these reasons Ive enumerated, I commendto you a
volume that is an important addition to the literature on AI expert systems.
It is noteworthy that the Stanford Heuristic Programming Project previ-
ously produced an analogous book describing the DENDRAL effort and
summarizing their experience with it (Lindsay, Buchanan, Feigenbaum and
Lederberg, 1980). Thus, HPP has done its bit twice. It is well ahead of
manyof the rest of us in providing valuable increments to the accumulation
of knowledge about expert systems.

Pittsburgh, Pennsylvania Allen Newell


March 1984

REFERENCES

Feigenbaum, E. A. The art of artificial intelligence: Themes and case stud-


ies in knowledgeengineering. In Proceedings of the Fifth International
Joint Conference on Artificial Intelligence. Pittsburgh, PA: ComputerSci-
ence Department, Carnegie-Mellon University, 1977.
Kulikowski, C. A., and Weiss, S. M. Representation of expert knowledge
for consultation: The CASNETand EXPERTprojects. In E Szolovits
(ed.), Artificial Intelligence in Medicine. Boulder, CO: WestviewPress,
1982.
Lindsay, R. K., Buchanan, B. G., Feigenbaum, E. A., and Lederberg, J.
Applications of Artificial Intelligence to Chemistry: The DENDRAL Project.
New York: McGraw-Hill, 1980.
Mathlab Group. MACSYMA Reference Manual (Tech. Rep.). Computer Sci-
ence Laboratory, M.I.T., 1977.
Moses, J. Symbolic Integration. Doctoral dissertation, M.I.T., 1967.
Pople, H. E., Jr. Heuristic methods for imposing structure on ill-structured
problems: The structuring of medical diagnosis. In E Szolovits, (ed.),
Artificial Intelligence in Medicine. Boulder, CO: WestviewPress, 1982.
Szolovits, P. Artificial Intelligence in Medicine, Boulder, CO: WestviewPress,
1982.
Preface

Artificial intelligence, or AI, is largely an experimental science--at least as


much progress has been made by building and analyzing programs as by
examining theoretical questions. MYCIN is one of several well-known pro-
grams that embody some intelligence and provide data on the extent to
which intelligent behavior can be programmed. As with other AI pro-
grams, its development was slow and not always in a forward direction.
But we feel we learned some useful lessons in the course of nearly a decade
of work on MYCINand related programs.
In this book we share the results of many experiments performed in
that time, and we try to paint a coherent picture of the work. The book is
intended to be a critical analysis of several pieces of related research, per-
formed by a large number of scientists. Webelieve that the whole field of
AI will benefit from such attempts to take a detailed retrospective look at
experiments, for in this way the scientific foundations of the field will
gradually be defined. It is for all these reasons that we have prepared this
analysis of the MYCINexperiments.
The MYCIN project is one of the clearest representatives of the experi-
mental side of AI. It was begun in the spring of 1972 with a set of discus-
sions among medical school and computer science researchers interested
in applying more intelligence to computer programs that interpret medical
data. Shortliffes Ph.D. dissertation in 1974 discussed the problem and the
MYCINprogram that implemented a solution. In itself, the 1974 version
of MYCINrepresents an experiment. We were testing the hypothesis, ad-
vanced in previous work at Stanford, that a rule-based formalism was suf-
ficient for the high performance, flexibility, and understandability that we
demanded in an expert consultation system. The positive answer to this
question is one of the best-known lessons in the history of AI.
In addition to, or rather because of, the original MYCIN program and
the medical knowledge base that was accumulated for that work, many
1derivative projects explored variations on the original design. EMYCIN
is amongthe best knownof these, but there are several others. In this book
we discuss many of the experiments that evolved in the period from 1972

lWe use the nameEMYCIN tbr the system that evolved from MYCIN as a frameworkfor
building and runningnewexpert systems. Thenamestands for "essential MYCIN,"that is,
MYCINs frameworkwithout its medical knowledgebase. Wehave been remindedthat E-
MYCIN is the nameo[ a drug that UpjohnCorp. has trademarked.The two namesshould
not be confused: EMYCINshould not be ingested, nor should E-MYCIN be loaded into a
computer.

xvii
xviii Preface

to 1982 based on the 1972-1974 design effort. We have chosen those


pieces of work that, at least in retrospect, can be seen as posing clear
questions and producing clear results, most of which were documented in
the AI or medical literature and in technical reports.
Weare taking a retrospective view, so as to restate questions and rein-
terpret results in a more meaningful way than that in which they were
originally documented. Amongother things, we now present these pieces
of" work as a collected whole, whereas they were not originally written as
such. Each paper is heavily edited--new sections have been added to put
the work in context, old sections have been deleted to avoid redundancies
and "red herrings," and the entire text has been reworked to fit each paper
into the unified picture. Each part begins with an overview chapter posing
the central question of the section, discussing the implications of the ques-
tion in its historical context, and providing a current frameworkfor inter-
preting the results. Someentirely new papers were prepared specifically
for this book. In addition, we are including several papers and technical
reports that have previously been difficult to find and will therefore be
generally available for the first time.
The last chapter is entirely new and could not have been written until
the experiments were performed. It presents a set of conclusions that we
have drawn from the experimental results. In a sense, the rest of the book
discusses the data that support these conclusions. Webelieve this book is
unique in its attempt to synthesize 10 years of work in order to demonstrate
scientific foundations and the way in which AI research evolves as key
issues emerge.

Acknowledgments

Wegratefully acknowledge the help and friendship of Edward Feigenbaum


and Joshua Lederberg. Over many years not only did they motivate and
encourage our work in applying artificial intelligence to medicine, but they
also created the intellectual and computing environment at Stanford that
made this work possible.
Many individuals have contributed to the varied aspects of MYCIN
and to the ideas in this book. In the past we have affectionately referred
to each other as "the MYCINgang." All of the authors of chapters built
parts of MYCIN,performed experiments, contributed to overall design,
and/or wrote the original articles on which the chapters are based. The
persons who have been part of the MYCINgang with us over the years
are Janice S. Aikins, Stanton G. Axline, Timothy E Beckett, James S. Ben-
nett, Sharon Wraith Bennett, Robert L. Blum, Miriam B. Bischoff, Alain
Bonnet, A. Bruce Campbell, Robert Carlson, Ricardo Chavez-Pardo, Wil-
liam J. Clancey, Jan E. Clayton, Stanley N. Cohen, Gregory E Cooper,
Acknowledgments xix

Randall Davis, Robert S. Engelmore, Lawrence M. Fagan, Robert Fallat,


Edward A. Feigenbaum, John Foy, Jean Gordon, Cordell Green, John E
Hannigan, Diane Warner Hasling, Robert Illa, Charlotte D. Jacobs, Jona-
than J. King, .John C. Kunz, Reed Letsinger, Robert London, Dana Lud-
wig, Thomas C. Merigan, John J. Osborn, Frank Rhame, Louis Sanner, A.
Carlisle Scott, David E. Smith, Motoi Suwa, Randy L. Teach, William J. van
Melle, Jerold Wallis, and Victor L. Yu.
Wewish to thank, also, the numerous other persons in the AI, partic-
ularly the AIM(AI in Medicine), community with whomconversations over
the years have been valuable sources of ideas. In particular, Randy Davis,
John McDermott, Carli Scott, Bill van Melle, Bill Clancey, and Jim Bennett
gave us help in clearly formulating Chapter 36.
Wewere fortunate in having Joan Differding and Dikran Karaguezian
organize and edit much of the material here. Their help was invaluable.
Jane Hoover carefully copyedited the manuscript and substantially con-
tributed to its readability. Wealso wish to thank Darlene Vian, Juanita
Mullen, Susan Novak, Cindy Lawton, and Barbara Elspas for assistance in
preparing the manuscript.
Almost all of the computing work reported here (and the manuscript
preparation) was done on the SUMEX-AIMcomputer at Stanford. Mr.
Thomas Rindfleisch, former director of SUMEX,made it possible for us
to attend to the research without undue worry about system reliability by
making SUMEXa stable, high-quality computing environment. We also
wish to acknowledge the importance of SUMEX funding from the National
Institutes of Health, Division of Research Resources (RR-00785) and
thank, particularly, Dr. William Baker for his leadership in sustaining the
quality of the resource.
Finally, the individual projects described here were funded by a variety
of agencies over the years. Wegratefully acknowledge the assistance of the
Bureau of" Health Services Research and Evaluation (HS-01544), the Na-
tional Institutes of Health, General Medical Sciences (GM01922 and GM
29662), the National Science Foundation (MCS-7903753), the Defense
vanced Research Projects Agency (MDA-903-77-C-0322), the Office of Na-
val Research (N0014-79-C-0302, NR-049-479), the National Library
Medicine (LM-03395, LM-00048), and the Henry J. Kaiser Family Foun-
dation.

Stanford University B.G.B.


March 1984 E.H.S.
In the early stages of the development of any science different men
confronting the same range of phenomena, but not usually all the same
particular phenomena, describe and interpret them in different ways. What is
surprising, and perhaps also unique in its degree to the fields we call science, is
that such initial divergences should ever largely disappear.

T. S. Kuhn, The Structure of


Scientific Revolutions (International
Encyclopediaof Unified Science,
vol. II, no. 2). Chicago:
University of Chicago Press, 1962.

The philosophers treatment of a question is like the treatment of an illness.

L. Wittgenstein, Philosophical
Investigations, para. 255 (trans.
G. E. M. Anscombe). New York:
Macmillan, 1953.

Every one then who hears these words of mine and does them will be like a
wise man who built his house upon the rock; and the rain fell, and the floods
came, and the winds blew and beat upon that house, but it did not fall, because
it had been founded on the rock. And every one who hears these words of mine
and does not do them will be like a foolish man who built his house upon the
sand; and the rain fell, and the floods came, and the winds blew and beat against
that house, and it fell; and great was the fall of it.

Matthew 7:24-27
(Revised Standard Version)
PART ONE

Background
1
The Context of the MYCIN
Experiments

Artificial Intelligence (AI) is that branch of computer science dealing with


symbolic, nonalgorithmic methods of problem solving. Several aspects of
this statement are important for understanding MYCINand the issues
discussed in this book. First, most uses of computers over the last 40 years
have been in numerical or data-processing applications, but most of a per-
sons knowledge of a subject like medicine is not mathematical or quanti-
tative. It is symbolic knowledge, and it is used in a variety of ways in prob-
lem solving. Also, the problem-solving methods themselves are usually not
mathematical or data-processing procedures but qualitative reasoning tech-
niques that relate items through judgmental rules, or heuristics, as well as
through theoretical laws and definitions. An algorithm is a procedure that
is guaranteed either to find the correct solution to a problem in a finite
time or to tell you there is no solution. For example, an algorithm for
opening a safe with three dials is to set the dials on every combination of
numbers and try the lock after each one. Heuristic methods, on the other
hand, are not guaranteed to work, but will often find solutions in much
shorter times than will exhaustive trial and error or other algorithms. For
the example of the safe, one heuristic is to listen for tumblers to drop into
place. Few problems in medicine have algorithmic solutions that are both
practical and valid. Physicians are forced to reason about an illness using
judgmental rules and empirical associations along with definitive truths of
physiology.
MYCINis an expert system (Duda and Shortliffe, 1983). By that
mean that it is an AI program designed (a) to provide expert-level solutions
to complex problems, (b) to be understandable, and (c) to be flexible
enough to accommodate new knowledge easily. Because we have designed
MYCINto provide advice through a consultative dialogue, we sometimes
refer to it as a consultation system.
There are two main parts to an expert system like MYCIN:a knowl-
edge base and an inference mechanism, or engine (Figure l-l). In addition,
there are often subprograms designed to facilitate interaction with users,
4 The Contextof the MYCIN
Experiments

EXPERTSYSTEM

Description User
=~

I
of newcase inter- qp~ Inference
Engine
face
USER

Advice& t
Explanation qp_~ Knowledge
Base [

FIGURE1-1 Majorparts of an expert system. Arrowsindicate


informationflow.

to help build a knowledgebase, to explain a line of reasoning, and so forth.


The knowledge base is the programs store of facts and associations it
"knows" about a subject area such as medicine. A critical design decision
is how such knowledge is to be represented within the program. There are
many choices, in general. For MYCIN,we chose to represent knowledge
mostly as conditional statements, or rules, of the following form:

IF: There is evidence that A and B are true,


THEN:Conclude there is evidence that C is true.

This form is often abbreviated to one of the following:

If A and B, then C
A& B--*C

Werefer to the antecedent of a rule as the premise or left-hand side (LHS)


and to the consequent as the action or right-hand side (RHS).
The inference mechanism can take many forms. We often speak of
the control structure or control of inference to reflect the [act that there
are different controlling strategies for the system. For example, a set of
rules may be chained together, as in this example:

If A, then B (Rule 1)
If B, then C (Rule 2)
A (Data)
..C (Conclusion)
The Contextof the MYCIN
Experiments 5

This is sometimes called forward chaining, or data-directed inference, be-


cause the data that are known (in this case A) drive the inferences from
left to right in rules, with rules chaining together to deduce a conclusion
(C).
MYCINprimarily uses backward chaining, or a goal-directed control
strategy. The deductive validity of the argument is established in the same
way, but the systems behavior is quite different. In goal-directed reasoning
a system starts with a statement of the goal to achieve and works "back-
ward" through inference rules, i.e., from right to left, to find the data that
establish that goal, for example:

Find out about C (Goal)


If B, then C (Rule 1)
If A, then B (Rule 2)
..If A, then C (Implicit rule)

Question: Is A true? (Data)

Since there are many rule chains and many pieces of data about which the
system needs to inquire, we sometimes say that MYCINis an evidence-
gathering program.
The whole expert system is used to perform a task, in MYCINscase
to provide diagnostic and therapeutic advice about a patient with an in-
fection as described in Section 1.2. Wesometimes refer to the whole system,
shown in Figure 1-1, as the performance system to contrast it with other
subsystems not so directly related to giving advice. MYCIN contains an
explanation subsystem, for example, which explains the reasoning of the
performance system (see Part Six).
Several of the chapters in this book deal with the problems of con-
structing a performance system in the first place. Wehave experimented
with different kinds of software tools that aid in the construction of a new
system, mostly by helping with the formulation and understanding of a
new knowledge base. Werefer to the process of mapping an experts knowl-
edge into a programs knowledge base as knowledge engineering. 1 The in-
tended users of these kinds of tools are either (a) the so-called knowledge
engineers who help an expert formulate and represent domain-specific
knowledge for the performance system or (b) the experts themselves. AI-

1Theterm knowledgeengineeringwas, to the best of our knowledge,coinedby Edward Fei-


genbaumafter Donald
Michiesphraseepistemological
engineering.
Likethe phrasesexpertsystem
and knowledge-based
system,however,it did not comeinto generaluse until about1975.For
morediscussion of expert systems,see Buchananand Duda(1983).
6 The Contextof the MYCIN
Experiments

though either group might also run the performance system to test it,
neither overlaps with the intended routine users of the performance sys-
tem. Our model is that engineers help experts build a system that others
later use to get advice. Elaborating on the previous diagrams, we show this
model in Figure 1-2.

Choice of Programming Language

LISP has been the programming language of choice for AI programs for
nearly two decades (McCarthy et al., 1962). It is a symbol manipulation
language of extreme flexibility based on a small number of simple con-
structs. 2 We are often asked why we chose LISP for work on MYCIN,so
a brief answer is included here. Above all, we needed a language and
programming environment that would allow rapid modification and test-
ing and in which it was easy and natural to separate medical rules in the
knowledge base from the inference procedures that use the rules. LISP is
an interpretive language and thus does not require that programs be re-
compiled after they have been modified in order to test them. Moreover,
LISP removes the distinction between programs and data and thus allows
us to use rules as parts of the program and to examine and edit them as data
structures. The editing and debugging facilities of Interlisp also aided our
research greatly.
Successful AI programs have been written in many languages. Until
recently LISP was considered to be too slow and too large for important
applications. Thus there were reasons to consider other languages. But for
a research effort, such as this one, we were much more concerned with
saving days during program development than with saving seconds at run
time. Weneeded the flexibility that LISP offered. WhenInterlisp became
available, we began using it because it promised still more convenience
than other versions. Nowthat additional tools, such as EMYCIN,have been
built on top of Interlisp, more savings can be realized by building new
systems using those tools (when appropriate) than by building from the
base-level LISP system. At the time we began work on MYCIN,however,
we had no choice.

1.1 Historical Perspective on MYCIN

As best as we can tell, production rules were brought into artificial intel-
ligence (AI) by Allen Newell, who had seen their power and simplicity
demonstrated in Robert Floyds work on formal languages and compilers

)See Winstonand Horn( 1981),Charniaket al. (1980),andAllen(1978)for moreinformation


aboutthe languageitself.
~t

0
e-
X v
LU:
I-
UA
O.

J~
o
8 The Context of the MYCINExperiments

(Floyd, 1961) at Carnegie-Mellon University. Newell saw ih production


systems an elegant formalism for psychological modeling, a theme still
pursued at Carnegie-Mellon University and elsewhere. Through conver-
sations between Newell and himself at Stanford in the 1960s (see Newell,
1966), Edward Feigenbaum began advocating the use of production rules
to encode domain-specific knowledge in DENDRAL.Don Waterman
picked up on the suggestion, but decided to work with rules and heuristics
of the game of poker (Waterman, 1970) rather than of mass spectrometry.
His success, and Feigenbaums continued advocacy, led to recoding much
of DENDRAUs knowledge into rules (Lindsay et al., 1980).
The DENDRAL program was the first AI program to emphasize the
power of specialized knowledge over generalized problem-solving methods
(see Feigenbaum et al., 1971). It was started in the mid-1960s by Joshua
Lederberg and Feigenbaum as an investigation of the use of AI techniques
for hypothesis formation. It constructed explanations of empirical data in
organic chemistry, specifically, explanations of analytic data about the mo-
lecular structure of an unknownorganic chemical compound."~ By the mid-
1970s there were several large programs, collectively called DENDRAL,
which interacted to help organic chemists elucidate molecular structures.
The programs are knowledge-intensive; that is, they require very special-
ized knowledge of chemistry in order to produce plausible explanations of
the data. Thus a major concern in research on DENDRAL was how to
represent specialized knowledge of a domain like chemistry so that a com-
puter program could use it for complex problem solving.
MYCINwas an outgrowth of DENDRAL in the sense that many of
the lessons learned in the construction of DENDRAL were used in the
design and implementation of MYCIN. Foremost among these was the
newfound power of production rules, as discussed in Chapter 2. The senior
members of the DENDRAL team, Lederberg and Feigenhaum, had con-
vinced themselves and Bruce Buchanan that the AI ideas that made DEN-
DRALwork could be applied to a problem of medical import. At about
that time, Edward Shortliffe had just discovered AI as a medical student
enrolled in a Computer Science Department course entitled "Models of
Thought Processes," taught at the time by Jerome Feldman. Also, Stanley
Cohen, then Chief of Clinical Pharmacology at the Stanford University
Medical School, had been working on a medical computing project, the
MEDIPHOR drug interaction warning system (Cohen et al., 1974). He had
sought Buchanans involvement and had also just accepted Shortliffe as a
research assistant on the project. In addition, the late George Forsythe,
then Chairman of the Computer Science Department, was strongly sup-
portive of this kind of interdisciplinary research project and encouraged

3Even more specifically, the data abom the unknown compound were data from a mass
spectrometer, an instrument that bombards a small sample uf a compound with high-energy
electrons and produces data on the resulting fragments.
Historical Perspective on MYCIN 9

Shortliffe in his efforts to obtain formal training in the field. Thus the
scene was set for a collaborative effort involving Cohen, Buchanan, and
Shortliffe--an effort that ultimately grew into Shortliffes dissertation.
After six months of collaborative effort on MEDIPHOR, our discus-
sions began to focus on a computer program that would monitor physi-
cians prescriptions for antibiotics and generate warnings on inappropriate
prescriptions in the same way that MEDIPHOR produced warnings re-
garding potential drug-drug interactions. Such a program would have
needed to access data bases on three Stanford computers: the pharmacy,
clinical laboratory, and bacteriology systems. It would also have required
considerable knowledge about the general and specific conditions that
make one antibiotic, or combination of antibiotics, a better choice than
another. Cohen interested Thomas Merigan, Chief of the Infectious Dis-
ease Division at Stanford, in lending both his expertise and that of Stanton
Axline, a physician in his division. In discussing this new kind of monitor-
ing system, however, we quickly realized that it would require much more
medical knowledge than had been the case for MEDIPHOR.Before a
system could monitor for inappropriate therapeutic decisions, it would
need to be an "expert" in the field of antimicrobial selection. Thus, with
minor modifications for direct data entry from a terminal rather than from
patient data bases, a monitoring system could be modified to provide con-
sultations to physicians. Another appeal of focusing on an interactive sys-
tem was that it provided us with a short-term means to avoid the difficulty
of linking three computers together to provide data to a monitoring sys-
tem. Thus our concept of a computer-based consultant was born, and we
began to model MYCINafter infectious disease consultants. This model
also conformed with Cohens strong belief that a computer-based aid for
medical decision making should suggest therapy as well as diagnosis.
Shortliffe synthesized medical knowledge from Cohen and Axline and
AI ideas from Buchanan and Cordell Green. Green suggested using In-
terlisp (then known as BBN-LISP), which was running at SRI International
(then Stanford Research Institute) but was not yet available at the univer-
sity. Conversations with him also led to the idea of using Carbonells pro-
gram, SCHOLAR(Carbonell, 1970a), as a model for MYCIN. SCHOLAR
represented facts about the geography of South America in a large se-
mantic network and answered questions by making inferences over the
net. However, this model was not well enough developed for us to see how
a long dialogue with a physician could be focused on one lin e of reasoning
at a time. Wealso found it difficult to construct semantic networks for the
ill-structured knowledgeof infectious disease. Weturned instead to a rule-
based approach that Cohen and Axline found easier to understand, par-
ticularly because chained rules led to lines of reasoning that they could
understand and critique.
One important reason for the success of our early efforts was Short-
liffes ability to provide quickly a working prototype program that would
show Cohen and Axline the consequences of the rules they had stated at
10 The Contextof the MYCIN
Experiments

each meeting. The modularity of the rules was an important benefit in


providing rapid feedback on changes. Focusing early on a working pro-
gram not only kept the experts interested but also allowed us to design the
emerging program in response to real problems instead of trying to imag-
ine the shape of the problems entirely in advance of their manifestations
in context.
Green recommendedhiring Carli Scott as our first full-time employee,
and the MYCINresearch began to take shape as a coordinated project.
Axline subsequently enlisted help from infectious disease fellows to com-
plement the expertise of" Cohens clinical pharmacology fellow. Graduate
students from the Computer Science Department were also attracted to
the work, partly because of" its social relevance and partly because it was
new and exciting. Randall Davis, for example, had been working on vision
understanding at the Stanford AI Lab and had been accepted for medical
school when he heard about MYCINand decided to invest his research
talents with us.
In our first grant application (October, 1973), we described the goals
of the project.

For the past year and a half the Divisions of Clinical Pharmacologyand
Infectious Disease plus membersof the Department of ComputerScience
have collaborated on initial developmentof a computer-basedsystem(termed
MYCIN) that will be capable of using both clinical data and judgmental de-
cisions regarding infectious disease therapy. The proposed research involves
developmentand acceptable implementation of the following:
A. CONSULTATION PROGRAM. The central component of the MY-
CINsystem is an interactive computer program to provide physicians with
consultative advice regarding an appropriate choice of antimicrobial therapy
as determined from data available from the microbiology and clinical chem-
istry laboratories and from direct clinical observations entered by the physi-
cian in response to computer-generatedquestions;
B. INTERACTIVEEXPLANATION CAPABILITIES. Another impor-
tant componentof the system permits the consultation programto explain
its knowledgeof infectious disease therapy and to.justify specific therapeutic
recommendations;
C. COMPUTER ACQUISITION OF JUDGMENTALKNOWLEDGE.
The third aspect of this workseeks to permit experts in the field of infectious
disease therapy to teach the MYCIN system the therapeutic decision rules
that they find useful in their clinical practice.

The submission of" our initial grant application encouraged us to choose a


name for the project on which we had already been working for two years.
After [ailing to find a suitable acronym, we selected the name MYCIN at
Axlines suggestion. This name is simply the commonsuffix associated with
many antimicrobial agents.
Although we were aiming at a program that would help physicians,
we also realized that there were many computer science problems with
Historical Perspective on MYCIN 11

1960S

J
1970S
CONGEN

Meta-DENDRAL
SU/X

I I
(QA)(Inference)
I
(Evaluation)

I TEIRESIASI~ EMYCIN
I
BAOBAB

I GUIDON
I "FFI I I
I SACONI :l I CENTAUR
I

~, GRAVIDA

1980S
1
NEOMYCIN ONCOCIN
WHEEZE CLOT

DART

J
FIGURE1-3 HPP programs relating to MYCIN.(Program
namesin boxes were Ph.D. dissertation research programs.)

which we had to grapple. No other AI program, including DENDRAL,


had been built using so much domain-specific knowledge so clearly sepa-
rated from the inference procedures.
A schematic review of the history of the work on MYCIN and related
projects is shown in Figure 1-3. MYCIN was one of several projects in the
Stanford Heuristic Programming Project (HPP); others were DENDRAL,
CONGEN, Meta-DENDRAL, and SU/X. 4 There was much interaction

tLater renamed IqASI)/SIAP (Nii and Feigenbaum, 1978; Nii et al., 1982).
12 The Contextof the MYCIN
Experiments

amongthe individuals working in HPPthat is not shown in this simplified


diagram, of course. Within the MYCINproject individuals were working
on several nearly separable subprojects, some of which are shown: Ques-
tion Answering (QA), Inference (including certainty factors, or CFs,
the therapy recommendation code), Explanation, Evaluation, and Knowl-
edge Acquisition. These subprojects formed the basis of several of the
experiments reported in this volume. All were well-fbcused projects since
we were undertaking them partly to improve the knowledge base and the
performance of MYCIN.Figure 1-3 shows roughly the chronology of
work; however, in the organization of this book chronology is not empha-
sized.

Ancient History

Jaynes (1976) refers to a collection of 20,000-30,000 Babylonian tablets,


about 20% of which contain sets of production rules ("omens") for gov-
erning everyday affairs. 5 These were already written and catalogued by
about 650 B.c. He describes the form of each entry as "an if-clause or
protasis followed by a then-clause or apodosis." For example,

"If a horse enters a mans house and bites either an ass or a man,
the owner of the house will die and his household will be scattered."
"If a manunwittingly treads on a lizard and kills it,
he will prevail over his adversary."

Included in these are medical rules, correlating symptomswith prog-


noses. According to one of Jaynes sources (Wilson, 1956; 1962), these
tablets of scientific teachings were catalogued by subject matter around 700
B.C. Amongthe left-hand sides quoted from the medical tablets are the
following (Wilson, 1956):

"If, after a days illness, he begins to suffer from headache ..."


"If, at the onset of his illness, he had prickly heat..."
"If he is hot (in one place) and cold (in another) .
"If the affected area is clammywith sweat..."

Each clause is catalogued as appearing in 60-150 entries on the tablets.


One right-hand side for the medical rules cited by Wilson is the following:

"... he will die suddenly."

5Weare indebtedto JamesBennettfor pointingout this reference.


MYCINs
TaskDomain--Antimicrobial
Selection 13

Thus we see that large collections of simple rules were used for medical
diagnosis long before MYCINand that some thought had been given to
6the organization of the knowledge base.

MYCINs Task DomainmAntimicrobial


1.2 Selection

Because a basic understanding of MYCINstask domain is important for


understanding much of what follows, we include here a brief description
7of infectious disease diagnosis and therapy.

1.2.1 The Nature of the Decision Problem

An antimicrobial agent is any drug designed to kill bacteria or to arrest


their growth. Thus the selection of antimicrobial therapy refers to the
problem of" choosing an agent (or combination of agents) for use in treating
a patient with a bacterial infection. The terms antimicrobial and antibiotic
are often used interchangeably, even though the latter actually refers to
any one of" a number of drugs that are isolated as naturally occurring
products of bacteria or fungi. Thus the well-known penicillin mold is the
source of" an antibiotic, penicillin, that is used as an antimicrobial. Some
antibiotics are too toxic for use in treating infectious diseases but are still
used in research laboratories (e.g., dactinomycin) or in cancer chemother-
apy (e.g., daunomycin). Furthermore, some antimicrobials (such as the sul-
fonamides) are synthetic drugs and are therefore not antibiotics. There
are also semisynthetic antibiotics (e.g., methicillin) that are produced
chemical laboratories by manipulating a naturally occurring antibiotic mol-
ecule. In writing about MYCIN we have tended not to rely on this formal
distinction between antimicrobial and antibiotic and have used the terms
as though they were synonymous.
Antimicrobial selection wouldbe a trivial problem if there were a single
nontoxic agent effective against all bacteria capable of causing humandis-
ease. However, drugs that are highly useful against certain organisms are
often not the most effective against others. The identity (genus) of the
organism causing an infection is therefore an important clue for deciding

~iThefact that the rules on the tablets werethemselvesindexedby premiseclauses would


suggestthat they wereusedin data-directedfashion.Yetthe globalorganizationof rules on
tablets wasbysuk~iectmatter,so that medicalrules weretogether, house-building rules to-
gether, and so on. This"big switch"organizationof the knowledge baseis an early instance
of using rule groupsto focus the attention of the problemsolver, a pressingproblem,espe-
cially ill large, data-directedsystemssuchas the Babylonianomens.
7Thissectionis basedon a similardiscussionbyShortliffe (1974).
14 The Contextof the MYCIN
Experiments

what drugs are apt to be beneficial for the patient. Initially, MYCIN did
not consider infections caused by viruses or pathogenic fungi, but since
these other kinds of organisms are particularly significant as causes of
meningitis, they were later added when we began to work with that do-
main.
Selection of therapy is a four-part decision process. First, the physician
must decide whether or not the patient has a significant infection requiring
treatment. If there is significant disease, the organism must be identified
or the range of possible identities must be inferred. The third step is to
select a set of drugs that may be appropriate. Finally, the most appropriate
drug or combination of drugs must be selected from the list of possibilities.
Each step in this decision process is described below.

Is the Infection Significant?

The human body is normally populated by a wide variety of bacteria.


Organisms can invariably be cultured from samples taken from a patients
skin, throat, or stool. These normal flora are not associated with disease in
most patients and are, in fact, often important to the bodys homeostatic
balance. The isolation of bacteria from a patient is therefore not presump-
tive evidence of significant infectious disease.
Another complication is the possibility that samples obtained from
normally sterile sites (such as the blood, cerebrospinal fluid, or urinary
tract) will be contaminated with external organisms either during the col-
lection process itself or in the microbiology laboratory where the cultures
are grown. It is therefore often wise to obtain several samples and to see
how many contain organisms that may be associated with significant dis-
ease.
Because the patient does have a normal bacterial flora and contami-
nation of cultures may occur, determination of the significance of an in-
fection is usually based on clinical criteria. Does the patient have a fever?
Is he or she coughing up sputum filled with bacteria? Does the patient
have skin or blood findings suggestive of serious infection? Is his or her
chest x-ray normal? Does the patient have pain or inflammation? These
and similar questions allow the physician to judge the seriousness of the
patients condition and often demonstrate why the possibility of infection
was considered in the first place.

What Is the Organisms Identity?

There are several laboratory tests that allow an organism to be identified.


The physician first obtains a sample from the site of suspected infection
(e.g., a blood sample, an aspirate from an abscess, a throat swabbing, or
urine specimen) and sends it to the microbiology laboratory for culture.
MYCINs
TaskDomain--Antimicrobial
Selection 15

There the technicians first attempt to grow organisms from the sample on
an appropriate nutritional medium. Early evidence of growth may allow
them to report the morphological and staining characteristics of the or-
ganism. However, complete testing of the organism to determine a definite
identity usually requires 24-48 hours or more.
The problem with this identification process is that the patient may be
so ill at tile time whenthe culture is first obtained that the physician cannot
wait two days before beginning antimicrobial therapy. Early data regarding
the organisms staining characteristics, morphology, growth conformation,
and ability to grow with or without oxygen may therefore become crucially
important for narrowing down the range of possible identities. Further-
more, historical infbrmation about the patient and details regarding his or
her clinical status may provide additional useful clues as to the organisms
identity.

What Are the Potentially Useful Drugs?

Even once the identity of an organism is known with certainty, its range
of antimicrobial sensitivities may be unknown. For example, although a
Pseudomonas is usually sensitive to gentamicin, an increasing number of
gentamicin-resistant Pseudomonaeare being isolated. For this reason the
microbiology technicians will often run in vitro sensitivity tests on an or-
ganism they are growing, exposing the bacterium to several commonly
used antimicrobial agents. This sensitivity information is reported to the
physician so that he or she will know those drugs that are likely to be
effective in vivo (i.e., in the patient).
Sensitivity data do not become available until one or two days after
the culture is obtained, however. The physician must therefore often select
a drug on the basis of the list of possible identities plus the antimicrobial
agents that are statistically likely to be effective against each of the ident-
ities. These statistical data are available from manyhospital laboratories
(e.g., 82%of E. coli isolated at Stanford Hospital are sensitive in vitro to
gentamicin), although, in practice, physicians seldom use the probabilistic
information except in a rather intuitive sense (e.g., "Most of the E. coli
infections I have treated recently have responded to gentamicin.").

Which Drug Is Best for This Patient?

Once a list of drugs that may be useful has been considered, the best
regimen is selected on the basis of a variety of factors. These include the
likelihood that the drug will be effective against the organism, as well as a
number of clinical considerations. For example, it is important to know
whether or not the patient has any drug allergies and whether or not the
drug is contraindicated because of age, sex, or kidney status. If the patient
16 The Contextof the MYCIN
Experiments

has meningitis or brain involvement, whether or not the drug crosses the
blood-brain barrier is an important question. Since some drugs can be
given only orally, intravenously (IV), or intramuscularly (IM), the desired
route of administration may become an important consideration. The se-
verity of the patients disease mayalso be important, particularly for those
drugs whose use is restricted on ecological grounds or which are particu-
larly likely to cause toxic complications. Furthermore, as the patients clin-
ical status varies over time and more definitive information becomes avail-
able from the microbiology laboratory, it may be wise to change the drug
of choice or to modify the recommended dosage.

1.2.2 Evidence That Assistance Is Needed

The "antimicrobial revolution" began with the introduction of the sulfon-


amides in the 1930s and penicillin in 1943. The beneficial effects that these
and subsequent drugs have had on humanity cannot be overstated. How-
ever, as early as the 1950s it became clear that antibiotics were being mis-
used. A study of office practice involving 87 general practitioners (Peterson
et al., 1956) revealed that antibiotics were given indiscriminately to all pa-
tients with upper respiratory infections by 67% of the physicians, while
only 33%ever tried to separate viral from bacterial etiologies. Despite
attempts to educate physicians regarding this kind of inappropriate ther-
apy, similar data have continued to be reported (Kunin, 1973).
At the time we began work on MYCIN,antibiotic misuse was receiving
wide attention (Scheckler and Bennett, 1970; Roberts and Visconti, 1972;
Kunin, 1973; Simmons and Stolley, 1974; Carden, 1974). The studies
showed that very few physicians go through the methodical decision pro-
cess that was described above. In the outpatient environment antibiotics
are often prescribed without the physicians having identified or even cul-
tured the offending organism (Kunin, 1973). In 1972 the FDAcertified
enough (2,400,000 kg) of the commonlyused antibiotics to treat two ill-
nesses of average duration in every man, woman,and child in the country.
Yet it has been estimated that the average person has an illness requiring
antibiotic treatment no more often than once every five to ten years (Kunin,
1973). Part of the reason for such overprescribing is the patients demand
for some kind of prescription with every office visit (Muller, 1972). It
difficult for manyphysicians to resist such demands; thus improved public
education is one step toward lessening the problem.
However, antibiotic use is widespread amonghospitalized patients as
well. Studies have shown that, on any given day, one-third of the patients
in a general hospital are receiving at least one systemic antimicrobial agent
(Roberts and Visconti, 1972; Scheckler and Bennett, 1970; Resztak and
Williams, 1972). The monetary cost to both patients and hospitals is enor-
mous (Reimann and Dambola, 1966; Kunin, 1973). Simmons and Stolley
(1974) have summarized the issues as follows:
MYCINs
TaskDomain--Antimicrobial
Selection 17

1. Has the wide use of antibiotics led to the emergence of new resistant
bacterial strains?
2. Has the ecology of "natural" or "hospital" bacterial flora been shifted
because of antibiotic use?
3. Have nosocomial (i.e., hospital-acquired) infections changed in inci-
dence or severity due to antibiotic use?
4. What are the trends of antibiotic use?
5. Are antibiotics properly used in practice?
Is there evidence that prophylactic use of antibiotics is harmful, and
how commonis it?
Are antibiotics often prescribed without prior bacterial culture?
Whencultures are taken, is the appropriate antibiotic usually pre-
scribed and correctly used?
6. Is the increasingly more frequent use of antibiotics presenting the med-
ical communityand the public with a new set of hazards that should be
approached by some new administrative or educational measures?

Having stated the issues, these authors proceed to cite evidence that in-
dicates that each of these questions has frightening answers--that the ef-
fects of antibiotic misuse are so far-reaching that the consequences may
often be worse than the disease (real or imagined) being treated!
Our principal concern has been with the fifth question: are physicians
rational in their prescribing habits and, if not, why not? Roberts and Vis-
conti examined these issues in 1,035 patients consecutively admitted to a
500-bed community hospital (Roberts and Visconti, 1972). Of 340 patients
receiving systemic antimicrobials, only 35%were treated for infection. The
rest received either prophylactic therapy (55%) or treatment for symptoms
without verified infection (10%). A panel of expert physicians and phar-
macists evaluated these therapeutic decisions, and only 13% were judged
to be rational, while 66%were assessed as clearly irrational. The remainder
were said to be questionable.
Of particular interest were the reasons why therapy was judged to be
irrational in those patients for whomsome kind of antimicrobial therapy
was warranted. This group consisted of 112 patients, or 50.2% of the 223
patients who were treated irrationally. It is instructive to list the reasons
that were cited, along with the percentages indicating how many of the
112 patients were involved:

Antimicrobial contraindicated in patient 7.1%


Patient allergic 2.7
Inappropriate sequence of antimicrobials 26.8
Inappropriate combination of antimicrobials 24.1
Inappropriate antimicrobial used to treat condition 62.5
Inappropriate dose 18.7
18 The Contextof the MYCIN
Experiments

Inappropriate duration of therapy 9.8


Inappropriate route 3.6
Culture and sensitivity needed 17.0
Culture and sensitivity indicate wrong antibiotic being used 16.1

The percentages add up to more than 100% because a given therapy may
have been judged inappropriate for more than one reason. Thus 62.5%
of the 112 patients who required antimicrobial therapy but were treated
irrationally were given a drug that was inappropriate for their clinical con-
dition. This observation reflects the need for improved therapy selection
tor patients requiring therapy--precisely the decision task that MYCIN
was designed to assist.
Once a need for improved continuing medical education in antimi-
crobial selection was recognized, there were several valid ways to respond.
One was to offer appropriate post-graduate courses for physicians. An-
other was to introduce surveillance systems for the monitoring and ap-
proval of antibiotic prescriptions within hospitals (Edwards, 1968; Kunin,
1973). In addition, physicians were encouraged to seek consultations with
infectious disease experts when they were uncertain how best to proceed
with the treatment of a bacterial infection. Finally, we concluded that an
automated consultation system that could substitute for infectious disease
experts when they are unavailable or inaccessible could provide a valuable
partial solution to the therapy selection problem. MYCINwas conceived
and developed in an attempt to fill that need.

1.3 Organization of the Book

This volume is organized into twelve parts of two to four chapters, each
highlighting a fundamental theme in the development and evolution of
MYCIN.This introductory part closes with a classic review paper that
outlines the production rule methodology.
The design and implementation of MYCINare discussed in Part Two.
Shortliffes thesis was the beginning, but the original system he developed
was modified as required.
In Part Three we focus on the problems of building a knowledge base
and on knowledge acquisition in general. TEIRESIAS,the program result-
ing from RandyDavis dissertation research, is described.
In Part Four we address the problems of reasoning under uncertainty.
The certainty factor model, one answer to the question of how to propagate
uncertainty in an inference mechanism, forms the basis of this part.
Part Five discusses the generality of the MYCINformalism. The EMY-
CIN system, written largely by William van Melle as part of his dissertation
Organization
of the Book 19

work, is a strongly positive answer to the question of whether MYCIN could


be generalized.
Workon explanation is reviewed in Part Six. Explanation was a major
design requirement from the start, and many persons contributed to MY-
CINs explanation capabilities.
In Part Seven we discuss some of the experimentation we were doing
with alternative representations. Jan Aikins thesis work on CENTAUR
examined the advantages of combining frames and production rules. Larry
Fagans work on VMexamined the augmentations to a production rule
system that are needed to reason effectively with data monitored over time.
As an outgrowth of the explanation work, we came to believe that
MYCINhad some pedagogical value to students trying to learn about
infectious disease diagnosis and therapy. William Clancey took this idea
one step further in his research on the GUIDON system, described in Part
Eight. GUIDON is an intelligent tutor that we initially believed could tutor
students about the contents of any knowledge base for an EMYCIN system.
There is now strong evidence that this hypothesis was false because more
knowledge is needed for tutoring than for advising.
In Part Nine we discuss the concept of meta-level knowledge, some of
which we found to be necessary for intelligent tutoring. Wefirst examined
rules of strategy and control, called meta-rules, in the context of the TEI-
RESIAS program. One working hypothesis was that meta-rules could be
encoded as production rules similar to those at the object level (medical
rules) and that the same inference and explanation routines could work
with them as well.
From the start of the project, we had been concerned about perfor-
mance evaluation, as described in Part Ten. Weundertook three different
evaluation experiments, each simpler and more realistic but somewhat
more limited than the last.
Another primary design consideration was human engineering, the
subject of Part Eleven. Weknew that a useful system had to be well enough
engineered to make people want to use it; high performance alone was
not sufficient. The chapters in this part discuss experiments with both
natural language interfaces and customized hardware and system archi-
tectures.
Finally, in Part Twelve, we attempt to summarize the lessons about
rule-based expert systems that we have learned in nearly a decade of re-
search on the programs named in Figure 1-3. Webelieve that AI is largely
an experimental science in which ideas are tested in working programs.
Although there are many experiments we neglected to perform, we believe
the descriptions of several that we did undertake will allow others to build
on our experience and to compare their results with ours.
2
The Origin of Rule-Based
Systems in AI

Randall Davis and Jonathan J. King

Since production systems (PSs) were first proposed by Post (1943)


general computational mechanism, the methodology has seen a great deal
of development and has been applied to a diverse collection of problems.
Despite the wide scope of goals and perspectives demonstrated by the
various systems, there appear to be many recurrent themes. Wepresent
an analysis and overview of those themes, as well as a conceptual frame-
work by which many of the seemingly disparate efforts can be viewed, both
in relation to each other and to other methodologies. Accordingly, we use
the term production system in a broad sense and show how most systems that
have used the term can be fit into the framework. The comparison to other
methodologies is intended to provide a view of PS characteristics in a
broader context, with primary reference to procedurally based techniques,
but also with reference to more recent developments in programming and
the organization of data and knowledge bases.
This chapter begins by offering a review of the essential structure and
function of a PS, presenting a picture of a "pure" PS to provide a basis for
subsequent elaborations. Current views of PSs fall into two distinct classes,
and we shall demonstrate that this dichotomy may explain much of the
existing variation in goals and methods. This is followed by some specu-
lations on the nature of appropriate and inappropriate problem domains
for PSs i.e., what is it about a problem that makes the PS methodology
appropriate, and howdo these factors arise out of the systems basic struc-
ture and function? Next, we review characteristics commonto all systems,
explaining how they contribute to the basic character and noting their

This chapter is based on an article taken with permission from Machine Intelligence 8: Machine
Representations of Knowledge, edited by E. W. Elcock and D. Michie, published in 1977 by ELlis
Horwood Ltd., Chichester, England.
"Pure"ProductionSystems 21

interrelationships. Finally, we present a taxonomyfor PSs, selecting four


dimensions of" characterization and indicating the range of possibilities
suggested by recent efforts.
Two points of methodology should be noted. First, we make frequent
reference to what is "typically" found, and what is "in the spirit of things."
Since there is really no one formal design for PSs and recent implemen-
tations have explored variations on virtually every aspect, their use becomes
more an issue of a programmingstyle than of anything else. It is difficult
to exclude designs or methods on formal grounds, and we refer instead
to an informal but well-established style of approach. A second, related
point is important to keep in mind as we compare the capabilities of PSs
with those of other approaches. Since it is possible to imagine coding any
given Turing machine in either procedural or PS terms [see Anderson,
(1976) for a formal proof of the latter], in the formal sense their compu-
tational power is equivalent. This suggests that, given sufficient effort, they
are ultimately capable of solving the same problems. The issues we wish
to examine are not, however, questions of absolute computational power
but of the impact of a particular methodology on program structure, as
well as of the relative ease or difficulty with which certain capabilities can
be achieved.

.1 "Pure" Production Systems

A production system may be viewed as consisting of three basic compo-


nents: a set of rules, a data base, and an interpreter for the rules. In the
simplest design a rule is an ordered pair of symbol strings, with a left-hand
side and a right-hand side (LHS and RHS). The rule set has a predeter-
mined, total ordering, and the data base is simply a collection of symbols.
The interpreter in this simple design operates by scanning the LHSof
each rule until one is found that can be successfully matched against the
data base. At that point the symbols matched in the data base are replaced
with those found in the RHSof the rule and scanning either continues
with the next rule or begins again with the first. A rule can also be viewed
as a simple conditional statement, and the invocation of rules as a sequence
of actions chained by modus ponens.

2.1.1 Rules

More generally, one side of a rule is evaluated with reference to the data
base, and if this succeeds (i.e., evaluates to TRUE in somesense), the action
specified by the other side is performed. Note that evaluate is typically taken
22 TheOriginof Rule-BasedSystemsin AI

to mean a passive operation of "perception," or "an operation involving


only matching and detection" (Newell and Simon, 1972), while the action
is generally one or more conceptually primitive operations (although more
complex constructs are also being examined; see Section 2.4.9). As noted,
the simplest evaluation is a matching of literals, and the simplest action, a
replacement.
Note that we do not specify which side is to be matched, since either
is possible. For example, given a grammar written in production rule
1form,

S~ABA
A-,A1
A~I
B~B0
B~0

matching the LHSon a data base that consists of the start symbol S gives
a generator for strings in the language. Matching on the RHSof the same
set of rules gives a recognizer for the language. We can also vary the
methodology slightly to obtain a top-down recognizer by interpreting ele-
ments of the LHSas goals to be obtained by the successful matching of
elements from the RHS. In this case the rules "unwind." Thus we can use
the same set of rules in several ways. Note, however, that in doing so we
obtain quite different systems, with characteristically different control
structures and behavior.
The organization and accessing of the rule set is also an important
issue. The simplest scheme is the fixed, total ordering already mentioned,
but elaborations quickly grow more complex. The term conflict resolution
has been used to describe the process of selecting a rule. These issues of
rule evaluation and organization are explored in more detail below.

2.1.2 Data Base

In the simplest production system the data base is simply a collection of


symbols intended to reflect the state of the world, but the interpretation
of those symbols depends in large part on the nature of the application.
For those systems intended to explore symbol-processing aspects of human
cognition, the data base is interpreted as modeling the contents of some
memory mechanism (typically short-term memory, STM), with each symbol
representing some "chunk" of knowledge; hence its total length (typically
around seven elements) and organization (linear, hierarchical, etc.) are

lOneclass of productionsystemswewill not addressat any length is that of grammars [br


formallanguages.Whilethe intellectual roots are similar (Floyd,1961;Evans,1964),their
use has evolveda distinctly different flavor. In particular, their nondeterminism
is an impor-
tant factor that providesa different perspectiveon controlandrendersthe questionof rule
selectiona mootpoint.
"Pure" Production Systems 23

portant theoretical issues. Typical contents of STMfor psychological


models are those of PSG (Newell, 1973), where STMmight contain purely
content-free symbols such as:
QQ
(EE FF)
TT

or of VIS (Moran, 1973a), where STMcontains symbols representing di-


rections on a visualized map:
(NEWC-1 CORNER WESTL-1 NORTH
L-2)
(L-2 LINEEASTP-2 P-l)
(HEARNORTHEAST% END)

For systems intended to be knowledge-based experts, the data base


contains facts and assertions about the world, is typically of arbitrary size,
and has no a priori constraints on the complexity of organization. For ex-
ample, the MYCINsystem uses a collection of quadruples, consisting of
an associative triple and a certainty [actor (CF), which indicates (on a scale
from -1 to 1) how strongly the fact has been confirmed (CF > 0)
disconfirmed (CF < 0):
(IDENTITYORGANISM-1
E.COLI.8)
(SITE CULTURE-2
BLOOD1.0)
(SENSITIVEORGANISM-1
PENICILLIN-1.0)

As another example, in the DENDRAL system (Feigenbaum et al., 1971;


Lindsay et al., 1980) the data base contains complex graph structures that
represent molecules and molecular fragments.
A third style of organization for the data base is the "token stream"
approach used, for example, in LISP70 (Tesler et al., 1973). Here the data
base is a linear stream of tokens, accessible only in sequence. Each pro-
duction in turn is matched against the beginning of the stream (i.e., if the
first character of a production and the first character of the stream differ,
the whole match fails), and if the rule is invoked, it may act to add, delete,
or modify characters in the matched segment. The anchoring of the match
at the first token offers the possibility of great efficiency in rule selection
since the productions can be "compiled" into a decision tree that keys off
sequential tokens from the stream. A very simple example is shown in
Figure 2-1.
Whatever the organization of the data base, one important character-
istic that should be noted is that it is the sole storage mediumfor all state
variables of the system. In particular, unlike procedurally oriented lan-
guages, PSs do not provide for separate storage of control state informa-
tion-there is no separate program counter, pushdownstack, etc.--and all
information to be recorded must go into the single data base. Werefer to
this as unity of data and control store and examine some of its implications
below. This store is, moreover, universally accessible to every rule in the
24 TheOriginof Rule-Based
Systemsin AI

production
set decision
tree

ABC ._, XY A B } 1st char

ACF _, WZ B B } 2ndchar

BBA .., XZ } 3rd char

ACD .., WY

FIGURE 2-1 Production rule and decision tree representa-


tions of a simple system that replaces sequencesof three sym-
bols in the data base with sequencesof two others.

system, so that anything put there is potentially detectable by any rule. We


shall see that both of these points have significant consequences for the
use of the data base as a communication channel.

2.1.3 Interpreter

The interpreter is the source of much of the variation found among dif-
ferent systems, but it maybe seen in the simplest terms as a select-execute
loop in which one rule applicable to the current state of the data base is
chosen and then executed. Its action results in a modified data base, and
the select phase begins again. Given that the selection is often a process of
choosing the first rule that matches the current data base, it is clear why
this cycle is often referred to as a recognize-act, or situation-action, loop. The
range of variations on this theme is explored in Section 2.5.3 on control
cycle architecture.
This alternation between selection and execution is an essential ele-
ment of PS architecture, which is responsible for one of its most funda-
mental characteristics. By choosing each new rule for execution on the
basis of the total contents of the data base, we are effectively performing
a complete reevaluation of the control state of the system at every cycle.
This is distinctly different from procedurally oriented approaches in which
control flow is typically the decision of the process currently executing and
is commonly dependent on only a small fraction of the total number of
state variables. PSs are thus sensitive to any change in the entire environ-
ment, and potentially responsive to such changes within the scope of a
single execution cycle. The price of such responsiveness is, of course, the
computation time required for the reevaluation.
An example of one execution of the recognize-act loop for a greatly
Two Views of Production Systems 25

simplified version of Newells PSGsystem will illustrate some of the fore-


going notions. The production system, called PS.ONE, is assumed for this
example to contain two productions, PD1 and PD2. We indicate this as
follows:
PS.ONE:(PD1 PD2)

PD~: (DD AND(EE) ~


PD2: (XX -~ CCDD)

PD1 says that if the symbol DDand some expression beginning with EE,
i.e., (EE . . .), is found in STM,then insert the symbol BBat the front
STM. PD2 says that if the symbol XXis found in STM, then first insert
the symbol CC, then the symbol DD, at the front of STM.
The initial contents of STMare
STM:(QQ(EE FF) RR SS)

This STMis assumed to have a fixed maximumcapacity of five elements.


As new elements are inserted at the front (left) of STM,therefore, other
elements will be lost (forgotten) off the right end. In addition, elements
accessed when matching the condition of a rule are refreshed (pulled to the
front of STM)rather than replaced.
The
2. production system scans the productions in order: PD1, then PD
Only PD2 matches, so it is evoked. The contents of STMafter this step are
STM: (DDCCXXQQ(EEFF))

PDI will match during the next cycle to yield


STM: (BB DD(EE FF) CC

completing two cycles of the system.

2.2 Two Views of Production Systems

Prior work has suggested that there are two major views of PSs, charac-
terized on one hand by psychological modeling efforts (PSG, PAS II, VIS,
etc.) and on the other by performance-oriented, knowledge-based expert
systems (e.g., MYCIN,DENDRAL). These distinct efforts have arrived
similar methodologies while pursuing differing goals.
The psychological modeling efforts are aimed at creating a program
that embodies a theory of human performance of simple tasks. From the
performance record of experimental human subjects, the modeler for-
mulates the minimally competent set of production rules that is able to
reproduce the behavior. Note that "behavior" here is meant to include all
aspects of human performance (mistakes, the effects of forgetting, etc.),
26 TheOriginof Rule-BasedSystemsin AI

including all sht)rtcomings or successes that may arise out of (and hence
~
may be clues to) the "architecture" of humancognitive systems.
An example of this approach is the PSG system, from which we con-
structed the example above. This system has been used to test a number
of theories to explain the results of the Sternberg memory-scanning tasks
(Newell, 1973), with each set of productions representing a different theory
of how tile humansubject retains and recalls the information given to him
or her during the psychological task. Here the subject first memorizes a
small subset of a class of familiar symbols (e.g., digits) and then attempts
to respt)nd to a symbol flashed on a screen by indicating whether or not it
was in the initial set. His or her response times are noted.
The task was first simulated with a simple production system that per-
formed correctly but did not account for timing variations (which were
due to list length and other factors). Refinements were then developed to
incorporate new hypotheses about how the symbols were brought into
memory, and eventually a good simulation was built around a small num-
ber of productions. Newell has reported (Newell, 1973) that use of a
methodology led in this case to the novel hypothesis that certain timing
effects are caused by a decoding process rather than by a search process.
The experiment also clearly illustrated the possible tradeoffs in speed and
accuracy between differing processing strategies. Thus the PS model was
an effective vehicle for the expression and evaluation of theories of be-
havior.
The performance-oriented expert systems, on the other hand, start
with productions as a representation of knowledge about a task or domain
and attempt to build a program that displays competent behavior in that
domain. These efforts are not concerned with similarities between the re-
sulting systems and human performance (except insofar as the latter may
provide a possible hint about ways to structure the domain or to approach
the problem or may act as a yardstick fbr success, since few AI programs
approach human levels of competence). They are intended simply to per-
fbrm the task without errors of any sort, humanlike or otherwise. This
approach is characterized by the DENDRAL system, in which much of the
development has involved embedding a chemists knowledge about mass
spectrometry into rules usable by the program, without attempting to
model the chemists thinking. The programs knowledge is extended by
adding rules that apply to new classes of chemical compounds. Similarly,
much of the work on the MYCIN system has involved crystallizing informal
knowledgeof" clinical medicine in a set of production rules.
Despite the difference in emphasis, researchers in both fields have

~Forexample,tbe critical evaluationof EPAM mustultimatelydependnot on the interest it


mayhaveas a learningmachine,but on its ability to explainand predictphenomena of verbal
learning (Feigenbaum, 1963). Thesephenomena include stimulus and responsegeneraliza-
tion, oscillation,retroactiveinhibition,andtbrgetting--allof whichare "mistakes"
for a system
intended[or high pertormancebut are importantin a systemmeantto modelhumanlearning
behavior.
TwoViewsof ProductionSystems 27

been drawn to PSs as a methodology. For tile psychological modelers,


production rules offer a clear, formal, and powerful way of expressing
basic syml)ol-processing acts that form the primitives of information-pro-
cessing psychology (of. Newell and Simon, 1972). For the designer
knowledge-based systems, production rules offer a representation of
knowledge that can be accessed and modified with relative ease, making it
quite useftfl for systems designed for incremental approaches to compe-
tence. For example, much of the MYCIN systems capability for explaining
its actions is based on the representation of knowledge as individual pro-
duction rules. This makes the knowledge far more accessible to the pro-
gram itself than it tnight be if it were embodied in the form of ALGOL-
like procedures. As in DENDRAL, the modification and upgrading of the
system occur via incremental modification of, or addition to, the rule set.
Note that we are suggesting that it is possible to view a great deal of
the work on lSs in terms of a unifying formalism. The intent is to offer
a conceptual structure that can help organize what may appear to be a
disparate collection of efforts. The presence of such a formalism should
not, however, obscure the significant differences that arise from the various
perspectives. For example, the decision to use RHS-driven rules in a goal-
directed fashion implies a control structure that is simple and direct but
relatively intlexible. This offers a very different programming tool than
the LHS-driven systems do. The latter are capable of much more complex
control structures, giving them capabilities muchcloser to those of a com-
plete progrannning language. Recent efforts have begun to explore the
issues of more complex, higher-level control within the PS methodology
(see Section 2.4.9).
Production systems are seen by some as more than a convenient par-
adigm for approaching psychological modeling--rather as a methodology
whose power arises out of its close similarity to fundamental mechanisms
of human cognition. Newell and Simon (1972, pp. 803-804, 806) have
argued that human problem-solving behavior can be modeled easily and
successfully by a production systent because it in fact is being generated
by one:

Weconf~ess to a strong premonitionthat the actual organization of hu-


manprogramsclosely resembles the production system organization .... We
cannot yet prove tile correctness of this judgment, and we suspect that the
uhimate verification maydepend on this organizations proving relatively
satisfactory in manydift~erent small ways, no one of themdecisive.
In summary, we do not think a conclusive case can be made yet for
production systems as the appropriate form of [human]program organiza-
tion. Manyof the arguments.., raise difficulties. Nevertheless, our judgment
stands that we should choose production systems as the preferred language
for expressing programs and program organization.

Observations such as this have led to speculation that the interest in pro-
28 TheOriginof Rule-BasedSystemsin AI

duction systems on the part of those building high-perfornlance knowl-


edge-based systems is more than a coincidence. Somesuggest that this is
occurring because current research is (re)discovering what has been
learned by naturally intelligent systems through ew)lutionithat structur-
ing knowledge in a production system formal is an effective approach to
the organization, retrieval, and use of very large amounts of knowledge.
The success of some rule-based AI systems does lend weight to this
argument, and the PS methodology is clearly powerful. But whether or
not this is a result of its equivalence to humancognitive processes and
whether or not this implies that artificially intelligent systems ought to be
similarly structured are still open questions, in our opinion.

2.3 Appropriate and Inappropriate Domains

Program designers have found that PSs easily model problems in some
domains but are awkward for others. Let us briefly investigate why this
maybe so, and relate it to the basic structure and. function of a PS.
Wecan imagine two very different classes of problems--the first is best
viewed and understood as consisting of many independent states, while
the second seems best understood via a concise, unified theory, perhaps
embodied in a single law. Examples of the former include some views of
perceptual psychology or clinical medicine, in which there are many states
relative to the number of actions (this may be due either to our lack of
cohesive theory or to the basic complexity of the system being modeled).
Examples of the latter include well-established areas of physics and math-
ematics, in which a few basic tenets serve to embodymuch of the required
knowledge, and in which the discovery of unifying principles has empha-
sized the similarities in seemingly different states. This first distinction
appears to be one important factor in distinguishing appropriate from
inappropriate domains.
A second distinction concerns the complexity of control flow. At two
extremes, we can imagine two processes, one of which is a set of indepen-
dent actions and the other of which is a complex collection of multiple,
parallel processes inw)lving several dependent subprocesses.
A third distinction concerns the extent to which the knowledge to be
embedded in a system can be separated from the manner in which it is to
be used [also knownas the controversy between declarative and procedural
representations; see Winograd (1975) for an extensive discussion]. As one
example, we can imagine simply stating facts, perhaps in a language like
predicate calculus, without assuming how those facts will be employed.
Alternatively, we could write procedural descriptions of how to accomplish
AppropriateandInappropriateDomains 29

a stated goal. Here the use of the knowledge is for the most part prede-
termined during the process of embodyingit in this representation.
In all three of" these distinctions, a PS is well-suited to the first descrip-
tion and ill-suited to the latter. The existence of multiple, nontrivially dif-
ferent, independent states is an indication of the feasibility of writing mul-
tiple, nontrivial, modular rules. A process composed of a set of
independent actions requires only limited communication between the ac-
tions, and, as we shall see, this is an important characteristic of PSs. The
ability to state what knowledge ought to be in the system without also
describing its use greatly improves the ease with which a PS can be written
(see Section 2.4.9).
For the second class of problems (unified theory, complex control flow,
predetermined use for the knowledge), the economy of the relevant basic
theory makes for either trivial rules or multiple, almost redundant, rules.
In addition, a complex looping and branching process requires explicit
communication between actions, in which one action explicitly invokes the
next, while interacting subgoals require a similarly advanced communica-
tion process to avoid conflict. Such communication is not easily supplied
in a PS-based system. The same difficulty also makes it hard to specify in
advance exactly how a given fact should be used.
It seems also to be the nature of production systems to focus upon the
variations within a domain rather than upon the commonthreads that link
different facts or operations. Thus, for example, the process of addition
is naturally expressed via productions as n2 rewrite operations involving
two symbols (the digits being added). The fact that addition is commuta-
tive, or rather that there is a property of "commutativity" shared by all
operations that we consider to be addition, is a rather awkward one to
express in production system terms. This same characteristic may, con-
versely, be viewed as a capability for focusing on and handling significant
amounts of detail. Thus, where the emphasis of a task is on recognition of
large numbers of distinct states, PSs provide a significant advantage. In a
procedurally oriented approach, it is both difficult to organize and trou-
blesome to update the repeated checking of large numbers of state vari-
ables and the corresponding transfers of control. The task is far easier in
PS terms, where each rule can be viewed as a "demon" awaiting the oc-
:~
currence of a specific state.
The potential sensitivity and responsiveness of PSs, which arise from
their continual reevaluation of" the control state, has also been referred to
as the openness of rule-based systems. It is characterized by the principle
that "any rule can fire at any time," which emphasizes the fact that at any
point in the computation any rule could be the next to be selected, de-
pending only on the state of" the data base at the end of the current cycle.
Compare this to the normal situation in a procedurally oriented language,

:~ln the case of one PS (DENDRAL) the initial, proceduralapproachprovedsufficiently


inflexiblethat the entire systemwasrewrittenin production
rule terms(Lindsayet al., 1980).
30 TheOriginof Rule-Based
Systemsin AI

where such a principle is manifestly untrue: it is simply not typically the


case that, depending on the contents of that data base, any procedure in
the entire program could potentially be the next to be invoked.
We do not mean to imply that both approaches couldnt perform in
both domains, but that there are tasks for which one of them would prove
awkward and the resulting system unenlightening. Such tasks are far more
elegantly accomplished in only one of the two methodologies. The main
point is that we can, to someextent, formalize our intuitive notion of which
approach seems more appropriate by considering two essential character-
istics of any PS: its set of multiple, independent rules and its limited, in-
direct channel of interaction via the data base.

.4 Production System Characteristics

Despite the range of variation in methodologies, there appear to be many


characteristics commonto almost all PSs. It is the presence of these and
their interactions that contribute to the "nature" of a PS, its capabilities,
deficiencies, and characteristic behavior.
The network of Figure 2-2 is a summaryof features and relationships.
Each box represents some feature, capability, or parameter of interest, with
arrows labeled with +s and -s suggesting the interactions between them.
This rough scale of facilitation and inhibition is naturally very crude, but
does indicate the interactions as we see them. Figure 2-2 contains at least
three conceptually distinct sorts of factors: (a) those fundamental charac-
teristics of the basic PS scheme(e.g., indirect, limited channel, constrained
format); (b) secondary effects (e.g., automated modifiability of behavior);
and (c) performance parameters of implementation (e.g., visibility of
havior flow, extensibility), which are helpful in characterizing PS strengths
and weaknesses.

2.4.1 Indirect, Limited Channel of Interaction

Perhaps the most fundamental and significant characteristic of PSs is their


restriction on the interactions between rules. In the simplest model, a pure
PS, we have a completely ordered set of rules, with no interaction channel
other than the data base. The total effect of any rule is determined by its
modifications to the data base, and hence subsequent rules must "read"
there any traces the system may leave behind. Winograd (1975, p. 194)
characterizes this feature in discussing global modularity in programming:
ProductionSystemCharacteristics 31

INTERACTION

1.,_
CHECKING
,I
CONSISTENCY
EXTENSIBILITY SELECTION
ALGORITHM OF
BEHAVIOR
EXPLANATIONS
OFPRIMITIVE
ACTIONS

FIGURE2-2 Basic features and relationships of a production


system. Links labeled with a + indicate a facilitating relation-
ship, while those labeled with a - indicate an inhibiting rela-
tionship.

We can view production systems as a programming language in which


all interaction is forced through a very narrow channel .... The temporal
interaction [of individual productions] is completely determined by the data
in this STM, and a unitbrm ordering regime for deciding which productions
will be activated in cases where more than one might apply .... Of course it
is possible to use the STMto pass arbitrarily complex messages which embody
any degree of interaction we want. But the spirit of the venture is very much
opposed to this, and the formalism is interesting to the degree that complex
processes can be described without resort to such kludgery, maintaining the
clear modularity between the pieces of knowledge and the global process
which uses them.

While this characterization is clearly true for a pure PS, with its limitations
on the size of" STM, we can generalize on it slightly to deal with a broader
class of systems. First, in the more general case, the channel is not so much
32 TheOriginof Rule-BasedSystemsin AI

narrow as indirect and unique. Second, the kludgery 4 arises not from arbi-
trarily complex messages but from specially crafted messages, which force
highly specific, carefully chosen interactions.
With reference to the first point, one of the most fundamental char-
acteristics of the pure PS organization is that rules must interact indirectly
through a single channel. Indirection implies that all interaction must oc-
cur by the effect of modifications written in the data base; uniqueness of
the channel implies that these modifications are accessible to every one of
the rules. Thus, to produce a system with a specified behavior, one must
not think in the usual terms of having one section of" code call another
explicitly, but rather use an indirect approach in which each piece of" code
(i.e., each rule) leaves behind the proper traces to trigger the next relevant
piece. The uniform access to the channel, along with the openness of" PSs,
implies that those traces must be constructed in the light of a potential
response from any rule in the system.
With reference to Winograds second point, in many systems the action
of a single rule may, quite legitimately, result in the addition of very com-
plex structures to the data base (e.g., DENDRAL; see Section 2.5). Yet
another rule in the same system may deposit just one carefully selected
symbol, chosen solely because it will serve as an unmistakable symbol for
precisely one other (carefully preselected) rule. Choosing the symbol care-
fully provides a way of sending what becomes a private message through
a public channel; the continual reevaluation of the control state assures
that the message can take immediate effect. The result is that one rule has
effectively called another, procedure style, and this is the variety of kludg-
ery that is contrary to the style of knowledgeorganization typically asso-
ciated with a PS. It is the premeditated nature of such message passing
(typically in an attempt to "produce a system with specified behavior") that
is the primary violation of the "spirit" of PS methodology.
The primary effect of this indirect, limited interaction is the devel-
opment of a system that is strongly modular, since no rule is ever called
directly. The indirect, limited interaction is also, however, the most signif-
icant factor that makes the behavior of a PS more difficult to analyze. This
results because, even for very simple tasks, overall behavior of a PS may
not be at all evident from a simple review of its rules.
1b illustrate manyof these issues, consider the algorithm for addition
of positive, single-digit integers used by Waterman(1974) with his PAS
production system interpreter. First, the procedural version of the algo-
rithm, in which transfer of control is direct and simple:

add(m,n) ::=
A] count~O;nn~n;
B] L~: if count= mthen return(nn);

4Kludge is a term drawn from die vernacular of computer programmers. It refers to a "patch"
or "trick" in a program or system that deals with a potential problem, usually in an inelegant
or nongeneralized way. Thus kludgery refers to the use of" kludges.
Production System Characteristics 33

C] count~successor(count);
D] nn~successor(nn);
E] go(L1);

Compare this with the set of productions for the same task in Figure 2-3.
The S in Rules 2, 3, and 5 indicates the successor function. After initiali-
zation (Rules 1 and 2), the system loops around Rules 4 and 5 producing
the successor rules it needs (Rule 5) and then incrementing NNby 1 for
Miterations. In this loop, intermediate calculations (the results of successor
function computations) are saved via (PROD)in Rule 5, and the final
swer is saved by (PROD)in Rule 3. Thus, as shown in Figure 2-4, after
computing 4 + 2 the rule set will contain seven additional rules; it is
recording its intermediate and final results by writing new productions and
in the future will have these answers available in a single step. Note that
the set of productions therefore/s memory(and in fact long-term memory,
or LTM, since productions are never lost from the set). The two are not
precisely analogous, since the procedural version does simple addition,
while the production set both adds and "learns." As noted by Waterman
(1974), the production rule version does not assume the existence of
successor function. Instead Rule 5 writes new productions that give the
successor for specific integers. Rule 3 builds what amounts to an addition
table, writing a new production for each example that the system is given.
Placing these new rules at the front of the rule set (i.e., before Rule 1)
means that the addition table and successor function table will always be
consulted before a computation is attempted, and the answer obtained in
one step if possible. Without these extra steps, and with a successor func-
tion, the production rule set could be smaller and hence slightly less com-
plex.
Waterman also points out some direct correspondences between the
production rules in Figure 2-3 and the statements in the procedure above.
For example, Rules 1 and 2 accomplish the initialization of line A, Rule 3
corresponds to line B, and Rule 4 to lines C and D. There is no production
equivalent to the "goto" of line E because the production system execution
cycle takes care of that implicitly. On the other hand, note that in the
procedure there is no question whatsoever that the initialization step
nn *-- n is the second statement of "add" and that it is to be executed just
once, at the beginning of the procedure. In the productions, the same
action is predicated on an unintuitive condition of the STM(essentially it
says that if the value of N is known, but NNhas never been referenced or
incremented, then initialize NNto the value that N has at that time). This
degree of explicitness is necessary because the production system has no
notion that the initialization step has already been performed in the given
ordering of statements, so the system must check the conditions each time
it goes through a new cycle.
Thus procedural languages are oriented toward the explicit handling
of control flow and stress the importance of its influence on the funda-
mental organization of the program (as, for example, in recent develop-
34 The Origin of Rule-BasedSystemsin AI

Production Rules:

Condition( L H S Action (RHS)


11 (READY)
(ORDER
X,) (REP(READY) (COUNT
(ATTEND)
2] (NX,) -(NN)NN) (DEP(NNXl))
3] (COUNT Xl) (M Xl) (NNX2) (SAYX2 IS THEANSWER)
(COND(M Xl) (N
(ACTION (STOP))
(ACTION (SAYX2 IS THEANSWER))
(PROD)
(STOP)
4] (COUNT)
(NN) (REP(COUNT) (S COUNT))
(REP(NN)(S
5] (ORDER
Xl X2) (REP
(X~X2)(X2))
(COND(S X3Xl))
(ACTION(REP (S X3X~)(X3
(PROD)

Initial STM:

READY)(ORDER0123456789)

Notation."

The Xls in the condition are variables in the pattern match; all other symbols
are literals. AnXl appearing only in the action is also taken as a literal. Thus if
Rule 5 is matched with XI =4 and X2=5, as its second action it would deposit
(COND(S X~ 4)) in STM.These variables are local to each rule; that is, their
previous bindings are disregarded.
All elements of the LHSmust be matched for a match to succeed.
A hyphen indicates the ANDNOT operation.
An expression enclosed in parentheses and starting with a literal [e.g., (COUNT)
in Rule 4] will match any expression in STMthat starts with the same literal
[e.g., (COUNT2)]. The expression (ORDERXl X2) will match (ORDER
3 . . . 9) and bind XI=0 and X2= 1.
REP stands for REPlace, so that, for example, the RHSof Rule 1 will replace
the expression (READY)in the data base with the expression (COUNT
[where the variable X1 stands for the element matched by the XI in (ORDER
X0].
DEPstands tor DEPosit symbols at front of STM.
ATTEND means wait for input from computer terminal. For this example, typ-
ing (M 4)(N 2) will have the system add 4 and
SAYmeans output to terminal.

FIGURE2-3 A production system for the addition of two sin-


gle-digit integers ]after Waterman
(1974), simplified slightly].
Production
SystemCharacteristics 35

(COND...) is shorthand for (DEP (COND...)).


(ACTION ...) is shorthand for (DEP (ACTION ...)).
PRODmeans gather all items in the STMof the form (COND...) and put them
together into an LHS,gather all items of the form (ACTION. . .) and put them
together into an RHS,and remove all these expressions from the STM.Form a
production timn the resulting LHSand RHS,and add it to the front of the set
of productions(i.e., before Rule 1).

FIGURE2-3 continued

ments in structured programming). PSs, on the other hand, emphasize


the statement of independent chunks of knowledge from a domain and
make control flow a secondary issue. Given the limited form of commu-
nication available in PSs, it is more difficult to express concepts that require
structures larger than a single rule. Thus, where the emphasis is on global
behavior of a system rather than on the expression of small chunks of
knowledge, PSs are, in general, less transparent than equivalent procedural
routines.

2.4.2 Constrained Format

While there are wide variations in the format permitted by various PSs, in
any given system the syntax is traditionally quite restrictive and generally
follows the conventions accepted for PSs. 5 Most commonlythis means,
first, that the side of" the rule to be matched should be a simple predicate
built out of" a Boolean combination of computationally primitive opera-
tions; these involve (as noted above) only matching and detection. Second,
it means the side of the rule to be executed should perform conceptually
simple operations on the data base. In many of the systems oriented toward
psychological modeling, the side to be matched consists of a set of literals
or simple patterns, with the understanding that the set is to be taken as a
conjunction, so that the predicate is an implicit one regarding the success
or failure of matching all of the elements. Similarly, the side to be executed
performs a simple symbol replacement or rearrangement.
Whatever the format, though, the conventions noted lead to clear re-
strictions for a pure production system. First, as a predicate, the matching
side of a rule should return only some indication of the success or failure
of the match.6 Second, as a simple expression, the matching operation is

5Note,however,that the tradition arises out of a commonly


followedconventionrather than
anyessentialcharacteristicof a PS.
6Whilebindingindividualvariables or segmentsin the processof pattern matchingis quite
often used, it wouldbe consideredinappropriate to havethe matchingprocess producea
complexdata structure intendedfor processingby anotherpart of the system.
36 The Origin of Rule-Based Systems in AI

RULE STATUS STMAFTERRULE SUCCEEDS NEW RULES/COMMENTS

CYCLE#1
(READY)(ORDER 0123456789) initial state
Rule1 Succeeds (COUNT 0)(ORDER 0123456789) awaitsinput (M4)(N
(N 2)(M 4)(COUNT after input
(ORDER0123456789)
Rule2 Succeeds (NN2)(N 2)(M 4)(COUNT X1 boundto 2
(ORDER0123456789)
Rule3 Fails
Rule4 Succeeds (S NN2)(N 2)(M 4)(S COUNT
(ORDER 0123456789)
Rule5 Succeeds (S NN2)(N 2)(M 4)(S COUNT Xl boundto 0
(ORDER123456789) NewRule 6:
(S 30) ~(REP(S X
30)(X31)
CYCLE #2
Rule 6 Succeeds (S NN2)(N 2)(M 4)(COUNT X3 boundto the literal COUNT
(ORDER123456789)
Rule1 Fails
Rule2 Fails
Rule 3 Fails
Rule 4 Fails
Rule5 Succeeds (S NN2)(N 2)(M 4)(COUNT NewRule 7:
(ORDER 23456789) (S X31) ~ (REP(SX31)(X32))
CYCLE #3
Rule7 Fails
Rule6 Fails
Rule1 Fails
Rule2 Fails
Rule3 Fails
Rule4 Fails
Rule5 Succeeds (S NN2)(N 2)(M 4)(COUNT NewRule 8:
(ORDER 3456789) (S 32) ~(REP(S X~
2)(33))
CYCLE #4
Rule8 Succeeds (NN3)(N 2)(M 4)(COUNT X3boundto NN
(ORDER 3456789)
Rule7 Fails
Rule 6 Fails
Rule1 Fails
Rule2 Fails
Rule 3 Fails
Rule 4 Succeeds (S NN3)(N 2)(M 4)(S COUNT
(ORDER 3456789)
Rule5 Succeeds (S NN3)(N 2)(M 4)(S COUNT NewRule 9:
(ORDER 456789) (S X33) ~ (REP(S33)(X34))
CYCLE #5
Rule 9 Succeeds (NN4)(N 2)(M 4)(S COUNT
(ORDER 456789)

etc. <continuedcycling> Rules10 and11 generated

Rule3 Succeeds (NN6)(N 2)(M 4)(COUNT BindXl to 4, X2 to 6, X3 to 2;


(ORDER6789) Prints 6 IS THEANSWER;
Rule12 produced;
Terminates.

FIGURE 2-4 Trace of production system shown in Figure 2-3.


Adding 4 and 2.
Production System Characteristics 37

precluded from using more complex control structures like iteration or


recursion within the expression itself (although such operations can be
constructed from multiple rules). Finally, as a matching and detection op-
eration, it must only "observe" the state of the data base and not change
it in the operation of testing it.
Wecan characterize a continuum of possibilities for the side of the
rule to be executed. There might be a single primitive action, a simple
collection of independent actions, a carefully ordered sequence of actions,
or even more complex control structures. Wesuggest that there are two
related forms of simplicity that are important here. First, each action to be
performed should be one that is a conceptual primitive for the domain.
In the DENDRAL system, for example, it is appropriate to use chemical
bond breaking as the primitive, rather than to describe the process at some
lower level. Second, the complexity of control flow for the execution of
these primitives should be limited--in a pure production system, for ex-
ample, we might be wary of a complex set of actions that is, in effect, a
small program of its own. Again, it should be noted that the system de-
signer mayof course follow or disregard these restrictions.
These constraints on form make the dissection and "understanding"
of productions by other parts of the program a more straightforward task,
strongly enhancing the possibility of having the program itself read and/
or modify (rewrite) its own productions. For example, the MYCINsystem
makes strong use of the concept of allowing one part of the system to read
the rules being executed by another part. The system does a partial eval-
uation of rule premises. Since a premise is a Boolean combination of pred-
icate functions such as
(SAND(SAMECNTXTSITE) (thesite of thecultureis bloodand
(SAME CNTXTGRAMGRAMPOS) the gramstainis grampositive and
(DEF IS CNTXTAIR AEROBIC)) the aerobicityis definitelyaerobic)

and since clauses that are unknown cause subproblems that may involve
long computations to be set up, it makes sense to check to see if, based on
what is currently known,the entire premise is sure to fail (e.g., if any clause
of a conjunction is knownto be false). Wecannot simply EVALeach clause,
since this will trigger a search if the value is still unknown.But if the clause
can be "unpacked" into its proper constituents, it is possible to determine
whether or not the value is knownas yet, and if so, what it is. This is done
via a template associated with each predicate function. For example, the
template fbr SAMEis
(SAMECNTXTPARMVALUE)

and it gives the generic type and order of arguments for the function
(much like a simplified procedure declaration). By using this as a guide
unpack and extract the needed items, we can safely do a partial evaluation
of the rule premise. A similar technique is used to separate the knownand
38 TheOriginof Rule-Based
Systemsin AI

unknown clauses of a rule for the users benefit when the system is ex-
plaining itself (see Chapter 18 for several examples).
Note that part of the system is reading the code being executed by the
other part. Furthermore, note that this reading is guided by infi)rmation
carried in the rule components themselves. This latter characteristic as-
sures that the capability is unaffected by the addition of new rules or
predicate functions to the system.
This kind of technique limits expressibility, however, since the limited
syntax may not be sufficiently powerful to make expressing each piece of
knowledge an easy task. This in turn both restricts extensibility (adding
something is difficult if it is hard to express it) and makes modification of
the systems behavior more difficult (e.g., it might not be particularly at-
tractive to implement a desired iteration if doing so requires several rules
rather than a line or two of code).

2.4.3 Rules as Primitive Actions

In a pure PS, the smallest unit of behavior is a rule invocation. At its


simplest, this involves the matching of literals on the LHS, followed by
replacement of those symbols in the data base with the ones found on the
RHS. While the variations can be more complex, it is in some sense a
violation of the spirit of things to have a sequence of actions in the RHS.
Moran (1973b), for example, acknowledges a deviation from the spirit
of production systems in VIS when he groups rules in "procedures" within
which the rules are totally ordered for the purpose of conflict resolution.
He sees several advantages in this departure. It is "natural" for the user (a
builder of psychological models) to write rules as a group working toward
a single goal. This grouping restricts the context of the rules. It also helps
minimize the problem of implicit context: when rules are ordered, a rule
that occurs later in the list may really be applicable only if some of the
conditions checked by earlier rules are untrue. This dependency, referred
to as implicit context, is often not made explicit in the rule, but may be
critical to system performance. The price paid for these advantages is two-
fold: first, extra rules, less directly attributable to psychological processes,
are needed to switch amongprocedures; second, it violates the basic pro-
duction system tenet that any rule should (in principle) be able to fire
any time--here only those in the currently active procedure can fire.
To the extent that the pure production system restrictions are met, we
can consider rules as the quanta of intelligent behavior in the system.
Otherwise, as in the VIS system, we must look at larger aggregations of
rules to trace behavior. In doing so, we lose some of the ability to quantify
and measure behavior, as is done, for example, with the PSG system sim-
ulation of the Sternberg task, where response times are attributed to in-
dividual production rules and then compared against actual psychological
data.
Production System Characteristics 39

A different sort of deviation is found in the DENDRAL system, and


in a few MYCINrules. In both, the RHS is effectively a small program,
carrying out complex sequences of actions. In this case, the quanta of
behavior are the individual actions of these programs, and understanding
the system thus requires familiarity with them. By embodying these bits of
behavior in a stylized format, we make it possible for the system to "read"
them to its users (achieved in MYCIN as described above) and hence pro-
vide someexplanation of its behavior, at least at this level. This prohibition
against complex behaviors within a rule, however, may force us to imple-
ment what are (conceptually) simple control structures by using the com-
bined effects of several rules. This of course may make overall behavior
of the system much more opaque (see Section 2.4.5).

2.4.4 Modularity

Wecan regard the modularity of a program as the degree of separation of


its functional units into isolatable pieces. A programis highly modularif any
functional unit can be changed (added, deleted, or replaced) with no un-
anticipated change to other functional units. Thus program modularity is
inversely related to the strength of coupling between its functional units.
The modularity of programs written as pure production systems arises
from the important fact that the next rule to be invoked is determined
solely by the contents of the data base, and no rule is ever called directly.
Thus the addition (or deletion) of a rule does not require the modification
of" any other rule to provide for or delete a call to it. Wemight demonstrate
this by repeatedly removing rules from a PS: many systems will continue
to display some sort of "reasonable" behavior. 7 By contrast, adding a pro-
cedure to an ALGOL-likeprogram requires modification of other parts of
the code to insure that the procedure is invoked, while removing an ar-
bitrary procedure from such a program will generally cripple it.
Note that the issue here is more than simply the "undefined function"
error message, which would result from a missing procedure. The problem
would persist even if the compiler or interpreter were altered to treat
undefined functions as no-ops. The issue is a much more fundamental one
concerning organization of knowledge: programs written in procedure-
oriented languages stress the kind of explicit passing of control from one
section of code to another that is characterized by the calling of procedures.

7The number of rules that could be removed without performance degradation (short of
redundancies) is an interesting characteristic that would appear to be correlated with which
of the two commonapproaches to PSs is taken. The psychological modeling systems would
apparently degenerate fastest, since they are designed to be minimally competent sets of
rules. Knowledge-based expert systems, on the other hand, tend to embody numerous in-
dependent subproblems in rules and often contain overlapping or even purposefully redun-
dant representations of knowledge. Hence, while losing their competence on selected prob-
lems, it appears they would often flmction reasonably well, even with several rules removed.
40 TheOriginof Rule-BasedSystemsin AI

This is typically done at a selected time and in a particular context, both


carefully chosen by the programmer. If a no-op is substituted for a missing
procedure, the context upon returning will not be what the programmer
expected, and subsequent procedure calls will be executed in increasingly
incorrect enviromnents. Similarly, procedures that have been added must
be called from somewherein the program, and the location of the call must
be chosen carefully if the effect is to be meaningful.
Production systems, on the other hand, especially in their pure form,
emphasize the decoupling of control flow from the writing of rules. Each
rule is designed to be, ideally, an independent chunk of knowledge with
its own statement of relevance (either the conditions of the LHS, as in
data-driven system, or the action of the RHS,as in a goal-directed system).
Thus, while the ALGOLprogrammer carefully chooses the order of pro-
cedure calls to create a selected sequence of environments, in a production
system it is the environment that chooses the next rule for execution. And
since a rule can only be chosen if its criteria of relevance have been met,
the choice will continue to be a plausible one, and system behavior will
remain "reasonable," even as rules are successively deleted.
This inherent modularity of pure production systems eases the task
of programming in them. Given some primitive action that the system fails
to perform, it becomes a matter of writing a rule whose LHSmatches the
relevant indicators in the data base, and whose RHSperforms the action.
Whereas the task is then complete for a pure PS, systems that vary from
this design have the additional task of assuring proper invocation of the
rule (not unlike assuring the proper call of a new procedure). The difficulty
of this varies from trivial in the case of systems with goal-oriented behavior
(like MYCIN)to substantial in systems that use more complex LHSscans
and conflict resolution strategies.
For systems using the goal-oriented approach, rule order is usually
unimportant. Insertion of a new rule is thus simple and can often be totally
automated. This is, of course, a distinct advantage where the rule set is
large and the problems of system complexity are significant. For others
(like PSG and PASII) rule order can be critical to performance and hence
requires careful attention. This can, however, be viewed as an advantage,
and indeed, Newell (1973) tests diffi~rent theories of behavior by the simple
expedient of changing the order of rules. The family of Sternberg task
simulators includes a number of production systems that differ only by the
interchange of two rules, yet display very different behavior. Watermans
system (Waterman, 1974) accomplishes "adaptation" by the simple heuristic
~
of placing a new rule immediately before a rule that causes an error.

8Onespecific exampleof the importanceof rule order can be seenin our earlier exampleof
addition (Figure2-3). HereRule5 assumesthat an orderingof the digits exists in STM
the form(ORDER 0 1 2 ...) and from this can be created the successor function for each
digit. If Rule5 wereplacedbefore Rule1, the systemwouldntadd at all. In addition,
acquiringthe notion of successorin subsequentruns dependsentirely on the placementof
the newsuccessorproductionsbeJbreRule3, or the effect of this newknowledge wouldbe
masked.
Production System Characteristics 41

2.4.5 Visibility of Behavior Flow

Visibility of behavior flow is the ease with which the overall behavior of a
PS can be understood, either by observing the system or by reviewing its
rule base. Even for conceptually simple tasks, the stepwise behavior of a
PS is often rather opaque. The poor visibility of PS behavior compared to
that of the procedural formalism is illustrated by the Waterman integer
addition example outlined in Section 2.4.1. The procedural version of the
iterative loop there is reasonably clear (lines B, C, and E), and an ALGOL-
type
FORI := 1 UNTILN DO...

would be completely obvious. Yet the PS formalism for the same thing
requires nonintuitive productions (like 1 and 2) and symbols like NNwhose
only purpose is to "mask" the condition portion of a rule so it will not be
inw)ked later [such symbols are termed control elements (Anderson, 1976)].
The requirement for control elements, and much of the opacity of PS
behaviol, is a direct result of two factors noted above: the unity of control
and data store, and the reevaluation of the data base at every cycle. Any
attempt to "read" a PS requires keeping in mind the entire contents of the
data base and scanning the entire rule set at every cycle. Control is much
more explicit and localized in procedural languages, so that reading AL-
)
GOLcode is a far easier task.
The perspective on knowledge representation implied by PSs also con-
tributes to this opacity. As suggested above, PSs are appropriate when it is
possible to specify the content of required knowledge without also speci-
fying the way in which it is to be used. Thus, reading a PS does not gen-
erally make clear how it works so much as what it may know, and the
behavior is consequently obscured. The situation is often reversed in pro-
cedural languages: program behavior may be reasonably clear, but the
domain knowledge used is often opaquely embedded in the procedures.
The two methodologies thus emphasize different aspects of knowledge and
program organization.

2.4.6 Machine Readability

Several interesting capabilities arise from making it possible for the system
to examine its own rules. As one example, it becomes possible to implement
automatic consistency checking. This can proceed at several levels. In the
simplest approach we can search for straightforward syntactic problems
such as contradiction (e.g., two rules of the form A & B ~ C and A &
--* -C) or subsumption (e.g., two rules of the form D & E & F ~ G and

:~One of the motivations |or the interest in structured programming is the attempt to em-
phasize still further tile degree of explicitness and localization of control.
42 TheOriginof Rule-BasedSystemsin AI

& F ~ G). A more sophisticated approach, which would require extensive


domain-specific knowledge, might be able to detect "semantic" problems,
such as, for example, a rule of the form A & B --, C when it is knownfrom
the meanings of A and B that A --, B. Manyother (domain-specific) tests
mayalso be possible. The point is that by automating the process, extensive
(perhaps exhaustive) checks of newly added productions are possible (and
could perhaps be run in background mode when the system is otherwise
idle).
A second sort of capability (described in the example in Section 2.4.2)
is exemplified by the MYCINsystems approach to examining its rules.
This is used in several ways (Davis, 1976) and produces both a more effi-
cient control structure and precise explanations of system behavior.

2.4.7 Explanation of Primitive Actions

Production system rules are intended to be modular chunks of knowledge


and to represent primitive actions. Thus explaining primitive acts should
be as simple as stating the corresponding rule--all necessary contextual
information should be included in the rule itself. Achieving such clear
explanations, however, strongly depends on the extent to which the as-
sumptions of modularity and explicit context are met. In the case where
stating a rule does provide a clear explanation, the task of modification of
program behavior becomes easier.
As an example, the MYCINsystem often successfully uses rules to
explain its behavior. This form of explanation fails, however, when consid-
erations of system performance or human engineering lead to rules whose
context is obscure. One class of rule, for example, says, in effect, "If A
seems to be true, and B seems to be true, then thats (more) evidence
favor of A. 10 It is phrased this way rather than simply "If B seems true,
thats evidence in favor of A," because B is a very rare condition, and it
appears counterintuitive to ask about it unless A is suspected to begin with.
The first clause of the rule is thus acting as a strategic filter, to insure that
the rule is not even tried unless it has a reasonable chance of succeeding.
System performance has been improved (especially as regards human en-
gineering considerations), at the cost of a somewhat more opaque rule.

2.4.8 Modifiability, Consistency, and Rule Selection


Mechanism

As noted above, the tightly constrained format of rules makes it possible


for the system to examine its own rule base, with the possibility of modi-
lying it in response to requests from the user or to ensure consistency with

lt~Theseare known
as self-re[erencing
rules; seeChapter5.
Production
SystemCharacteristics 43

respect to newly added rules. While all these are conceivable in a system
using a standard procedural approach, the heavily stylized format of rules,
and the typically simple control structure of the interpreters, makes them
all realizable prospects in a PS.
Finally, the relative complexity of the rule selection mechanismwill
have varying effects on the ability to automate consistency checks, or be-
havior modification and extension. An RHS scan with backward chaining
(i.e., a goal-directed system; see Section 2.5.3) seems to be the easiest
follow since it mimics part of human reasoning behavior, while an LHS
scan with a complex conflict resolution strategy makes the system generally
more difficult to understand. As a result, predicting and controlling the
effects of changes in, or additions to, the rule base are directly influenced
in either direction by the choice of rule selection mechanism.

2.4.9 Programmability

The answer to "Howeasy is it to program in this formalism?" is "Its rea-


sonably difficult." The experience has been summarized (Moran, 1973a):

Anystructure which is added to the system diminishes the explicitness


o[ rule conditions .... Thus rules acquire implicit conditions. This makes
them(superficially) moreconcise, but at the price of clarity and precision ....
Another questionable device in most present production systems (including
mine) is the use of tags, markers, and other cute conventions for communi-
cating betweenrules. Again, this makesfor conciseness, but it obscures the
meaningof what is intended. The consequenceof this in myprogramis that
it is very delicate: one little slip with a tag and it goesoff the track. Also, it
is very difficult to alter the program;it takes a lot of time to readjust the
signals.

One source of the difficulties in programming production systems is the


necessity of programming"by side effect." Another is the difficulty of using
the PS methodology on a problem that cannot be broken down into the
solution of independent subproblems or into the synthesis of a behavior
that is neatly decomposable.
Several techniques have been investigated to deal with this difficulty.
One of them is the use of tags and markers (control elements), referred
to above. Wehave come to believe that the manner in which they are used,
particularly in psychological modeling systems, can be an indication of how
successfully the problem has been put into PS terms. To demonstrate this,
consider two very different (and somewhatidealized) approaches to writing
a PS. In the first, the programmer writes each rule independently of all
the others, simply attempting to capture in each some chunk of required
knowledge. The creation of each rule is thus a separate task. Only when
all of them have been written are they assembled, the data base initialized,
44 TheOriginof Rule-Based
Systemsin AI

and the behavior produced by the entire set of rules noted. As a second
approach, the programmer starts out with a specific behavior that he or
she wants to recreate. The entire rule set is written as a group with this in
mind, and, where necessary, one rule might deposit a symbol like A00124
in STMsolely to trigger a second specific rule on the next cycle.
In the first case the control elements would correspond to recognizable
states of the system. As such, they function as indicators of those states
and serve to trigger what is generally a large class of potentially applicable
rules.ll In the second case there is no such correspondence, and often only
a single rule recognizes a given control element, The idea here is to insure
the execution of a specific sequence of" rules, often because a desired effect
could not be accomplished in a single rule invocation. Such idiosyncratic
use of" control elements is formally equivalent to allowing one rule to call
a second, specific rule and hence is very much out of character for a PS.
To the extent that such use takes place, it appears to us to be suggestive
of a failure of the methodology--perhaps because a PS was ill-suited to
the task to begin with or because the particular decomposition used for
the task was not well chosen. 12 Since one fundamental assumption of the
PS methodologyas a psychological modelingtool is that states of" the system
correspond to what are at least plausible (if not immediately recognizable)
individual "states of mind," the relative abundance of the two uses of con-
trol elements mentioned above can conceivably be taken as an indication
of how successfully the methodology has been applied.
A second approach to dealing with tile difficulty of programming in
PSs is the use of increasingly complex forms within a single rule. Where
a pure PS might have a single action in its RHS, several psychological
modeling systems (PAS II, VIS) have explored the use of more complex
sequences of actions, including the use of" conditional exits from the se-
quence.
Finally, one effort (Rychener, 1975) has investigated the use of PSs
that are unconstrained by prior restrictions on rule format, use of tags,
etc. The aim here is to employ the methodology as a formalism for expli-
cating knowledge sources, understanding control structures, and examin-
ing the effectiveness of" PSs for attacking the large problems typical of
artificial intelligence. The productions in this system often turn out to have
a relatively simple format, but complex control structures are built via
carefully orchestrated interaction of rules. This is done with several tech-
niques, including explicit reliance on both control elements and certain
characteristics of the data base architecture. For example, iterative loops

l lThis basic techniqueof "broadcasting"informationand allowingindividual segmentsof


the systemto determinetheir relevancehas beenextendedand generalizedin systemslike
HEARSAYII (Lesser et al., 1974)and BEINGS (Lenat, 1975).
lZThe possibility remains,of course,that a "natural"interpretationof a controlelementwill
be forthcoming as the modeldevelops,and additionalrules that refer to it will be added.In
that case the easeof addingthe newrules arises out of the fact that the techniqueof" allowing
onerule to call anotherwasnot used.
Taxonomy
of ProductionSystems 45

are manufactured via explicit use of control elements, and data are (re-
dundantly) reasserted in order to make use of the "recency" ordering on
rules (the rule that mentions the most recently asserted data item is chosen
first; see Section 2.5.3). These techniques have supported the reincarnation
as PSs of a number of sizable AI programs [e.g., STUDENT (Bobrow,
1968)], but, Bobrownotes, "control tends to be rather inflexible, failing to
take advantage of the openness that seems to be inherent in PSs."
This reflects something of a new perspective on the use of PSs. Pre-
vious efforts have used them as tools for analyzing both the core of knowl-
edge essential to a given task and the manner in which such knowledge is
used. Such efforts relied in part on the austerity of the available control
structure to keep all of the knowledgeexplicit. The expectation is that each
production will embody a single chunk of knowledge. Even in the work of
Newell (1973), which used PSs as a mediumfor expressing different the-
ories in the Sternberg task, an important emphasis is placed on productions
as a model of the detailed control structure of humans. In fact, every aspect
of the system is assumed to have a psychological correlate.
The work reported by Rychener (1975), however, after explicitly de-
tailing the chunks of knowledge required in the word problem domain of
STUDENT,notes a many-to-many mapping between its knowledge chunks
and productions. That work also focuses on complex control regimes that
can be built using PSs. While still concerned with knowledge extraction
and explication, it views PSs more as an abstract programming language
and uses them as a vehicle for exploring control structures. While this
approach does offer an interesting perspective on such issues, it should
also be noted that as productions and their interactions grow more com-
plex, many of the advantages associated with traditional PS architecture
may be lost (for example, the loss of openness noted above). The benefits
to be gained are roughly analogous to those of using a higher-level pro-
gramming language: while the finer grain of the process being examined
may become less obvious, the power of the language permits large-scale
tasks to be undertaken and makes it easier to examine phenomena like the
interaction of entire categories of knowledge.
The use of PSs has thus grown to encompass several different forms,
many of which are tar more complex than the pure PS model described
initially.

2.5 Taxonomy of Production Systems

In this section we suggest four dimensions along which to characterize


PSs: form, content, control cycle architecture, and system extensibility. For
each dimension we examine related issues and indicate the range as evi-
denced by systems currently (or recently) in operation.
46 The Origin of Rule-Based Systems in AI

2.5.1 FormDHowPrimitive or Complex Should the


Syntax of Each Side Be?

There is a wide variation in the syntax used by PSs and corresponding


differences in both the matching and detection process and the subsequent
action caused by rule invocation. For matching, in the simplest case only
literals are allowed, and it is a conceptually trivial process (although the
rule and data base maybe so large that efficiency becomesa consideration).
Successively more complex approaches allow free variables [Watermans
poker player (Waterman, 1970)], syntactic classes (as in some parsing sys-
tems), and increasingly sophisticated capabilities of variable and segment
13
binding and of pattern specification (PAS II, VIS, LISP70).
The content of the data base also influences the question of form. One
interesting example is Andersons ACTsystem (Anderson, 1976), whose
rules have node networks in their LHSs. The appearance of an additional
piece of network as input results in a "spread of activation" occurring in
parallel through the LHSof each production. The rule that is chosen is
the one whose LHSmost closely matches the input and that has the largest
subpiece of network already in its working memory.
As another example, the DENDRAL system uses a literal pattern
match, but its patterns are graphs representing chemical classes. Each class
is defined by a basic chemical structure, referred to as a skeleton. As in the
data base, atoms composing the skeleton are given unique numbers, and
chemical bonds are described by the numbers of the atoms they join (e.g.,
"5 6"). The LHSof a rule is the name of one of these skeletons, and
side effect of a successful match is the recording of the structural corre-
spondence between atoms in the skeleton and those in the molecule. The
action parts of these rules describe a sequence of actions to perform: break
one or more bonds, saving a molecular fragment, and transfer one or more
hydrogen atoms from one fragment to another. An example of a simple
rule is
ESTROGEN
~ (BREAK(14 15) (13
(HTRANS + 1 +2)

The LHShere is the name of the graph structure that describes the estro-
gen class of molecules, while the RHSindicates the likely locations for bond
breakages and hydrogen transfers when such molecules are subjected to
mass spectral bombardment. Note that while both sides of the rule are
relatively complex, they are written in terms that are conceptual primitives
in the domain.
A related issue is illustrated by the rules used by MYCIN,where the
LHSconsists of a Boolean combination of standardized predicate func-
tions. Here the testing of a rule for relevance consists of having the stan-

13For an especially thorough discussion of pattern-matching methods in production systems


as used in V1S, see Moran (1973a, pp. 42-45).
Taxonomy of Production Systems 47

dard LISP evaluator assess the LHS, and all matching and detection are
controlled by the functions themselves. While using functions in LHSs
provides power that is missing from using a simple pattern match, that
creates the temptation to write one function to do what should be ex-
pressed by several rules. For example, one small task in MYCIN is to de-
duce that certain organisms are present, even though they have not been
recovered from any culture. This is a conceptually complex, multistep op-
eration, which is currently (1975) handled by invocation of a single func-
tion. If one succumbs often to the temptation to write one function rather
than several rules, the result can be a system that may perform the initial
task but that loses a great manyof the other advantages of the PS approach.
The problem is that the knowledge embodied in these functions is un-
available to anything else in the system. Whereas rules can be accessed and
their knowledge examined (because of their constrained format), chunks
of" ALGOL-likecode are not nearly as informative. The availability of a
standardized, well-structured set of operational primitives can help to
avoid the temptation to create new functions unnecessarily.

2.5.2 ContentmWhich Conceptual Levels of


Knowledge Belong in Rules?

The question here is how large a reasoning step should be embodied in a


single rule, and there seem to be two distinct approaches. Systems designed
for psychological modeling (PAS II, PSG, etc.) try to measure and compare
tasks and determine required knowledge and skills. As a result, they try to
dissect cognition into its most primitive terms. While there is, of course, a
range of possibilities, from the simple literal replacement found in PSGto
the more sophisticated abilities of PAS II to construct new productions,
rules in these systems tend to embodyonly the most basic conceptual steps.
Grouped at the other end of this spectrum are the task-oriented systems,
such as DENDRAL and MYCIN, which are designed to be competent at
selected real-world problems. Here the conceptual primitives are at a much
higher level, encompassing in a single rule a piece of reasoning that may
be based both on experience and on a highly complex model of the do-
main. For example, the statement "a gram-negative rod in the blood is
likely to be an E. coli" is based in part on knowledgeof physiological systems
and in part on clinical experience. Often the reasoning step is sufficiently
large that the rule becomesa significant statement of a fact or principle in
the domain, and, especially where reasoning is not yet highly formalized,
a comprehensive collection of such rules may represent a substantial por-
tion of the knowledgein the field.
An interesting, related point of methodology is the question of what
kinds of knowledge ought to go into rules. Rules expressing knowledge
about the domain are the necessary initial step, but interest has been gen-
erated lately in the question of embodying strategies in rules. Wehave
48 TheOriginof Rule-BasedSystemsin AI

been actively pursuing this in the implementation of meta-rules in the MY-


CIN system (Davis et al., 1977). These are "rules about rules," and they
contain strategies and heuristics. Thus, while the ordinary rules contain
standard object-level knowledge about the medical domain, meta-rules
contain information about rules and embody strategies tor selecting po-
tentially useful paths of reasoning. For example, a meta-rule might suggest:

If ttle patient has had a bowel tumor, then in concluding about or-
ganism identity, rules that mention the gastrointestinal tract are more
likely to be useful.

There is clearly no reason to stop at one level, however--third-order rules


could be used to select from or order the recta-rules, by using information
about howto select a strategy (and hence represent a search through "strat-
egy space"); fourth-order rules would suggest how to select criteria for
choosing a strategy; etc.
This approach appears to be promising for several reasons. First, the
expression of any new level of" knowledge in the system can mean an in-
crease in competence. This sort of strategy information, moreover, may
translate rather directly into increased speed (since fewer rules need be
tried) or no degradation in speed even with large increases in the number
of rules. Second, since meta-rules refer to rule content rather than rule
names, they automatically take care of new object-level rules that may be
added to the system. Third, the possibility of expressing this information
in a format that is essentially the same as the standard one means a unifl)rm
expression of many levels of knowledge. This uniformity in turn means
that the advantages that arise out of the embodiment of any knowledge in
a production rule (accessibility and the possibility of automated explana-
tion, modification, and acquisition of" rules) should be available for the
higher-order rules as well.

2.5.3 Control Cycle Architecture

The basic control cycle can be broken down into two phases called recog-
nition and action. The recognition phase involves selecting a single rule for
execution and can be further subdivided into selection and conflict resolu-
tion. 14 In the selection process, one or more potentially applicable rules are
chosen from the set and passed to the conflict resolution algorithm, which
chooses one of them. There are several approaches to selection, which can
be categorized by their rule scan method. Most systems (e.g., PSG, PASII)
use some variation of an LHS scan, in which each LHSis evaluated in
turn. Manystop scanning at the first successful evaluation (e.g., PSG), and

~4Therangeof conflict resolutionalgorithmsin this section wassuggestedin a talk by Don


Waterman.
Taxonomy
of ProductionSystems 49

hence conflict resolution becomesa trivial step (although the question then
remains of where to start the scan on the next cycle: to start over at the
first rule or to continue from the current rule).
Somesystems, however, collect all rules whose LHSs evaluate success-
fully. Conflict resolution then requires some criterion for choosing a single
rule from this set (called the conflict set). Several have been suggested,
including:

(i) Rule order--there is a complete ordering of all rules in the system,


and the rule in the conflict set with the highest priority is chosen.
(ii) Data order--elements of the data base are ordered, and that rule is
chosen which matches element(s) in the data base with highest priority.
(iii) Generality order--the most specific rule is chosen.
(iv) Rule precedence--a precedence network (perhaps containing cycles)
determines the hierarchy.
(v) Recency order--either the most recently executed rule or the rule
containing the most recently updated element of the data base is
chosen.

For example, the LISP70 interpreter uses (iii), while DENDRAL uses (iv).
A different approach to the selection process is used in the MYCIN
system. The approach is goal-oriented and uses an RHSscan. The process
is quite similar to the unwinding of consequent theorems in PLANNER
(Hewitt, 1972): given a required subgoal, the system retrieves the (unor-
dered) set of rules whose actions conclude something about that subgoal.
The evaluation of the first LHSis begun, and if any clause in it refers to
a fact not yet in the data base, a generalized version of this fact becomes
the new subgoal, and the process recurs. However, because MYCINis
designed to work with judgmental knowledge in a domain where collecting
all relevant data and considering all possibilities are very important, in
general, it executes all rules from the conflict set rather than stopping after
the first success.
The meta-rules mentioned above may also be seen as a way of selecting
a subset of the conflict set for execution. There are several advantages to
this. First, the conflict resolution algorithm is stated explicitly in the meta-
rules (rather than implicitly in the systems interpreter) and in the same
representation as the rest of the rule-based knowledge. Second, since there
can be a set of meta-rules for each subgoal type, MYCIN can specify dis-
tinct, and hence potentially more customized, conflict resolution strategies
for each individual subgoal. Since the backward chaining of rules may also
be viewed as a depth-first search of an AND/OR goal tree, 15 we may view

15An AND/OR goal tree is a reasoning networkin whichANDs(conjunctions of LHScon-


ditionals) and ORs(disjunctionsof multiplerules that all allowthe samegoal/conclusion
be reached)alternate. Thisstructure is describedin detail duringthe discussionof MYCINs
control structure in Ghapter5.
50 TheOriginof Rule-BasedSystemsin AI

the search tree as storing at every branch point a collection of specific


heuristics about which path to take. In addition, rules in the system are
inexact, judgmental statements with a model of "approximate implication"
in which the user may specify a measure of how firmly he or she believes
that a given LHSimplies its RHS(Shortliffe and Buchanan, 1975). This
admits the possibility of writing numerous, perhaps conflicting heuristics,
whose combined judgment forms the conflict resolution algorithm.
Control cycle architecture affects the rest of the production system in
several ways. Overall efficiency, for example, can be strongly influenced.
The RHSscan in a goal-oriented system insures that only relevant rules
are considered in the conflict set. Since this is often a small subset of the
total, and one that can be computed once and stored for reference, there
is no search necessary at execution time; thus the approach can be quite
efficient. In addition, since this approach seems natural to humans, the
systems behavior becomes easier to follow.
Amongthe conflict resolution algorithms mentioned, rule order and
recency order require a minimal amount of checking to determine the rule
with highest priority. Generality order can be efficiently implemented, and
the LISP70 compiler uses it effectively. Data order and rule precedence
require a significant amount of bookkeeping and processing, and hence
may be slower (PSH, a development along the lines of PSG, attacks pre-
cisely this problem).
The relative difficulty of adding a new rule to the system is also de-
termined to a significant degree by the choice of control cycle architecture.
Like PLANNER with its consequent theorems, the goal-oriented approach
makes it possible to simply "throw the rule in the pot" and still be assured
that it will be retrieved properly. The generality-ordering technique also
permits a simple, automatic method for placing the new rule, as do the
data-ordering and recency strategies. In the latter two cases, however, the
primary factor in ordering is external to the rule, and hence, while rules
may be added to the rule set easily, it is somewhat harder to predict and
control their subsequent selection. For both rule order and rule precedence
networks, rule addition may be a substantially more difficult problem that
depends primarily on the complexity of the criteria used to determine the
hierarchy.

2.5.4 System Extensibility

Learning, viewed as augmentation of the systems rule base, is of concern


both to the information-processing psychologists, whoview it as an essential
aspect of human cognition, and to designers of knowledge-based systems,
who acknowledge that building truly expert systems requires an incremen-
tal approach to competence. As yet we have no range or even points of
Taxonomy
of ProductionSystems 51

comparison to offer because of the scarcity of examples. Instead, we sug-


16
gest some standards by which the ease of augmentation may be judged.
Perhaps the most basic question is "Howautomatic is it?" The ability
to learn is clearly an area of competence by itself, and thus we are really
asking how much of that competence has been captured in the system, and
how much the user has to supply. Someaspects of this competence include:

If the current system displays evidence of a bug caused by a missing or


incorrect rule, how muchof the diagnosing of the bug is handled by the
system, and how much tracing must be done by the user?
Once the bug is uncovered, who fixes it? Must the user modify the code
by hand?.., tell the system in some commandlanguage what to do?...
indicate the generic type of the error? Can the user simply point out the
offending rule, or can the system locate and fix the bug itself?
Can the system indicate whether the new rule will in fact fix the bug or
if it will have side effects or undesired interactions?
How much must the user know about rule format conventions when
expressing a new (or modified) rule? Must he or she know how to code
it explicitly?.., knowprecisely the vocabulary to use?.., knowgenerally
how to phrase it? Or can the user indicate in some general way the
desired rule and allow the system to make the transformation? Whohas
to know the semantics of the domain? For example, can the system detect
impossible conjunctions (A & B, where A ~ not-B), or trivial disjunctions
(A V B, where A --, not-B)? Who knows enough about the systems
idiosyncrasies to suggest optimally fast or powerful ways of expressing
rules?
Howdifficult is it to enter strategies?
Howdifficult is it to enter control structure information? Where is the
control structure information stored: in aggregations of rules or in
higher-order rules? The former makes augmentation or modification a
difficult problem; the latter makes it somewhat easier, since the infor-
mation is explicit and concentrated in one place.
Can you assure continued consistency of the rule base? Whohas to do
the checking?

These are questions that will be important and useful to confront in de-
signing any system intended to do knowledge acquisition, especially any
built around production rules as underlying knowledge representation.

Hqtshouldbe notedthat this discussionis orientedprimarilytowardan interactive, mixed-


initiative viewof learning, in whichthe humanexpert teachesthe systemandanswersques-
tions it maygenerate.It has also beeninfluencedby our experiencein attackingthis problem
for the MY(IN system(Davis, 1976). Manyother modelsof the process(e.g., teaching
selectedexamples) are of coursepossible.
52 TheOriginof Rule-BasedSystemsin AI

2.6 Conclusions
In artificial intelligence research, production systems were first used to
embody primitive chunks ofinformation-processing behavior in simulation
programs. Their adaptation to other uses, along with increased experience
with them, has focused attention on their possible utility as a general pro-
gramming mechanism. Production systems permit the representation of
knowledge in a highly uniform and modular way. This may pay off hand-
somely in two areas of investigation: development of programs that can
manipulate their own representations and development of a theory of
loosely coupled systems, both computational and psychological. Production
systems are potentially useful as a flexible modeling tool for many types
of systems; current research efforts are sufficiently diverse to discover the
extent to which this potential may be realized.
Information-processing psychologists continue to be interested in pro-
duction systems. PSs can be used to study a wide range of tasks (Newell
and Simon, 1972). They constitute a general programming system with
the full power of a Turing machine, but use a homogeneous encoding of
knowledge. To the extent that the methodology is that of a pure production
system, the knowledge embedded is completely explicit and thus aids
experimental verification or falsifiability of theories that use PSs as a me-
dium of expression. Productions may correspond to verifiable bits of psy-
chological behavior (Moran, 1973a), reflecting the role of postulated hu-
man information-processing structures such as short-term memory. PSs
are flexible enough to permit a wide range of variation based on reaction
times, adaptation, or other commonlytested psychological variables. Fi-
nally, they provide a method for studying learning and adaptive behavior
(Waterman, 1974).
For those wishing to build knowledge-based expert systems, the homo-
geneous encoding of knowledge offers the possibility of automating parts
of the task of dealing with the growing complexity of such systems. Knowl-
edge in production rules is both accessible and relatively easy to modify. It
can be executed by one part of the system as procedural code and exam-
ined by another part as if it were a declarative expression. Despite the
difficulties of programmingPSs, and their occasionally restrictive syntax,
the fundamental methodology suggests a convenient and appropriate
framework for the task of structuring and specifying large amounts of
knowledge. (See Hayes-Roth et al., 1983, for recent uses of production
systems.) It maythus prove to be of great utility in dealing with the prob-
lems of complexity encountered in the construction of large knowledge
bases.
PART TWO

Using Rules
3
The Evolution of MYCINs
Rule Form

There is little doubt that the decision to use rules to encode infectious
disease knowledge in the nascent MYCIN system was largely influenced by
our experience using similar techniques in DENDRAL. However, as men-
tioned in Chapter 1, we did experiment with a semantic network repre-
sentation before turning to the production rule model. The impressive
published examples of Carbonells SCHOLAR system (Carbonell, 1970a;
1970b), with its ability to carry on a mixed-initiative dialogue regarding
the geography of South America, seemed to us a useful model of the kind
of rich interactive environment that would be needed for a system to advise
physicians.
Our disenchantment with a pure semantic network representation of
the domain knowledge arose for several reasons as we began to work with
Cohen and Axline, our collaborating experts. First, the knowledge of in-
fectious disease therapy selection was ill-structured and, we found, difficult
to represent using labeled arcs between nodes. Unlike South American
geography, our domain did not have a clear-cut hierarchical organization,
and we found it challenging to transfer a page or two from a medical
textbook into a network of sufficient richness for our purposes. Of partic-
ular importance was our need for a strong inferential mechanism that
would allow our system to reason about complex relationships among di-
verse concepts; there was no precedent for inferences on a semantic net
that went beyond the direct, labeled relationships between nodes.1
Perhaps the greatest problem with a network representation, and the
greatest appeal of production rules, was our gradually recognized need to
deal with small chunks of domain knowledge in interacting with our expert
collaborators. Because they were not used to dissecting their clinical rea-
soning processes, it was totally useless to ask them to "tell us all that you
know." However, by discussing specific difficult patients, and by encour-

IThe PROSPECTOR systemDudaet al., 1978a; 1978b), whichwas developedshortly after


MYCIN, uses a networkof inferential relations--a so-calledinferencenet--to combine
a seman-
tic networkwithinferencerules.

55
56 The Evolution of MYCINsRule Form

aging our collaborators to justify their questions or decisions, those of us


who were not expert in the field began to tease out "nuggets" of domain
knowledge--individual inferential facts that the experts identified as per-
tinent for problem solving in the domain. By encoding these facts as in-
dividual production rules, rather than attempting to decompose them into
nodes and links in a semantic network, we found that the experts were
able to examine and critique the rules without difficulty. This transparency
of the knowledge base, coupled with the inherent modularity of knowledge
expressed as rules, allowed us to build a prototype system quickly and
allowed the experts to identify sources of performance problems with rel-
ative ease. They particularly appreciated having the ability to observe the
effects of chained reasoning based on individual rules that they themselves
had provided to us. In current AI terminology, the organization of knowl-
edge was not object-centered but was centered around inferential processes.
Our early prototype rapidly diverged from DENDRAL because we
were driven by different performance goals and different characteristics
of the knowledge in the domain. Of particular importance was the need
to deal with inexact inference; unlike the categorical conclusions in DEN-
DRALsrules, the actions in MYCINsproductions were typically conclu-
sions about the state of the world that were not knownwith certainty. We
soon recognized the need to accumulate evidence regarding alternative
hypotheses as multiple rules lent credence to the conclusions. The need
for a system to measure the weight of evidence of competing hypotheses
was not surprising; it had also characterized conventional statistical ap-
proaches to computer-based medical decision making. Our certainty factor
model, to which we refer frequently throughout this book (and which is
the subject of Part Four), was developed in response to our desire to deal
with uncertainty while attempting to keep knowledge modular and in rules.
The absence of complete certainty in most of our rules meant that we
needed a control structure that would consider all rules regarding a given
hypothesis and not stop after the first one had succeeded. This need for
exhaustive search was distinctly different from control in DENDRAL,
where the hierarchical ordering of rules was particularly important for
correct prediction and interpretation (see Chapter 2). Because rule order-
ing was not important in MYCIN,the modularity of rules was heightened;
the experts did not need to worry about ordering the rules they gave us
2or about other details of control.
Another important distinction between the reasoning paradigms of
DENDRALand MYCIN was recognized early. DENDRALgenerated
hypotheses regarding plausible chemical structures and used its rule set to

2The arbitrary order of MYCINsrules did lead to some suboptimal performance character-
istics, however. In particular, the ordering of questions to the user often seemed unfocused.
It was for this reason that the MAINPROPS (later known as INITIALDATA)feature was
devised (see Chapter 5), and the concept of meta-rules was developed to allow rule selection
and ordering based on strategic knowledge of the domain (see Chapter 28). The development
of prototypes in CENTAUR (Chapter 23) was similarly motivated.
DesignConsiderations 57

test these hypotheses and to select the best ones. Thus DENDRALs control
scheme involved forward invocation of rules for the last phase of the plan-
generate-and-test paradigm. On the other hand, it was unrealistic for MY-
CIN to start by generating hypotheses regarding likely organisms or com-
binations of pathogens; there were no reasonable heuristics for pruning
the search space, and there was no single piece of orienting information
similar to the mass spectrum, which provided the planning information to
constrain DENDRALshypothesis generator. Thus MYCINwas dependent
on a reasoning model based on evidence gathering, and its rules were used
to guide the process of input data collection. Because we wanted to avoid
problems of natural language understanding, and also did not want to
teach our physician users a specialized input language, we felt it was un-
reasonable to ask the physician to enter some subset of the relevant patient
descriptors and then to have the rules fire in a data-driven fashion. Instead,
we chose a goal-directed control structure that allowed MYCIN to ask the
relevant questions and therefore permitted the physician to respond, in
general, with simple one-word answers. Thus domain characteristics led
to forward-directed use of the generate-and-test paradigm in DENDRAL
and to goal-directed use of the evidence-gathering paradigm in MYCIN.
Wewere not entirely successful in putting all of the requisite medical
knowledge into rules. Chapter 5 describes the problems encountered in
trying to represent MYCINstherapy selection algorithm as rules. Because
therapy selection was initially implemented as LISP code rather than in
rules, MYCINsexplanation system was at that time unable to justify spe-
cific therapy decisions in the same way it justified its diagnostic decisions.
This situation reflects the inherent tension between procedural and pro-
duction-based representation of this kind of algorithmic knowledge. The
need for further work on the problem was clear. A few years later Clancey
assumed the challenge of rewriting the therapy selection part of MYCIN
so that appropriate explanations could be generated for the user. Wewere
unable to encode the entire algorithm in rules, however, and instead settled
on a solution reminiscent of the generate-and-test approach used in DEN-
DRAL:rules were used to evaluate therapeutic hypotheses after they had
been proposed (generated) by an algorithm that was designed to support
explanations of its operation. This clever solution, described in Chapter 6,
seemed to provide an optimal mix of procedural and rule-based knowl-
edge.

3.1 Design Considerations


Manyof the decisions that led to MYCINsinitial design resulted from a
pragmatic response to perceived demands of physicians as computer users.
Our perceptions were largely based on our own intuitions and observations
58 The Evolution of MYCINs
RuleForm

about problems that had limited the success of previous computer-based


medical decision-making systems. More recently we have undertaken for-
real studies of physician attitudes (Chapter 34), and the data that resulted,
coupled with our prior experience building MYCIN,have had a major
impact on our more recent work with ONCOCIN(Chapter 35). These
issues are addressed in detail in Part Eleven.
However, since many of the features and technical decisions that are
reflected in the other chapters in Part Twoare based on our early analysis
of design considerations for MYCIN(Shortliffe, 1976), we summarize
those briefly here. Wehave already alluded to several ways in which MY-
CIN departed from the pure production systems described in Chapter 2.
These are further discussed throughout the book (see especially Chapter
36), but it is important to recognize that the systems development was
evolutionary. Most such departures resulted from characteristics of the
medical domain, from our perceptions of physicians as potential computer
users, or from unanticipated problems that arose as MYCIN grew in size
and complexity.
Werecognized at the outset that educational programs designed for
instruction of medical students had tended to meet with more long-term
success than had clinical consultation programs. A possible explanation,
we felt, was that instructional programs dealt only with hypothetical pa-
tients in an effort to teach diagnostic or therapeutic concepts, whereas
consultation systems were intended to assist physicians with the manage-
ment of real patients in the clinical setting. A program aiding decisions
that can directly affect patient well-being must fulfill certain responsibilities
to physicians if they are to accept the computer and make use of its knowl-
edge. For example, we observed that physicians had tended to reject com-
puter programs designed as decision-making aids unless they were
accessible, easy to use, forgiving of simple typing errors,, reliable, and fast
enough to save time. Physicians also seemed to prefer that a program
function as a tool, not as an "all-knowing" machine that analyzes data and
then states its conclusions as dogmawithout justifying them. Wehad also
observed that physicians are most apt to need advice from consultation
programs when an unusual diagnostic or therapeutic problem has arisen,
which is often the circumstance when a patient is acutely ill. Time is an
important consideration in such cases, and a physician will probably be
unwilling to experiment with an "unpolished" prototype. In fact, time will
always be an important consideration given the typical daily schedule of a
practicing physician.
With considerations such as these in mind from the start, we defined
the following list of prerequisites for the acceptance of a clinical consul-
3tation program (Shortliffe et al., 1974):

3Thisanalysis waslater updated,expanded,and analyzedafter we gainedmoreexperience


with MYCIN
(Shortliffe, 1980).
MYCINas an Evolutionary System 59

1. The program should be useful; i.e., it should respond to a well-docu-


mented clinical need and, ideally, should tackle a problem with which
physicians have explicitly requested assistance.
2. The program should be usable; i.e., it should be fast, accessible, easy to
learn, and simple for a novice computer user.
3. The program should be educational when appropriate; i.e., it should allow
physicians to access its knowledge base and must be capable of convey-
ing pertinent information in a form that they can understand and from
which they can learn.
4. The program should be able to explain its advice; i.e., it should provide
the user with enough information about its reasoning so that he or she
can decide whether to follow the recommendation.
5. The program should be able to respond to simple questions; i.e., it should
be possible for the physician to request justifications of specific infer-
ences by posing questions, ideally using natural language.
6. The program should be able to learn new knowledge; i.e., it should be
possible to tell it new facts and have them easily and automatically in-
corporated for future use, or it should be able to learn from experience
as it is used on large numbers of cases.
7. The programs knowledge should be easily modified; i.e, adding new
knowledge or correcting errors in new knowledge should be straight-
forward, ideally accomplished without having to make explicit changes
to the program (code) itself.

This list of design considerations played a major role in guiding our early
work on MYCIN,and, as we suggested earlier in this chapter, they largely
account for our decision to implement MYCINas a rule-based system. In
Chapters 4 through 6, and in subsequent discussions of knowledge acqui-
sition (Part Three) and explanation (Part Six), it will becomeclear how
production system formalism provided a powerful foundation for an evolv-
ing system intended to satisfy the design goals we have outlined here.

3,2 MYCINas an Evolutionary System

One of the lessons of the MYCINresearch has been the way in which the
pure theory of production systems, as described in Chapter 2, has required
adaptation in response to issues that arose during system development.
Many of these deviations from a pure production system approach with
backward chaining will become clear in the ensuing chapters. For reference
we summarize here some of those deviations, citing the reasons for changes
60 The Evolution of MYCINs
Rule Form

that were introduced, even though this anticipates more complete discus-
sions in later chapters.

1. The context tree: Werealized the need to allow our rules to make
conclusions about multiple objects and to keep track of the hierarchical
relationships among them. The context tree (described in Chapter 5) was
created to provide a mechanismfor representing hierarchical relationships
and for quantifying over multiple objects. For instance, ORGANISM-1 and
ORGANISM-2 are contexts of the same type that are related to cultures
in which they are observed to be growing and that need to be compared,
collected, and reasoned with together at times.

2. Instantiation of contexts: Whena new object required attention, we


needed a mechanism for creating it, naming it, and recording its associa-
tions with other contexts in the system. Prototypical contexts, similar in
concept to the "frames" of more recent AI work (Minsky, 1975), provided
a mechanism for creating new objects when they were needed. These are
called context-types to distinguish them from individual contexts. For in-
stance, ORGANISM is a context-type.

3. Development of MAINPROPS: Physicians using the evolving system


began to complain that MYCINdid not ask questions in the order they
were used to. For example, they indicated it was standard practice to discuss
the site, timing, and method of collection for a culture as soon as it was
first mentioned. Thus we created a set of parameters called the MAIN-
PROPSfor each prototypical context. 4 The values of these parameters
were automatically asked for when a context was first created, thereby
providing the kind of focused questioning with which physicians felt most
comfortable. The benefit was in creating a more natural sequence of ques-
tions. The risk was in asking a few more questions than might be logically
necessary for some cases. This was a departure from the pure production
system aproach of asking questions only when the information was needed
for evaluating the premise of a rule.

4. Addition of" antecedent rules: The development of MAINPROPS


meant that we knew there were a small number of questions that would
be asked every time a context was created. In a pure backward-chaining
system, rules that had premise conditions that depended only on the values
of parameters on MAINPROPS lists would be invoked when needed so
there was no a priori reason to do anything special with such rules. How-
ever, two situations arose that made us flag such rules as antecedent rules
to he invoked in a data-driven fashion rather than await goal-oriented
invocation. First, there were cases in which an answer to one MAINPROPS

4This namewas later changedto INITIALDATA


in EMYCIN
systems.
MYCIN
as an EvolutionarySystem 61

question could uniquely determine (via a definitional antecedent rule) the


value of another subsequent MAINPROPS property for the same context
(e.g., if an organisms identity was known, its gram stain and morphology
were of course immediately determined). By implementing such rules as
antecedent rules and by checking to see if the value of a MAINPROPS
parameter was known before asking the user, we avoided inappropriate or
unnecessary questions.
The second use of antecedent rules arose when the preview mecha-
nism was implemented (see paragraph 12 below). Because an antecedent
rule could determine that a premise condition of another rule was false,
such rules could be rejected immediately during the preview phase. If
antecedent rules had been saved for backward-chained invocation, how-
ever, the preview mechanism would have failed to reject the rule in ques-
tion. Thus the MONITOR would have inappropriately pursued the first
two or three conditions in the premise of the rule, perhaps at considerable
computational expense, only to discover that the subsequent clause was
clearly false due to an answer of an earlier MAINPROPS question. Thus
antecedent rules offered a considerable enhancement to efficiency in such
cases.

5. Self-referencing rules: As will be discussed in Chapter 5, it became


necessary to write rules in which the same parameter appeared in both the
premise and the action parts. Self-referencing rules of the form A & B &
C --, A are a departure from the pure production system approach, and
they required changes to the goal-oriented rule invocation mechanism.
They were introduced for three purposes: default reasoning, screening,
and using information about risks and utilities.
a. Default reasoning: MYCIN makes no inferences except those that are
explicitly stated in rules, as executed under the certainty factor (CF) model
(see Chapter 11) and backward-chaining control. There are no implicit
ELSEclauses in the rules that assign default values to parameters. 5 When
rules fail to establish a value for a parameter, its value is considered to be
UNKNOWN--no other defaults are used. One use of the self-referencing
rules is to assign a default value to a parameter explicitly:

IF a value for X is not known(after trying to establish one),


THENconclude that the value of X is Z.

Thus, reasoning with defaults is done in the rules and can be explained
in the same way as any other conclusions. The control structure had to be
changed, however, to delay executing these rules until all other relevant
rules had been tried.
b. Screening: For purposes of human engineering, we needed a screen-

5Explicitelse clausesweredefinedin the syntax(see Chapter5) but wereeliminated,mostly


for the sakeof simplicity.
62 The Evolution of MYCINs
Rule Form

ing mechanism to avoid asking about unusual parameters (B and C, above)


unless there is already some other evidence for the hypothesis (A) under
consideration. For example, we did not want MYCINto use the simple
rule

Pseudomonas-type skin lesions -, Pseudomonas

unless there already was evidence for Pseudomonas--otherwise, the pro-


gram would appear to be asking for minute pieces of data inappropriately.
c. Utilities: Self-referencing rules gave us a way to consider the risks
of failing to consider a hypothesis. Once there is evidence for Pseudomonas,
say, being a possible cause of an infection, then a self-referencing rule can
boost the importance of considering it in therapy, based on the high risk
of failing to treat for it.

6. Mapping rules: Wesoon recognized the need for rules that could
be applied iteratively to a set of contexts (e.g., a rule comparinga current
organism to each bacterium in the set of all previous organisms in the
context tree). Special predicate functions (e.g., THERE-IS, FOR-EACH,
ONE-OF)were therefore written so that a condition in a rule premise could
map iteratively over a set of contexts. This was a partial solution to the
general representation problem of expressing universal and existential
quantification. Only by considering all contexts of a type could we deter-
mine if all or some of them had specified properties. The context tree
allowed easy comparisons within any parent context (e.g., all the organisms
growing in CULTURE-2) but did not allow easy comparison across contexts
(e.g., all organisms growing in all cultures).

7. Tabular representation of knowledge: Whenlarge numbers of rules


had been written, each having essentially the same form, we recognized
the efficiency of collapsing them into a single rule that read the values for
its premise conditions and action from a specialized table. (A related con-
cept was implemented in changes that allowed physicians to enter infor-
mation in a more natural way. If they were looking at a patients record
for answers to questions, it was more convenient to enter many items at
once into a table of related parameters. There was, however, the attendant
risk of asking for information that would not actually be used in some
cases.) Chapter 5 describes the implementation of this feature.

8. Augmentation of rules: As multiple experts joined to collaborate on


development of the knowledge base, we recognized the need to keep track
of who wrote individual rules. Thus extra properties were added to rules
that allowed us to keep track of authorship, to record literature references
that defended the inference stored in the rule, and to allow recording of
free-form text justification of certain complicated rules for which the nor-
mal rule translation was somewhatcryptic. These extra slots associated with
MYCINas an Evolutionary System 63

rules gave the latter more the character of frames than of pure produc-
tions.

9. The therapy algorithm: As described in Chapter 5, the final step in


MYCINsdecision process was largely algorithmic and proved difficult to
encode in rules. Chapter 6 describes our eventual solution, in which we
integrated algorithmic and rule-based approaches in a novel manner.

10. Managementof uncertainty: Previous PSs had not encoded the un-
certainty in rules. Thus MYCINscertainty factor model (see Part Four)
was an augmentation mandated by the nature of decision making in this
complex medical domain.

11. Addition of meta-rules: As mentioned in Chapter 2 and described


in Chapter 28, we began to realize that strategies for optimal rule invo-
cation could themselves be encoded in rules. MYCINsPS approach was
modified to manage high-level meta-rules that could be invoked via the
usual rule monitor and that would assist in determining optimal problem-
solving strategies.

12. Addition of a preview mechanism:It becameclear that it was ineffi-


cient for the rule interpreter to assess the first few conditions in a rule
premise if it was already knownthat a subsequent condition was false. Thus
a preview mechanismwas added to the interpreter so that it first examined
the whole premise to see if there were parameters whose values had pre-
viously been determined. The addition of the preview mechanism made it
important to add antecedent rules, as mentioned above (paragraph 4).

13. The concept of a unit~ path: Because many MYCINrules reached


conclusions with less than certainty, it was generally necessary to invoke all
rules that could bear on the value of a parameter under consideration.
This is part of MYCINscautious evidence-gathering strategy in which all
relevant evidence available at the time of a consultation is used. However,
if a rule successfully reaches a conclusion with certainty (i.e., it has CF=1),
then it is not necessary to try alternate rules. Thus the rule monitor was
altered to try first those rules that could reach a conclusion with certainty,
either through a single rule with CF = 1 or through a chain of rules, each
with CF= 1 (a so-called unity path). Whencertain rules succeeded, the
alternate rules were ignored, and this prevented inefficiencies in the de-
velopment of the reasoning network and in the generation of questions to
the user.

14. Prevention of circular reasoning: The issue of circular reasoning


does not normally arise in pure production systems but was a serious po-
tential problem for MYCIN.(Self-referencing rules, discussed in para-
graph 5 above, are a special case of the general circular reasoning problem
64 The Evolution of MYCINs
Rule Form

involving any number of rules.) Special changes to the rule monitor were
required to prevent this undesirable occurrence (see Chapter 5).

15. The tracing mechanism: As is described in Chapter 5, we made the


decision to determine all possible values of a parameter instead of deter-
mining only the value specified in the premise condition of interest. This
potential inefficiency was tolerated for reasons of user acceptance. W,e
found that physicians preferred a focused and exhaustive consideration of
one topic at a time, rather than having the system return subsequently to
the subject when another possible value of the same parameter was under
consideration.

16. The ASKFIRST concept: Pure production systems have not gen-
erally distinguished between attributes that the user may already knowwith
certainty (such as values of laboratory tests) and those that inherently re-
quire inference. In MYCINthis became an important distinction, which
required that each parameter be labeled as an ASKFIRSTattribute (orig-
inally named LABDATA as discussed in Chapter 5) or as a parameter that
should first be determined by using rules rather than by asking the user.

17. Procedural conditions associated with parameters: Wealso discovered


unusual circumstances in which a special test was necessary before MYCIN
could decide whether it was appropriate to ask the user for the value of a
parameter. This was solved through a kind of procedural attachment, i.e.,
an executable piece of conditional code associated with a parameter, which
would allow the rule monitor to decide whether a question to the user was
appropriate. Each parameter thus began to be represented as a frame with
several slots, including some whose values were procedures.

18. Rephrasing prompts: As users became more familiar with MYCIN,


we found that they preferred short, less detailed prompts when the pro-
gram requested information. Thus a "terse" mode was implemented and
could be selected by an experienced user. Similarly, a reprompt mechanism
was developed so that a novice user, puzzled by a question, could be given
a more detailed explanation of what MYCINneeded to know. These fea-
tures were added to an already existing HELPfacility, which showed ex-
amples of acceptable answers to questions.

19. Multiple instances of contexts: Someof the questions asked by MY-


CIN are necessary for deciding whether or not to create contexts (rather
than for determining the value of a parameter). Furthermore, optimal
human engineering requires that this kind of question be phrased differ-
ently for the first instance of a context-type than for subsequent instances.
These alternate prompts are discussed in Chapter 5.
A Word About the Logic of MYCIN 65

20. HERSTORYList: Another addition to the rule monitor in MYCIN


was a mechanismfor keeping track of all rules invoked, failing, succeeding,
etc., and the reasons for these various outcomes. The so-called HERS-
TORYList, or history tree, then provided the basis for MYCINsexpla-
nations in response to users queries.

21. Creation of a Patient Data Table: Finally, we recognized the need to


develop mechanisms for (a) reevaluating cases when more information
became available and (b) assessing the impact of modifications to the knowl-
edge base on a library of cases previously handled well. These goals were
achieved by the development of a Patient Data Table, i.e., a mechanismfor
storing and accessing the initializing conditions necessary for full consid-
eration of cases. See Chapter 5 for further discussions of this feature.

3.3 A Word About the Logic of MYCIN

The logic of MYCINsreasoning is propositional logic, where the elemen-


tary propositions are fact triples and the primary rule of inference is modus
ponens (A and A D B implies B). It is extended (and somewhat complicated)
in the following respects:

Certainty factors (CFs) are attached (or propagated) to all propositions.


CFs are associated with all implications.
Predicates are associated with fact triples to change the way facts stated
in rules are matched against facts in the dynamically constructed case
record. A variety of predicates have been defined (see Section 5.1.5);
some refer to values of attributes (e.g., NOT-SAME, ONE-OF)and some
reference values of CFs (e.g., KNOWN, DEFINITE).
Limited quantification is allowed over conjunctions of propositions (e.g.,
THERE-IS, FOR-EACH).
Meta-level reasoning is allowed in order to increase efficiency (e.g., using
meta-rules or looking for a unity path).

MYCINslogic is incomplete in the sense that we know there are prop-


ositions that can be expressed in the language but are not provable as
theorems. MYCINs logic is not inconsistent in itself (we believe), but it
not immuneto inconsistencies introduced into its knowledge base.
66 The Evolutionof MYCINs
RuleForm

3,4 Overview of Part Two

The remainder of this part consists of three papers that summarize MY-
CIN and its use of production rules. In order to orient the reader to
MYCINsoverall motivation and design, we first include as Chapter 4 an
introductory paper that provides an overview of the system as of 1978
(approximately the time when development of the medical knowledge base
stopped). Chapter 5 is the original detailed description of MYCINfrom
1975. It provides technical information on the systems representation and
control mechanisms. Chapter 6 is a brief paper from 1977 that discusses
the way in which production rules were adapted to deal with the algo-
rithmic knowledge regarding therapy selection.
4
The Structure of the
MYCIN System

William van Melle

A number of constraints influenced the design of the MYCINsystem. In


order to be useful, the system had to be easy to use and had to provide
consistently reliable advice. It needed to be able to accommodatethe large
body of task-specific knowledge required for high performance, a knowl-
edge base that is subject to change over time. The system also had to be
able to use inexact or incomplete information. This applies not only to the
absence of definitive laboratory data, but also to the medical domainitself
(which is characterized by much judgmental knowledge). Finally, to be
useful interactive system, MYCIN needed to be capable of supplying ex-
planations for its decisions and responding to physicians questions, rather
than simply printing orders.
The MYCINsystem comprises three major subprograms, as depicted
in Figure 4-1. The Consultation Programis the core of the system; it interacts
with the physician to obtain information about the patient, generating
diagnoses and therapy recommendations. The Explanation Program pro-
vides explanations and justifications for the programs actions. The Knowl-
edge-Acquisition Programis used by experts to update the systems knowledge
base.

This chapter is basedon an article originally appearingunderthe title "MYCIN: A knowl-


edge-basedconsultationprogramfor infectious diseasediagnosis"in InternationalJournalof
Man-MachineStudies 10:313-322(1978). Copyright 1979by Academic Press Inc (London)
Limited.Usedwith permission.

67
68 The Structure of the MYCIN
System

PhysicianUser

Dynamic Explanation ]
1
Static Factual
Patient andJudgmental
Program
Data Knowledge
i

Knowledge

I Acquisition
Program

T
InfectiousDisease
Expert

FIGURE4-1 Organization of the MYCINsystem. Arrows de-


note information flow between modules, knowledge base, and
users.

4.1The Consultation Program

Excerpts from a sample consultation are shown in Figure 4-2. (A complete


annotated typescript is shown in the Appendix.) The doctors responses
are in upper-case letters following the double-asterisk prompt. Consider-
able effort has gone into the human engineering aspects of the system, to
make the interaction as smooth as possible. If the user finds a question
unclear, he or she can request a rephrasing with a list of sample responses.
A large dictionary of synonyms allows the user great latitude in choice of
responses, and simple typing or spelling errors are automatically corrected.

4.1.1 Knowledge Organization

Information about the patient and conclusions drawn during the consul-
tation are represented as associative (object-attribute-value) triples. The
The Consultation Program 69

1) Patientsname:(first-last)
** FREDBRAUN
2) Sex: (Questions1-3 request
**M background
patient data.)
3) Age:
** 55
4) Are there anyculturesfor FredBraunwhichmayberelated to
the presentillness, andfromwhichorganisms havebeengrown
successfullyin the microbiology
laboratory?
**y
........ CULTURE-1
........
5) Fromwhatsite wasthe specimenfor CULTURE-1 taken?
** BLOD
= BLOOD (System
correctstypingerror.)
6) Pleasegive the date and time whenCULTURE-1 wasobtained.
(mo/da/yrtime)
** JUN20, 1977
Thefirst organism isolated fromthe bloodculture of 20-JUN-77
(CULTURE-I)will bereferred to as:
........ ORGANISM-1........
7) Enter the laboratory-reported identity of ORGANISM-l:
**UNKNOWN (Typicallyidentity is notyet known.)
8) Thestain (gramor ZiehI-Neelsenacid-fast) of ORGANISM-l:
** NEG (Preliminarylab results give some clues.)
9) Is ORGANISM-1 rod or coccus(etc.):
** ROD
10) Whatis the formof the individualorganisms (e.g. lancet-
shaped
for cocci, fusiformfor rods,etc.)?
** FUSlFORM
(... morequestions
follow in orderto gathersufficient informationto infer the identity
andsignificanceof the infecting organisms...)
23) Whatis the suspected portal of entry of the gramneg rod fromthe
bloodculture of 20-JUN-77 (ORGANISM-I) into this sterile site?
** GI(6) (Userindicatesthe uncertaintyof his response.)
24) Is FredBrauna burnpatient?
**N
25) Is FredBrauna compromised host (e.g. alcoholic,
sickle-cell-disease, immunosuppressed)?
**N
26) Is FredBraunfebrile?
**y

(... after about50-50questions, MYClN


prints the diagnostichypotheses
on
whichtherapy will be based... )
Mytherapyrecommendations will bedesigned to treat for organisms
that are either verylikely or, althoughless
likely, wouldhavea significant effect ontherapyselectionif theywerepresent.It is importantto coverfor the
followingprobableinfection(s) andassociated organism(s):
INFECTION-1is BACTEREMIA
<ITEM-l> E.COLI [ORGANISM-I]
<ITEM-2> KLEBSIELLA[ORGANISM-I]
<ITEM-3> ENTEROBACTER [ORGANISM-I]
<ITEM-4> KLEBSIELLA-PNEUMONIAE
[ORGANISM-I]

FIGURE 4-2 Excerpts from a MYCIN consultation. (Com-


ments in italics are not part of the actual interaction.)
70 The Structure of the MYCINSystem

(... questionsfollow to evaluatepossibletherapychoices,


andfinally MYCIN prints its therapyrecommendations...
)
[REC-t]Mypreferredtherapyrecommendation is as follows:
In order to coverfor items<12 3 4>:
Give: GENTAMICIN (Onedrugcovers4 possibleidentities.)
Dose:119rng (6.0 ml) q8hIV for 10days[calculatedonbasis
1.7 mg/kg]
Comments: Modifydosein renal failure.

FIGURE4-2 continued

objects, knownas contexts in MYCIN,are such things as individual cultures


taken from the patient, organisms that grew out of them, and drugs the
patient is currently receiving. Various attributes, termed clinical parameters,
characterize these objects. Questions asked during the consultation attempt
to fill in the values for relevant attributes of these objects. To represent the
uncertainty of data or competing hypotheses, attached to each triple is a
certainty factor (CF), a number between - 1 and 1 indicating the strength
of the belief in (or a measure of the importance of) that fact. A CF of
represents total certainty of the truth of the fact, while a CF of - 1 rep-
resents certainty regarding the negation of the fact. While certainty factors
are not conditional probabilities, they are informally based on probability
theory (see Part Four). Sometriples (with CFs) from a typical consultation
might be as follows:
(IDENTITY ORGANISM-1PSEUDOMONAS0.8)
(IDENTITYORGANISM-1
E. COLI0.15)
(SITE CULTURE-2
THROAT 1.0)
(BURNEDPATIENT-298
YES-1.0)

Here ORGANISM-I is probably Pseudomonas, but there is some evidence


to believe it is E. coli; the site of CULTURE-2
is (without doubt) the throat;
and PATIENT-298is known not to be a burn patient.

4.1.2 Production Rules

MYCINreasons about its domain using judgmental knowledge encoded


as production rules. Each rule has a premise, which is a conjunction of
predicates regarding triples in the knowledgebase. If the premise is true,
the conclusion in the action part of the rule is drawn. If the premise is
knownwith less than certainty, the strength of the conclusion is modified
accordingly.
A typical rule is shown in Figure 4-3. The predicates (such as SAME)
are simple LISP functions operating on associative triples, which match
the declared facts in the premise clause of the rule against the dynamic
data known so far about the patient. SAND,the multi-valued analogue of
The Consultation Program 71

RULE036
PREMISE: (SAND (SAME CNTXTGRAMGRAMNEG)
(SAME CNTXTMORPHROD)
(SAMECNTXTAIR ANAEROBIC))
ACTION: (CONCLUDECNTXTIDENTITY BACTEROIDES
TALLY.6)
IF: 1) .Thegramstain of the organism is gramneg,and
2) Themorphologyof the organism is rod, and
3) Theaerobicityof theorganism is anaerobic
THEN:Thereis suggestiveevidence(.6) that the identity
of theorganism
is bacteroides

FIGURE 4-3 A MYCIN rule, in both its internal (LISP) form


and English translation. The term CNTXT appearing in every
clause is a variable in MYCINthat is boundto the current con-
text, in this case a specific organism(ORGANISM-2), to which
the rule maybe applied.

the Boolean ANDfunction, performs a minimization operation on CFs.


The body of the rule is actually an executable piece of LISP code, and
"evaluating" a rule entails little more than the LISP function EVAL.How-
ever, the highly stylized nature of the rules permits the system to examine
and manipulate them, enabling many of the systems capabilities discussed
below. One of these is the ability to produce an English translation of the
LISP rule, as shown in the example. This is possible because each of the
predicate functions has associated with it a translation pattern indicating
the logical roles of the functions arguments.
It is intended that each rule be a single, modular chunk of medical
knowledge. The number of rules in the MYCINsystem grew to about 500.

4.1.3 Application of Rules---The Rule Interpreter

The control structure is a goal-directed backward chaining of rules. At any


given time, MYCIN is working to establish the value of some clinical pa-
rameter. To this end, the system retrieves the (precomputed) list of rules
whose conclusions bear on this goal. The rule in Figure 4-3, for example,
would be retrieved in the attempt to establish the identity of an organism.
If, in the course of evaluating the premise of one of these rules, some
other piece of information that is not yet known is needed, MYCINsets
up a subgoal to find out that information; this in turn causes other rules
to be tried. Questions are asked during the consultation when rules fail to
deduce the necessary information. If the user cannot supply the requested
information, the rule is simply ignored. This control structure results in a
highly focused search through the rule base.
72 The Structureof the MYCIN
System

4.1.4 Advantages of the Rule Methodology

The modularity of rules simplifies the task of updating the knowledge base.
Individual rules can be added, deleted, or modified without drastically
affecting the overall performance of the system. And because each rule is
a coherent chunk of knowledge, it is a convenient unit for explanation
purposes. For example, to explain why the system is asking a question
during the consultation, a first approximation is simply to display the rule
currently under consideration.
The stylized nature of the rules is useful for many operations. While
the syntax of the rules permits the use of any LISP function, there is a
small set of standard predicates that make up the vast majority of the rules.
The system contains information about the use of these predicates in the
form of function templates. For example, the predicate SAMEis described
as follows:

function template: (SAME CNTXTPARMVALUE)

samplefunction call: (SAMECNTXTSITE BLOOD)

The system can use these templates to "read" its own rules. For example,
the template shown here contains the standard tokens CNTXT,PARM,
and VALUE (for context, parameter, and corresponding value), indicating
the components of the associative triple that SAMEtests. If the clause
above appears in the premise of a given rule, the system can determine
that the rule needs to know the site of the culture, and that the rule can
only succeed if that site is, in fact, blood. Whenasked to display rules that
are relevant to blood cultures, MYCIN will be able to choose that rule.
An important function of the templates is to permit MYCINto pre-
compute automatically (at system generation time) the set of rules that
conclude about a particular parameter; it is this set that the rule monitor
retrieves when the system needs to deduce the value of that parameter.
The system can also read rules to eliminate obviously inappropriate
ones. It is often the case that, of a large set of rules under consideration,
several are provably false by information already known. That is, the in-
formation needed to evaluate one of the clauses in the premise has already
been determined, and that clause is false, thereby making the entire prem-
ise false. By reading the rules before actually invoking them, many can be
immediately discarded, thereby avoiding the deductive work necessary in
evaluating the premise clauses that precede the false one (this is called the
preview mechanism). In some cases this means the system avoids the useless
search of one or more subgoal trees, when the information thereby de-
duced would simply be overridden by the demonstrably false premise.
Another more dramatic case occurs when it is possible, on the basis of
information currently available, to deduce with certainty the value of some
parameter that is needed by a rule. This is the case when there exists a
Explanation Capability 73

chain of one or more rules whose premises are known (or provable, as
above) with certainty and that ultimately conclude the desired value with
certainty. Since each rule in this chain must have a certainty factor of 1.0,
we term such a chain a unity path; and since a value known with certainty
excludes all other potential values, no other rules need be tried. MYCIN
always seeks a unity path before trying a set of rules or asking a question;
typically, this means "commonsense"deductions are made directly, without
asking the user "silly" questions or blindly invoking all the rules pertaining
to the goal. Since there are usually few rules on any potential unity path,
the search tends to be small.
The ability to read rules opens the way to the writing of rules that
manipulate other rules. Weterm such rules meta-rules (see Part Nine); they
are used to make deductions not about the medical entities of the domain
but about strategies to be used by the system. Whenever the rule inter-
preter is about to invoke a list of rules to establish somegoal, it first applies
any meta-rules associated with that goal. These meta-rules can reorder or
prune the rule list to make the search more suitable for the given case.

4.2Explanation Capability

A major subprogram of MYCINis a general Question-Answering (QA) Mod-


ule, which answers simple English-language questions concerning the sys-
tems decisions in a particular consultation or about the systems general
knowledge. A limited set of commonlydesired explanations is also pro-
vided in a commandstyle during the consultation by the Reasoning Status
Checker. This special module explains the systems current line of reasoning
and motivation for the questions asked. (A more detailed study of expla-
nation in MYCIN is included in Part Six of this book.) The QAModule is
entered automatically at the end of each consultation and is also available
to the user during the consultation if the need arises. The user is encour-
aged to ask questions about any aspect of MYCINsperformance.
Because of the relatively unambiguous vocabulary used in the medical
domain, and because of our desire to minimize the delay in answering
questions, we have forgone any complicated parsing techniques in the QA
Module. Rather, we use a simple pattern-matching approach, relying on
the strong implications of certain keywords in the users questions. The
words in the translation of a clinical parameter, and values of that param-
eter, are tagged in the system dictionary as implying that parameter. By
suitably weighing the implications of each word in the sentence, the system
determines one or more parameter/value pairs being asked about. Pattern
matching is used to determine the general type of the question, which
indicates what is being asked about those parameter/value pairs. Typical
74 The Structure of the MYCINSystem

** IS ORGANISM-1 A STREPTOCOCCUS?
No.Theidentity of ORGANISM-1is e.coli (.74) bacteroides(.7)
klebsiella-pneumoniae
(.4) pseudornonas-aeruginosa (.22) enterobacter
(.21) protaus-non-mirabilis
(.19) serratia (.03) salmonella
**WHYDID YOURULEOUTSTREPTOCOCCUS AS A POSSIBILITY FORORGANISM-l?
RULE033could havebeenusedto determinethat the identity of
ORGANISM-1is streptococcus.However,clause 2 of RULE033 ["the
morphologyof the organismis coccus"]wasalreadyknownto be false
for ORGANISM-I,so the rule wasnevertried.
** HOWDO YOUKNOWTHATCULTURE-1WASFROMA STERILE SOURCE?
I usedRULE189 to concludethat this bloodculture wastakenfroma
sterile source.Thisgavea cumulativeCFof (1.0). Thelast
questionaskedbefore the conclusionwasmade was7.
** WHATDOESGU MANIPULATION TELL YOUABOUTTHE IDENTITY OF AN ORGANISM?
Therules listed belowuse:
whetherthe patient hashada genito-urinarymanipulativeprocedure
to concludeabout:
the identity of the organism
156, 163, 190
Whichdo youwish to see?
** 156
RULE156
IF: 1) Thesite of the cultureis blood,and
2) Thegramstain of the organismis gramneg, and
3) Themorphology of the organismis rod, and
4) Theportal of entryof the organism is udne,and
5) Thepatient hasnot hada genito-urinarymanipulativeprocedure,
and
6) Cystitis is not a problemfor whichthe patient hasbeentreated
THEN: Thereis suggestive evidence(.6) that the identity of the
organismis e.coli
** IS BLOOD A STERILESITE?
Yes.Bloodis oneof thosesites that is normally
sterile.
** WHATARETHENONSTERILE SITES?
Thosesites that are normally nonsterile are: CERVIX,
CUTANEOUS-ULCER,
LOCHIA,
NOSE,SKIN,STOOL,
THROAT,URETHRA, VAGINA.
** IS BACTEROIDES AEROBIC?
No,the aerobicityof bacteroides
is anaerobic.

FIGURE 4-4 Interaction with the QA Module. The users


questions follow the double-asterisk prompt.

questions (see Figure 4-4) seek the value of a certain parameter, how this
value was concluded, and how this parameter was used. Questions about
the value of a given parameter of an object are answered by simply dis-
playing the current hypothesis regarding it. To explain how the value of a
parameter was concluded, MYCIN retrieves the list of rules that were suc-
cessfully applied and prints them, along with the conclusions drawn. Al-
ternatively, if the user supplied the value by answering a previous question,
this is noted. More general questions about how information is used or
concluded are answered by retrieving the relevant rules from the rule base.
Explanation Capability 75

34) Waspenicillinase addedto this bloodculture (CULTURE-I)?


** WHY
[i.e. WHY
is it importantto determine
whetherpenicillinase wasaddedto
CULTURE-1 ?]
[3.0] This will aid in determiningwhetherORGANISM-t is a contaminant.
It hasalreadybeenestablishedthat
[3.1] the site of CULTURE-1is blood,and
[3.2] the gramstain of ORGANISM-1 is grarnpos
Therefore,if
[3.3] penicillinasewasadded to this bloodculture
then
there is weaklysuggestiveevidence(.4) that ORGANISM-1 is
contaminant[RULE039]
** WHY
[i.e. WHY
is it importantto determinewhetherORGANISM-1
is a
contaminant?]
[4.0]... In orderto determinewhetherthereis significant disease
associatedwith this occurrenceof ORGANISM-1.
It hasalreadybeenestablishedthat
[4.1] this bloodculturewastakenfroma sterile source
Therefore, if
[4.2] ORGANISM-1 is not a contaminant
then
thereis stronglysuggestiveevidence (.9) that there
significant diseaseassociatedwith this occurrenceof ORGANISM-1
[RULE032]
** WHY
[i,e. WHYis it importantto determine
whether
thereis significant
diseaseassociatedwith this occurrenceof ORGANISM-I]
[5.0]... In order to determinethe organisms whichrequire therapy.
It hasalreadybeenestablishedthat
[5.1] the infectionis known, and
[5.2] the identity of ORGANISM-1 is known
Therefore, if
[5.3] thereis significant diseaseassociatedwith this occurrence
of ORGANISM-1
then
it is definite (1.0) that ORGANISM-1 is oneof the organismswhich
requiretherapy
[RULE090]
** HOW
5.2
[i.e. HOW
wasit establishedthat the identity of ORGANISM-1
is known?]
Question9 wasasked,but youwerentsure; so the following wasused:
[5,5] RULE084
wasusedto concludethat the identity of ORGANISM-1
is
streptococcus-group-a
(.29).

FIGURE 4-5 Use of the Reasoning Status Checker during the


consultation to explain MYCINs line of reasoning.

As shown in Figure 4-5, the Reasoning Status Checker is invoked by


the HOWand WHYcommands. At any time during the consultation, when
76 TheStructureof the MYCIN
System

the user is asked a question, he or she can delay answering it and instead
ask why the question was asked. Since questions are asked in order to
establish the truth of the premise of some rule, a simple answer to WHY
is "because Im trying to apply the following rule." Successive WHY ques-
tions unwind the chain of subgoals, citing the rules that led to the current
rule being tried.
Besides examining the current line of reasoning, the user can also ask
about previous decisions, or about how future decisions might be made,
by giving the HOWcommand.Explaining how the truth of a certain clause
was established is accomplished as described above for the general QA
Module. To explain how a presently unknownclause might be established,
MYCIN retrieves the set of rules that the rule interpreter would select to
establish that clause and selects the relevant rules from among them by
"reading" the premises for applicability and the conclusions for relevance
to the goal.

4.3Knowledge Acquisition

The knowledge base is expanded and improved by acquiring new rules,


or modifications to old rules, from experts. Ordinarily, this process involves
having the medical expert supply a piece of medical knowledge in English,
which a system programmer converts into the intended LISP rule. This
mode of operation is suitable when the expert and the skilled programmer
can work together. Ideally, however, the expert should be able to convey
his or her knowledge directly to the system.
Workhas been undertaken (see Part Three) to allow experts to update
the rule base directly. A rule-acquisition routine parses an English-lan-
guage rule by methods similar to those used in parsing questions in the
QAModule. Each clause is broken down into one or more object-attribute-
value triples, which are fitted into the slots of the appropriate predicate
function template. This process is filrther guided by rule models (see Chap-
ter 28), which supply expectations about the structure of rules and the
interrelationships of the clinical parameters.
One modeof acquisition that has received special attention is acquiring
new rules in the context of an error. In this case, the user is trying to
correct a localized deficiency in the rule base; if a new rule is to correct
the programs faulty behavior, it must at the very least apply to the con-
sultation at hand. In particular, each of the premises must evaluate to
TRUEfor the given case. These expectations greatly simplify the task of
the acquisition program, and also aid the expert in formulating new rules.
One difficult aspect of rule acquisition is the actual formulation of
medical knowledge into decision rules. Our desire to keep the rule format
Knowledge Acquisition 77

simple is occasionally at odds with the need to encode the many aspects of
medical decision making. The backward chaining of rules by the deductive
system is also often a stumbling block for experts who are new to the
system. However, they soon learn to structure their knowledge appropri-
ately. In fact, some experts have felt that encoding their knowledge into
rules has helped them formalize their own view of the domain, leading to
greater consistency in their decisions.
5
Details of the Consultation
System

Edward H. Shortliffe

In this chapter MYCINsimplementation is presented in considerable de-


tail. Our goals are to explain the data and control structures used by the
program and to describe some of the complex and often unexpected prob-
lems that arose during system implementation. In Chapter 1 the motiva-
tions behind many of MYCINscapabilities were mentioned. The reader
is encouraged to bear those design criteria in mind throughout this chap-
ter.
This chapter specifically describes the Consultation System. This sub-
program uses both system knowledge from the corpus of rules and patient
data entered by the physician to generate advice for the user. Furthermore,
the program maintains a dynamic data base, which provides an ongoing
record of the current consultation. As a result, this chapter must discuss
both the nature of the various data structures and how they are used or
maintained by the Consultation System.
Section 5.1 describes the corpus of rules and the associated data struc-
tures. It provides a formal description of the rules used by MYCIN.Our
quantitative truth model is briefly introduced, and the mechanismfor rule
evaluation is explained. This section also describes the clinical parameters
with which MYCIN is familiar and which fbrm the basis for the conditional
expressions in the premise of a rule.
In Section 5.2 MYCINsgoal-oriented control structure is described.
Mechanisms for rule invocation and question selection are explained at
that time. The section also discusses the creation of the dynamic data base,

This chapter is condensed from Chapter 3 of Computer-Based Medical Consultation.~: MYCIN,


New York: Elsevier/North-Holland, 1976. Copyright 1976 by Elsevier/North-Holland. All
rights reserved. Used with permission.

78
System Knowledge 79

which is the foundation for both the systems advice and its explanation
capabilities (to be described in Part Six).
Section 5.3 is devoted to an explanation of the programs context tree,
i.e., the network of interrelated organisms, drugs, and cultures that char-
acterize the patient and his or her current clinical condition. The need for
such a data structure is clarified, and the method for propagation (growth)
of the tree is described.
The final tasks in MYCINs clinical problem area are the identification
of" potentially useful drugs and the selection of the best drug or drugs
from that list. MYCINsearly mechanism for making these decisions is
discussed in Section 5.4 of this chapter. Later refinements are the subject
of Chapter 6.
Section 5.5 discusses MYCINsmechanisms for storing patient data
and for permitting a user to change the answer to a question. As will be
described, these two capabilities are closely interrelated.
In Section 5.6 we briefly mention extensions to the system that were
contemplated when this material was written in 1975. Several of these
capabilities were eventually implemented.

5.1 System Knowledge

5.1.1 Decision Rules

Automated problem-solving systems use criteria for drawing conclusions


that often support a direct analogy to the rule-based knowledge represen-
tation used by MYCIN.Consider, for example, the conditional probabilities
that underlie Bayesian diagnosis programs. Each probability provides in-
tormation that may be stated in an explicit rule format:

P(hle) = means
IF: e is known to be true
THEN:conclude that h is true with probability X

It is important to note, therefore, that the concept of rule-based knowledge


is not unique, even for medical decision-making programs.

Representation of the Rules

The 200 rules in the original MYCINsystem consisted of a premise, an


action, and sometimes an e/se clause. Else clauses were later deleted from
the system because they were seldom used, and a general representation
80 Details of the Consultation System

of inference statements could be achieved without them. Every rule has a


name of" the form RULE###where ### represents a three-digit number.
The details of rules and how they are used are discussed throughout
the remainder of" this chapter. Wetherefore offer a formal definition of
rules, which will serve in part as a guide for what is to follow. The rules
are stored as LISP data structures in accordance with the following Backus-
Nauer Form (BNF) description:

<rule> ::= <premise> <action> <premise> <action>


<else>
<premise> ::= (SAND <condition> .. <condition>)
<condition> :: = (<funcl > <context> <parameter>)
(<func2> <context> <parameter> <value>)
(<special-func> <arguments>)
($OR <condition> ... <condition>)
<action> ::= <concpart>
<else> :: = <concpart>
<concpart> ::= <conclusion> I <actfunc> I
(DO-ALL<conclusion> ... <conclusion>)
(DO-ALL <actfunc> ... <actfunc>)
<context> ::= see Section 5.1.2
<parameter> :: = see Section 5.1.3
<value> ::= see Section 5.1.4
<funcl> ::= see Section 5.1.5
<func2> ::= see Section 5.1.5
<special-func> ::-- see Section 5.1.6
<arguments> ::= see Section 5.1.6
<conclusion> :: = see Section 5.2.3
<actfunc> ::= see Section 5.4

Thus the premise of a rule consists of a conjunction of conditions, each of


which must hold for the indicated action to be taken. Negations of con-
ditions are handled by individual predicates (< func 1 > and < func2 >) and
therefore do not require a $NOTfunction to complement the Boolean
functions SANDand $OR. If the premise of a rule is known to be false,
the conclusion or action indicated by the else clause is taken. If the truth
System Knowledge 81

of the premise cannot be ascertained or the premise is false but no else


condition exists, the rule is simply ignored.
The premise of a rule is always a conjunction of one or more condi-
tions. Disjunctions of conditions may be represented as multiple rules with
identical action clauses. A condition, however, may itself be a disjunction
of conditions. These conventions are somewhat arbitrary but do provide
sufficient flexibility so that any Boolean expression maybe represented by
one or more rules. As is discussed in Section 5.2, multiple rules are effec-
tively ORedtogether by MYCINscontrol structure.
For example, two-leveled Boolean nestings of conditions are acceptable
as follows:

Legal:
[1] A&B&C~D
[2] A & (B or C) -
[3] (A or B or C) & (D or E)

Illegal:
[4] AorBorC~D
[5] A&(Bor(C&D))-E

Rule [4] is correctly represented by the following three rules:

[6] A -o D
[7] B --, D
[8] C ~ D

whereas [5] must be written as:

[9] A&C&D--.E
[10] A&B~E

Unlike rules that involve strict implication, MYCINsrules allow the


strength of an inference to be modified by a certainty factor (CF). A CF
a number from -1 to + 1, the nature of which is described in Section
5.1.4 and in Chapter 11.
The following three examples are rules from MYCINthat have been
translated into English from their internal LISP representation (Section
5.1.7). They represent the range of rule types available to the system. The
details of their internal representation will be explained as we proceed.
82 Details of the ConsultationSystem

RULE037
IF: 1) Theidentity of the organismis notknown with
certainty,
and
2) Thestain of the organism is gramneg, and
3) Themorphology of the organism is rod, and
4) Theaerobicity of theorganism is aerobic
THEN: There is stronglysuggestive evidence(.8) thatthe
classof theorganism is enterobacteriaceae

RULE145
IF: 1) Thetherapyunderconsideration is one
of: cephalothin clindamycin
erythromycin
lincomycin vancomycin,and
2) Meningitis is aninfectiousdiseasediagnosis
forthepatient
THEN: It is definite(1) thetherapy underconsideration
is nota potentialtherapy
for useagainst
the
organism

RULE060
iF: Theidentity of theorganism
is bacteroides
THEN:I recommend therapychosenfromamong the followingdrugs:
1 - clindamycin (.99)
2 - chloramphenicol(.99)
3 - erythromycin (.57)
4- tetracycline (.28)
5- carbenicillin (.27)

Before we can explain how rules such as these are invoked and eval-
uated, it is necessary to describe further MYCINsinternal organization.
We shall therefore temporarily digress in order to lay some groundwork
for the description of the evaluation [unctions in Section 5.1.5.

5.1.2 Categorization of Rules by Context

The Context Tree

Although it is common to describe diagnosis as inference based on attri-


butes of the patient, MYCINsdecisions must necessarily involve not only
the patient but also the cultures that have been grown, organisms that have
been isolated, and drugs that have been administered. Each of these is
termed a context of the programs reasoning (see <:context> in the BNF
description of rules).
MYCINcurrently (1975) knows about ten different context-types:

JThe use of the wordcontext should not be confusedwith its meaningin high-level languages
that permit temporary saving of all information regarding a programs current status--a
commonmechanismfor backtracking and parallel-processing implementations.
System Knowledge 83

CURCULS A current culture from which organisms were isolated


CURDRUGS An antimicrobial agent currently being administered to
a patient
CURORGS An organism isolated from a current culture
OPDRGS An antimicrobial agent administered to the patient
during a recent operative procedure
OPERS An operative procedure the patient has undergone
PERSON The patient
POSSTHER A therapy being considered for recommendation
PRIORCULS A culture obtained in the past
PRIORDRGS An antimicrobial agent administered to the patient in
the past
PRIORORGS An organism isolated from a prior culture

Except for PERSON,each of these context-types may be instantiated more


than once during any given run of the consultation program. Some may
not be created at all if they do not apply to the given patient. However,
each time a context-type is instantiated, it is given a unique name. For
example, CULTURE-1is the first CURCUL and ORGANISM-1 is the first
CURORG. Subsequent CURCULSor PRIORCULS are called CULTURE-
2, CULTURE-3,etc.
The context-types instantiated during a run of the consultation pro-
gram are arranged hierarchically in a data structure termed the context tree.
One such tree is shown in Figure 5-1. The context-type for each instan-
tiated context is shown in parentheses near its name. Thus, to clarify ter-
minology, we note that a node in the context tree is called a context and is
created as an instantiation of a context-type. This sample context tree cor-
responds to a patient from whomtwo current cultures and one prior cul-
ture were obtained. One organism was isolated from each of the current
cultures, but the patient is being treated (with two drugs) for only one
the current organisms. Furthermore, two organisms were grown from the
prior culture, but therapy was instituted to combat only one of these. Fi-
nally, the patient has had a recent operative procedure during which he
or she was treated with an antimicrobial agent.
The context tree is useful not only because it gives structure to the
clinical problem (Figure 5-1 already tells us a good deal about PATIENT-
1), but also because we often need to be able to relate one context to
another. For example, in considering the significance of ORGANISM-2,
MYCIN may well want to be able to reference the site of the culture from
which ORGANISM-2 was obtained. Since the patient has had three dif-
ferent cultures, we need an explicit mechanism for recognizing that OR-
GANISM-2 came from CULTURE-2, not from CULTURE-1 or CUL-
TURE-3. The technique for dynamic propagation (i.e., growth) of the
context tree during a consultation is described in Section 5.3.
84

--0
System Knowledge 85

Interrelationship of Rules and the Tree

The 200 rules currently used by MYCIN 2 are not explicitly linked in a
decision tree or reasoning network. This feature is in keeping with our
desire to keep system knowledge modular and manipulable. However, rules
are subject to categorization in accordance with the context-types for which
they are most appropriately invoked. For example, some rules deal with
organisms, somewith cultures, and still others deal solely with the patient.
MYCINscurrent rule categories are as follows (context-types to which they
may be applied are enclosed in parentheses):

CULRULES Rules that may be applied to any culture


(CURCULS or PRIORCULS)
CURCULRULES Rules that may be applied only to current cultures
(CURCULS)
CURORGRULES Rules that may be applied only to current
organisms (CURORGS)
DRGRULES Rules that may be applied to any antimicrobial
agent that has been administered to combat a
specific organism (CURDRUGSor PRIORDRGS)
OPRULES Rules that may be applied to operative procedures
(OPERS)
ORDERRULES Rules that are used to order the list of possible
therapeutic recommendations (POSSTHER)
ORGRULES Rules that may be applied to any organism
(CURORGS or PRIORORGS)
PATRULES Rules that may be applied to the patient (PERSON)
PDRGRULES Rules that may be applied only to drugs given to
combat prior organisms (PRIORDRGS)
PRCULRULES Rules that may be applied only to prior cultures
(PRIORCULS)
PRORGRULES Rules that may be applied only to organism
isolated from prior cultures (PRIORORGS)
THERULES Rules that store information regarding drugs of
choice (Section 5.4.1)

Every rule in the MYCINsystem belongs to one, and only one, of these
categories. Furthermore, selecting the proper category for a newly ac-
quired rule does not present a problem. In fact, category selection can be
automated to a large extent.
Consider a rule such as this:

2Ed, note: This number increased to almost 500 by 1978.


86 Details of the Consultation
System

RULE124
IF:1)The
siteofthecultureis throat,and
2)Theidentity
oftheorganism is streptococcus
THEN:Thereis stronglysuggestive
evidence
(.8)that
thesubtypeof theorganismJsnotgroup-D
This is one of MYCINsORGRULES and may thus be applied to either a
CURORGS context or a PRIORORGScontext. Referring back to Figure
5-1, suppose RULE124 were applied to ORGANISM-2. The first condition
in the premise refers to the site of the culture from which ORGANISM-2
was isolated (i.e., CULTURE-2) and not to the organism itself (i.e., orga-
nisms do not have sites, but cultures do). The context tree is therefore
important for determining the proper context when a rule refers to an
attribute of a node in the tree other than the context to which the rule is
being explicitly applied. Note that this means that a single rule may refer
to nodes at several levels in the context tree. The rule is categorized simply
on the basis of the lowest context-type (in the tree) that it mayreference.
Thus RULE124 is an ORGRULErather than a CULRULE.

5.1.3 Clinical Parameters

This subsection describes the data types indicated by <parameter> and


<value> in the BNF description of rules. Although we have previously
asserted that all MYCINsknowledge is stored in its corpus of rules, the
clinical parameters and their associated properties comprise an important
class of second-level knowledge. Weshall first explain the kind of" param-
eters used by the system and then describe their representation.
A clinical parameter is a characteristic of one of the contexts in the
context tree, i.e., the nameof the patient, the site of a culture, the mor-
phology of an organism, the dose of a drug, etc. A patients status would
be completely specified by a context tree in which values were known for
all the clinical parameters characterizing each node in the tree (assuming
the parameters known to MYCINencompass all those that are clinically
relevant--a dubious assumption at present). In general, this is more in-
formation than is needed, however, ~ one of MYCINstasks is to identify
those clinical parameters that need to be considered for the patient about
whomadvice is being sought.
The concept of an attribute-object-value triple is commonwithin the
AI field. This associative relationship is a basic data type for the SAIL
language (Feldman et al., 1972) and is the foundation for the property-list
formalism in LISP (McCarthy et al., 1962). Relational predicates in pred-
icate calculus also represent associative triples. The point is that manyfacts
maybe expressed as triples that state that someobject has an attribute with
some specified value. Stated in the order <attribute object value>, ex-
amples include:
(COLOR
BALLRED)
(OWNS
FIREMAN
RED-SUSPENDERS)
System Knowledge 87

(AGE BOB 22)


(FATHERCHILD DADDY)
(GRAMSTAIN ORGANISMGRAM-POSITIVE)
(DOSE DRUG 1.5-GRAMS)
(MAN BOB TRUE)
(WOMANBOB FALSE)

Note that the last two examples are different from the others ~n that they
represent a rather different kind of relationship. In fact, several authors
would classify the first six as "relations" and the last two as "predicates,"
using the simpler notation:
MAN (BOB)
-WOMAN(BOB)

Regardless of" whether it is written as MAN(BOB) or (MANBOBTRUE),


this binary predicate statement has rather different characteristics from
the relations that form natural triples. This distinction will becomeclearer
later (see yes-no parameters below).
MYCINstores inferences and data using the attribute-object-value
concept. The object is always some context in the context tree, and the
attribute is a clinical parameter appropriate for that context. Information
stored using this mechanism may be retrieved and updated in accordance
with a variety of conventions described throughout this chapter.

The Three Kinds of Clinical Parameters

There are three fundamentally different kinds of clinical parameters. The


simplest variety is single-valued parameters. These are attributes such as the
name of the patient and the identity of the organism. In general, they have
a large number of possible values that are mutually exclusive. As a result,
only one can be the true value, although several may seem likely at any
point during the consultation.
Multi-valued parameters also generally have a large number of possible
values. The difference is that the possible values need not be mutually
exclusive. Thus such attributes as a patients drug allergies and a locus of
an infection may have multiple values, each of which is knownto be correct.
The third kind of clinical parameter corresponds to the binary pred-
icate discussed above. These are attributes that are either true or false for
the given context. For example, the significance of an organism is either
true or false (yes or no), as is the parameter indicating whether the dose
of a drug is adequate. Attributes of this variety are called yes-no parameters.
They are, in effect, a special kind of single-valued parameter for which
there are only two possible values.

Classification and Representation of the Parameters

The clinical parameters known to MYCINare categorized in accordance


with the context to which they apply. These categories include:
88 Details of the Consultation
System

PROP-CUL Those clinical parameters which are attributes of


cultures (e.g., site of the culture, methodof collection)
PROP-DRG Those clinical parameters which are attributes of
administered drugs (e.g., name of the drug, duration
of administration)
PROP-OP Those clinical parameters which are attributes of
operative procedures (e.g., the cavity, if any, opened
during the procedure)
PROP-ORG Those clinical parameters which are attributes of
organisms (e.g., identity, gram stain, morphology)
PROP-PT Those clinical parameters which are attributes of the
patient (e.g., name, sex, age, allergies, diagnoses)
PROP-THER Those clinical parameters which are attributes of
therapies being considered for recommendation (e.g.,
recommended dosage, prescribing name)

These categories encompass all clinical parameters used by the system.


Note that any of the nodes (contexts) in the context tree for the patient
may be fully characterized by the values of the set of clinical parameters
in one of these categories.
Each of the 65 clinical parameters currently (1975) known to MYCIN
has an associated set of properties that is used during consideration of the
parameter for a given context. Figure 5-2 presents examples of the three
types of clinical parameters, which together demonstrate several of these
properties:

EXPECT This property indicates the range of expected


values that the parameter may have.
IF equal to (YN), then the parameter is a yes-no
parameter.
IF equal to (NUMB),then the expected value
the parameter is a number.
IF equal to (ONE-OF<list>), then the value
the parameter must be a member of <list>.
IF equal to (ANY), then there is no restriction
the range of values that the parameter may have.
PROMPT This property is a sentence used by MYCINwhen
it requests the value of the clinical parameter from
the user; if there is an asterisk in the phrase (see
Figure 5-2), it is replaced by the name of the
context about which the question is being asked;
this property is used only for yes-no or single-
valued parameters.
PROMPT 1 This property is similar to PROMPT but is used if
the clinical parameter is a multi-valued parameter;
in these cases MYCINonly asks the question about
System Knowledge 89

Yes-No Parameter
FEBRILE: <FEBRILEis an attribute of a patient and is therefore a member of
the list PROP-PT>
EXPECT: (YN)
LOOKAHEAD: (RULE149 RULE109 RULE045)
PROMPT:(Is * febrile?)
TRANS: (* IS FEBRILE)

Single-Valued Parameter
IDENT: <IDENTis an attribute of an organism and is therefore a member of
the list PROP-ORG>
CONTAINED-IN: (RULE030)
EXPECT: (ONEOF (ORGANISMS))
LABDATA: T
LOOKAHEAD: (RULE004 RULE054 ... RULE168)
PROMPT:(Enter the identity (genus) of*:)
TRANS: (THE IDENTITY OF *)
UPDATED-BY: (RULE021 RULE003 ... RULE166)

Multi-Valued Parameter
INFECT: <INFECTis an attribute of a patient and is therefore a member of
the list PROP-PT>
EXPECT: (ONEOF (PERITONITIS BRAIN-ABCESS MENINGITIS
BACTEREMIA UPPER-URINARY-TRACT-INFECTION ...
ENDOCARDITIS))
LOOKAHEAD: (RULE115 RULE149 ... RULE045)
PROMPTl: (Is there evidence that the patient has a (VALU)?)
TRANS: (AN INFECTIOUS DISEASE DIAGNOSIS FOR *)
UPDATED-BY: (RULE157 RULE022 ... RULEI05)

FIGURE5-2 Examples of the three types of clinical parame-


ters. As shown, each clinical parameteris characterized by a set
of properties described in the text.

a single one of the possible parameter values; the


value of interest is substituted for (VALU)in the
question.
LABDATA This property is a flag, which is either T or NIL;
if T it indicates that the clinical parameter is a
piece of primitive data, the value of which may be
known with certainty to the user (see Section
5.2.2).
LOOKAHEAD This property is a list of all rules in the system that
reference the clinical parameter in the premise.
9O Details of the Consultation
System

UPDATED-BY This property is a list of all rules in the system in


which the action or else clause permits a
conclusion to be made regarding the value of the
clinical parameter.
CONTAINED-IN This property is a list of all rules in the system in
which the action or else clause references the
clinical parameter but does not cause its value to
be updated.
TRANS This property is used to translate an occurrence of
this parameter into its English representation; the
context of the parameter is substituted for the
asterisk during translation.
DEFAULT This property is used only with clinical parameters
for which EXPECT= (NUMB); it gives the
expected units for numerical answers (days, years,
grams, etc.).
CONDITION This property, when utilized, is an executable LISP
expression that is evaluated before MYCIN
requests the value of the parameter; if the
CONDITION is true, the question is not asked
(e.g., "Dont ask for an organisms subtype if its
genus is not known by the user").

The uses of these properties will be discussed throughout the remain-


der of this chapter. However, a few additional points are relevant here.
First, it should be noted that the order of rules for the properties LOOK-
AHEAD, UPDATED-IN, and CONTAINED-INis arbitrary and does not
affect the programs advice. Second, EXPECTand TRANSare the only
properties that must exist for every clinical parameter. Thus, for example,
if there is no PROMPT or PROMPT1 stored for a parameter, the system
assumes that it simply cannot ask the user for the value of the parameter.
Finally, note in Figure 5-2 the difference in the TRANS property for yes-
no and non-yes-no parameters. In general, a parameter and its value may
be translated as follows:

THE <attribute> OF <object> IS <value>

However, for a yes-no parameter such as FEBRILE,it is clearly necessary


to translate the parameter in a fashion other than this:

THE FEBRILE OF PATIENT-I IS YES

Our solution has been to suppress the YESaltogether and simply to say:

PATIENT-1 IS FEBRILE
System Knowledge 91

5.1.4 Certainty Factors

Chapter 11 presents a detailed description of certainty factors and their


theoretical foundation. This section therefore provides only a brief over-
view of" the subject. A familiarity with the characteristics of certainty factors
(CFs) is necessary for the discussion of MYCINduring the remainder
this chapter.
The value of every clinical parameter is stored by MYCIN along with
an associated certainty factor that reflects the systems "belief" that the
value is correct. This formalism is necessary because, unlike domains in
which objects either have or do not have some attribute, in medical diag-
nosis and treatment there is often uncertainty regarding attributes such as
the significance of" the disease, the efficacy of a treatment, or the diagnosis
itself. CFs are an alternative to conditional probability that has several
advantages in MYCINsdomain.
A certainty factor is a number between - 1 and + 1 that reflects the
degree of belief in a hypothesis. Positive CFs indicate there is evidence
that the hypothesis is valid. The larger the CF, the greater is the belief in
the hypothesis. WhenCF = 1, the hypothesis is known to be correct. On
the other hand, negative CFs indicate that the weight of evidence suggests
that the hypothesis is false. The smaller the CF, the greater is the belief
that the hypothesis is invalid. CF = - 1 means that the hypothesis has been
effectively disproven. WhenCF = 0, there is either no evidence regarding
the hypothesis or the supporting evidence is equally balanced by evidence
suggesting that the hypothesis is not true.
MYCINshypotheses are statements regarding values of clinical pa-
rameters for the various nodes in the context tree. For example, sample
hypotheses are
hi =Theidentityof ORGANISM-1
is streptococcus
h2= PATIENT-1
is febrile
h3 = Thename
of PATIENT-1
is John Jones
We use the notation CF[h,E]=X to represent the certainty factor
for the hypothesis h based on evidence E. Thus, if CF[hl,E ] = .8,
CF[h2,E] = -.3, and CF[ha,E] = + 1, the three sample hypotheses above
may be qualified as follows:
CF[hl,E]
= .8 : There is stronglysuggestive
evidence
(.8) that
theidentity of ORGANISM-1
is streptococcus
CF[h2,E]
= -.3 : There is weakly
suggestiveevidence
(.3) that
PATIENT-1is notfebrile
CF[h3,E
] = +1 : ItJohn
is definite
(1)thatthename ofPATIENT-1
Jones

Certainty factors are used in two ways. First, as noted, the value of
every clinical parameter is stored with its associated certainty factor. In this
case the evidence E stands for all information currently available to MY-
92 Details of the ConsultationSystem

CIN. Thus, if the program needs the identity of ORGANISM-I, it may


look in its dynamic data base and find:
IDENTof ORGANISM-1
= ((STREPTOCOCCUS
.8))
B

The second use of CFs is in the statement of decision rules themselves.


In this case the evidence E corresponds to the conditions in the premise
of the rule. Thus

x
A&B&C~D

is a representation of the statement CF[D,(A & B & C)] = X. For example,


consider the following rule:

IF: 1) Thestainof theorganismis grampos, and


2) Themorphology of theorganismis coccus, and
3) Thegrowth conformation
of the organismis chains
THEN:There is suggestiveevidence(.7) thatthe
identityof theorganism
is streptococcus

This rule may also be represented as CF[hl,e] = .7, where hi is the hy-
pothesis that the organism (context of the rule) is St reptococcus and e is
the evidence that it is a gram-positive coccus growing in chains.
Since diagnosis is, in effect, the problem of selecting a disease from a
list of competing hypotheses, it should be clear that MYCINmay simul-
taneously be considering several hypotheses regarding the value of a clin-
ical parameter. These hypotheses are stored together, along with their CFs,
for each node in the context tree. We use the notation Val[C,P] to signify
the set of all hypotheses regarding the value of the clinical parameter P
for the context C. Thus, if MYCINhas reason to believe that ORGANISM-
1 may be either a Streptococcus or a Staphylococcus, but Pneumococcus has
been ruled out, its dynamic data base might well show:

VaI[ORGANISM-I,IDENT]
= ((STREPTOCOCCUS
.6)(STAPHYLOCOCCUS
(DIPLOCOCCUS-PNEUMONIAE
-1))

It can be shown that the sum of the CFs for supported hypotheses
regarding a single-valued parameter (i.e., those parameters for which the
hypotheses are mutually exclusive) cannot exceed 1 (Shortliffe and Buch-
anan, 1975). Multi-valued parameters, on the other hand, may have several
hypotheses that are all known to be true, for example:

VaI[PATIENT-1,ALLERGY]
= ((PENICILLIN
1)(AMPICILLIN
(CARBENICILLIN
1)(METHICILLIN

As soon as a hypothesis regarding a single-valued parameter is proved to


be true, all competing hypotheses are effectively disproved:

VaI[ORGANISM-I,IDENT]
= ((STREPTOCOCCUS
1)(STAPHYLOCOCCUS-1)
(DIPLOCOCCUS-PNEUMONIAE
-1))
System Knowledge 93

In Chapter 11 we demonstrate that CF[h,E] = -CF[--nh,E]. This ob-


servation has important implications for the way MYCINhandles the bi-
nary-valued attributes we call yes-no parameters. Since "yes" is "~no," it is
not necessary to consider "yes" and "no" as competing hypotheses for the
value of" a yes-no parameter (as we do for single-valued parameters). In-
stead, we can always express "no" as "yes" with a reversal in the sign of the
CEThis means that VaI[C,P] is always equal to the single value "yes," along
with its associated CF, when P is a yes-no parameter.
We discuss below MYCINsmechanism for adding to the list of hy-
potheses in Val[C,P] as new rules are invoked and executed. However, the
following points should be emphasized here:

1. The strength of the conclusion associated with the execution of a rule


reflects not only the CF assigned to the rule, but also the programs
degree of belief regarding the validity of the premise.
2. The support of several rules favoring a single hypothesis may be assim-
ilated incrementally on the list Val[C,P] by using the special combining
functions described in Chapter 11.

5.1.5 Functions for the Evaluation of Premise


Conditions

This section describes the evaluation of the individual conditions (see


<condition>, Section 5.1.1) in the premise of rules. Conditions in general
evaluate to true or false (T or NIL). Thus they may at first glance
considered simple predicates on the values of clinical parameters. However,
since there may be several competing hypotheses on the list Val[C,P], each
associated with its own degree of belief as reflected by the CF, conditional
statements regarding the value of parameters can be quite complex. All
predicates are implemented as LISP functions. The functions that under-
take the required analysis are of three varieties, specified by the designa-
tions <funcl>, <func2>, and <special-func> in the BNFrule descrip-
tion. This section explains the <funcl> and <func2> predicates. The
<special-func> category is deferred until later, however, so that we may
first introduce our specialized knowledge structures.
There are four predicates in the category <funcl>. These functions
do not form conditionals on specific values of a clinical parameter but are
concerned with the more general status of knowledge regarding the attri-
butes in question. For example, KNOWN[ORGANISM-I,IDENT] is an
invocation of the <funcl> predicate KNOWN; it would return true if the
identity of ORGANISM-1 were known, regardless of the value of the clin-
ical parameter IDENT. KNOWN and the other <funcl> predicates may
be formally defined as follows:
94 Details of the Consultation
System

Predicates of the Category <funcl >


Let V = Val[C,P] be the set of all hypotheses regarding the value of the
clinical parameter P for the context C.
Let Mv= Max[V]be the most strongly supported hypothesis in V (i.e., the
hypothesis with the largest CF).
Let CFmv= CF[Mv,E] where E is the total available evidence.

Then, if P is either a single-valued or multi-valued parameter, the four


predicates (functions) may be specified as follows:

Function If Then Else


KNOWN[C,P] CFmv> .2 T NIL
NOTKNOWN[C,P] CFmv-< .2 T NIL
DEFINITE[C,P] CFmv = 1 T NIL
NOTDEFINITE[C,P] CFmv < 1 T NIL

In words, these definitions reflect MYCINsconvention that the value of a


parameter is known if the CF of the most highly supported hypothesis
exceeds .2. The .2 threshold was selected empirically. The implication is
that a positive CF less than .2 reflects so little evidence supporting the
hypothesis that there is virtually no reasonable hypothesis currently known.
The interrelationships among these functions are diagrammed on a CF
number line in Figure 5-3. Regions specified are the range of values for
CFmvover which the function returns T.
As was pointed out in the preceding section, however, yes-no param-
eters are special cases because we know CF[YES,E] = -CF[NO,E]. Since
the values of yes-no parameters are always stored in terms of YES, MYCIN
must recognize that a YESwith CF = -.9 is equivalent to a NO with CF
= .9. The definitions of the four <funcl> predicates above do not reflect
this distinction. Therefore, when P is a yes-no parameter, the four func-
tions are specified as follows:

Function If Then Else


KNOWN[C,P] ]CFmv
I > ,2 T NIL
NOTKNOWN[C,P] ICFmv]-<.2 T NIL
DEFINITE[C,P] ICFmvl = 1 T NIL
NOTDEFINITE[C,P] ICFmvl < 1 T NIL

Figure 5-4 shows the relationship among these functions for yes-no param-
eters.
There are nine predicates in the category <func2>. Unlike the
<funcl> predicates, these functions control conditional statements re-
garding specific values of the clinical parameter in question. For example,
SAME{ORGANISM-I,IDENT,E.COLI] is an invocation of the <func2>
SystemKnowledge 95

NOTKNOWN ..I
"I

KNOWN

4 NOTDEFINITE ~)

-1 -.2 0 .2

J J i
t
DEFINITE

FIGURE 5-3 Diagramindicating the range of CF values


over which the <funcl> predicates hold true whenapplied to
multi-valuedor single-valued (i.e., non-yes-no)clinical param-
eters. Vertical lines andparenthesesdistinguish closed andnon-
closed certainty factor ranges, respectively.

predicate SAME;it would return a non-NIL value if the identity of OR-


GANISM-1were known to be E. coli. SAMEand the other <func2> pred-
icates may be formally defined as tollows:

Predicates of the Category <func2>


Let V = VaI[C,P] be the set of all hypotheses regarding the value of the
clinical parameter P for the context C.
Let I = Intersection[V, LST]be the set of all hypotheses in V that also occur
in the set LST; LST contains the possible values of P for comparison
by the predicate function; it usually contains only a single element; if
no element in LSTis also in V, I is simply the empty set.
Let Mi = Max[I] be the most strongly confirmed hypothesis in I; thus Mi is
NILif" I is the emptyset.
Let CFmi = CF[Mi,E] where CFmi = 0 if Mi is NIL.

Then the <func2> predicates are specified as follows:


96 Details of the ConsultationSystem

NOTKNOWN
J

KNOWN ( KNOWN
J
r "1

NOTDEFINITE ;)

-1 -.2 0 .2

I I I
tDEFINITE DEFINITE
t
FIGURE5-4 Diagram indicating the range of CF values over
which the <funcl> predicates hold true when applied to yes-
no clinical parameters.

Function If Then Else


SAME[C,ELST] CFmi > .2 CFmi NIL
THOUGHTNOT[C,ELST] CFmi < - .2 - CFmi NIL
NOTSAME[C,ELST] CFmi -< .2 T NIL
MIGHTBE[C,ELST] CFmi >- -.2 T NIL
VNOTKNOWN[C,P, LST] [CFmi[ -< .2 T NIL
DEFIS[C,ELST] CFmi = + l T NIL
DEFNOT[C,ELST] CFmi = - 1 T NIL
NOTDEFIS[C,ELST] .2 < CFmi < ! T NIL
NOTDEFNOT[C,P, LST] - 1 < CFmi < -.2 T NIL

The names of the functions have been selected to reflect their semantics.
Figure 5-5 shows a graphic representation of each function and also ex-
plicitly states the interrelationships among them.
Note that SAME and THOUGHTNOTare different from all the
other functions in that they return a number (CF) rather than T if the
defining condition holds. This feature permits MYCINto record the de-
gree to which premise conditions are satisfied. In order to explain this
SystemKnowledge 97

THOUGHTNOT ~- ) ( ~ SAME
7 i

VNOTKNOWN

I"

I. MIGHTBE
l" "l
( NOTDEFNOT ~- ) ( ~ NOTDEFIS ~- )

-1 -.2 0 .2 +1

I I I I
t DEFNOT
t
DEFIS
SAME or NOTSAME = THOUGHTNOTor MIGHTBE = T
NOTSAME -- VNOTKNOWNor THOUGHTNOT
THOUGHTNOT - NOTDEFNOT or DEFNOT
MIGHTBE = VNOTKNOWNor SAME
SAME = NOTDEFIS or DEFIS

FIGURE 5-5 Diagramindicating the range of CF values over


which the <func2>predicates hold true. The logical relation-
ships of these predicates are summarizedbelow the diagram.

point, we must discuss the SANDfunction that oversees the evaluation of


the premise of a rule. The reader will recall the BNFdescription:

<premise> :: = (SAND<condition> ... <condition>)

SANDis similar to the standard LISP ANDfunction in that it evaluates


its conditional arguments one at a time, returning false (NIL) as soon as
condition is found to be false, and otherwise returning true (T). The dif-
ference is that SANDexpects some of its conditions to return numerical
values rather than simply T or NIL. If an argument condition returns NIL
98 Details of the ConsultationSystem

(or a number equal to .2 or less), it is considered false and SANDstops


considering subsequent arguments. On the other hand, nonnumeric values
of conditions are interpreted as indicating truth with CF = 1. Thus each
true condition either returns a number or a non-NIL value that is inter-
preted as 1. SANDthen maintains a record of the lowest value returned
by any of its arguments. This number, termed TALLY,is a certainty tally,
which indicates MYCINs degree of belief in the premise (see Combining
Function 2 in Chapter 11). Thus .2 < TALLY-< 1, where TALLY= 1 in-
dicates that MYCINbelieves the premise to be true with certainty.
Most of the predicates that evaluate conditions in the premise of a rule
return either T or NIL as we have shown. Consider, however, the semantics
of the most commonly used function, SAME, and its analogous function,
THOUGHTNOT. Suppose MYCIN knows:
VaI[ORGANISM-I,IDENT]
= ((STREPTOCOCCUS
.7)(STAPHYLOCOCCUS

Then it seems clear that

SAME[ORGANISM-1
,IDENT,STREPTOCOCCUS]

is in some sense "more true" than

SAME[ORGANISM-1
,IDENT,STAPHYLOCOCCUS]

even though both hypotheses exceed the threshold CF = .2. If SAME


merely returned T, this distinction would be lost. Thus, for this example:

SAME[ORGANISM-I,IDENT,
STREPTOCOCCUS]
= .7
SAME[ORGANISM-I,IDENT,
STAPHYLOCOCCUS]
= .3
whereas KNOWN[ORGANISM-I,IDENT]
=T
and NOTDEFIS[ORGANISM-I,IDENT,
STREPTOCOCCUS]
= T

A similar argument explains why THOUGHTNOTreturns a CF rather


than T. It is unclear whether any of the other <func2> predicates should
return a CF rather than T; our present conviction is that the semantics of
those functions do not require relative weightings in the way that SAME
and THOUGHTNOT do.
Consider a brief example, then, of the way in which the premise of a
rule is evaluated by SAND. The following ORGRULE:

IF: 1) Thestainof the organismis gramneg, and


2) Themorphologyof the organismis rod, and
3) Theaerobicityof the organism
is aerobic
THEN:Thereis stronglysuggestiveevidence (.8) that
theclassof theorganismis enterobacteriaceae

is internally coded in LISP as:

PREMISE:(SAND
(SAMECNTXTGRAMGRAMNEG)
(SAMECNTXTMORPH
ROD)
(SAMECNTXT
AIRAEROBIC))
ACTION:(CONCLUDE
CNTXT
CLASSENTEROBACTERIACEAE
TALLY.8)
System Knowledge 99

Suppose this rule has been invoked for consideration of ORGANISM-l;


i.e., the context of the rule (CNTXT) is the node in the context tree termed
ORGANISM-1.Now suppose that MYCINhas the following information
in its data base (we will discuss later howit gets there):
VaI[ORGANISM-1,GRAM]
= ((GRAMNEG 1.0))
VaI[ORGANISM-1,MORPH]
= ((ROD.8)(COCCUS
VaI[ORGANISM-1,AIR]
= ((AEROBIC.6)(FACUL

SAND begins by evaluating SAME[ORGANISM-I,GRAM,GRAMNEG].


The function returns CF = 1.0, so TALLYisset to 1.0 (see definition of
TALLYin the description of SANDabove). Next SANDevaluates the sec-
ond premise condition, SAME[ORGANISM-1,MORPH,ROD],which re-
turns .8. Since the first two conditions both were found to hold, SAND
evaluates SAME[ORGANISM-1,AIR,AEROBIC],which returns .6. Thus
TALLY is set to .6, and SANDreturns T. Since the premise is true, MYCIN
may now draw the conclusion indicated in the action portion of the rule.
Note, however, that CONCLUDE has as arguments both .8 (i.e., the CF
for the rule as provided by the expert) and TALLY(i.e., the certainty tally
for the premise). CONCLUDE and the other functions that control infer-
ences are described later.

5.1.6 Static KnowledgeStructures

Although all MYCINsinferential knowledge is stored in rules, there are


various kinds of" static definitional information, which are stored differently
even though they are accessible from rules.

Tabular and List-Based Knowledge

There are three categories of knowledge structures that could be discussed


in this section. However, one of them, MYCINsdictionary, is used prin-
cipally for natural language understanding and will therefore not be de-
scribed. The other two data structures are simple lists and knowledge ta-
bles.

Simple lists: Simple lists provide a mechanismfor simplifying references


to variables and optimizing knowledge storage by avoiding unnecessary
duplication. Twoexamples should be sufficient to explain this point.
As was shown earlier, the EXPECTproperty for the clinical parameter
IDENT is
(ONEOF(ORGANISMS))

ORGANISMS
is the name of a linear list containing the names of all bac-
100 Details of the ConsultationSystem

teria known to MYCIN. There is also a clinical parameter named COV-


ERFOR for which the EXPECT property is

(ONEOF
ENTEROBACTERIACEAE
(ORGANISMS)
G COCCI C-COCCI)

Fhus, by storing the organisms separately on a list named ORGANISMS,


we avoid having to duplicate the list of names in the EXPECTproperty of
both IDENT and COVERFOR. Furthermore, using the variable name
rather than internal pointers to the list structure facilitates references to
the list of organisms whenever it is needed.
A second example involves the several rules in the system that make
conclusions based on whether an organism was isolated from a site that is
normally sterile or nonsterile. STERILESITESis the name of a simple list
containing the names of all normally sterile sites known to the system.
There is a similar list named NONSTERILESITES. Thus many rules can
have the condition (SAME CNTXT SITE STERILESITES), and the sites
need not be listed explicitly in each rule.

Knowledge tables: In conjunction with the special functions discussed


in the next subsection, MYCINs knowledge tables permit a single rule to
accomplish a task that would otherwise require several rules. A knowledge
table contains a comprehensive record of certain clinical parameters plus
the values they take on under various circumstances. For example, one of
MYCINs knowledge tables itemizes the gram stain, morphology, and aero-
bicity for every bacterial genus known to the system. Consider, then, the
task of inferring an organisms gram stain, morphology, and aerobicity if
its identity is known with certainty. Without the knowledge table, MYCIN
would require several rules of the following form:

IF: Theidentityof theorganism is definitelyW


THEN:1) It is definite(1) thatthe gramstain of the
organismis X, and
2) It is definite(1) thatthemorphologyof the
organismis Y, and
3) It is definite(1)thattheaerobicity
of the
organismis Z

Instead, MYCINcontains a single rule of the following form:

RULE030
IF: Theidentityof theorganism is known
withcertainty
THEN:It is definite(1) that theseparameters
- GRAM
MORPH AIR- shouldbetransferredfromthe identity
of the organism
to this organism

Thus if ORGANISM-1 is known to be a Streptococcus, MYCIN can use


RULE030 to access the knowledge table to look up the organisms gram
stain, morphology, and aerobicity.
System Knowledge 101

Specialized Functions

The efficient use of knowledge tables requires the existence of four spe-
cialized functions (the category <special-func> from Section 5.1.1).
explained below, each function attempts to add members to a list named
GRIDVALand returns T if at least one element has been found to be
placed in GRIDVAL.

Functions of the Category <special-func>


Let V= VaI[C,P] be the set of all hypotheses regarding the value of the
clinical parameter P for the context C.
Let CLSTbe a list of objects that may be characterized by clinical param-
eters.
Let PLSTbe a list of clinical parameters.

Then:

Function Value of GRIDVAL


SAME2[C,CLST, PLST] {X X ~ CLST& (for all P in PLST)
SAME[C,P, VaI[X,P]]}
NOTSAME2[C,CLST, PLST] {X X e CLST& (for at least one P in
PLST) N OTSAM E[C,P, Val[X,P]]}
SAME3[C,P, CLST, P*] {X X ~ CLST & SAME[C,P, Val[X,P*]]}
NOTSAME3[C,P,CLST, P*] {X X CLST & NOTSAME
[ C,P,Val[x,e*
]]}
GRID[<object>,<attribute>] {X X is a value of the <attribute> of
<object>}

GRIDis merely a function for looking up information in the specialized


knowledge table.
The use of these functions is best explained by example. Consider the
following verbalization of a rule given us by one of our collaborating ex-
perts:

If you know the portal of entry of the current organism and also
know the pathogenic bacteria normally associated with that site, you
have evidence that the current organism is one of those pathogens
so long as there is no disagreement on the basis of gram stain,
morphology, or aerobicity.

This horrendous sounding rule is coded quite easily using


SAME2[C,CLST,PLST], where C is the current organism, CLSTis the list
102 Details of the Consultation System

of pathogenic bacteria normally associated with the portal of entry of C,


and PLST is the set of properties (GRAMMORPH AIR). GRID is used
set up CLST. The LISP version of the rule is
PREMISE:(SAND(GRID (VAL CNTXTPORTAL)PATH-FLORA)
(SAME2CNTXTGRIDVAL(QUOTE(GRAMMORPHAIR))))
ACTION:(CONCLISTCNTXTIDENTGRIOVAL.8)

Note that GRID sets up the initial value of GRIDVAL for use by SAME2,
which then redefines GRIDVAL for use in the action clause. This rule is
translated (to somewhatstilted English) as follows:
IF: 1) Thelist of likely pathogens associated with the
portal of entry of the organism is known, and
2) This current organismandthe members youare
considering agreewith respectto the following
properties: GRAM MORPH AIR
THEN: Thereis stronglysuggestiveevidence (3) that
eachof themis the identity of this current
organism

SAME2and NOTSAME2 can also be used for comparing the values of


the same clinical parameters lbr two or more different contexts in the
context tree, for example:
SAME2[ORGANISM-1
(ORGANISM-2ORGANISM-3)(GRAMMORPH)]

On the other hand, SAME3and NOTSAME3 are useful for comparing


different parameters of two or more contexts. Suppose you need a pred-
icate that returns T if the site of a prior organism (ORGANISM-2)
is the
same as the portal of entry of the current organism (ORGANISM-l).This
is accomplished by the following:
SAME3[ORGANISM-1
PORTAL
(ORGANISM-2)SITE]

5.1.7 Translation of Rules into English

Rules are translated into a subset of English using a set of recursive func-
tions that piece together bits of text. Weshall demonstrate the process
using the premise condition (GRID (VAL CNTXT PORTAL) PATH-
FLORA),which is taken from the rule in the preceding section.
The reader will recall that every clinical parameter has a property
named TRANSthat is used for translation (Section 5.1.3). In addition,
every function, simple list, or knowledge table that is used by MYCINs
rules also has a TRANSproperty. For our example the following TRANS
properties are relevant:
GRID: (THE (2) ASSOCIATED
WITH(1) IS KNOWN)
VAL: (((21
PORTAL: (THE PORTALOFENTRYOF*)
PATH-FLORA: (LIST OFLIKELYPATHOGENS)
Use of the Rules to Give Advice 103

The numbers in the translations of functions indicate where the translation


of the corresponding argument should be inserted. Thus the translation
of GRIDs second argument is inserted for the (2) in GRIDs TRANSprop-
erty. The extra parentheses in the TRANSfor VALindicate that the trans-
lation of VALsfirst argument should be substituted for the asterisk in the
translation of VALs second argument. Since PORTALis a PROP-ORG,
CNTXTtranslates as "the organism," and the translation of (VALCNTXT
PORTAL) becomes
Theportal of entryof the organism

Substituting VALs translation for the (1) in GRIDs TRANSand PATH-


FLORAstranslation for the (2) yields the final translation of the condi-
tional clause:
Thelist of likely pathogens
associated
with the portal of entryof the organism
is known

Similarly, (GRID (VAL CNTXTCLASS) CLASSMEMBERS)

translatesas: Thelist of members


associatedwith the class of the organismis known

All other portions of rules use essentially this same procedure for
translation. An additional complexity arises, however, if it is necessary to
negate the verbs in action or else clauses whenthe associated CF is negative.
The translator program must therefore recognize verbs and know how to
negate them when evidence in a premise supports the negation of the
hypothesis that is referenced in the action of the rule.

5.2 Use of the Rules to Give Advice

The discussion in Section 5.1 was limited to the various data structures
used to represent MYCINsknowledge. The present section proceeds to
an explanation of how MYCIN uses that knowledge in order to give advice.

5.2.1 MYCINsControl Structure

MYCINsrules are directly analogous to the consequent theorems intro-


duced by Hewitt in his PLANNER system (Hewitt, 1972). They permit
reasoning chain to grow dynamically on the basis of the users answers to
questions regarding the patient. This subsection describes that reasoning
network, explaining how it grows and how MYCINmanages to ask ques-
tions only when there is a reason for doing so.
104 Details of the ConsultationSystem

Consequent Rules and Recursion

MYCINs
task involves a four-stage decision problem:

1. Decidewhichorganisms,if any, are causing significant disease.


2. Determinethe likely identity of the significant organisms.
3. Decide whichdrugs are potentially useful.
4. Select the best drug or drugs.

Steps 1 and 2 are closely interrelated since determination of an organisms


significance maywell depend on its presumedidentity. Furthermore, MY-
CINmust consider the possibility that the patient has an infection with an
organismnot specifically mentionedby the user (e.g., an occult abscess
suggestedby historical informationor subtle physical findings). Finally, if
MYCIN decides that there is no significant infection requiring antimicro-
bial therapy, it should skip Steps 3 and 4, advising the user that no treat-
ment is thought to be necessary. MYCINs task area theretore can be de-
fined by the following rule:
RULE092
IF: 1) There is anorganism whichrequirestherapy,and
2) Consideration hasbeengivento thepossible
existenceof additional
organisms requiring
therapy,
eventhough theyhavenot actuallybeenrecovered
fromanycurrentcultures
THEN:[)o the following:
1) Compilethelist of possibletherapieswhich,
based
uponsensitivitydata,may beeffectiveagainst
theorganisms requiringtreatment,
and
2) Determine thebesttherapyrecommendations fromthe
compiledlist
OTHERWISE: Indicatethatthepatientdoesnot requiretherapy

This rule is one of MYCINs PATRULES (i.e., its context is the patient)
and is knownas the goal rule for the system. A consultation session with
MYCIN results from a simple two-step procedure:

1. Create the patient context as the top node in the context tree (see Sec-
tion 5.3 for an explanation of hownodes are added to the tree).
2. Attempt to apply the goal rule to the newlycreated patient context.

After the second step, the consultation is over. Thus we must explain how
the simple attempt to apply the goal rule to the patient causes a lengthy
consultation with an individualized reasoning chain.
WhenMYCIN first tries to evaluate the premise of the goal rule, the
first condition requires that it knowwhether there is an organism that
requires therapy. MYCIN then reasons backwards in a manner that may
be informally paraphrased as follows:
Use of the Rules to Give Advice 105

Howdo I decide whether there is an organism requiring therapy?


Well, RULE090tells me that organisms associated with significant
disease require therapy. But I dont even have any organisms in the
context tree yet, so Id better ask first if there are any organisms, and
if there are Ill try to apply RULE090 to each of them. However, the
premise of RULE090requires that I know whether the organism is
significant. I have a bunch of rules for making this decision
(RULE038 RULE042 RULE044 RULEI08 RULE122). For example,
RULE038 tells me that if the organism came from a sterile site it is
probably significant. Unfortunately, I dont have any rules for infer-
ring the site of a culture, however, so I guess Ill have to ask the user
for this information when I need it...

This goal-oriented approach to rule invocation and question selection is


automated via two interrelated procedures, a MONITOR that analyzes
rules and a FINDOUTmechanism that searches for data needed by the
MONITOR.
The MONITOR analyzes the premise of a rule, condition by condition,
as shown in Figure 5-6. ~ Whenthe value of the clinical parameter refer-
enced in a condition is not yet known to MYCIN,the FINDOUTmecha-
nism is invoked in an attempt to obtain the missing information.
FINDOUTthen either derives the necessary information (from other
rules) or asks the user for the data.
FINDOUThas a dual strategy depending on the kind of information
required by the MONITOR. This distinction is demonstrated in Figure
5-7. In general, a piece of data is immediately requested from the user (an
ASKIquestion) if it is considered in some sense "primitive," as are, for
example, most laboratory data. Thus, if the physician knows the identity
of an organism (e.g., from a lab report), we would prefer that the system
request that information directly rather than try to deduce it via decision
rules. However, if the user does not know the identity of the organism,
MYCIN uses its knowledge base in an effort to deduce the range of likely
organisms. Nonlaboratory data are those kinds of information that require
inference even by the clinician, e.g., whether or not an organism is a con-
taminant or whether or not a previously administered drug was effective.
FINDOUTalways attempts to deduce such information first, asking the
physician only when MYCINsknowledge base of rules is inadequate for
making the inference from the information at hand (an ASK2question).
Wehave previously described the representation of clinical parameters
and their associated properties. The need for two of these properties,
LABDATAand UPDATED-BY,should now be clear. The LABDATAflag
for a parameter allows FINDOUTto decide which branch to take through

3As discussed in Section 5.1.5, the MONITOR


uses the SANDfunction to oversee the premise
evaluation.
106 Details of the Consultation System

THE MONITOR FOR RULES (~ START

CONSIDER THE
FIRST CONDITION
IN THE PREMISE
OF THE RULE

HAS
NECESSARY
no INFORMATION BEEN CONSIDER THE
GATHEREDTO DECIDE NEXT CONDITION
IF THE CONDITION IN THE PREMISE
IS TRUE?

GATHER THE
NECESSARY
INFORMATION
USING THE FINDOUT
MECHANISM

IS ~ yes
THE CONDITION -

l
no (or unknown)
~ no
ADD THE
CONCLUSION OF
REJECT THE RULE TO THE
THE ONGOING RECORD
RULE OF THE CURRENT
CONSULTATION

FIGURE 5-6 Flow chart describing how the MONITORana-


lyzes a rule and decides whether or not it applies in the clinical
situation under consideration. Each condition in the premise of
the rule references some clinical parameter, and all such con-
ditions must be true for the rule to be accepted (Shortliffe et
al., 1975).
Use of the Rules to Give Advice 107

THE FINDOUT MECHANISM


START )

yes

1
RETRIEVE Y = LIST OF RULES
1
ASK USER FOR THE VALUE
WHICH MAY AID IN DEDUCING
THE VALUE OF THE PARAMETER
OF THE PARAMETER
]
I
APPLY MON/TOR TO EACH RULE
IN THE LIST Y

RETURN
I WHICH MAYYAID
J

i
=
IN DEDUCING
.o.s
iI
THE VALUE OF THE PARAMETER

ASK USER FOR THE VALUE


OF THE PARAMETER I APPLY MONITOR TO EACH RULE ]
IN THE LIST Y

FIGURE5-7 Flow chart describing the strategy for determin-


ing which questions to ask the physician. The derivation of
values of parameters may require recursive calls to the MON-
ITOR, thus dynamically creating a reasoning chain specific to
the patient under consideration (Shortliffe et al., 1975).

its decision process (Figure 5-7). Thus IDENT is marked as being LAB-
DATAin Figure 5-2.
Recall that the UPDATED-BY property is a list of all rules in the system
that permit an inference to be made regarding the value of the indicated
parameter. Thus UPDATED-BYis precisely the list called Y in Figure
5-7. Every time a new rule is added to MYCINs knowledge base, the name
of the rule is added to the UPDATED-BY property of the clinical param-
108 Details of the Consultation
System

eter referenced in its action or else clause. Thus the new rule immediately
becomes available to FINDOUTat times when it may be useful. It is not
necessary to specify explicitly its interrelationships with other rules in the
system.
Note that FINDOUTis accessed from the MONITOR,but the MON-
ITOR may also be accessed from FINDOUT.This recursion allows self-
propagation of a reasoning network appropriate for the patient under
consideration and selects only the necessary questions and rules. The first
rule passed to the MONITOR is always the goal rule. Since the first con-
dition in the premise of this rule references a clinical parameter named
TREATFOR,and since the value of TREATFORis of course unknown
before any data have been gathered, the MONITORasks FINDOUTto
trace the value of TREATFOR. This clinical parameter is not LABDATA,
so FINDOUTtakes the left-hand pathway in Figure 5-7 and sets Y to the
UPDATED-BYproperty of TREATFOR,the two-element list (RULE090
RULE149). The MONITOR is then called again with RULE090as the rule
for consideration, and FINDOUT is used to trace the values of clinical
parameters referenced in the premise of RULE090.Note that this process
parallels the informal paraphrase of MYCINsreasoning given above.
It is important to recognize that FINDOUTdoes not check to see
whether the premise condition is true. Instead, the FINDOUTmechanism
traces the clinical parameter exhaustively and returns its value to the MON-
ITOR, where the conditional expression may then be evaluated. 4 Hence
FINDOUT is called one time at most for a clinical parameter (in a given
context--see Section 5.3). When FINDOUTreturns a value to the MON-
ITOR, it marks the clinical parameter as having been traced. Thus when
the MONITORreaches the question "HAS ALL NECESSARYINFOR-
MATION BEEN GATHERED TO DECIDE IF THE CONDITION IS
TRUE?" (Figure 5-6), the parameter is immediately passed to FINDOUT
unless it has been previously marked as traced.
Figure 5-8 is a portion of MYCINsinitial reasoning chain. In Figure
5-8 the clinical parameters being traced are underlined. Thus REGIMEN
is the top goal of the system (i.e., it is the clinical parameter in the action
clause of the goal rule). Below each parameter are the rules (from the
UPDATED-BY property) that may be used for inferring the parameters
value. Clinical parameters referenced in the premise of each of these rules
are then listed at the next level in the reasoning network. Rules with mul-
tiple premise conditions have their links numbered in accordance with the
order in which the parameters are traced (by FINDOUT).ASK1indicates
that a parameter is LABDATA, so its value is automatically asked of the
user when it is needed. ASK2refers to parameters that are not LABDATA
but for which no inference rules currently exist, e.g., if the dose of a drug
is adequate. One of the goals in the future development of MYCINsknowl-

4Theprocessis slightly different for muhi-valued


parameters;see Section5.2.1,
109

I-. ooo

Z~ OOe

N -
|
~{~"

11 ~ w~
~

u.i

-!-~
110 Details of the Consultation System

edge base is to acquire enough rules allowing the values of non-LABDATA


parameters to be inferred so that ASK2questions need no longer occur.
Note that the reasoning network in Figure 5-8 is drawn to reflect
maximumsize. In reality many portions of such a network need not be
considered. For example, RULE042(one of the UPDATED-BY rules under
SIGNIFICANCE) is rejected if the SITE condition is found to be false by
the MONITOR. When that happens, neither COLLECTnor SIGNUM
needs to be traced by FINDOUT,and those portions of the reasoning
network are not created. Thus the order of conditions within a premise is
highly important. In general, conditions referencing the most common
parameters (i.e., those that appear in the premises of the most rules) are
put first in the premises of new rules to act as an effective screening mech-
anism.
A final comment is necessary regarding the box labeled "REJECT
THE RULE" in Figure 5-6. This step in the MONITORactually must
check to see if the rule has an else clause. If so, and if the premise is known
to be false, the conclusion indicated by the else clause is drawn. If there is
no else clause, or if the truth status of the premise is uncertain (e.g., the
user has entered UNKNOWN when asked the value of one of the relevant
parameters), the rule is simply ignored without any conclusion having been
reached.

Asking Questions of the User

The conventions for communication between a program and a physician


are a primary factor determining the systems acceptability. Wehave there-
fore designed a number of features intended to simplify the interactive
process that occurs when FINDOUTreaches one of the boxes entitled
"ASK USER FOR THE VALUE OF THE PARAMETER" (Figure 5-7).
WhenMYCINrequests the value of a single-valued or yes-no param-
eter, it uses the PROMPT property of the parameter. The users response
is then compared with the EXPECTproperty of the parameter. If the
answer is one of the expected responses, the program simply continues
through the reasoning network. Otherwise, MYCINchecks the system dic-
tionary to see if the users response is a synonymfor one of the recognized
answers. If this attempt also fails, MYCIN uses Interlisp spelling-correction
routines (Teitelman, 1974) to see if a simple spelling or typographical error
will account for the unrecognized response. If so, the program makes the
correction, prints its assumption, and proceeds as though the user had
made no error. If none of these mechanisms succeeds, MYCINtells the
user that the response is not recognized, displays a list of sample responses,
and asks the question again.
Multi-valued parameters are handled somewhat differently. FIND-
OUTrecursively traces such parameters in the normal fashion, but when
forced to ask a question of the user, it customizes its question to the con-
Use of the Rules to Give Advice 111

dition being evaluated in the MONITOR. Suppose, for example, the MON-
ITOR were evaluating the condition (SAME CNTXT INFECT MENIN-
GITIS), i.e., "Meningitis is an infectious disease diagnosis for the patient."
If FINDOUT were to ask the question using the regular PROMPT strategy,
it would request:
Whatis the infectiousdisease
diagnosis
for PATIENT-1
?

The problem is that the patient may have several diagnoses, each of which
can be expressed in a variety of ways. If the physician were to respond:
A meningeal
inflammation
that is probably
of infectiousorigin

MYCIN would be forced to try to recognize that this answer implies men-
ingitis. Our solution has been to customize questions for multi-valued pa-
rameters to reflect the value being checked in the current premise condi-
tion. The PROMPT1 property is used, and questions always expect a yes
or no response:
Is thereevidence
that thepatienthasa meningitis?

The advantages of this approach are the resulting ability to avoid natural
language processing during the consultation itself and the posing of ques-
tions that are specific to the patient under consideration.
In addition to the automatic spelling-correction capability described
above, there are a number of options that may be utilized whenever MY-
CIN asks the user a question:

UNKNOWN Used to indicate that the physician does not know


the answer to the question, usually because the data
are unavailable (may be abbreviated U or UNK)
? Used to request a list of sample recognized
responses
?? Used to request a list of all recognized responses
RULE Used to request that MYCINdisplay the translation
of the current decision rule. FINDOUTsimply
translates the rule being considered by the
MONITOR.This feature provides a simple
capability for explaining why the program is asking
the question. However, it cannot explain motivation
beyond the current decision rule.
QA Used to digress temporarily in order to use the
Explanation System. The features of this system are
explained in Chapter 18.
WHY Used to request a detailed explanation of the
question being asked. This feature is much more
conversational than the RULEoption above and
permits investigation of the current state of the
entire reasoning chain.
112 Details of the Consultation
System

CHANGE ### Used to change the answer to a previous question.


Whenever MYCINasks a question, it prints a
number in front of the prompt. Thus CHANGE 4
means "Go back and let me reanswer question 4."
The complexities involved in this process are
discussed below.
STOP Halts the program without completing the
consultation
HELP Prints this list

5.2.2 Creation of the Dynamic Data Base

The Consultation System maintains an ongoing record of the consultation.


These dynamic data include information entered by the user, inferences
drawn using decision rules, and record-keeping data structures that facil-
itate question answering by the Explanation System (Chapter 18).

Data Acquired from the User

Except for questions related to propagation of the context tree, all queries
from MYCIN to the physician request the value of a specific clinical pa-
rameter for a specific node in the context tree. The FINDOUT mechanism
screens the users response, stores it in MYCINsdynamic data base, and
returns the value to the MONITOR for evaluation of the conditional state-
ment that generated the question in the first place. The physicians re-
sponse is stored, of course, so that future rules containing conditions ref-
erencing the same clinical parameter will not cause the question to be asked
a second time.
As has been noted, however, the values of clinical parameters are al-
ways stored along with their associated certainty factors. A physicians re-
sponse must therefore have a CF associated with it. MYCINsconvention
is to assume CF = 1 for the response unless the physician explicitly states
otherwise. Thus the following exchange:
7) Staining
characteristics
of ORGANISM-1
(gram):
**GRAMNEG

results in: VaI[ORGANISM-1,GRAM]


= ((GRAMNEG
1.0))

If, on the other hand, the user is fairly sure of the answer to a question
but wants to indicate uncertainty, he or she may enter a certainty factor in
parentheses after the response. MYCINexpects the number to be an in-
teger between - 10 and + 10; the program divides the number by 10 to
obtain a CEUsing integers simplifies the users response and also discour-
Use of the Rules to Give Advice 113

ages comparisons between the number and a probability measure. Thus


the following exchange:

8) Enterthe identity (genus)ORG


ANISM-l:
** ENTEROCOCCUS (8)

results in: VaI[ORGANISM-I,IDENT]= ((STREPTOCOCCUS-GROUP-D


,8))

This example also shows how the dictionary is used to put synonyms into
standardized form for the patients data base (i.e., Enterococcus is another
name for a group-D Streptococcus).
A variant of this last example is theusers option to enter multiple
responses to a question, as long as each is modified by a CEFor example:

13) Did ORGANISM-2


growin clumps,chains,or pairs?
** CLUMPS(6) CHAINS
(3) PAIRS

results in: VaI[ORGANISM-2,CONFORM]


= ((CLUMPS.6)(CHAINS .3)(PAIRS-.8))

The CFs associated with the parameter values are then used for evaluation
of premise conditions as described earlier. Note that the users freedom to
modify answers increases the flexibility of MYCINsreasoning. Without the
CF option, the user might well have responded UNKNOWN to question
13 above. The demonstrated answer, although uncertain, gives MYCIN
much more information than would have been provided by a response of
UNKNOWN.

Data Inferred by the System

This subsection explains the <conclusion> item from the BNF rule
description, i.e., the functions that are used in action or else clauses when
a premise has shown that an indicated conclusion may be drawn. There
are only three such functions, two of which (CONCLISTand TRANS-
LIST) reference knowledge tables (Section 5.1.6) but are otherwise depen-
dent on the third, a function called CONCLUDE. CONCLUDE takes five
arguments:

CNTXT The node in the context tree about which the conclusion is
being made
PARAM The clinical parameter whose value is being added to the
dynamic data base
VALUE The inferred value of the clinical parameter
TALLY The certainty tally for the premise of the rule (see Section
5.1.5)
114 Details of the Consultation System

CF The certainty factor for the rule as judged by the expert


from whomthe rule was obtained

The translation of CONCLUDE


depends on the size of CF:

]CF]-> .8 "There is strongly suggestive evidence that..."


.4 ~ levi < .8 "There is suggestive evidence that..."
ICFI< .4 "There is weakly suggestive evidence that..."
Computed CF "There is evidence that..."

Thus the following conclusion:


(CONCLUDE
CNTXTIDENT STREPTOCOCCUS
TALLY .7)

translates as:
Thereis suggestive
evidence
(.7) that theidentityof theorganism
is streptococcus

If, for example, the rule with this action clause were successfully applied
to ORGANISM-1, an organism for which no previous inferences had been
made regarding identity, the result would be:
VaI[ORGANISM-I,IDENT]
= ((STREPTOCOCCUS
X))

where X is the product of .7 and TALLY(see Combining Function 4,


Chapter 11). Thus the strength of the conclusion reflects both the CF for
the rule and the extent to which the premise of the rule is believed to be
true for ORGANISM-1.
Suppose a second rule were now found that contains a premise true
for ORGANISM-1 and that adds additional evidence to the assertion that
the organism is a Streptococcus. This new evidence somehowhas to be com-
bined with the CF (=X) that is already stored for the hypothesis that
ORGANISM-1 is a Streptococcus. If Y is the CF calculated for the second
rule (i.e., the product of the TALLY for that rule and the CF assigned to
the rule by the expert), the CF for the hypothesis is updated to Z so that:
VaI[ORGANISM-I,IDENT]
= ((STREPTOCOCCUS
Z))

where Combining Function 1 gives Z = X + Y(1 - X). This function


justified and discussed in detail in Chapter 11.
Similarly, additional rules leading to alternate hypotheses regarding
the identity of ORGANISM-1 may be successfully invoked. The new hy-
potheses, along with their associated CFs, are simply appended to the list
of hypotheses in Val[ORGANISM-1,IDENT]. Note, of course, that the CFs
of some hypotheses may be negative, indicating that there is evidence sug-
gesting that the hypothesis is not true. Whenthere is both positive and
negative evidence for a hypothesis, Combining Function 1 must be used
in a modified form.
A final point to note is that values of parameters are stored identically
regardless of whether the information has been inferred or acquired from
Use of the Rules to Give Advice 115

the user. The source of a piece of information is maintained in a separate


record. It is therefore easy to incorporate new rules that infer values of
parameters for which ASK2questions to the user were once necessary.

Creating an Ongoing Consultation Record

In addition to information provided or inferred regarding nodes in the


context tree, MYCINsdynamic data base contains a record of the consul-
tation session. This record provides the basis for answering questions about
the consuhation (Chapter 18).
Twogeneral types of records are kept. One type is information about
howvalues of" clinical parameters were obtained. If the value was inferred
using rules, a record of those inferences is stored with the rules themselves.
Thus whenever an action or else clause is executed, MYCIN keeps a record
of the details. The second type of record provides a mechanismfor explain-
ing why questions were asked. MYCIN maintains a list of questions, their
identifying numbers, the clinical parameter and context involved, plus the
rule that led to generation of the question. This information is useful when
the user retrospectively requests an explanation for a previous question
(Chapter 18).

5.2.3 Self-Referencing Rules

As new rules were acquired from the collaborating experts, it became ap-
parent that MYCINwould need a small number of rules that departed
from the strict modularity to which we had otherwise been able to adhere.
For example, one expert indicated that he would tend to ask about the
typical Pseudomonas-type skin lesions only if he already had reason to be-
lieve that the organism was a Pseudomonas.If the lesions were then said to
be evident, however, his belief that the organism was a Pseudomonaswould
be increased even more. A rule reflecting this fact must somehowimply
an orderedness of rule invocation; i.e., "Dont try this rule until you have
already traced the identity of the organism by using other rules in the
system." Our solution has been to reference the clinical parameter early in
the premise of the rule as well as in the action, for example:
RULE040
IF: 1) Thesite of the cultureis blood,and
2) Theidentity of the organismmaybe pseudomonas, and
3) Thepatient hasecthymagangrenosum skin lesions
THEN: Thereis stronglysuggestive evidence
(.8) that the
identity of theorganismis pseudomonas

Note that RULE040is thus a member of both the LOOKAHEAD property


and the UPDATED-BYproperty for the clinical parameter IDENT. Rules
116 Details of the Consultation
System

having the same parameter in both premise and action are termed se!/-
referencing rules. The ordered invocation of such rules is accomplished by
a generalized procedure described below.
As discussed in Section 5.2.1, a rule such as RULE040is originally
invoked because MYCIN is trying to infer the identity of an organism; i.e.,
FINDOUTis asked to trace the parameter IDENTand recursively sends
the UPDATED-BY list for that parameter to the MONITOR.When the
MONITOR reaches RULE040, however, the second premise condition ref-
erences the same clinical parameter currently being traced by FINDOUT.
If the MONITORmerely passed IDENT to FINDOUTagain (as called
for by the simplified flow chart in Figure 5-6), FINDOUTwould begin
tracing IDENTfor a second time, RULE040would be passed to the MON-
ITORyet again, and an infinite loop would occur.
The solution to this problem is to let FINDOUT screen the list called
Y in Figure 5-7, i.e., the UPDATED-BY property for the parameter it is
about to trace. Y is partitioned by FINDOUT into regular rules and self-
referencing rules (where the latter category is defined as those rules that
also occur on the LOOKAHEAD list for the clinical parameter). FIND-
OUT passes the first group of rules to the MONITORin the normal
fashion. After all these rules have been tried, FINDOUT marks the pa-
rameter as having been traced and then passes the self-referencing rules
to the MONITOR.In this way, when the MONITOR considers the second
condition in the premise of RULE040,the condition is evaluated without
a call to FINDOUTbecause the parameter has already been marked as
traced. Thus the truth of the premise of a self-referencing rule is deter-
mined on the basis of the set of non-self-referencing rules, which were
evaluated first. If one of the regular rules permitted MYCIN to conclude
that an organism might be a Pseudomonas, RULE040might well succeed
when passed to the MONITOR.This mechanism for handling self-refer-
encing rules satisfies the intention of an expert when he or she gives us
decision criteria in self-referencing form.
It should be noted that this approach minimizes the potential for self-
referencing rules to destroy certainty factor commutativity. By holding
these rules until last, we insure that the certainty tally for any of their
premises (see Section 5.1.5) is the same regardless of the order in which
the non-self-referencing rules were executed. If there is more than one
self-referencing rule successfully executed for a given context and param-
eter, however, the order of their invocation may affect the final CE The
approach we have implemented thus seeks merely to minimize the poten-
tial undesirable effects of self referencing rules.

5.2.4 Preventing Reasoning Loops

Self-referencing rules are actually a special case of a more general problem.


Reasoning loops involving multiple rules cannot be handled by the mech-
anism described above. The difference is that self-referencing rules are
Useof the Rulesto GiveAdvice 117

intentional parts of MYCINsknowledge base whereas reasoning loops are


artifacts that must somehowbe avoided.
For the following discussion we introduce the following notation:

[q] X ::> Y

means that decision rule [q] uses clinical parameter X to reach a conclusion
regarding the value of clinical parameter Y. Thus a self-referencing rule
may be represented by:

[a] E ::> E

where E is the clinical parameter that is referenced in both the premise


and the action of the rule. Consider now the following set of rules:

[1] A ::> B
[2] B ::> C
[3] C ::> D
[4] D ::> A

Rule [1], for example, says that under certain unspecified conditions, the
value of A can be used to infer the value of B. Now suppose that the
MONITOR asks FINDOUTto trace the clinical parameter D. Then MY-
CINs recursive mechanism would create the following reasoning chain:

[4] [1] [2] [3]


...D ::> A ::> B ::> C ::> D

The difference between this looped reasoning chain and a self-referencing


rule is that Rule [4] was provided as a mechanism for deducing the value
of A, not for reinforcing the systems belief in the value of D. In cases
where the value of A is of primary interest, the use of Rule [4] would be
appropriate.
MYCINsolves this problem by keeping track of all parameters cur-
rently being traced by the FINDOUTmechanism. The MONITORthen
simply ignores a rule if one of the parameters checked in its premise is
already being traced. The result, with the value of D as the goal, is a three-
membered reasoning chain in the case above:

[1] [2] [3]


A ::> B ::> C ::> D

Rule [4] is rejected because parameter D is already being traced elsewhere


in the current reasoning chain. If the value of A were the main goal,
however, the chain would be
118 Details of the Consultation
System

[2] [3] [4]


B ::> C ::> D ::> A

Note that this simple mechanism allows us to have potential reasoning


loops in the knowledge base but to select only the relevant nonlooping
portions for consideration of a given patient.
A similar problem can occur when a rule permits two conclusions to
be made, each about a different clinical parameter. MYCIN prevents loops
in such circumstances by refusing to permit the same rule to occur twice
in the current reasoning chain.

5,3 Propagation of the Context Tree

The mechanismby which the context tree is customized for a given patient
has not yet been discussed. As described in Section 5.2.2, the consultation
system begins simply by creating the patient context and then attempting
to execute the goal rule. All additional nodes in the context tree are thus
added automatically during the unwinding of MYCINsreasoning regard-
ing the premise of the goal rule. This section first explains the data struc-
tures used for creating new nodes. Mechanisms for deciding when new
nodes should be added are then discussed.

5.3.1 Data Structures Used for Sprouting Branches

Section 5.1.2 was devoted to an explanation of the context tree. At that


time we described the different kinds of contexts and explained that each
node in the tree is an instantiation of the appropriate context-type. Each
context-type is characterized by the following properties:

PROMPT 1 A sentence used to ask the user whether the first node
of this type should be added to the context tree;
expects a yes-no answer
PROMPT2 A sentence used to ask the user whether subsequent
nodes of this type should be added to the context tree
PROMPT3 Replaces PROMPT1 when it is used. This is a message
to be printed out if MYCIN assumes that there is at
least one node of this type in the tree.
PROPTYPE Indicates the category of clinical parameters (see
Section 5.1.3) that may be used to characterize
context of this type
Propagation of the Context Tree 119

SUBJECT Indicates the categories of rules that may be applied


to a context of this type
SYN Indicates a conversational synonym for referring to a
context of this type. MYCIN uses SYNwhen filling in
the asterisk of PROMPT properties for clinical
parameters.
TRANS Used for English translations of rules referencing this
type of context
TYPE Indicates what kind of internal name to give a context
of this type
MAINPROPS Lists the clinical parameters, if any, that are to be
automatically traced (by FINDOUT)whenever
context of this type is created
ASSOCWITH Gives the context-type of nodes in the tree
immediately above contexts of this type

Two sample context-types are shown in Figure 5-9. The following ob-
servations may help clarify the information given in that figure:

1. PRIORCULS: Whenever a prior culture is created, it is given the name


CULTURE-# (see TYPE), where # is the next unassigned culture num-
ber. The values of SITE and WHENCUL are immediately traced using
the FINDOUTmechanism (see MAINPROPS).The culture node is put
in the context tree below a node of type PERSON(see ASSOCWITH),
and the new context may be characterized by clinical parameters of the
type PROP-CUL(see PROPTYPE).The prior culture may be the con-
text for either PRCULRULESor CULRULES(see SUBJECT) and
translated, in questions to the user, as "this (site) culture" (see SYN)
where (site) is replaced by the site of the culture if it is known.
2. CURORG:Since there is a PROMPT3rather than a PROMPT1, MY-
CIN prints out the PROMPT3message and assumes (without asking)
that there is at least one CURORGfor each CURCUL(see AS-
SOCWITH); the other CURORGproperties correspond to those de-
scribed above for PRIORCULS.

Whenever MYCINcreates a new context using these models, it prints


out the name of the new node in the tree, e.g.:
...... ORGANISM-1
......

Thus the user is familiar with MYCINsinternal names for the cultures,
organisms, and drugs under discussion. The node names may then be used
in MYCINsquestions at times when there may be ambiguity regarding
which node is the current context, e.g.:

Is the patients illness with the staphylococcus


(ORGANISM-2)
a hospital-acquired
infection?
120 Details of the ConsultationSystem

PRIORCULS
ASSOCWITH: PERSON
MAINPROPS: (SITE WHENCUL)
PROMPTI: (Were any organisms that were significant (but no longer
require therapeutic attention) isolated within the last
approximately 30 days?)
PROMPT2: (Any other significant earlier cultures from which pathogens
were isolated?)
PROPTYPE: PROP-CUL
SUBJECT: (PRCULRULES CULRULES)
SYN: (SITE (this * culture))
TRANS: (PRIOR CULTURES OF *)
TYPE: CULTURE-

CURORG
ASSOCWITH: CURCUL
MAINPROPS: (IDENT GRAM MORPH SENSITIVS)
PROMPT2: (Any other organisms isolated from * for which you would like
a therapeutic recommendation?)
PROMPT3:(I will refer to the first offending organism from * as:)
PROPTYPE: PROP-ORG
SUBJECT: (ORGRULES CURORGRULES)
SYN: (IDENT (the *))
TRANS: (CURRENT ORGANISMS OF *)
TYPE: ORGANISM-

FIGURE5-9 Context trees such as that shown in Figure 5-1


are generated from prototype context-types such as those shown
here. The defining properties are described in the text.

It should also be noted that when PROMPT1or PROMPT2is used to


ask a question, the physician need not be aware that the situation is dif-
ferent from that occurring when FINDOUTasks questions. All the user
options described in Section 5.2.1 operate in the normal fashion.
Finally, the MAINPROPSproperty (later called INITIALDATA) re-
quires brief explanation. The claim was previously made that clinical pa-
rameters are traced and their values requested by FINDOUT only when
they are needed for evaluation of a rule that has been invoked. Yet we
must now acknowledge that certain LABDATAparameters are automati-
cally traced whenever a node for the context tree is created. The reason
for this departure is an attempt to keep the program acceptable to physi-
cians. Since the order of rules on UPDATED-BY lists is arbitrary, the order
in which questions are asked is somewhat arbitrary as well. We have found
that physicians are annoyed if the "basic" questions are not asked first, as
soon as the context is created. The MAINPROPS convention forces certain
Propagation
of the ContextTree 121

standard questions early in the characterization of a node in the context


tree. Parameters not on the MAINPROPS list are then traced in an arbi-
trary order that depends on the order in which rules are invoked. Since
the parameters on MAINPROPS lists are important pieces of information
that would uniformly be traced by FINDOUTanyway, the convention we
have implemented forces a standardized ordering of the "basic" questions
without generating useless information.

5.3.2 Explicit Mechanisms for Branching

There are two situations under which MYCINattempts to add new nodes
to the context tree. The simpler case occurs when rules explicitly reference
contexts that have not yet been created. Suppose, for example, MYCINis
trying to determine the identity of a current organism and therefore in-
vokes the following CURORGRULE:
IF: 1)Theidentity of theorganism
is notknown
withcertainty,
and
2) Thiscurrentorganismandpriororganisms
of
thepatientagreewithrespect
tothefollowing
properties: GRAM MORPH
THEN:There is weaklysuggestive
evidence
thateach
of
themis a priororganism
withthesame
identity
asthiscurrent organism

The second condition in the premise of this rule references other nodes
in the tree, namely nodes of the type PRIORORGS. If no such nodes exist,
the MONITORasks FINDOUTto trace PRIORORGSin the normal fash-
ion. The difference is that PRIORORGS is not a clinical parameter but a
context-type. FINDOUTtherefore uses PROMPT1of PRIORORGSto ask
the user if there is at least one organism. If so, an instantiation of PRIOR-
ORGSis added to the context tree, and its MAINPROPS are traced.
PROMPT2 is then used to see if there are any additional prior organisms,
and the procedure continues until the user indicates there are no more
PRIORORGS that merit discussion. Finally, FINDOUTreturns the list of
prior organisms to the MONITOR so that the second condition in the rule
above can be evaluated.

5.3.3 Implicit Mechanisms for Branching

There are two kinds of implicit branching mechanisms. One of these is


closely associated with the example of the preceding section. As shown in
Figure 5-1, a prior organism is associated with a prior culture. But the
explicit reference to prior organisms in the rule above made no mention
of prior cultures. Thus if FINDOUTtries to create a PRIORORGSin
122 Details of the Consultation
System

response to an explicit reference but finds there are no PRIORCULS, the


program knows there is an implied need to ask the user about prior cul-
tures before asking about prior organisms. Since PRIORCULS are asso-
ciated with the patient, and since the patient node already exists in the
context tree, only one level of implicit branching is required in the evalu-
ation of the rule.
The other kind of implicit branching occurs when the MONITOR
attempts to evaluate a rule for which no appropriate context exists. For
example, the first rule invoked in an effort to execute the goal rule is a
CURORGRULE (see RULE090, Figure 5-8). Since no current organism
has been created at the time the MONITOR is passed this CURORGRULE,
MYCINautomatically attempts to create the appropriate nodes and then
to apply the invoked rule to each.

5.4 Selection of Therapy

The preceding discussion concentrated on the premise of MYCINsprin-


cipal goal rule (RULE092). This section explains what happens when the
premise is found to be true and the two-step action clause is executed.
Unlike other rules in the system, the goal rule does not lead to a conclusion
(Section 5.2.2) but instead instigates actions. The functions in the action
of the goal rule thus correspond to the <actfunc> class that was introduced
in the BNFdescription. The first of these functions causes a list of potential
therapies to be created. The second allows the best drug or drugs to be
selected from the list of possibilities.

5.4.1 Creation of the Potential Therapy List

There is a class of decision rules, the THERULES, that are never invoked
by MYCINsregular control structure because they do not occur on the
UPDATED-BY list of any clinical parameter. These rules contain sensitivity
information for the various organisms known to the system, for example:
iF: Theidentity
of theorganism
is pseudomonas
THEN:I recommend
therapychosenfromamong
thefollowing
drugs:
1- colistin (.96)
2 - polymyxin (.96)
3 - gentamicin(.96)
4- carbenicillin(.65)
5 - sulfisoxazole
(.64)

The numbers associated with each drug are the probabilities that a Pseu-
domonasisolated at Stanford Hospital will be sensitive (in vitro) to the in-
Selection of Therapy 123

dicated drug. The sensitivity data were acquired from Stanfords micro-
biology laboratory (and could easily be adjusted to reflect changing
resistance patterns at Stanford or the data for some other hospital desiring
a version of MYCIN with local sensitivity information). Rules such as the
one shown here provide the basis for creating a list of potential therapies.
There is one such rule for every kind of organism known to the system.
MYCINselects drugs only on the basis of the identity of offending
organisms. Thus the programs first task is to decide, for each current
organism deemed to be significant, which hypotheses regarding the or-
ganisms identity (IDENT)are sufficiently likely that they must be consid-
ered in choosing therapy. MYCIN uses the CFs of the various hypotheses
in order to select the most likely identities. Each identity is then given an
item number (see below) and the process is repeated for each significant
current organism. The Set of Indications for therapy is then printed out,
e.g.:

Mytherapyrecommendation will be basedon the following possible


identities of the organism(s)
that seemto besignificant:
<Item 1> Theidentity of ORGANISM-1
maybe
STREPTOCOCCUS-GROUP-D
<Item 2> The identity of ORGANISM-1
maybe
STREPTOCOCCUS-ALPHA
<Item 3> The identity of ORGANISM-2
is PSEUDOMONAS

Each item in this list of therapy indications corresponds to one of the


THERULES. Thus MYCIN retrieves the list of potential therapies for each
indication from the associated THERULE. The default (in vitro) statistical
data are also retrieved. MYCIN then replaces the default sensitivity data
with real data about those of" the patients organisms, if any, for which
actual sensitivity information is available from the laboratory. Furthermore,
if" MYCIN has inferred sensitivity information from the in vivo perfor-
mance of a drug that has already been administered to the patient, this
information also replaces the default sensitivity data. Thus the compiled
list of potential therapies is actually several lists, one for each item in the
Set of" Indications. Each list contains the names of drugs and, in addition,
the associated numbers representing MYCINsjudgment regarding the
organisms sensitivity to each of the drugs.

5.4.2 Selecting the Preferred Drug from the List

WhenMYCINrecommends therapy, it tries to suggest a drug for each of


the items in the Set of Indications. Thus the problem reduces to selecting
the best drug from the therapy list associated with each item. Clearly, the
probability that an organism will be sensitive to a drug is an important
factor in this selection process. However, there are several other consid-
124 Details of the Consultation
System

erations. MYCINs strategy is to select the best drug on the basis of sensi-
tivity information but then to consider contraindications fi)r that drug.
Only if a drug survives this second screening step is it actually recom-
mended. Furthermore, MYCINalso looks for ways to minimize the num-
ber of drugs recommended and thus seeks therapies that cover for more
than one of the items in the Set of Indications. The selection/screening
process is described in the following two subsections.

Choosing the Apparent First-Choice Drug

The procedure used for selecting the apparent first-choice drug is a com-
plex algorithm that is somewhat arbitrary and is thus currently (1974)
under revision. This section describes the procedure in somewhat general
terms since the actual LISP functions and data structures are not partic-
ularly enlightening.
There are three initial considerations used in selecting the best therapy
for a given item:

1. the probability that the organism is sensitive to the drug;


2. whether the drug is already being administered;
3, the relative efficacy of drugs that are otherwise equally supported by
the first twocriteria.

As is the case with human consultants, MYCINdoes not insist on a


change in therapy if the physician has already begun a drug that may work,
even if" that drug would not otherwise be MYCINsfirst choice. Drugs with
sensitivity numbers within .05 of one another are considered to be almost
identical on the basis of the first criterion. Thus the rule in the previous
section, for example, indicates no clear preference amongcolistin, poly-
myxin, and gentamicin 5 for Pseudomonasinfections (if default sensitivity
information from the rule is used). However, our collaborating experts
have ranked the relative efficacy of antimicrobials on a scale from 1 to 10.
The number reflects such factors as whether the drug is bacteriostatic or
bacteriocidal or its tendency to cause allergic sensitization. Since genta-
micin has a higher relative efficacy than either colistin or polymyxin, it is
the first drug considered for Pseudomonasinfections (unless knownsensi-
tivity information or previous drug experience indicates that an alternate
choice is preferable).
Once MYCIN has selected the apparent best drug for each item in the
Set of Indications, it checks to see if one of the drugs is also useful for one
or more of the other indications. For example, if the first-choice drug for

5Ed.note: Amikacin
andtobramyciuwerenot yet availablein 1974whenthis rule waswritten.
Theknowledgebase was later updatedwith the newdrug infi)rmation.
Selection of Therapy 125

Item 1 is the second-choice drug for Item 2 and if the second-choice drug
for Item 2 is almost as strongly supported as the first-choice drug, Item
ls first-choice drug also becomesItem 2s first-choice drug. This strategy
permits MYCINto attempt to minimize the number of drugs to be rec-
ommended.
A similar strategy is used to avoid giving two drugs of the same drug
class. For example, MYCIN knows that if the first choice for one item is
penicillin and the first choice for another is ampicillin, then the ampicillin
maybe given for both indications (because ampicillin covers essentially all
organisms sensitive to penicillin).
In the ideal case MYCIN will find a single drug that effectively covers
for all the items in the Set of Indications. But even if each item remains
associated with a different drug, a screening stage to look for contraindi-
cations is required. This rule-based process is described in the next sub-
section. It should be stressed, however, that the manipulation of drug lists
described above is algorithmic; i.e., it is coded in LISP functions that are
called from the action clause of the goal rule. There is considerable "knowl-
edge" in this process. Since rule-based knowledge provides the foundation
of MYCINs ability to explain its decisions, it would be desirable eventually
to remove this therapy selection method from functions and place it in
6decision rules.

Rule-Based Screening for Contraindications

Unlike the complex list manipulations described in the preceding subsec-


tion, criteria for ruling out drugs under consideration may be effectively
placed in rules. The rules in MYCINfor this purpose are termed OR-
DERRULES. A sample rule of this type is:

IF: 1) Thetherapyunderconsiderationis tetracycline, and


2) Theage(in years)of the patientis less than
THEN: Thereis stronglysuggestive evidence(.8) that
tetracyclineis nota potentialtherapyfor use
againstthe organism

In order to use MONITOR and FINDOUTwith such rules, we must con-


struct appropriate nodes in the context tree and must be able to charac-
terize them with clinical parameters. The context-type used for this
purpose is termed POSSTHER and the parameters are classified as PROP-
THER. Thus when MYCINhas selected the apparent best drugs for the
items in the Set of Indications, it creates a context corresponding to each
of these drugs. POSSTHER contexts occur below CURORGS in the context
tree. FINDOUT is then called to trace the relevant clinical parameter,

6Ed. note: See the next chapter tor a discussion of howthis was later accomplished.
126 Details of the Consultation System

which collects contraindication information (i.e., this becomes a new goal


statement), and the normal recursive mechanism through the MONITOR
insures that the proper ORDERRULES are invoked.
ORDERRULES allow a great deal of" drug-specific knowledge to be
stored. For example, the rule above insures that tetracycline is ruled out
in youngsters who still have developing bone and teeth. 7 Similar rules tell
MYCIN never to give streptomycin or carbenicillin alone, not to give sul-
fonamides except in urinary tract infections, and not to give cephalothin,
clindamycin, lincomycin, vancomycin, cefazolin, or erythromycin if the pa-
tient has meningitis. Other ORDERRULES allow MYCINto consider the
patients drug allergies, dosage modifications, or ecological considerations
(e.g., save gentamicin for Pseudomonas,Serratia, and Hafnia unless the pa-
tient is so sick that you cannot risk using a different aminoglycoside while
awaiting lab sensitivity data). Finally, there are rules that suggest appro-
priate combination therapies (e.g., add carbenicillin to gentamicin for
known Pseudomonas infections). In considering such rules MYCIN often is
forced to ask questions that never arose during the initial portion of the
consultation. Thus the physician is asked additional questions during the
period after MYCIN has displayed the items in the Set of Indications but
before any therapy is actually recommended.
After the presumed first-choice drugs have been exposed to the OR-
DERRULE screening process, MYCINchecks to see whether any of the
drugs is now contraindicated. If so, the drug-ranking process is repeated.
Newfirst-choice drugs are then subjected to the ORDERRULES. The pro-
cess continues until all the first-choice drugs have been instantiated as
POSSTHERS.These then become the systems recommendations. Note
that this strategy may result in the recommendationof drugs that are only
mildly contraindicated so long as they are otherwise strongly favored. The
therapy recommendation itself takes the following form:
Mypreferredtherapyrecommendationis as follows:
In order to cover for Items <1><2><3>:
Givethe followingin combination:
1. PENICILLIN
Dose:285,000UNITS/KG/DAY - IV
2. GENTAMICIN
Dose:1.7 MG/KG Q8H- IV ORIM
Comments:MODIFYDOSEIN RENALFAILURE

The user may also ask for second, third, and subsequent therapy recom-
mendations until MYCIN is able to suggest no reasonable alternatives. The
mechanismfor these iterations is merely a repeat of the processes described
above but with recommended drugs removed from consideration.

7Ed. note: This rule ignores any statement of the mechanism whereby its conclusion follows
from its premise. The lack of" underlying "support" knowledge accounts for changes intro-
duced in GUIDON when MYCINsrules were used for education. See Part Eight for further
discussion of this point.
Mechanisms
for Storageof Patient Data 127

5.5 Mechanisms for Storage of Patient Data

5.5.1 Changing Answers to Questions

If a physician decides he or she wants to change a response to a question


that has already been answered, MYCINmust do more than merely re-
display the prompt, accept the users new answer, and make the appro-
priate change to the value of the clinical parameter in question. In general,
the question was originally asked because the premise of a decision rule
referenced the clinical parameter. Thus the original response affected the
evaluation of at least one rule, and subsequent pathways in the reasoning
network may have been affected as well. It is therefore necessary for MY-
CIN somehowto return to the state it was in at the time the question was
originally asked. Its subsequent actions can then be determined by the
corrected user response.
Reversing all decisions made since a question was asked is a complex
problem, however. The most difficult task is to determine what portions
of a parameters cumulative CF preceded or followed the question requir-
ing alteration. In fact, the extra data structures needed to permit this kind
of backing up are so large and complicated, and would be used so seldom,
that it seems preferable simply to restart the consultation from the begin-
ning when the user wants to change one of his or her answers.
Restarting is of course also less than optimal, particularly if it requires
that the physician reenter the answers to questions that were correct the
first time around. Our desire to make the program acceptable to physicians
required that we devise some mechanism for changing answers, but re-
starting from scratch also had obvious drawbacks regarding user accep-
tance of" the system. Wetherefore needed a mechanism for restarting MY-
CINs reasoning process but avoiding questions that had already been
answered correctly. WhenFINDOUT asks questions, it therefore uses the
following three-step algorithm:

1. Before asking the question, check to see if the answer is already stored
(in the Patient Data Table--see Step 3 below); if the answer is there, use
that value rather than asking the user; otherwise go to Step 2.
2. Ask the question using PROMPTor PROMPT1as usual.
3. Store the users response in the dynamic record of facts about the pa-
tient, called the Patient Data Table, under the appropriate clinical pa-
rameter and context.

The Patient Data Table, then, is a growing record of the users responses
to questions from MYCIN.It is entirely separate from the dynamic data
record that is explicitly associated with the nodes in the context tree. Note
128 Details of the Consultation
System

that the Patient Data Table contains only the text responses of the user--
there is no CF information (unless included in the users response), nor
are there data derived from MYCINsrule-based inferences.
The Patient Data Table and the FINDOUTalgorithm make the task
of changing answers much simpler. The technique MYCINuses is the
following:

a. Whenever the user wants to change the answer to a previous question,


he or she enters CHANGE <numbers>, where <numbers> is a list of
the questions whose answers need correction.
b. MYCIN looks up the indicated question numbers in its question record.
c. The users responses to the indicated questions are removed from the
ct~rrent Patient Data Table.
d. MYCIN reinitializes the system, erasing the entire context tree, includ-
ing all associated parameters; however it leaves the Patient Data Table
intact except for the responses deleted in (c).
e. MYCINrestarts the consultation from the beginning.

This simple mechanism results in a restarting of the Consultation System


but does not require that the user enter correct answers a second time.
Since the Patient Data Table is saved, Step 1 of the FINDOUT algorithm
above will find all the users responses until the first question requiring
alteration is reached. Thus the first question asked the user after he or she
gives the CHANGE commandis, in fact, the earliest of the questions he
or she wants to change. There may be a substantial pause after the
CHANGEcommand while MYCINreasons through the network to the
first question requiring alteration, but a pause is to be preferred over a
mechanism requiring reentry of all answers. The implemented technique
is entirely general because answers to questions regarding context tree
propagation are also stored in the Patient Data Table.

5.5.2 Remembering Patients for Future Reference

Whena consultation is complete, the Patient Data Table contains all re-
sponses necessary for generating a complete consultation for that patient.
It is therefore straightforward to store the Patient Data Table (on disk or
tape) so that it may be reloaded in the future. FINDOUT will automatically
read responses from the table, rather than ask the user, so a consultation
maybe run several times on the basis of only a single interactive session.
There are two reasons for storing Patient Data Tables for future ref-
erence. One is their usefulness in evaluating changes to MYCINsknowl-
edge base. The other is the resulting ability to reevaluate patients once new
clinical information becomes available.
Mechanisms
for Storageof Patient Data 129

Evaluating New Rules

Newrules may have a large effect on the way a given patient case is handled
by MYCIN.For example, a single rule may reference a clinical parameter
not previously sought or may lead to an entirely new chain in the reasoning
network. It is therefore useful to reload Patient Data Tables and run a new
version of MYCIN on old patient cases. A few new questions may be asked
(because their responses are not stored in the Patient Data Table). Conclu-
sions regarding organism identities may then be observed, as may the pro-
grams therapeutic recommendations. Any changes from the decisions
reached during the original run (i.e., when the Patient Data Table was
created) must be explained. When a new version of MYCINevaluates
several old Patient Data Tables in this manner, aberrant side effects of new
rules may be found. Thus a library of stored patient cases provides a useful
mechanism for screening new rules before they become an integral part
of MYCINsknowledge base.

Reevaluating Patient Cases

The second use for stored Patient Data Tables is the reevaluation of patient
data once additional laboratory or clinical information becomes available.
If a user answers several questions with UNKNOWN during the initial
consultation session, MYCINsadvice will of course be based on less than
complete information. After storing the Patient Data Table, however, the
physician may return for another consultation in a day or so once he or
she has more specific information. MYCINcan use the previous Patient
Data Table for responses to questions whose answers are still up to date.
The user therefore needs to answer only those questions that reference
new information. A mechanism for the physician to indicate directly what
snew data are available has not yet been automated, however,
A related capability to be implemented before MYCINbecomes avail-
able in the clinical setting is a SAVEcommand. 9 If a physician must leave
the computer terminal midwaythrough a consultation, this option will save
the current Patient Data Table on the disk. Whenthe physician returns to
complete the consultation, he or she will reload the patient record and the
session will continue from the point at which the SAVEcommand was
entered.
It should be stressed that saving the current Patient Data Table is not
the same as saving the current state of MYCINsreasoning. Thus, as we
have stated above, changes to MYCINsrule corpus may result in different
advice from the same Patient Data Table.

SEd. note: A RESTART


option was subsequentlydevelopedto permit reassessmentof cases
overtime.
9Ed.note: Thisoption wasalso subsequentlyimplemented.
130 Details of the Consultation
System

5.6 Suggested Improvements to the System

This section summarizes some ideas for improvement of the consultation


program described in this chapter. Each of the topics mentioned is the
subject of current (1974) efforts by one or more of the researchers asso-
ciated with the MYCINproject.

5.6.1 Dynamic Ordering of Rules

The order in which rules are invoked by the MONITOR is currently con-
trolled solely by their order on the UPDATED-BY property of the clinical
parameter being traced.l The order of rules on the UPDATED-BY prop-
erty is also arbitrary, tending to reflect nothing more than the order in
which rules were acquired. Since FINDOUT sends all rules on such lists
to the MONITOR and since our certainty factor combining function is
commutative, the order of rules is unimportant.
Somerules are much more useful than others in tracing the value of
a clinical parameter. For example, a rule with a six-condition premise that
infers the value of a parameter with a low CF requires a great deal of work
(as manyas six calls to FINDOUT) with very little gain. On the other hand,
a rule with a large CF and only one or two premise conditions may easily
provide strong evidence regarding the value of the parameter in question.
It may therefore be wise for FINDOUTto order the rules in the UP-
DATED-BY list on the basis of both information content (CF) and the work
necessary to evaluate the premise. Then if the first few rules are success-
fully executed by the MONITOR, the CF associated with one of the values
of the clinical parameter may be so large that invocation of subsequent
rules will require more computational effort than they are worth. If FIN-
DOUTtherefore ignores such rules (i.e., does not bother to pass them to
the MONITOR), considerable time savings may result. Furthermore, entire
reasoning chains will in some cases be avoided, and the number of ques-
tions asked the user could accordingly be decreased.l~

5.6.2 Dynamic Ordering of Conditions Within Rules

The MONITOR diagram in Figure 5-6 reveals that conditions are evalu-
ated strictly in the order in which they occur within the premise of the
rule. The order of conditions is therefore important, and the most corn-
t tEd. note: Manyof these ideas werelater implemented and are briefly mentioned
in Chapter
lAnexceptionto this point is the sell-referencingrules--seeSection5.2.3.

4. Forexample,meta-rulesprovideda mechanism for encodingstrategies to help select the


mostpertinentrules in a set, andthe conceptof a unity pathwasimplementedto favorchains
of rules that reachedconclusions
withcertaintyat eachstep in the chain.
Suggested Improvements to the System 131

monly referenced clinical parameters should be placed earliest in the prem-


ise.
Suppose, however, that in a given consultation the clinical parameter
referenced in the fourth condition of a rule has already been traced by
FINDOUTbecause it was referenced in some other rule that the MONI-
TORhas already evaluated. As currently designed, MYCIN checks the first
three conditions first, even if the fourth condition is already knownto be
false. Since the first three conditions may well require calls to FINDOUT,
the rule may generate unnecessary questions and expand useless reasoning
chains.
The solution to this problem would be to redesign the MONITOR so
that it reorders the premise conditions, first evaluating those that reference
clinical parameters that have already been traced by FINDOUT.In this
way a rule will not cause new questions or additions to the reasoning net-
12
work if any of" its conditions are knownto be false at the outset.

5.6.3 Prescreening of Rules

An alternate approach to the problem described in the preceding section


would be for FINDOUTto judge the implications of every parameter it
traces. Once the value has been determined by the normal mechanism,
FINDOUTcould use the LOOKAHEAD list for the clinical parameter in
order to identify all rules referencing the parameter in their premise con-
ditions. FINDOUTcould then evaluate the relevant conditions and mark
the rule as failing if the condition turns out to be false. Then, whenever
the MONITOR begins to evaluate rules that are invoked by the normal
recursive mechanism, it will check to see if the rule has previously been
marked as false by FINDOUT.If so, the rule could be quickly ruled out
without needing to consider the problem of reordering the premise con-
ditions.
At first glance, the dynamic reordering of premise conditions appears
to be a better solution than the one just described. The problem with rule
prescreening is that it requires consideration of all rules on the parameters
LOOKAHEAD list, some of which may never actually be invoked during
the consultation. ~3

5.6.4 Placing All Knowledge in Rules

Although most of MYCINsknowledge is placed in decision rules, we have


pointed out several examples of knowledge that is not rule-based. The
simple lists and knowledgetables maybe justified on the basis of efficiency,

V2Ed. note: The preview mechanism in MYCINwas eventually implemented to deal with this
issue,
13Ed. note: It was for this reason that the idea outlined here was never implemented.
132 Details of the Consultation
System

especially since those knowledge structures may be directly accessed by


rules.
However, the algorithmic mechanisms for therapy selection are some-
what more bothersome. Although we have managed to put many drug-
related decision criteria in the ORDERRULES, the mechanisms for cre-
ating the potential therapy lists and for choosing the apparent first-choice
drug are programmed explicitly in a series of relatively complex LISP
functions. Since MYCINs ability to explain itself is based on rule retrieval,
the system cannot give good descriptions of these drug selection proce-
dures. It is therefore desirable to place more of the drug selection knowl-
edge in rules.
Such efforts should provide a useful basis for evaluating the power of
our rule-based formalism. If the goal-oriented control structure we have
developed is truly general, one would hope that algorithmic approaches
to the construction and ordering of lists could also be placed in decision
rule format. Wetherefore intend to experiment with ways for incorporat-
ing the remainder of MYCINsknowledge into decision rules that are in-
14
voked by the standard MONITOR/FINDOUT process.

5.6.5 The Need for a Context Graph

The context tree used by MYCINis the source of one of the systems
primary problems in attempting to simulate the consultation process. Every
node in the context tree leads to the uppermost patient node by a single
pathway. In reality, however, drugs, patients, organisms, and cultures are
not interrelated in this highly structured fashion. For example, drugs are
often given to cover for more than one organism. The context tree does
not permit a single CURDRUG or PRIORDRUGto be associated with
more than a single organism. What we need, therefore, is a network of
contexts in the form of a graph rather than a pure tree. The reasons why
MYCINcurrently needs a tree-structured context network are explained
in Section 5.1.2. Wehave come to recognize that a context graph capability
is an important extension of the current system, however, and this will be
the subject of future design modifications. 15 Whenimplemented, for ex-
ample, it will permit a physician to discuss a prior drug only once, even
though it may have been given to cover for several prior organisms.

14Ed.note: Rule-based encodingof the therapyselection algorithmwaseventuallyundertaken


andis describedin the next chapter.
15Ed.note: This problemwasneveradequatelysolvedand remainsa limitation of the EMYCIN
architecture(Part Five). Apartial solution wasachievedwhenpredicate functionswerede-
velopedthat alloweda specific rule to be appliedto all contextsof a giventypeand to draw
inferencesin onepart of the contexttree basedon findingselsewherein the contexttree.
6
Details of the Revised
Therapy Algorithm

William J. Clancey

A program that is designed to provide sophisticated expert advice must


cope with the needs of naive users who may find the advice puzzling or
difficult to accept. This chapter describes additions to MYCIN that provide
for explanations of its therapy decisions, the lack of which was a shortcom-
ing of the original therapy recommendation code described in Section 5.4
of Chapter 5. It deals with an optimization problem that seeks to provide
"coverage" for organisms while minimizing the number of drugs pre-
scribed. There are many factors to consider, such as prior therapies and
drug sensitivities, and a person often finds it hard to juggle all of the
constraints at once. Whenthe optimal solution is provided by a computer
program, its correctness may not be immediately obvious to the user. This
motivates our desire to provide an explanation capability to justify the
programs results.
The explanation capability derives from two basic programming con-
siderations. First, we have used heuristics that capture what expert physi-
cians consider to be good medical practice. Thus, while the program is not
designed to mimic the step-by-step problem-solving behavior of a physi-
cian, its chief decision criteria have been provided by expert physicians. It
is accordingly plausible that the criteria will makesense to other physicians.
The second consideration is that the program must maintain records
of decisions that were made. These are used for explaining what occurred

This chapter is an expanded version of a paper originally appearing in Proceedings of the IJCAI
1977. Used by permission of International Joint Conferences on Artificial Intelligence, Inc.;
copies of the Proceedings are available from William Kaufmann, Inc., 95 First Street, Los
Altos, CA94022.

133
134 Details of the RevisedTherapy
Algorithm

during the optimization process and why the output was not different.
While the maintenance of records for explanation purposes is not new
(e.g., see Winograd, 1972; Bobrow and Brown, 1975; Scragg, 1975a;
1975b), the means that we use to retrieve them are novel, namely a state
transition representation of the algorithm. Our work demonstrates that a
cleanly structured algorithm can provide both sophisticated performance
and a simple, useful explanation capability.

6.1 The
Problem

The main problem of the therapy selector is to prescribe the best drug for
each organism thought to be a likely cause of the infection, while mini-
mizing the total number of drugs. These two constraints often conflict: the
best prescription for, say, four items may require four different drugs,
although for any patient usually no more than two drugs need to be given
(or should be, for reasons of drug interaction, toxic side effects, cost, etc.).
The original therapy program lacked a general scheme for relating
the local constraints (best drug for each item) to the global constraint (few-
est possible number of drugs). As we began to investigate the complexities
of therapy selection, it became necessary to patch the program to deal with
the special cases we encountered. Before long we were losing track of how
any given change would affect the programs output. Wefound it increas-
ingly difficult to keep records during the program execution for later use
in the explanation system; indeed, the logic of the program was too con-
fusing to explain easily. Wedecided to start over, aiming for a more struc-
tured algorithm that would provide sophisticated therapy, and by its very
organization would provide simple explanations for a naive user. The ques-
tion was this: what organization could balance these two, sometimes con-
tradictory, goals?
Because we wanted to formulate judgments that could be provided by
physicians and would appear familiar to them, we decided not to use math-
ematical methods such as evaluation polynomials or Bayesian analysis. On
the other hand, MYCINsinferential rule representation seemed to be
inadequate because of the general algorithmic nature of the problem (i.e.,
iteration and complex data structures). Weturned our attention to sepa-
rating out the optimization criteria of therapy selection from control in-
formation (specifications for iteratively applying the heuristics). As is dis-
cussed below, the key improvement was to encode canonically the
optimization performed by the inner loop of the algorithm.
Our Solution 135

6.2 Our Solution

6.2.1 Local and Global Criteria

We found that viewing the optimization problem in terms of local and


global criteria provides a fruitful means for structuring the problem. Local
criteria are the item-specific factors, such as sensitivity of the organism to
preferred drugs, toxicity of drugs, the desire to "reserve" drugs for more
serious diseases, and the desire to continue current therapy if possible.
Global criteria deal with the entire recommendation; we wished to mini-
mize the number of drugs, prescribing only two drugs if possible to cover
for all of the most likely organisms. 1 In addition, there were a few patient
factors to consider, such as allergies to antibiotics.
Besides providing for optimal therapy, we wished to provide for an
explanation capability that would list simple descriptions of the therapy
selection heuristics used by the algorithm, as well as reasons for not making
a different recommendation.

v PLAN GENERATE ~ TEST OUTPUT

(local factors) (global) (global)


"~
v RANK PROPOSE ~ APPROVE PRESCRIBE

T I
FIGURE6-1Therapy selection viewed as a plan-generate-
and-test process.

After clearly stating these design goals, we needed an implementation


scheme that would bring about the optimization. The key to our solution
was the use of a generate-and-test control structure for separately apply-
ing the local and global factors. Figure 6-1 shows the steps of the plan-
generate-and-test method and, below them, the corresponding steps of
our algorithm. Briefly, the steps are

1. plan by ranking the drugs--the local factors are considered here;

IHere we realized that we could group the items into those that should definitely be treated
("most likely") and those that could be left out when three or more drugs would be necessary.
136 Details of the RevisedTherapy
Algorithm

Number
of drugsof eachrank."
Instruction first second third
1 1 0 0
2 2 0 0
3 1 1 0
4 1 0 1

FIGURE
6-2 Instructions for the therapy proposer.

2. propose a recommendation and test it, thus dealing with the global
factors; and
3. make a final recommendation.

The following sections consider these steps in more detail.

6.2.2 Plan

Westart with an initial list of drugs to which each organism issensitive


and sort it by applying production rules for ranking. These reranking rules
are applied independently for every organism to be treated. The chief
purpose of this sorting process is to incorporate drug sensitivity informa-
tion for the organisms growing in cultures taken from the patient, z Thus
we arrive at a patient-specific list of drugs for each organism, reranked
and grouped into first, second, and third ranks of choices.
Because this sorting process is a consideration specific to each orga-
nism, we refer to it as a local criterion of optimal therapy. Wecall it (loosely)
a planning step because it makes preparations for later steps.

6.2.3 Generate

The second step of the algorithm is to take the ordered drug lists and
generate possible recommendations. This is done by a proposer that selects
subsets of drugs (a recommendation) from the collection of drugs for all
of the organisms to be treated. Selection is directed by a fixed, ordered set
of instructions that specify how manydrugs to select from each preference
group. The first few instructions are listed in Figure 6-2. For example, the

2Atypical rule mightbe "If the organismgrowingfromthe culture appearsto be resistant


to the drug,thenclassify the drugas a third choice."
OurSolution 137

third instruction tells the proposer to select a drug from each of the first
and second ranks. Instructions for one- and two-drug recommendations
are taken from a static list; those for recommendations containing three
or more drugs are generated from a simple pattern.
It should be clear that the ordering of the instructions ensures that
two of the global criteria will be satisfied: prescribing one or two drugs if
possible, and selecting the best possible drug(s) for each organism.
instruction therefore serves as a canonical description of a recommenda-
tion. Consequently, we can "reduce" alternate subsets of drugs to this form
(the number of drugs of each rank) and compare them.

6.2.4 Test

Since all of the drugs for all of the organisms were grouped together for
use by the proposer, it is quite possible that a proposed recommendation
will not cover all of the most likely organisms. For example, the proposal
might have two drugs that are in the first rank for one item but are second
or third for other items, or are not even on their lists. Thus the first step
of testing is to makesure that all of the most likely items are covered.
The second test ensures that each drug is in a unique drug class. For
example, a proposal having both gentamicin and streptomycin would be
rejected because these two drugs are aminoglycosides and therefore cause
a "redundant" effect.
The last test is for patient-specific contraindications. These rules take
into account allergies, age of the patient, pregnancy, etc. These rules are
relatively expensive to apply, so they are done last, rather than applying
them to each possible drug in the plan step. With this test we have dealt
with the last global criterion of therapy selection. The first proposal that
satisfies these three tests becomes the therapy advice. The details of drug
prescription will not be considered further here; it consists primarily of
algorithmic dosage calculation and adjustment in the case of renal failure.

6.2.5 Performance

We have found that the algorithm described above is manageable and


performs well. It is straightforward to add new rules for ranking the drugs
and for testing the proposals. The canonical instructions are relatively
fixed, but it would not be difficult, for example, to provide infection-spe-
cific instruction sets. The program has made acceptable recommendations
for a library of more than 100 meningitis patients.
138 Details of the RevisedTherapyAlgorithm

6.3 The Explanation Capability

Wewill now consider how the structure of the algorithm is exploited to


produce simple explanations. A sample question about therapy selection
is shown in Figure 6-3. The medical decisions that were applied to the
drug chloramphenicol are listed as a logical sequence of reasons, which is
produced by retrieving and printing traces that were left behind by the
program. The trace retrieval program is termed CHRONICLER because
its explanations consist of a chronicle of decision events.

** WHY DIDYOUGIVE CHLORAMPHENICOL FOR E.COLI IN REC-I?


CHLORAMPHENICOLwasproscribed for ITEM-2 in RECOMMENDATION-1:
Since
-- CHLORAMPHENICOL
is a treatment of choice
for e.coliin meningitis
-- ITEM-2is sensitiveto CHLORAMPHENICOL
-- there
werenocontraindications
forit
CHLORAMPHENICOLwasprescribed because it waspartof the
recommendation
thatcovers
for all of theitems,usingthefewest number
of drugs.

FIGURE 6-3 A question concerning why a drug was pre-


scribed. (Users input follows the doubleasterisks.)

Figure 6-4 shows the general organization of the Explanation System.


The traces (discussed below) constitute a dynamic event history. A chronicle
of events is printed by using a process transition diagram to selectively
retrieve the relevant traces.
Figure 6-5 shows the kind of transition diagramwe use to represent
the steps of therapy selection. The states roughly correspond to the gen-
erate and test steps shownin Figure 6-1. The arrows are labeled as positive
(pos) and negative (neg) criteria (i.e., criteria that support or oppose
recommendation of a given drug). These correspond to the medical strat-
egies, e.g., "The drug is on the treatment-of-choice list for the organism
(the initial list) and so was considered to cover for the organism." If a drug
is prescribed, there must be a sequence of positive criteria leading from
the first state to the output state. These are the reasons offered the user
as an explanation for prescribing the drug. To make the explanation
clearer, the states are reordered into three groups (planning criteria, testing
criteria, and generate and output criteria) to conform to the following
general scheme:
Since
--<plan criteria>
--<test criteria>
The Explanation Capability 139

(dynamic)

.I
I MYCIN
PROGRAM
I
Records
left
behind

Event
History

(static)

TRANSITION
Event
DIAGRAM Structure

Answerto
Question

FIGURE
6-4 Organization of the Explanation System.

(therefore)
<generate and output criteria>

On the other hand, ifa drug is not prescribed, there must be a negative
criterion to explain whyit dropped out of contention if it was on the initial
list. Failure to prescribe can be caused by either failure to consider the

pos .pos
cnteria cnteria
STATE-1 ~ STATE-2 STATE-3 " ~ STATE-N
{goal}

neg criteria neg criteria

STATE-2 STATE-3
(final} (final}

FIGURE
6-5 The state transition diagram.
140 Details of the Revised TherapyAlgorithm

** WHYDIDNTYOUSUGGESTPENICILLIN IN REC-1FORSTAPH-COAG+?
PENICILLIN
wasnotprescribed
for ITEM-1
in RECOMMENDATION-1
:
PENICILLIN
was discounted
forITEM-1because
it is NOT
DEFINITE
thattheitemis sensitive
to thisdrug.
There
areotherpotentialtherapies
underconsideration
which
aremuchmoredesirable,
viz., currenttherapies
ordrugs
towhich theitem
is definitely
sensitive.
Would
youlike to seesomedetails?
** YES
Thedrugsto which thestaphylococcus-coag-pos
is sensitive
are:cephalothin
(1.0) vancomycin
(1.0)
gentamycin
(1.0) tobramycin(1.0) erythromycin-and-tetracycline
(1.0) chloramphenicol-and-erythromycin
(1.0)
[RULE098 RULE445]
Would
youlike to know aboutthehistory of PENICILLIN
in thedecision
process
upto thispoint? ** YES
-- PENICILLIN
is a treatmentof choicefor staphylococcus-coag-pos
in meningitis.
Butasexplained
above,
PENICILLINwasdiscounted.

FIGURE6-6 Question concerning why a drug was not pre-


scribed.

drug (plan) or failure of a test. A third possibility is that the drug wasnt
part of an acceptable recommendation, but was otherwise a plausible choice
(when considered alone). In this case, the drug needs to be considered
the context of a full recommendation for the patient. 3 (See Figure 6-9 for
an example.)
Figure 6-6 shows an example of a question concerning why a drug was
not prescribed. In response to a question of this type, the negative criterion
is printed and the user is offered an opportunity to see the positive deci-
sions accrued up to this point. In this example we see that penicillin was
not prescribed because it is not definite that the item is sensitive to this
drug. That is the negative criterion. The fact that penicillin was a potential
treatment of choice permitted its transition to the reranking step. 4 This is
shown in Figure 6-7. When MYCINs rules (as opposed to Interlisp code)
are used to make a transition decision, we can provide further details, as
shown in Figure 6-6.
For questions involving two drugs, e.g., "Why did you prescribe chlor-
amphenicol instead of penicillin for Item-l?", CHRONICLERis invoked
to explain why the rejected drug was not given. Then the user is offered
the opportunity to see why the other drug was given.
To summarize, MYCINleaves behind traces that record the application

3Eventsare recordedas properties of the drugs they involve. Thetrace includes other contexts
such as the item being considered. To deal with iteration, events are of two types: enduring
and pass-specific. Enduringevents represent decisions that, oncemade,are never reconsidered,
e.g., the initial ranking of drugs tbr each organism.Pass-specific events maynot figure in the
final result; they mayindicate computationthat ~ailed to producea solution, e.g., proposing
a drug as part of a specific recommendation.Thus traces are accessed by drug nameand the
context of the computation, including which pass of the generate-and-test process produced
the final solution.
4penicillin is givenfor staph-coag+ only if the organismis knownto be sensitive to that agent.
Comparing Alternative Recommendations 141

v initial I~ plan1 v plan2 ~ .


"treatment
of choice"

1
"itemnot sensitive"

plan2
{final}

FIGURE
6-7 Trace history for the question shownin Figure
6-6.

of the positive and negative criteria. The Explanation System uses a state
transition diagram that represents the steps of the algorithm to retrieve
the relevant traces in a logical order.
It is interesting to note that CHRONICLER is described well by Bob-
row and Browns synthesis, contingent knowledge, and analysis (SCA) par-
adigm for understanding systems (Bobrow and Brown, 1975). Contingent
knowledge is a record of program-synthesized observations for later use
by an analysis program to answer questions or comment on the observed
system. In CHRONICLER the traces and transition diagram constitute the
contingent knowledge structure. Synthesis (abstraction of results) is per-
formed by the therapy selector as it classifies the drugs in the various
decision steps and records its "observations" in traces. Analysis is per-
formed by CHRONICLER as it "reads" the traces, interpreting them in
terms of" the state transition diagram. The meteorology question-answering
system described by Brown et al. (1973) uses a similar knowledge repre-
sentation.

6.4 Comparing Alternative Recommendations

The therapy program will print alternative recommendations if asked to


do so. This is done by eliminating drugs from contention and rerunning
the algorithm. Figure 6-8 shows an example of an alternative recommen-
dation and a comparison of this alternative to the algorithms first choice.
The comparison is made by simply reducing the recommendations to can-
onical form and then determining their relative ranking in the list of in-
structions. A recommendation is better than another when it reduces to a
better instruction, i.e., whenthe preference description of its drugs appears
earlier in the fixed list of instructions. It is also possible that the comparison
142 Details of the Revised TherapyAlgorithm

Doyouwishto seethenextchoice therapy?


** YES
Which,if any,of theabovedrugsdoyouwisheliminated
from
consideration?
**CHLORAMPHENICOL
[REC-2] Nextbesttherapy recommendation:
In orderto coverforItems<23 4>:
Give: GENTAMICIN
In orderto coverfor Item<t>:
Give:METHICILLIN
RECOMMENDATION-2 is substantially inferior to RECOMMENDATION-1
(because
it requires
moredrugsor drugsthatarenotasgood).
Would
youlike to seedetailsof thiscomparison?
** YES
Looking
at thecoverageof themostlikelyitems,
weseethat:
RECOMMENDATION-1 consistsof 1 first choicedrug
(CHLORAMPHENICOL for ITEM-2 andITEM-3)andI secondchoicedrug
(GENTAMICIN for ITEM-3 andITEM-t)
RECOMMENDATION-2 consistsof 1 first choicedrug(METHICILLIN
for ITEM-l)and1 third choicedrug(GENTAMICIN for ITEM-2
and
ITEM-3)
Doyouwishto seethenextchoicetherapy?
** NO

FIGURE6-8 Comparing alternative recommendations.

will be a "close call," because one of the recommendations might use better
drugs for the most likely organisms but cover for fewer of the less likely
organisms. Again, it is the ability to encode output canonically that gives
us the ability to make such a direct comparison of alternatives.

6,5 Evaluating a Users Choice of Therapy

The comparison described above is useful to a physician who prefers to


give therapy other than MYCINs first choice and wishes to know if the
program truly considers it to be suboptimal therapy. However, it is tedious
for the user to request all possible alternatives to be printed, so we offer
the user the opportunity to enter his or her own choice of drugs for the
organisms that require therapy (Figure 6-9).
Each drug the user suggests for an item is first formed into a standard
internal question for CHRONICLER:"Why wasnt <drug> prescribed for
<item>?" If there is a negative criterion about this drug for this item in
the event history, it is printed and the user is given the option of selecting
another drug.
Evaluating a Users Choice of Therapy 143

** WHYDIDNT YOUGIVE AMPICILLIN-AND-GENTAMICIN


FORE.COLI IN REC-I?
AMPICILLIN-AND-GENTAMICIN
wasnot prescribed for ITEM-2in
RECOMMENDATION-1:
AMPICILLIN-AND-GENTAMICINis a plausible choicefor e.coli in
meningitis,andwasnot explicitly rejectedfor useagainstITEM-2in
RECOMMENDATION-1.However,the best therapy did not include
AMPICILLIN-AND-GENTAMICIN.
If youwouldlike to suggesttherapywhichincludes
AMPICILLIN-AND-GENTAMICIN, your regimenwill be comparedto MYCINs,
Wouldyoulike to dothis? ** YES
For eachitem in turn, enter the drugyouwouldhaveprescribedin
FIECOMMENDATION-1.
In order to minimizethe number of drugsin your recommendation,
youmay
not wantto prescribetherapyfor everyitem. Itemswhichrepresentthe
mostlikely organisms are indicatedwith a plus sign (+).
+ ITEMol-- the staphylococcus-coag-pos** GENTAMIClN
+ ITEM-2-- the e.coli ** AMPICILLINGENTAMICIN
+ ITEM3-- the klebsiella-pneumoniae
** GENTAMICIN
+ ITEM-4-- the pseudomonas-aeruginose
**
[Checkingfor contraindications...]
[Considering AMPICILLIN-AND-GENTAMICIN
for use against iNFECTION-1
...]
[No contraindicationsfound...]
[Nowcomparing
your prescription to MYCINs...]
Perhapsyoudid not realize that oneof the drugsyouprescribed,
GENTAMICIN,will coverfor ITEM-4,an item for whichyoudid not
prescribetherapy.I havechanged yourprescriptionaccordingly.
ORGANISMS Yourregimen MYCINsregimen
Drug-- Choice Drug-- Choice
"mostlikely"
ITEM-3 GENTAMICIN
-- 3rd CHLORAMPHENICOL-AND-
GENTAMICIN
-- 1st
ITEM-2 AMPICILLIN-AND- CHLORAMPHENICOL--
1st
GENTAMICIN-- 1st
ITEM-1 GENTAMICIN
-- 2nd GENTAMICIN
-- 2nd
"lesslikely"
ITEM-4 GENTAMICIN
- 2nd GENTAMICIN
-- 2nd
(Thedesirabilityof a drugis definedto beits lowestrankingfor the
itemsit covers.)
Bothprescriptionsincludefewer than3 drugs,so wemustlook at how
highly rankedeachprescriptionis for the mostlikely organism(s).
Yourprescriptionof 1 first choicedrug(AMPICILLIN
for ITEM-2)and
third choice drug(GENTAMICIN for ITEM-3)is not as goodas MYCINs
prescription of 1 first choicedrug (CHLORAMPHENICOLfor ITEM-2and
Item-3) and1 secondchoice drug (GENTAMICIN for ITEM-l).
[You mayrefer to your regimenas RECOMMENDATION-2
in later questions.]

FIGURE 6-9 Evaluating a users choice of therapy.


144 Details of the RevisedTherapy
Algorithm

Once the user has supplied a set of drugs to cover for all of the most
likely organisms, his or her proposal is tested for the criteria of drug class
uniqueness and patient-specific factors (described in Section 6.2.4). If the
proposal is approved, this recommendation is compared to the programs
choice of therapy, just as the program compares its alternatives to its own
first-choice recommendation. 5 It is also possible to directly invoke the ther-
apy comparison routine.

6.6 Some Unsolved Problems

There are a number of improvements that could be made to this system.


Amongthe most important to potential users is a more flexible question
format. In our experience physicians tend to address short, unspecific
questions to the program, e.g., "Why ampicillin?" or "What happened to
E. coli?" Processing these questions will require a fairly sophisticated pre-
processor that can help the user define such a question more precisely, or
at least make some plausible assumptions.
Second, we anticipate the need to explain the heuristics, which now
are describable only in a template form. 6 A user might like to know what
a "drug sensitivity" is or why a heuristic was not used. Providing simple,
fixed-text definitions is easy, but discussing a particular heuristic to the
extent of explaining whyit was not applicable is well beyond the capabilities
of this Explanation System. One possible solution is to represent the heu-
ristics internally in a rulelike form with a set of preconditions in program-
readable predicates, like MYCINsrules. Wecould then say, for example,
that a drug was lowered in rank because its sensitivity was "intermediate,"
even though it was a current therapy (which would otherwise be reason
for continuing to prescribe it). Thus we would be splitting a medical cri-
terion into its logical components. Moreover, human explanations some-
times include hypothetical relations that have important instructional ben-
efit, e.g., "If all of the drugs had been intermediate, then this current
therapy would have been given preference." In general, paraphrasing ex-
planations, explaining why an event failed to take place, and relating de-
cisions are difficult because they require some representation of what the
heuristics mean. Providing a handle on these underlying concepts is a far
cry from a system that can only fill in templates.
Third, it is important to justify the medical heuristics and initial pref-

5Theexplanationsat this point are morepedagogicalthan those suppliedwhenthe prugram


compares its ownalternatives. It seemsdesirableto phrasecomparisonsas positivelyas pos-
sible to avoidirritating the user.
6Thatis, eachmedicalheuristic has a string withblanksassociatedwithit, e.g., <drug>"was
discountedfor" <item>"becauseit wasnot definite that the itemwassensitiveto this drug."
Conclusions 145

erence ranks for drugs. Wenow provide text annotations that include ref-
erences and comments about shortcomings and intent.
Finally, we could further develop the tutorial aspects of the Explana-
tion System. Rather than passively answering questions, the Explanation
System might endeavor to teach the user about the overall structure and
philosophy of" the program (upon request!). For example, a user might
appreciate the optimality of the results better if he or she understood the
separation of factors into local and global considerations. Besides explain-
ing the results of a particular run, an Explanation System might charac-
terize individual decisions in the context of the programs overall design.
Parts Six and Eight discuss the issues of explanation and education in more
detail.

6.7 Conclusions
Wehave developed a system that prescribes optimal therapy and is able to
provide simple, useful explanations. The system is based on a number of
design ideas that are summarized as follows:

1. separate the local and global optimality criteria;


2. apply these criteria in comprehensible steps--a generate-and-test con-
trol structure was found to be suitable;
3. justify selected therapies by using canonical descriptions that
a. juggle several global criteria at once, and
b. permit direct comparison of alternatives; and
4. exploit the simple control structure by using a state transition diagram
to order retrieval of traces.

In addition, the Explanation System has benefited from a few simplifying


factors:

1. There are relatively few traces (fewer than 50 drugs to keep track of
and fewer than 25 strategies that might be applied).
2. There is a single basic question: Whywas (or was not) a particular drug
prescribed for a particular organism?

While this therapy selection algorithm may appear straightforward, it


is the product of trying to codify an unstructured list of factors presented
by physicians. The medical experts did not order these considerations and
were not sure how conflicting constraints should be resolved. The frame-
work we imposed, namely, invoking optimality criteria locally and globally
146 Details of the RevisedTherapyAlgorithm

within a generate-and-test control structure and describing output can-


onically, provided a language that enabled us to codify the physiciansjudg-
ments, thereby significantly improving the performance and manageability
of the program.
Moreover, this well-structured design enables us to print simple ex-
planations of the programs decisions and to compare alternative solutions.
Wehave provided this facility because we want the program to be used
intelligently. If a user is confused or disagrees with the optimality criteria,
we expect him or her to feel free to reject the results. The explanation
system we have provided is intended to encourage thoughtful use of the
therapy selection program.
PART THREE

Building a Knowledge
Base
7
Knowledge Engineering

From early experience building the DENDRAL system, it was obvious to


us that putting domain-specific knowledge into a program was a bottleneck
in building knowledge-based systems (Buchanan et al., 1970). In other
systems of" the 1960s and early 1970s, items of knowledge were cast as LISP
functions. For example, in the earliest version of DENDRAL the fact that
the atomic weight of carbon is 12 was built into a function, called WEIGHT,
which returned 12 when called with the argument C. The function "knew
about" several commonchemical elements, but when new elements or new
isotopes were encountered, the function had to be changed. Because we
wanted to keep our programs "lean" to run in 64K of working memory,
we gave our programs only as much knowledge as we thought they would
have to know. Thus we often encountered missing items in running new
test cases. It was very quickly seen that LISP property lists (data structures)
were a superior alternative to LISP code as a way of storing simple facts,
so definitions of functions like WEIGHT were changed to retrievals from
property lists (using GETPROPsand macros). Defining new objects and
properties was trivial in comparison to the overhead of editing functions.
This was the beginning of our realization that there is considerable flexi-
bility to be gained by separating domain-specific knowledge from the code
that uses that knowledge. This was also our first encounter with the prob-
lem that has come to be known as knowledge acquisition (Buchanan et al.,
1970).

The Nature of the Knowledge Acquisition


7,1 Process

Knowledgeacquisition is the transfer and transformation of problem-solving


expertise from some knowledge source to a program. There are many

Section 7.1 is largely taken from material originally written for Chapter 5 of Building Expert
Systen~ (eds., E Hayes-Roth, D. Waterman, and D. Lenat). Reading, Mass.: Addison-Wesley,
1983.
149
150 KnowledgeEngineering

sources we might turn to, including humanexperts, textbooks, data bases,


and our own experience. In this section we will concentrate mostly on
acquiring knowledge from human experts in an enterprise known as
knowledge engineering (Hayes-Roth et al., 1983). These experts are spe-
cialists (but not necessarily unique individuals) in a narrow area of knowl-
edge about the world. The expertise that we hope to elucidate is a collection
of definitions, relations, specialized facts, algorithms, strategies, and heu-
ristics about the narrow domain area. It is different from general knowl-
edge about the domain and from commonsense knowledge about the
world, some of which is also needed by expert systems.
A knowledge base for an expert system is constructed through a pro-
cess of iterative development. After initial design and prototype imple-
mentation, the system grows incrementally both in breadth and depth.
While other large software systems are sometimes built by accretion, this
style of construction is inescapable for expert systems because the requisite
knowledge is impossible to define as one complete block.
One of the key ideas in constructing an expert system is transparency--
making the system understandable despite the complexity of the task. An
expert system needs to be understandable for the following reasons:

the system matures through incremental improvements, which require


thorough understanding of previous versions and of the reasons for
good and poor performance on test cases;
the system improves through criticism from persons who are not (or
need not be) familiar with the implementation details;
the system uses heuristic methods and symbolic reasoning because math-
ematical algorithms do not exist (or are inefficient) for the problems
solves.

7.1.1 Modes of Knowledge Acquisition

The transfer and transformation required to represent expertise for a


program may be automated or partially automated in some special cases.
Most of the time a person, called a knowledge engineer, is required to
communicate with the expert and the program. The most difficult aspect
of knowledge acquisition is the initial one of helping the expert concep-
tualize and structure the domain knowledge for use in problem solving.
Because the knowledge engineer has far less knowledge of the domain
than does the expert, by definition, the process of transferring expertise
into a program is bound to suffer from communication problems. For
example, the vocabulary that the expert uses to talk about the domain with
a novice is probably inadequate for high-performance problem solving.
There are several modes of knowledge acquisition for an expert sys-
tem, which can be seen as variations on the process shown in Figure 7-1.
TheNatureof the Knowledge
AcquisitionProcess 151

EXPERT ~.- KNOWLEDGE EXPERTSYSTEM


ENGINEER

DATA,TEXTS KNOWLED~GEBASE

FIGURE 7-1 Important elements in the transfer of expertise.


Feedbackto the expert about the systems performanceon test
cases is not shown.

All involve transferring, in one way or another, the expertise needed for
high-performance problem solving in a domain from a source to a program.
The source is generally a human expert, but could also be the primary
sources from which the expert has learned the material: journal articles
(and textbooks) or experimental data. A knowledge engineer translates
statements about the domain from the source to the program with more
or less assistance from intelligent programs. And there is variability in the
extent to which the knowledgebase is distinct from the rest of the system.

Handcrafting

Conceptually, the simplest way for a programmer to put knowledge into a


program is to code it in. This was the standard mode of building AI pro-
grains in the 1950s and 1960s because the main emphasis of most of those
systems was demonstrating intelligent behavior for a few problems. AI
programmers could be their own experts for many game-playing, puzzle-
solving, and mathematics programs. And a few domain specialists became
their own AI programmers in order to construct complex systems (Colby,
1981; Hearn, 1971). Whenthe programmer and the specialist are not the
same person, however, it is risky to rely on handcrafting to build complex
programs embodying large amounts of judgmental knowledge. Generally,
it is slow to build and debug such a program, and it is nearly impossible
to keep the problem-solving expertise consistent if it grows large by small
increments.

Knowledge Engineering

The process of working with an expert to map what he or she knows into
a form suitable fbr an expert system to use has come to be known as
knowledge engineering (Feigenbaum, 1978; Michie, 1973).
As DENDRAL matured, we began to see patterns in the interactions
152 KnowledgeEngineering

between the person responsible for the code and the expert responsible
for the knowledge. There is a dialogue which, at first, is muchlike a systems
analysis dialogue between analyst and specialist. The relevant concepts are
named, and the relations among them made explicit. The knowledge engi-
neer has to become familiar enough with the terminology and structure
of the subject area that his or her questions are meaningful and relevant.
As the knowledge engineer learns more about the subject matter, and as
the specialist learns more about the structure of the knowledge base and
the consequences of expressing knowledge in different forms, the process
speeds up.
After the initial period of conceptualization, in which most of the
framework for talking about the subject matter is laid out, the knowledge
structures can be filled in rather rapidly. This period of rapid growth of
the knowledge base is then followed by meticulous testing and refinement.
Knowledge-engineering tools can speed up this process. For example, in-
telligent editing programs that help keep track of changes and help find
inconsistencies can be useful to both the knowledge engineer and the ex-
pert. At times, an expert can use the tools independently of the knowledge
engineer, thus approaching McCarthys idea of a program accepting advice
from a specialist (McCarthy, 1958). The ARLeditor incorporated in EMY-
CIN (see Chapters 14-16) is a simple tool; the TEIRESIASdebugging
system (discussed in Chapter 9) is a more complex tool. Politakis (1982)
has recently developed a tool for examining a knowledge base for the
EXPERTsystem (Kulikowski and Weiss, 1982) and suggesting changes,
much like the tool for ONCOCIN discussed in Chapter 8.
A recent experiment in knowledge engineering is the ROGETpro-
gram (Bennett, 1983), a knowledge-based system that aids in the concep-
tualization of knowledge bases for EMYCIN systems. Its knowledge is part
of what a knowledge engineer knows about helping an expert with the
initial process of laying out the structure of a new body of knowledge. It
carries on a dialogue about the relationships among objects in the new
domain, about the goal of the new system, about the evidence available,
and about the inferences from evidence to conclusions. Although it knows
nothing (initially) about a new knowledge base, it knows something about
the structure of other knowledge bases. For example, it knows that evi-
dence can often be divided into "hard" evidence from instruments and
laboratory analysis and "soft" evidence from subjective reports and that
both are different from identifying features such as gender and race. Much
more remains to be done, but ROGETis an important step in codifying
the art of knowledge engineering.

Various Forms of "Learning"

For completeness, we mention briefly several other methods of building


knowledge-based programs. We have not experimented with these in the
context of MYCIN,so we will not dwell on them.
Knowledge Acquisition in MYCIN 153

Learning from examples may automate much of the knowledge ac-


quisition process by exploiting large data bases of recorded experience
(e.g,, hospital records of patients, field service records of machinefailures).
The conceptualization stage may be bypassed if the terminology of the
records is sufficient for problem solving. Induction of new production
rules from examples was used by Waterman (1970) in the context of the
game of poker and in Meta-DENDRAL (Lindsay et al., 1980) in the context
of" mass spectrometry. The RX system (Blum, 1982) uses patient records
to discover plausible associations.
Other methods of learning are discovery by exploration of new con-
cepts and relations (Lenat, 1983), reading published accounts (Ciesielski,
1980), learning by watching (Waterman, 1978), and learning by analogy
(Winston, 1979). See Buchanan et al. (1978) and Barr and Feigenbaum
(1982) for reviews of automatic learning methods.

7.2 Knowledge Acquisition in MYCIN

In the MYCINwork we experimented with computer-based tools to ac-


quire knowledge from experts through interactive dialogues. TEIRESIAS,
discussed in Chapter 9, is the best-known example. In discussing knowl-
edge acquisition, it is important to remember that there are separate pro-
grams under discussion: the expert system, i.e., MYCIN,and the programs
that provide help in knowledge acquisition, i.e., TEIRESIAS.
As mentioned above, MYCIN itself was an experiment in keeping med-
ical knowledge separate from the rest of the program. Webelieved that
this would simplify knowledgeacquisition, and it does, but not to the extent
we had hoped. Because the syntax of the elements carrying knowledge was
simplified, however, our focus shifted from the mechanics of editing those
elements to the contents of those knowledge structures. That is, there was
an important conceptual shift from thinking of editing data structures to
thinking of modifying knowledgestructures; we have come to call the latter
process knowledge programming.
The processes of constructing and editing a knowledge base became
interesting subjects of our research. Wecould see that the communication
between expert and program was very slow. So we began investigating
computer-based tools that would facilitate the transfer of expertise. In the
original version of" MYCIN,there were some tools for helping Shortliffe,
as knowledge engineer, build and modify the infectious disease knowledge
base. No attempt was made to get experts to use the tools directly, although
that was clearly a next step. These first tools included a rule language
(syntax and parser) that allowed entering a new rule in a quasi-English
form. In the example shown in Figure 7-2, the user indicates a desire to
enter a new rule by typing NR. He or she is then asked for a rule in English,
154 KnowledgeEngineering

**NR [Theknowledgeengineerstarts
Thenewrule will becalledRULE200. theruleacquisition
routineby
If: 1-** THEORGANISM IS A GRAM
NEGATIVEROD typingNRfor New Rule.]
and2-** IT IS ANAEROBIC
and3-** IT WAS ISOLATEDFROM
THEBLOOD
and4-** YOUTHINKTHEPORTAL WASTHEGI TRACT
and5-**
[user:cardage
returnwithno
entry]
Then:1-** IT IS PROBABLY A BACTEROIDES
Ona scaleof 1 to 10, howmuchcertaintywould
youaffix to thisconclusion?
** 9
and 2-**
[user:carriage
returnwithno
entry]
Thisis myunderstanding
of yourrule:
RULE200

IF: 1) Thesite of thecultureis blood,and [Note


thattheoriginalclause1
2) Thestainof the organism is gramneg, and hasbeenexpanded to separate
3) Themorphology of the organismis rod, and thetwoattributes,stainand
4) Theaerobicity of theorganism is anaerobic, and morphology.]
5) Theportalof entryof theorganism is GI
THEN:There is stronglysuggestiveevidence (.9) that the
organismis bacteroides

Okay?(YESor NO)
** YES

FIGURE7-2 Example of rule acquisition in the original


(1974) MYCIN
program. (Users input follows double asterisks.)

following the format of" other rules in the system. MYCINtranslates the
rule into its internal LISP representation and then translates it back into
English to print out a version of the rule as it has understood the meaning.
The user is then asked to approve the rule or modify it. The original system
also allowed simple changes to rules in a quick and easy interaction, much
as is shown in Figure 7-2 for acquiring a new rule.
This simple model of knowledge acquisition was subsequently ex-
panded, most notably in the work on TEIRESIAS (Chapter 9). Many
the ideas (and lines of LISP code) from TEIRESIAS were incorporated
EMYCIN(Part Five). Contrast Figure 7-2 with the TEIRESIAS example
in Section 9.2 and the EMYCINexample in Chapter 14 for snapshots of
our ideas on knowledge acquisition. Research on this problem continues.
Two of our initial working hypotheses about knowledge acquisition
have had to be qualified. We had assumed that the rules were sufficiently
independent of one another that an expert could always write new rules
without examining the rest of the knowledge base. Such modularity is
desirable because the less interaction there is among rules, the easier and
safer it is to modify the rule set. However, we fbund that some experts are
KnowledgeAcquisition in MYCIN 155

1. Expert tells knowledgeengineer what rules to add or modify.


2. Knowledgeengineer makes changes to the knowledgebase.
3. Knowledgeengineer runs one or more old cases for consistency checking.
4. If any problemswith old cases, knowledgeengineer discusses them with expert,
then goes to Step I.
5. Expert runs modified system on new case(s) until problems are discovered.
6. If no problemson substantial numberof cases, then stops; otherwise, goes to
Step 1.

FIGURE 7-3 The major steps of rule writing and refinement


after conceptualization.

helped if they see the existing rules that are similar to a new rule under
consideration, where similar means either that the conclusion mentions the
same parameter (but perhaps different values) or that the premise clauses
mention the same parameters. The desire to compare a proposed rule with
similar rules stems largely from the difficulty of assigning CFs to new rules.
Comparing other evidence and other conclusions puts the strength of the
proposed rule into a partial ordering. For example, evidence el for con-
clusion C could be seen to be stronger than e2 but weaker than e3 for the
same conclusion. Wealso assumed, incorrectly, that the control structure
and CF propagation method were details that the expert could avoid learn-
ing. That is, an expert writing a new rule sometimes needs to understand
how the rule will be used and what its effect will be in the overall solution
to a problem. These two problems are illustrated in the transcripts of
several electronic mail messages reprinted at the end of Chapter 10. The
transcripts also reveal much about the vigorous questioning of assumptions
that was taking place as rules were being written.
Throughout the development of MYCINsknowledge base about in-
fectious diseases (once a satisfactory conceptualization for the problem was
found), the primary mode of interaction between the knowledge engineer
and expert was a recurring cycle as shown in Figure 7-3. Much of the
actual time, particularly in the early years, was spent on changes to the
code, outside of this loop, in order to get the system to work efficiently (or
sometimes to work at all) with new kinds of knowledge suggested by ex-
perts. Considerable time was spent with the experts trying to understand
their larger perspective on diagnosis and therapy in infectious disease. And
some time was spent trying to reconceptualize the programs problem-
solving framework. Webelieved that the time-consuming nature of the six-
step loop shown in Figure 7-3 was one of the key problems in building an
expert system, although the framework itself was simple and effective.
Thus we looked at several ways to improve the experts and knowledge
engineers efficiency in the loop.
For Step 1 of the loop we created facilities for experts (or other users)
156 Knowledge
Engineering

to leave comments for the knowledge engineers. Wegave them an English-


like language for describing new relationships. And we created the expla-
nation facility described in Part Six, so they could understand a faulty line
of reasoning well enough to correct the knowledge base. For Step 2, as
mentioned, we created tools for the knowledgeengineer, to facilitate entry
and modification of rules. For Step 3, we created an indexed library of test
cases and facilities for running many cases in batch mode overnight. For
Step 4, the batch system recorded differences caused by a set of modifi-
cations in the advice given on the test cases. The record was then used by
the knowledge engineer to assess the detrimental effects, if any, of recent
changes to the rules. Some of our concern with human engineering, dis-
cussed in Part Eleven, was motivated by Step 5 because we realized the
necessity of an experts "playing with" the system in order to discover its
weaknesses.
The TEIRESIASsystem discussed in Chapter 9 was the product of an
experiment on interactive transfer of expertise. TEIRESIASwas designed
to help an expert at Steps 1, 2, and 5. Although the program was never
used routinely in its entirety by collaborating infectious disease specialists,
we considered the experiment to be highly successful. It showed the power
of using a model of the domain-specific knowledge with syntactic editors.
It showed that debugging in the context of a specific case is an effective
means to focus the experts attention. TEIRESIAS analyzed a rule set stat-
ically to build rule models, which, in turn, were used during the dynamic
debugging. It thus "knewwhat it knew," that is, it had models of the knowl-
edge base. It used the rule models to provide advice about incomplete
areas of the knowledge base, to provide suggestions and help during in-
teractive debugging sessions and to provide summary explanations. Much
of TEIRESIAS is now embedded in the knowledge acquisition code of
EMYCIN.
The rule checker discussed in Chapter 8 was an experiment in static
analysis of a rule set, in contrast to TEIRESIASdynamic analysis in con-
text. It was not a large project, but it does demonstrate the power of ana-
lyzing a rule set ti)r the expert. Its analysis of rules is simpler than
TEIRESIASstatic analysis for two reasons: the rules it considers all make
conclusions with certainty (i.e., CF = 1); and the clusterings of rules are
easier to identify as a result of an extra slot attached to each rule naming
the context in which it applies. It analyzes rules for the ONCOCIN system,
described in more detail in Chapters 32 and 35.
As we had believed from the start, the kind of" analysis performed by
the rule checker provides helpful information to the expert writing new
rules. To some extent, it is orthogonal to the six-step interactive loop men-
tioned above, but it might also be seen as Step 2a between entering a set
of changes and running test cases. After the expert adds several new rules
(through the interactive loop or not), the rule checker will point out logical
problems of inconsistency and subsumption and pragmatic problems of
redundancy and incompleteness. Any of these is a signal to the expert to
Knowledge Acquisition in MYCIN 157

examine the subsets of" rules in which the rule checker identifies problems.
Because this analysis is more systematic than the empirical testing in Steps
3-5 of the six-step loop, it can catch potential problems long before they
would manifest themselves in test cases.
Some checking of rules is also done in EMYCIN,as described in the
EMYCIN manual (van Melle et al., 1981). As each rule is entered or edited,
it is checked for syntactic validity to catch commoninput errors. By syn-
tactic, we meanissues of rule form--viz., that terms are spelled correctly,
values are legal for the parameters with which they are associated, etc.--
rather than the actual information (semantic) content (i.e., whether or
the rule "makes sense"). Performing the syntactic checks at acquisition time
reduces the likelihood that the consultation program will later fail due to
"obvious" errors. This permits the expert to concentrate on debugging
logical errors and omissions.
The purely syntactic checks are made by comparing each rule clause
with the internal function template corresponding to the predicate or action
function used in the clause. Using this template, EMYCINdetermines
whether the argument slots for these functions are correctly filled. For
example, each argument requiring a parameter must be assigned a valid
parameter (of some context), and any argument requiring a value must
assigned a legal value for the associated parameter. If an unknownparam-
eter is tbund, the checker tries to correct it with the Interlisp spelling
corrector, usinga spelling list of" all parameters in the system. If that fails,
it asks if" this is a new(previously unmentioned)parameter. If so, it defines
the new parameter and, in a brief diversion, prompts the system builder
to describe it. Similar action is also taken if an unrecognized value for a
parameter is found.
A limited semantic check is also performed: each new or changed rule
is compared with any existing rules that conclude about the same param-
eter to make sure it does not directly contradict or subsume any of them.
A contradiction occurs when two rules with the same set of premise clauses
make conflicting conclusions (contradictory values of CFs for the same
parameter); subsumption occurs when one rules premise is a subset of the
others, so that the first rule succeeds wheneverthe second one does (i.e.,
the second rule is more specific), and both conclude about the same values.
In either case, the interaction is reported to the expert, who may then
examine or edit any of the conflicting or redundant rules.
Another experimental system we incorporated into MYCINwas a
small body of code that kept statistics on the use of rules and presented
the statistical results to the knowledgebase builders. 1 It provided another
way of analyzing the contents of a knowledge base so potential problems
could be examined. It revealed, for example, that some rules never suc-
ceeded, even though they were called many times. Even though their con-

IThis code was largely written by.Jan Aikins.


158 Knowledge
Engineering

clusions were relevant (mentioned a subgoal that was traced), their premise
conditions never matched the specific facts of the cases. Sometimes this
happens because a rule is covering a very unusual set of circumstances not
instantiated in the test cases. Since muchexpertise resides in such rules,
we did not modify them if they were in the knowledge base for that reason.
Sometimes, though, the lack of successful invocation of rules indicated a
problem. The premises might be too specific, perhaps because of tran-
scription errors in premise clauses, and these did need attention. This
experimental system also revealed that some rules always succeeded when
called, occasionally on cases where they were not supposed to. Although it
was a small experiment, it was successful: empirically derived statistics on
rule use can provide valuable information to the persons building the
knowledge base.
One of the most important questions we have been asking in our work
on knowledge acquisition is

How(or to what extent) can an intelligent system replace a knowledge


engineer in helping an expert build a knowledge base?

The experimental systems we have written are encouraging in pointing


toward automated assistance (see Chapter 16), but they are far from
definitive solution. Wehave built tools for the knowledge engineer more
readily than for the expert. In retrospect we now believe that we under-
estimated both the intellectual effort involved in building a good knowl-
edge base and the amount of global information about the expert system
that the expert needs to know.
8
Completeness and
Consistency in a
Rule-Based System

Motoi Suwa, A. Carlisle Scott, and


Edward H. Shortliffe

The builders of a knowledge-based expert system must ensure that the


system will give its users accurate advice or correct solutions to their prob-
lems. The process of verifying that a system is accurate and reliable has
two distinct components: checking that the knowledge base is correct, and
verifying that the program can interpret and apply this information cor-
rectly. The first of these components has been the focus of the research
described in this chapter; the second is discussed in Part Ten (Chapters 30
and 31).
Knowledge base debug,~ng, the process of checking that a knowledge
base is correct and complete, is one component of the larger problem of
knowledge acquisition. This process involves testing and refining the sys-
tems knowledge in order to discover and correct a variety of errors that
can arise during the process of transferring expertise from a humanexpert
to a computer system. In this chapter, we discuss some commonproblems
in knowledge acquisition and debugging and describe an automated assis-
tant for checking the completeness and consistency of the knowledge base
in the ONCOCIN system (discussed in Chapters 32 and 35).
As discussed in Chapters 7 and 9, an experts knowledge must undergo
a number of transformations before it can be used by a computer. First,
the person acquires expertise in some domain through study, research,
and experience. Next, the expert attempts to formalize this expertise and
to express it in the internal representation of an expert system. Finally, the

This chapter is based on an article originally appearing in The AI Magazine 3:16-21 (Autumn
1982). Copyright 1982 by AAAI. All rights reserved. Used with permission.

159
160 Completeness
andConsistencyin a Rule-BasedSystem

knowledge, in a machine-readable form, is added to the computer systems


knowledgebase. Problems can arise at any stage in this process: the experts
knowledge may be incomplete, inconsistent, or even partly erroneous. Al-
ternatively, while the experts knowledge may be accurate and complete, it
may not be adequately transferred to the computer-based representation.
The latter problem typically occurs when an expert who does not under-
stand computers works with a knowledge engineer who is unfamiliar with
the problem domain; misunderstandings that arise are often unrecognized
until performance errors occur. Finally, mistakes in spelling or syntax
(made when the knowledge base is entered into the computer) are frequent
sources of errors.
The knowledge base is generally constructed through collaboration
between experts in the problem domain and knowledge engineers. This
difficult and time-consuming task can be facilitated by a program that:

1. checks for inconsistencies and gaps in the knowledge base,


2. helps the experts and knowledge engineers communicate with each
other, and
3. provides a clear and understandable display of the knowledge as the
system will use it.

In the remainder of this chapter we discuss an experimental program with


these capabilities.

8.1 Earlier
Work
One goal of the TEIRESIASprogram, described in the next chapter, was
to provide aids for knowledge base debugging. TEIRESIASallows an ex-
pert to judge whether or not MCINsdiagnosis is correct, to track down
the errors in the knowledge base that led to incorrect conclusions, and to
alter, delete, or add rules in order to fix these errors. TEIRESIASmakes
no formal assessment of rules at the time they are initially entered into the
knowledge base.
In the EMYCINsystem for building knowledge-based consultants
(Chapter 15), the knowledge acquisition program fixes spelling errors,
checks that rules are semantically and syntactically correct, and points out
potentially erroneous interactions among rules. In addition, EMYCINs
knowledge base debugging facility includes the following options:

1. a trace of the systems reasoning process during a consultation, available


to knowledge engineers familiar with the programs internal represen-
tation and control processes;
SystematicCheckingof a Knowledge
Base 161

2. an interactive mechanismfor reviewing and correcting the systems con-


clusions (a generalization of the TEIRESIASprogram);
3. an interface to the systems explanation facility to produce automatically,
at the end of a consultation, explanations of howthe system reached its
results; and
4. a verification mechanism, which compares the systems results at the
end of a consuhation with the stored "correct" results for the case that
were saved from a previous interaction with the TEIRESIAS-like op-
tion. The comparison includes explanations of why the system made its
incorrect conclusions and why it did not make the correct ones.

8.2 Systematic Checking of a Knowledge Base

The knowledge base debugging tools mentioned above allow a system


builder to identify problems with the systems knowledge base by observing
errors in its performance on test cases. While thorough testing is an essen-
tial part of verifying the consistency and completeness of a knowledgebase,
it is rarely possible to guarantee that a knowledge base is completely de-
bugged, even after hundreds of test runs on sample test cases. TEIRESIAS
was designed to aid in debugging an extensive rule set in a fully functional
system. EMYCIN was designed to allow incremental building of a knowl-
edge base and running consultations with only a skeletal knowledge base.
However, EMYCIN assumes that the task of building a system is simply to
encode and add the knowledge.
In contrast, building a new expert system typically starts with the se-
lection of knowledge representation formalisms and the design of a pro-
gram to use the knowledge. Only when this has been done is it possible to
encode the knowledge and write the program. The system may not be
ready to run tests, even on simple cases, until much of the knowledge base
is encoded. Regardless of" how an expert system is developed, its developers
can profit from a systematic check on the knowledge base without gath-
ering extensive data for test runs, even before the full reasoning mecha-
nism is functioning. This can be accomplished by a program that checks a
knowledge base for completeness and consistency during the systems de-
velopment.

8.2.1 Logical Checks for Consistency

Whenknowledge is represented in production rules, inconsistencies in the


knowledge base appear as:
162 Completeness
andConsistencyin a Rule-BasedSystem

Conflict: two rules succeed in the same situation but with conflicting re-
sults.
Redundancy: two rules succeed in the same situation and have the same
results.
Subsumption: two rules have the same results, but one contains additional
restrictions on the situations in which it will succeed. Wheneverthe more
restrictive rule succeeds, the less restrictive rule also succeeds, resulting
in redundancy.

Conflict, redundancy, and subsumption are defined above as logical con-


ditions. These conditions can be detected if the syntax allows one to ex-
amine two rules and determine if situations exist in which both can succeed
and whether the results of applying the two rules are identical, conflicting,
or unrelated.

8.2.2 Logical Checks for Completeness

Incompleteness of the knowledge base is the result of:

Missing rules: a situation exists in which a particular inference is required,


but there is no rule that succeeds in that situation and produces the
desired conclusion.

Missing rules can be detected logically if it is possible to enumerate all


circumstances in which a given decision should be made or a given action
should be taken.

8.2.3 Pragmatic Considerations

It is often pragmatic conditions, not purely logical ones, that determine


whether or not there are inconsistencies in a knowledge base. The seman-
tics of the domain may modify syntactic analysis. Of the three types of
inconsistency described above, only conflict is guaranteed to be a true error.
In practice, logical redundancy may not cause problems. In a system
where the first successful rule is the only one to succeed, a problem will
arise only if one of two redundant rules is revised or deleted while the
other is left unchanged. On the other hand, in a system using a scoring
mechanism, such as the certainty factors in EMYCIN systems, redundant
rules cause the same evidence to be counted twice, leading to erroneous
increases in the weight of their conclusions.
In a set of rules that accumulate evidence for a particular hypothesis,
one rule that subsumes another may cause an error by causing the same
evidence to be counted twice. Alternatively, the expert might have put-
Rule Checking in ONCOCIN 163

posely written the rules so that the more restrictive one adds a little more
weight to the conclusion made by the less restrictive one.
An exhaustive syntactic approach for identifying missing rules would
assume that there should be a rule that applies in each situation defined
by all possible combinations of domain variables. Someof these combina-
tions, however, are not meaningful. For example, there are no males who
are pregnant (by definition) and no infants who are alcoholics (by reason
of circumstances). Like checking for consistency, checking for complete-
ness generally requires some knowledge of the problem domain.
Because of these pragmatic considerations, an automated rule checker
should display potential errors and allow an expert to indicate which ones
represent real problems. It should prompt the expert for domain-specific
information to explain why apparent errors are, in fact, acceptable. This
information should be represented so that it can be used to make future
checking more accurate.

8.3 Rule Checking in ONCOCIN

8.3.1 Brief Description of ONCOCIN

ONCOCIN (see Chapter 35) is a rule-based consultation system to advise


physicians at the Stanford Medical Center cancer clinic on the management
of patients who are on experimental treatment protocols. These protocols
serve to ensure that data from patients on various treatment regimens can
be compared in order to evaluate the success of therapy and to assess the
relative effectiveness of alternative regimens. A protocol specifies when the
patient should visit the clinic, what chemotherapy and/or radiation therapy
the patient should receive on each visit, when laboratory tests should be
performed, and under what circumstances and in what-ways the recom-
mended course of therapy should be modified.
As in MYCIN,a rule in ONCOCIN has an action part that concludes
a value for some parameter on the basis of values of other parameters in
the rules condition part. Currently, however, all parameter values can be
determined with certainty; there is no need to use weighted belief mea-
sures. Whena rule succeeds, its action parameter becomes known so no
other rules with the same action parameter will be tried.
In contrast to MYCIN,rules in ONCOCIN specify the context in which
they apply. Examples of ONCOCINcontexts are drugs, chemotherapies
(i.e., drug combinations), and protocols. A rule that determines the dose
of a drug may be specific to the drug alone or to both the drug and the
chemotherapy. In the latter case, the context of the rule would be the list
of pairs of drug and chemotherapy for which the rule is valid. At any time
164 Completenessand Consistency in a Rule-BasedSystem

during a consultation, the current context represents the particular drug,


chemotherapy, and protocol currently under consideration.
In order to determine the value ofa parameter, the system tries rules
that conclude about that parameter and that apply in the current context.
For example, Rule 75 shownbelow is invoked to determine the value of
the parameter current attenuated dose. The condition will be checkedonly
whenthe current context is a drug in the chemotherapyMOPP or a drug
in the chemotherapy PAVE.Clause 1 of the condition gives a reason to
attenuate (lessen) the doses of drugs, and clause 2 mentions a reason not
to attenuate more than 75%.
RULE75
[action parameter](a) Todetermine thecurrentattenuated
dose
[context] (b) for all drugsin MOPP,
or forall drugs
in PAVE:

[condition]IF: 1) Thisis thestartofthefirst cycle


aftera cycle
wasaborted,and
2) Thebloodcounts donotwarrant dose
attenuation
[action] THEN:Conclude that the currentattenuateddoseis 75
percent of thepreviousdose

Certain rules for determining the value of a parameter serve special func-
tions. Somegive a "definitional" value in the specified context. Theseare
called initial rules and are tried first. Otherrules providea (possibly context-
dependent) "default" or "usual" value in the event that no other rule suc-
ceeds. Theseare called default rules and are applied last. Rules that do not
serve either of these special functions are called normalrules. Concluding
a parameters value consists of trying, in order, three groups of rules:
initial, normal,then default. Arules classification tells whichof these three
groups it belongs to.t

lInternally in LISP,the context, condition, action, and classification are properties of an atom
namingthe rule. The internal form of Rule 75 is

RULE075
CONTEXT: ((MOPPDRUG)(PAVEDRUG))
CONDITION: (AND(SIS POST,
ABORT
(SIS NORMALCOUNTS
YES))
ACTION: (CONCLUDEVALUE ATTENDOSE
(PERCENTOF
75 PREVIOUSDOSE))
CLASSIFICATION:
NORMAL
As in MYCIN, the LISP functions that are used in conditions or actions in ONCOCIN have
templates indicating what role their arguments play, For example, both SIS and CON-
CLUDEVALUE take a parameter as their first argument and a value of that parameter as
their second argument. Eachfunction also has a descriptor representing its meaning. For
example,the descriptor of $1S showsthat the function will succeed whenthe parameter value
of its first argumentis equal to its secondargument.
Rule Checking in ONCOCIN 165

8.3.2 Overview of the Rule-Checking Program

A rules context and condition together describe the situations in which it


applies. The templates and descriptors of rule functions make it possible
to determine the combination of values of condition parameters that will
cause a rule to succeed. The rules context property shows the context(s)
in which the rule applies. The contexts and conditions of two rules can
therefore be examined to determine if there are situations in which both
can succeed. If" so, and if" the rules conclude different values for the same
parameter, they are in conflict. If they conclude the same thing, except
that one contains extra condition clauses, then one subsumes the other.
These definitions of inconsistencies simplify the task of checking the
knowledge base. The rules can be partitioned into disjoint sets, each of
which concludes about the same parameter in the same context. The re-
sulting rule sets can be checked independently. To check a set of rules, the
program:

1. finds all parameters used in the conditions of these rules;


2. makes a table, displaying all possible combinations of condition param-
eter values and the corresponding values that will be concluded for the
action parameters (see Figure 8-1); 2 and
3. checks the tables for conflict, redundancy, subsumption, and missing
rules; then displays the table with a summaryof any potential errors
that were found. The rule checker assumes that there should be a rule
for each possible combination of" values of condition parameters; it hy-
3pothesizes missing rules on this assumption (see Figure 8-2).

ONCOCINs rule checker dynamically examines a rule set to determine


which condition parameters are currently used to conclude a given action
parameter. These parameters determine what columns should appear in
the table for the rule set. The program does not expect that each of the
parameters should be used in every rule in the set (as illustrated by Rule
76 in the example of the next subsection). In contrast, TEIRESIAS(see
next chapter) examined the "nearly complete" MYCINknowledge base and
built static rule models showing (among other things) which condition pa-
rameters were used (in the existing knowledge base) to conclude a given
action parameter. Whena new rule was added to MYCIN,it was compared

2Because a parameters wllue is always known with certainty and the possible values are
mutually exclusive, the different combinations of condition parameter values are disjoint. If
a rule corresponding to one combination succeeds, rules corresponding to other combinations
in the same table will fail. This would not be true in an EMYCIN consultation system in
which the values of some parameters can be concluded with less than complete certainty. In
such cases, the combinations in a given table would not necessarily be disjoint.
:~We plan to add a mechanism to acquire information about the meanings of parameters and
the relationships amongthem and to use this information to omit semantically impossible
combinations from subsequent tables.
166
167

z
0
Z
0

o
o

T
#

168 Completeness
andConsistencyin a Rule-BasedSystem

Missing
rulecorresponding
to combination
C4:
Todeterminethecurrentattenuated doseforCytoxanin CVP
IF: 1) Theblood counts
dowarrant doseattenuation,
2)Thecurrent chemotherapy cyclenumberis 1, and
3) Thisis notthestartofthefirstcycle
after
significant
radiation
THEN:Conclude thatthecurrent attenuated
dose is...

FIGURE 8-2 Proposed missing rule (English translation).


Notethat no value is given for the action parameter;this could
be filled in by the systembuilder if the rule lookedappropriate
for addition to the knowledgebase.

with the rule model for its action parameter. TEIRESIASproposed missing
clauses if some condition parameters in the model did not appear in the
new rule.

8.3.3 An Example

ONCOCINsrule-checking program can check the entire rule base, or can


interface with the systems knowledge acquisition program and check only
those rules affected by recent changes to the knowledge base. This latter
mode is illustrated by the example in Figure 8-1. Here the system builder
is trying to determine if the recent addition of one rule and deletion of
another have introduced errors.
The rules checked in the example conclude the current attenuated
dose for the drug cytoxan in the chemotherapy named CVP. There are
three condition parameters commonlyused in those rules. Of these, NOR-
MALCOUNTS takes YES or NO as its value. CYCLEand SIGXRT take
integer values. The only value of CYCLEor SIGXRTthat was mentioned
explicitly in any rule is 1; therefore, the table has rows for values 1 and
OTHER (i.e., other than 1).
The table shows that Rule 80 concludes that the attenuated dose
should have a value of 250 milligrams per square meter when the blood
counts do not warrant dose attenuation (NORMALCOUNTS = YES), the
chemotherapy cycle number is 1 (CYCLE= 1), and this is the first cycle
after significant radiation (SIGXRT= 1). This combination of values
the condition parameters is labeled C1.
Rule 76, shown next in Figure 8-1, can succeed in the same situation
(C1) as Rule 80, but it concludes a different dose. These rules do not
conflict, however, because Rule 76 is a default rule, which will be invoked
only if all normal rules (including Rule 80) fail. Note that NORMAL-
COUNTS is the only condition parameter that appears explicitly in Rule
76, as indicated by the parentheses around the values of the other two
Rule Checking in ONCOCIN 169

Rule set: 33 24

(ontext the drug DTIC in the chemotherapy ABVD

Action parameter: the dose attenuation due to low WBC

Default value: 100

Evaluation Rule Vah~e WBC Combination


(percentage) (in thousands)
01.5235
33 25 .... *** 0 ...... Cl
24 50 ........ ***0... C2

Summmyof Comparison
No problems were tound.

Notes
Asterisks appear beneath values included by the rule.
Zeros appear beneath upper and lower bounds that are not included.
(e.g., Rule 33 applies when 1.5 -< WBC< 2.0)

FIGURE8-3 A table of rules with ranges of numerical values.

parameters. Rule 76 will succeed in all combinations that include NOR-


MALCOUNTS= YES (namely Ct, C3, C5, and C7).
Rules 667 and 67 are redundant (marked R) because both use com-
bination C,~ to conclude the value labeled V2 (250 mg/m2 attenuated by
the minimum count attenuation).
Rule 600 is in conflict with Rule 69 (both marked C) because both use
combination C6 but conclude different values (and both are categorized as
normal rules).
No rules exist for combinations C4 and C8, so the program hypoth-
esizes that rules are missing.
The system builder can enter ONCOCINs knowledge acquisition pro-
grain to correct any of the errors found by the rule checker. A missing
rule can be displayed in either LISP or English (Figure 8-2) and then added
to the systems knowledge base after the expert has provided a value for
its action parameter.
If a summary table is too big to display, it is divided into a number of
subtables by assigning constant values to some of the condition parameters.
If the conditions inw)lve ranges of numeric values, the table will display
these ranges graphically as illustrated in Figure 8-3.
170 Completeness
andConsistencyin a Rule-BasedSystem

8.4 Effects of the Rule-Checking Program

The rule-checking program described in this chapter was developed at tile


same time that ONCOCINsknowledge base was being built. During this
time, periodic runs of the rule checker suggested missing rules that had
been overlooked by the oncology expert. They also detected conflicting
and redundant rules, generally because a rule had the incorrect context
and therefore appeared in the wrong table.
A number of inconsistencies in the use of domain concepts were re-
vealed by the rule checker. For example, on one occasion the program
proposed a missing rule fbr a meaningless combination of condition pa-
rameter values. In discussing the domain knowledge that expressed the
interrelationship amongthe values, it became clear that a number of in-
dividual yes/no valued parameters could be represented more logically as
different values for the same parameter.
The knowledge engineers and oncology experts alike have found the
rule checkers tabular display of rule sets mucheasier to interpret than a
rule-by-rule display. Having tabular summaries of related rules has facili-
tated the task of modifying the knowledge base. Although the program
described assists a knowledge engineer in ensuring the consistency and
completeness of the rule set in the ONCOCIN system, its design is general,
so it can be adapted to other rule-based systems.
9
Interactive Transfer of
Expertise

Randall Davis

Whereasmuchearly work in artificial intelligence was devoted to the search


for a single, powerful, domain-independent problem-solving methodology
[e.g., GPS(Newell and Simon, 1972)], subsequent efforts have stressed the
use of large stores of domain-specific knowledge as a basis for high per-
formance. The knowledge base for this sort of program [e.g., DENDRAL
(Feigenbaum et al., 1971), MACSYMA (Moses, 1971)] is often assembled
by hand, an ongoing task that may involve several person-years of effort.
A key element in constructing a knowledge base is the transfer of expertise
from a human expert to the program. Since the domain expert often knows
nothing about programming, the interaction between the expert and the
pertormance program usually requires the mediation of a human pro-
grammer.
Wehave sought to create a program that could supply much the same
sort of assistance as that provided by the programmer in this transfer of
expertise. The result is a system called TEIRESIAS 1 (Davis, 1976; 1978;
Davis et al., 1977), a large Interlisp program designed to offer assistance
in the interactive transfer of knowledge from a human expert to the knowl-
edge base of a high-performance program (Figure 9-1). Information flow
from right to left is labeled explanation. This is the process by which TEI-
RESIASclarifies for the expert the source of the performance programs
results and motivations for its actions. This is a prerequisite to knowledge
acquisition, since the expert must first discover what the performance pro-

This chapter originally appeared in Artificial Intelligence 12:121-157 (1979). It has been
shortened and edited. Copyright 1979 by Artificial Intelligence. All rights reserved. Used
with permission.
IThe program is named for the blind seer in Oedipus the King, since the program, like the
prophet, has a form of "higher-order" knowledge.

171
172 InteractiveTransferof Expertise

TEIRESIAS
explanation

I DOMAIN I
EXPERT knowledge,.~[
transfer
vI
PERFORMANCE
PROGRAM
]

FIGURE9-1 Interaction between the expert and the perfor-


manceprogramis facilitated by TEIRESIAS.

gram already knows and how it used that knowledge. Information flow
from left to right is labeled knowledgetransfer. This is the process by which
the expert adds to or modifies the store of domain-specific knowledge in
the performance program.
Work on TEIRESIAS has had two general goals. We have attempted
first to develop a set of tools for knowledge base construction and main-
tenance and to abstract from them a methodology applicable to a range
of systems. The second, more general goal has been the development of
an intelligent assistant. This task involves confronting manyof the tradi-
tional problems of AI and has resulted in the exploration of a number of
solutions, reviewed below.
This chapter describes a number of tile key ideas in the development
of TEIRESIASand discusses their implementation in the context of a
specific task (acquisition of new inference rules 2) for a specific rule-based
performance program. While the discussion deals with a specific task, sys-
tem, and knowledge representation, several of the main ideas are appli-
cable to more general issues concerning the creation of intelligent pro-
grams.

9.1 Meta-Level
Knowledge

A central theme that runs through this chapter (and is discussed more
fully in Part Nine) is the concept of meta-level knowledge, or knowledgeabout
knowledge. This takes several different forms, but can be summed up
generally by saying that a program can "know what it knows." That is, not
only can a program use its knowledge directly, but it may also be able to
examineit, abstract it, reason about it, and direct its application.
To see in general terms how this might be accomplished, recall that

")Acquisitionof newconceptualprimitives fromwhichrules are built is discussedby Davis


(1978),whilethe designandimplementation of the explanationcapabilitysuggestedin Figure
9-1 is discussedin Part Six.
Perspectiveon Knowledge
Acquisition 173

one of the principal problems of AI is the question of representation of


knowledge about the world, for which numerous techniques have been
developed. One way to view what we have done is to imagine turning this
in on itself, using s(nne of these same techniques to describe the program
itself. The resulting system contains both object-level representations, which
describe the external world, and meta-level representations, which describe
the internal world of representations. As the discussion of rule models in
Sections 9.6 and 9.7 will make clear, such a system has a number of inter-
esting capabilities.

9.2 Perspective on Knowledge Acquisition

We view the interaction between the domain expert and the performance
program as interactive transfer of expertise. Wesee it in terms of a teacher
who continually challenges a student with new problems to solve and care-
fully observes the students performance. The teacher may interrupt to
request a justification of some particular step the student has taken in
solving the problem or may challenge the final result. This process may
uncover a fault in the students knowledge of the subject (the debugging
phase) and result in the transfer of information to correct it (the knowledge
acquisition phase). Other approaches to knowledge acquisition can be com-
pared to this by considering their relative positions along two dimensions:
(i) the sophistication of their debuggingfacilities, and (if) the independence
of their knowledge acquisition mechanism.
The simplest sort of debugging tool is characterized by programs like
DDT, used to debug assembly language programs. The tool is totally pas-
sive (in the sense that it operates only in response to user commands),
low-level (since it operates at the level of machine or assembly language),
and knows nothing about the application domain of the program. Debug-
gers like BAIL(Reiser, 1975) and Interlisps break package (Teitelman,
1974) are a step up from this since they function at the level of program-
ming languages such as SAIL and Interlisp. The explanation capabilities
in TEIRESIAS, in particular the HOWand WHYcommands (see Part Six
for examples), represent another step, since they function at the level of
the control structure of the application program. The guided debugging
that TEIRESIAS can also provide (illustrated in Section 9.5) represents yet
another step, since here the debugger is taking the initiative and has
enough built-in knowledge about the control structure that it can track
downthe error. Finally, at the most sophisticated level are knowledge-rich
debuggers like the one described by Brown and Burton (1978). Here the
program is active, high-level, informed about the application domain, and
capable of independently localizing and characterizing bugs.
By independence of the knowledge acquisition mechanism, we mean the
degree of human cooperation necessary. Much work on knowledge acqui-
174 Interactive Transfer of Expertise

sition has emphasized a highly autonomous mode of operation. There is,


for example, a large body of work aimed at inducing the appropriate gen-
eralizations from a set of test data; see, for example, Buchanan and Mit-
chell (1978) and Hayes-Roth and McDermott (1977). In these efforts
interaction is limited t O presenting the program with the data and perhaps
providing a brief description of the domain in the form of values for a
few key parameters; the program then functions independently. Winstons
work on concept formation (Winston, 1970) relied somewhat more heavily
on user interaction. There the teacher was responsible [or providing an
appropriate sequence of examples (and nonexamples) of a concept.
describing our work, we have used the phrase "interactive transfer of ex-
pertise" to indicate that we view knowledge acquisition as information
transfer from an expert to a program. TEIRESIASdoes not attempt to
derive new knowledgeon its own, but rather tries to "listen" as attentively
as possible, commenting appropriately to help the expert augment the
knowledge base. It thus requires strong cooperation from the expert.
There is an important assumption involved in the attempt to establish
this sort of communication:we are assuming that it is possible to distinguish
between the problem-solving paradigmand the expertise or, equivalently, that
control structure and representation in the performance program can be
considered separately from the content of its knowledge base. The basic
control structure(s) and representations are assumed to be established and
debugged, and the fundamental approach to the problem is assumed to
be acceptable. The question of how knowledge is to be encoded and used
is settled by the selection of one or more of the available representations
and control structures. The experts task is to enlarge what it is the program
knows.
There is a corollary assumption, too, in the belief that the control
structures and knowledge representations can be made sufficiently com-
prehensible to the expert that he or she can (a) understand the systems
behavior in terms of them and (b) use them to codify his or her own
knowledge. This ensures that the expert understands system performance
well enough to know what to correct, and can then express the required
knowledge, i.e., can "think" in those terms. Thus part of the task of estab-
lishing the link shown in Figure 9-1 involves insulating the expert from
the details of implementation, by establishing a discourse at a level high
enough that he or she does not have to program in LISE

9.3Design of the Performance Program

Figure 9-2 shows the major elements of the performance program that
TEIRESIASis designed to help construct. Although the performance pro-
gram described here is MYCIN,the context within which TEIRESIASwas
Designof the Performance
Program 175

Performance
Program

I INFERENCE
ENGINE

KNOWLEDGE
[
BASE

FIGURE
9-2 Architecture of the performance program.

actually developed, many of the features of TEIRESIAShave been incor-


porated in EMYCIN (see Chapter 15) and are independent of any domain.
The knowledge base is the programs store of task-specific knowledge that
makes possible high performance. The inference engine is an interpreter
that uses the knowledge base to solve the problem at hand. The main point
of interest in this very simple design is the explicit division between these
two parts of the program. This design is in keeping with the assumption
noted above that the experts task is to augment the knowledge base of a
program whose control structure (inference engine) is assumed to be both
appropriate and debugged.
Twoimportant advantages accrue from keeping this division as strict
as possible. First, if all of the control structure information has been kept
in the inference engine, then we can engage the domain expert in a dis-
cussion of the knowledge base alone rather than of questions of program-
ming and control structures. Second, if all of the task-specific knowledge
has been kept in the knowledge base, then it is possible to remove the
current knowledge base, "plug in" another, and obtain a performance pro-
gram for a new task (see Part Five). The explicit division thus offers
degree of domain-independence. It does not mean, however, that the in-
ference engine and knowledge base are totally independent: knowledge
base content is strongly influenced by the control paradigm used in the
inference engine. It is this unavoidable interaction that motivates the im-
portant assumption, noted in Section 9.2, that the control structure and
knowledge representation are comprehensible to the expert, at least at the
conceptual level.
An example of the program in action is shown in Section 9.5. The
program interviews the user, requesting various pieces of information that
are relevant to selecting the most appropriate antibiotic therapy, then
prints its recommendations. In the remainder of this chapter the user will
176 InteractiveTransferof Expertise

be an expert running MYCIN in order to challenge it, offering it a difficult


case and observing and correcting its performance.
Wehave noted earlier that the expert must have at least a high-level
understanding of the operation of the inference engine and the manner
of knowledge representation in order to be able to express new knowledge
for the performance program. An example of a rule, with brief explana-
tions of the terms premise, Boolean combination, conclusion, and certainty
factor, suffices to allow understanding of the representation of knowledge.
An equally brief explanation of backward chaining and the conservative
strategy of exhaustive evidence gathering suffices to allow understanding
of the inference engine. As mentioned in Section 9.2, we are assuming that
the expert can understand these concepts without having to deal with
details of implementation. Note as well that TEIRESIASbasic design and
the notion of interactive transfer of expertise do not depend on this par-
ticular control structure, only on the (nontrivial) assumption that an equally
comprehensible explanation can be found for whatever control structure
is actually used in the inference engine.

9.4 A Word About Natural Language

Natural language has not been a major focus of this work, and for the
most part we have used the simplest techniques that would support the
level of performance required. For instance, all questions and responses
from TEIRESIASare either preformed or manufactured by filling in tem-
plates with appropriate words (as evidenced by the occasional appearance
of ungrammatical phrases like "a organism"). Where answers are free text
(rather than multiple choice), the system relies primarily on the keyword-
and template-oriented techniques described in Chapter 18. There is no
parser in the system, in part to minimize processing time and in part be-
cause users often give ungrammatical responses.

9.5 Example of TEIRESIASin Operation

In the interaction shown in Figure 9-3, the expert is challenging MYCIN


with a particular case, observing the programs performance, and then
making the necessary correction to the knowledge base. The bug encoun-
tered here was created by deleting an existing rule from the knowledge
base and removing all references to it. The expert then repairs the bug by
tracking down the source of the problem and teaching the system the "new"
Example of TEIRESIAS in Operation 177

rule. The deleted rule was chosen in order to display a large number of
TEIRESIASs capabilities in a single example. It is special in no other sense,
however, and offers a real trace of the system in action.
In the trace, computer output is in mixed upper and lower case, user
responses follow a double asterisk and are in bold upper case, and com-
mentary is indented, italicized, and enclosed in parentheses. The example
is also divided into five phases for later reference.

(MYCIN
hasbeenprovidedwith details of a caseandhasjust concluded
likely causesof the infection.)

Mytherapyrecommendation
will bebasedon the following probableinfection(s) andpotential causative
organism(s):
INFECTION-Iis PRIMARY-BACTEREMIA
<Item 1> E.COLI [ORGANISM-I]
<Item 2> PSEUDOMONAS-AERUGINOSA
[ORGANISM-I]
Areall theseidentities correct?
++** N

(Theexpertdisagreeswith the results andwill indicateto TEIRESIAS


the identities that hethinks should
haveresulted.)

(PHASEI: TRACKING
DOWN
THEBUG)

Arethere anyidentities missing?


++**y

Whichone(s)?
+ +** KLEBSIELLA-PNEUMONIAE
PROTEUS-NON-MIRABILIS
Are thereanyidentities that shouldnthaveappeared?
+ +** PSEUDOMONAS-AERUGINOSA
Doyou wantto
1] track down the problemyourself, or
2] allow meto guidethe debugging, or
3] ignoreit for nowandfix it anothertime?
Type1, 2 or 3
++**2
Well start with the PSEUDOMONAS-AERUGINOSA
first.

(Thistime, TEIRESlASwill guidethe process.It appliesthe heuristicof starting with anidentity that
appearedincorrectly.)

Thefollowing rules wereusedin deducingthat the identity of ORGANISM-1


is pseudomonas-aeruginosa.

RULE184
Since[1.1] the categoryof ORGANISM-1
is not known
[1.2] the gramstain of ORGANISM-1
is gramneg
[1.3] the morphologyof ORGANISM-1
is rod

FIGURE9-3 TEIRESIASexample.
178 Interactive Transfer of Expertise

[1.4] the aerobicityof ORGANISM-1


is facultative
Thereis weaklysuggestiveevidence(.3) that the identity of ORGANISM-1
is pseudomonas-aeruginosa

Is this rule correct?


++**y

Shouldits premisehavefailed for this case?


++**y

Whichclauseof the premiseshouldhavebeenfalse? [give #]


++** 1

Noconclusionswerereachedconcerningthe category of ORGANISM-1


Shouldit havebeenpossibleto concludea category?
++**y

Whichone?
+ +** ENTEROBACTERIACEAE

Therewereno rules successfullyinvokedin the attemptto deducethat the categoryof ORGANISM-1


is
enterobacteriaceae.

Thefollowing(potentially applicable)rules werenot executed


because
the starred[**] clausefailed:

RULE116
IF: 1) the identity of ORGANISM-1 is not known
** 2) the gramstain of ORGANISM-1 is not known**
3) the morphologyof ORGANISM-1 is not known
4) the site of CULTURE-1 is csf
5) the infectionis meningitis
6) the age(in years)of the patientis less than
equalto .17

THEN: Thereis weaklysuggestiveevidence


(.3) that the
categoryof ORGANISM-1 is enterobacteriaceae

RULE050
IF: 1) the morphologyof ORGANISM-1 is rod
2) the gramstain of ORGANISM-1 is gramneg
3) the aerobicity of ORGANISM-1
is facultative
** 4) the infection with ORGANISM-1wasacquiredwhile the
patient washospitalized**
THEN: Thereis evidencethat the categoryof ORGANISM-1
is enterobacteriaceae

Areall theserules correct?


++**y

Shouldany of themhavebeensuccessfullyinvoked?
++**N

(This wouldbe the caseff oneof the starred premiseclauseswasfalse because


it hadbeen"stepped
on" by yet anotherrule further backin the chain.)

Is therea rule missing?


++**y

FIGURE 9-3 continued


Example of TEIRESIAS in Operation 179

Ok,then I needa rule that wouldallow meto deducethat the category


of ORGANISM-1is enterobacteriaceae.
Wouldyoulike to give mesucha rule now?
++**y

Good...

(To reviewbriefly: Theexpertis tracking downthe reasonwhyMYCIN incorrectly concluded that the
identity waspseudomonas-aeruginosa. Pseudomonas-aeruginosa appearedbecauseRule 184 executed
successfully. This occurredin part becausecategorywasunknown. Categorywasunknown becauseall
rules to conclude
it (50 and116)failed. Sincethe expertbelievesall existingrules are correct, the
knowledge basemustlack a rule that will conclude
aboutcategory;in particular, that the categoryis
enterobacteriaceae.Thiswill causecategoryto beknown, inhibiting Rule184,fixing at least this first
problem,the incorrect conclusionaboutpseudomonas-aeruginosa.
Notehowsharplyfocusedthe issue hasbecome, for both the expertandTEIRESIAS, as a result of the
large amount of contextualinformationavailable. Thedoctoris not beingaskedto "discussbacteremia";
instead, the systemcanat this point say: "1 needa rule that wouldallowmeto deduce that the category
of ORGANISM-1 is enterobacteriaceae."Andit mustbe a rule whichis invocablein the contextof this
patient. Thisfocusmakes it easierfor the expertto specifya rule that mayneverhavebeenexplicitly
stated or recognizedas such. This canbe an important advantage in any domainwherethe knowledge
is notyet fully formalized.
Thecontextualinformationis useful for TEIRESlAS as well. It providesa numberof expectationsabout
the contentof the rule that are useful both in understandingwhatthe experttypesin andin checkingthe
newrule to makesurethat it doesindeedsolve the problemat hand.Details of this processare
explainedbelow.
Thetrace continues
with the experttypingin the rule...)

(PHASEI1: INTERPRETING
THERULE)

Thenewrule will be called RULE383


If: 1 - THEPATIENTSINFECTIONIS PRIMARY-BACTEREMIA
and 2 - THESITEIS ONEOFTHESTERILESITES
and 3 -
Then: 1 - THEREIS EVIDENCE(.8) THATTHECATEGORY
IS ENTEROBACTERIACEAE
and 2 -
This maytake a few moments...

(Generally,it takesonthe orderof 10to 30 seconds


of cputime.)

This is myunderstanding
of yourrule:

RULE383
IF: 1) Primary-bacteremiais anassociatedinfection for
whichtherapywouldcover, and
2) Theinfection with the organismwasacquiredwhile
the patient washospitalized,and
3) Theculture wastakenfroma sterile source,and
4) Theorganism is not normallyfoundat this site
THEN: Thereis stronglysuggestive
evidence
(.8) that the
categoryof the organism
is enterobacteriaceae
180 Interactive Transfer of Expertise

(PHASEllk MODIFYING
THEINTERPRETATION)

(TEIRESIAS initial interpretationof the naturallanguage is incorrect (recall that the example waschosen
in partfor this reason).In the systemsfirst attemptto understand therule, eachline of the originaltext
hasbeeninterpreted as two clauses.This results primarily fromthe word-by-word approach to meaning.
For instance,despitethe obviousgrammatical simplicity of the first line of text, the systemusedtwo
connotations of the word"infection" to generatethe first twoclauses.In the secondline of text the word
"sterile" wasresponsiblefor clause3, andthe word"sites" usedas the basisfor clause4. Thedetails of
this processare explainedbelow,whereit will become clear that whilethe translationis wrong,it is not
unreasonable giventhe simplenaturallanguage facilities.
Noris it unreasonable that TEIRESIAS hasturned eachline of Englishinto morethan onepremise
clause.Theexpertis not restricted to typing the Englishequivalentof a single premiseclauseon each
line. If he were,rather thantyping "the organismis a gram-negative
aerobicrod," he wouldhaveto type
threelines.
Wewill seelater that whileTEIRESIAS interpretationsare incorrect, they are all "reasonable"
and
"consistent,"in light of the systemssomewhat primitive approachto natural language. Therest of the
exampleshowsthat evenwhenthe initial guessesare incorrect, straighteningthings out maybe very
easy.)

Okay?(Yes or No)
++** N

(Theexpertindicatesthat thereis a problemwith the interpretationandusesthe systems


rule editor to
specify the changesneeded.)

Changepremise?
++**y

Whatclause shouldbe changedin premise?


(type HELP
for list ol commands)
+ +** CHANGE 1
Suppose(1) werechangedto read-
1 - Theinfection is primary-bacteremia
Wouldthat be ok?
++**y

(Severalsimpleheuristicsare invokedto selectthe nextmostlikely interpretation,basedonthe English


sourceof this clauseandthe set of interpretationsalreadyrejected. In this case,the second
choiceis
correct.)

Thenyoull probablywantto delete:

2) Theinfection with the organismwasacquiredwhile the


patient washospitalized

Correct?
++**y

(TEIRESlAS enforcesconsistencyconstraintson its interpretation of the Englishtext. Oneof these


constraintsindicatesthat if the newinterpretationof clause1 is correct, thenclause2 musthavearisen
spuriously.Thatis, the system realizes it cannotget both the newversionof clause1 andthe old clause
2 fromthe sameline of text, andindicatesthis to the expert.)

FIGURE 9-3 continued


Example of TEIRESIAS in Operation 181

Anythingelse?
++**D3
clause(s)3 deleted.

(Realizingthat there wasanotherspuriousclause, the expertdeletesclause3, but then hassecond


thoughtsaboutthe onehe chose,so he checks,using the ? command.)

Anythingelse?
++**?
1 - the infectionis primary-bacteremia
and 2 - <deleted>
and 3 - <deleted>
and4 - the organism is not normallyfoundat this site

(He then discoversthat he hastyped the wrongnumber,undoesthe damage,


and deletesthe one
initially intended.All commands
canbeabbreviatedto a single letter.)

Anythingelse?
++**U3
clause(s)3 undeleted.
Anythingelse?
++*D4
clause(s)4 deleted,

(Thereis onelast correction to be madefor whichTEIRESIAS


suggeststhe wording...)

Anythingelse?
++**C3
Suppose(3) werechangedto read-
3 - Thesite of thecultureis oneof: thosesites that are
normallysterile
Wouldthat be ok?
++**y

Anythingelse?
++**?
1 - the infectionis primary-bacteremia
and 2 - <deleted>
and3 - the site of the cultureis oneof: thosesites that are
normally sterile
and 4 - <deleted>

++**OK

(... andat last hes happywith the rule.)

Changeaction?
++**N
This is myunderstanding
of yourrule:
182 Interactive Transfer of Expertise

RULE383
IF: 1) Theinfection is pdmary-bacteremia, and
2) Thesite of thecultureis oneof: thosesites that arenormallysterile
THEN: Thereis stronglysuggestive evidence (.8) that the categoryof the organism
enterobacteriaceae

Okay?(Yesor No)
++**y

(Thisis the classictest of comprehension:


the original Englishtext hasbeenturnedinto a rule codedin
LISP;the LISPis thentranslatedbackinto Englishfor approval.Compare the results with the expert;~
originalinput.
Nowthat the expertis satisfied that the systemunderstands
the rule as hehasenteredit, TEIRESIAS
checksseveralmorethings aboutthe rule.)

(PHASEIV: SECOND-GUESSING)

I hateto criticize, Dr. Davis,but did youknowthat mostrules about


whatthe categoryof a organismmightbe, that mention-
thesite of a culture
theinfection
Also mention-
S
[A] - the portal of entryof a organism
ShallI try to writea clauseto account
for [A]?
++**y

(The mechanism that makespossiblethis sort of second-guessing by TEIRESIAS is explainedin detail


below.Briefly, the systemmaintainsa modelof the knowledge base. In checkingto seehowthis new
pieceof information"fits into" that model,it discoversthat its expectations
havebeenonly partially met.
It indicatesto the expertwhetexpectations
wereleft unfulfilled andasksflit shouldattemptto fill in the
missingparts.)

Howabout-
[A] Theportal of entryof the organism
is GI
Ok?
++**y

(Theexpertagreedto the inclusionof a clauseaboutthe portal of entry; this determines whichattribute


to use. TEIFtESIASexaminesits mode/ofthe knowledge baseto seewhichpredicatefunction is likely to
bethe right one;the mode/evidently
suggested the functionSAME. Finally, in orderto fix the original
bug,it mustbepossibleto invokethe newrule for this patient. Since,in theoriginal consultation,th~)
expertresponded
that the porte/ofentry wasthe G/ tract, this is a plausibleguessfor the valueto use.
This demonstratesanotherbenefit of the "knowledge acquisitionin context" methodology mentioned in
Chapter7.)

Also, mostrules aboutwhatthe categoryof a organismmaybe


Also concludeabout-
[A] - the identity of a organism
ShallI try to write a clauseto account
for [A]?

FIGURE 9-3 continued


Example of TEIRESIAS in Operation 183

++**N

(Thesystemnoticedanotherunfulfilled expectationconcerning
the action part of the rule, but the expert
alwayshasthe optionof overridingthe systemssuggestions,anddoesso.)

Therule nowreads:

RULE383
IF: 1) Theinfection is primary-bacteremia, and
2) Thesite of the cultureis oneof: thosesites
that are normallysterile, and
3) Theportal of entryof the organism is GI
THEN: Thereis stronglysuggestive evidence
(.8) that the
categoryof the organismis enterobacteriaceae

(Nowthat both the expertandTEIRESIAS are satisfied with the rule, there is a bit of bookkeeping
to do.
First, the newrule is checkedby the system(behindthe scenes)to be surethat there is nothing
obviouslywrongwith it--for instance,oneof the tests ensuresthat premise clausesare not contradictory.
Hadthis beenthe case,the systemwouldhavenoticedit at this point andthe user wouldhavebeen
giventhe optionof retumingto the rule editor to fix the problem. Thereare severalotherthingsthe
systemchecksat this point, but this rule passesall the tests, andhencethereis noindicationof any
problem.)

(PHASEV: FINAL CHECKOUT,


BOOKKEEPING,
RERUNNING
THE CONSULTATION)

Pleasedescribe
briefly whyyouadded
(or changed)
this rule. Type
emptyline when
done.

+ + ** THESYSTEMFAILEDTO REALIZEIT COULD


CONCLUDE
+ + ** CATEGORY,
ANDTHIS ALLOWEDRULE184TO INCORRECTLY
+ +** CONCLUDE
IDENTITY
-t-+**

(Overthe years of MYCIN development,


the expertsassociatedwith the project haveoccasionallylooked
in astonishment at a rule which,thoughit hadbeenaroundfor sometime, seemed, superficially at least,
to makevery little sense.Thequestionthen arises as to whythat rule wasaddedto the knowledge base.
Toavoidthis problem,the expertis requiredto explainwhyhe wroteor changed the rule.

Thereare also a number of other bookkeeping-type


choresthat are being attendedto here, again,
behindthe scenes.TEIRESlAS takescareof taggingthe rule with informationthat will simplify keeping
track of the large, andconstantlychanging,bodyof rules. Wewill seein a moment howthat information
is accessed.)

RULE383
has nowbeenaddedto the knowledgebase.
I will nowrerunthe consultationto checkthe effectiveness
of your new
rule. Getcomfortable,
sincethis maytakea bit.
Also,newquestionsmaybeaskedas a result of this newrule. If a
question is printed, pleaseanswer
it with referenceto this current
patient.

(TEIRESIAS nowinvokesthe performanceprogramas a subprocessto makesure the bugis fixed, and


to test for anyotherside effects. SinceMYCIN
hassavedall the answers
fromthe initial consultation,the
184 Interactive Transfer of Expertise

processis fairly inexpensive.Theexpert is warned aboutnewquestionsbecause his newrule may


mentionanattribute that wasnot previouslytraced,or it mayindirectly causethe tracingof one. While
there are somecasesin whichit is possibleto compute the incrementaleffect on a givenconsultationof
addinga single newrule to the knowledge
base,theseare rare enough that in the long run it is
computationallycheaperto simplyrerunthe consultation.)

18-DEC-75 16:30:46
.......... PATIENT-281 ..........
.......... INFECTION-1 ..........
........... CULTURE-1 ...........
........... ORGANISM-1 ..........

(Theanswersto all questionswereavailable Internally, so the performance programasksnothingnew.


Thecontextnamesare printed to let the expertfollow the progress
of the consultation.)

Mytherapyrecommendation will be basedon the following probable


infection(s) andpotentialcausativeorganism(s):

INFECTION-1is PRIMARY-BACTEREMIA
<Item 1> E. COLI [ORGANISM-I]
<item 2> KLEBSIELLA-PNEUMONIAE [ORGANISM-I]
<Item 3> PROTEUS-NON-MIRABILIS[ORGANISM-I]

Well,it appears
that thingsarestraightened
out,

(Thenewconclusionsare the onesthe expertindicatedearlier as the correct answers.This doesnot go


unnoticedby TEIRESIAS,
which,as before,interrupts the consultationat this point to checkthe vafidity of
the answersproduced.)

Areall thesevaluescorrect now?


++**y

(As always,TEIRESIAS
gives the expertthe last word,andwaits for approvalbefore proceeding.)

Good,then the problemsbeenfixed.


.. continuingwith therapyselectionnow.

(At this point TEIRESIAS


allows the performance
programsubprocess
to continuewith the normal
therapy-selection
phase.This hasbeenomittedherefor breviiy.
Thereadermaybe wonderingwhythree errors disappeared with the addition of onerule. Oncethe new
rule permittedthe categoryof the organism
to beconcluded,not only did Rule184fail, but a secondrule
that hadpreviouslyfailed wasnowsuccessfullyinvoked.This rule usedthe categoryof the organismto
concludethat klebsiella-pneumoniae
andproteus-non-mirabilis
werelikely identities.

All of the bookkeeping that wasdoneis accessiblevia the INFOrmation command addedto the
question-answeringfacility of the performance
program.This gives the expert the background
for any
rule in the system.)

** INFO383
waswrittenbyDr. Davis
on December
18, 1975

FIGURE 9-3 continued


How It All Works 185

for the followingreason:


THESYSTEMFAILED TO REALIZEIT COULDCONCLUDECATEGORY,
AND
THIS ALLOWEDRULE184TO INCORRECTLY CONCLUDE
IDENTITY.
for patient[281]

[whowasdescribed
asfollows:
CLASSICCASEOF GRAMNEGATIVERODINFECTIONFORA
PATIENTWITHA NON-NOSOCOMIAL
DISEASE]

FIGURE 9-3 continued

9.6 How It All Works

9.6.1 Overview of the Main Ideas

Before reviewing the trace in more detail, we describe the ideas that make
possible the capabilities displayed. This subsection serves primarily to name
and briefly sketch each in turn; the details are supplied in subsequent
subsections reviewing the example. [See Davis (1976) for more details.]

Knowledge Acquisition in Context

Pertormance programs of the sort TEIRESIAShelps create will typically


find their greatest utility in domains where there are no unifying laws on
which to base algorithmic methods. In such domains there is instead a
collection of informal knowledge based on accumulated experience. This
means an expert specifying a new rule may be codifying a piece of knowl-
edge that has never previously been isolated and expressed as such. Since
this is difficult, anything that can be done to ease the task will prove very
useful.
In response, we have emphasized knowledge acquisition in the context
of a shortcoming in the knowledge base. To illustrate the utility of this
approach, consider the difference between asking the expert:

What should I know about the patient?

and saying to him:

Here is an example in which you say the performance program made


a mistake. Here is all the knowledge the program used, here are all
the facts of the case, and here is how it reached its conclusions. Now,
186 InteractiveTransferof Expertise

what is it that you knowand the system doesnt that allows you to avoid
making that same mistake?

Note how much more focused the second question is and how much easier
it is to answer.

Building Expectations

The focusing provided by the context is also an important aid to TEIRE-


SIAS. In particular, it permits the system to build up a set of expectations
concerning the knowledge to be acquired, facilitating knowledge transfer
and making possible several useful features illustrated in the trace and
described below.

Model-Based Understanding

Model-based understanding suggests that some aspects of understanding


can be viewed as a process of matching: the entity to be understood is
matched against a collection of prototypes, or models, and the most ap-
propriate model is selected. This sets the framework in which further in-
terpretation takes place. While this view is not new, TEIRESIASemploys
a novel application of it, since the system has a model of the knowledgeit
is likely to be acquiring from the expert.

Giving a Program a Model of Its Own Knowledge

We will see that the combination of TEIRESIAS and the perfi)rmance


program amounts to a system that has a picture of its own knowledge. That
is, it not only knows something about a particular domain but also in a
primitive sense knows what it knows and employs that model of its knowl-
edge in several ways.

Learning as a Process of Comparison

Wedo not view learning as simply the addition of information to an ex-


isting base of knowledge, but instead take it to include various forms of
comparison of the new information with the old. This of course has its
corollary in humanbehavior: a student will quickly point out discrepancies
between newly taught material and his or her current stock of information.
TEIRESIAShas a similar, though very primitive, capability: it compares
new information supplied by the expert with the existing knowledge base,
points out inconsistencies, and suggests possible remedies.
How It All Works 187

Learning by Experience

One of the long-recognized potential weaknesses of any model-based sys-


tem is dependence on a fixed set of models, since the scope of the pro-
grams "understanding" of the world is constrained by the number and
types of models it has. As will become clear, the models TEIRESIASem-
ploys are not handcrafted and static, but are instead formed and contin-
ually revised as a by-product of its experience in interacting with the ex-
pert.

9.6.2 Phase I: Tracking Downthe Bug

To provide the debugging facility shown in the dialogue of Section 9.5,


TEIRESIASmaintains a detailed record of the actions of the performance
program during the consultation and then interprets this record on the
basis of an exhaustive analysis of the performance programs control struc-
ture. This presents the expert with a comprehensible task because (a) the
backward-chaining technique used by the performance program is
straightforward and intuitive, even to a nonprogrammer, and (b) the rules
are designed to encode knowledge at a reasonably high conceptual level.
As a result, even though TEIRESIASis running through an exhaustive
case analysis of the preceding consultation, the expert is presented with a
task of debugging reasoning rather than code.
The availability of an algorithmic debugging process is also an impor-
tant factor in encouraging the expert to be as precise as possible in making
responses. Note that at each point in tracking down the error the expert
must either approve of the rules invoked and the conclusions made or
indicate which one was in error and supply the correction. This approach
is extremely useful in domains where knowledge has not yet been formal-
ized and where the traditional reductionist approach of dissecting reason-
3ing downto observational primitives is not yet well established.
TEIRESIASfurther encourages precise comments by keeping the de-
bugging process sharply focused. For instance, when it became clear that
there was a problem with the inability to deduce the category, the system
first asked which category it should have been. It then displayed only those
rules appropriate to that answer, rather than all the rules concerning that
topic that were tried.
Finally, consider the extensive amount of contextual information that
is now available. The expert has been presented with a detailed example

3The debugging process does allow the expert to indicate that the performance programs
results are incorrect, but he or she cannot find an error in the reasoning. This choice is
offered only as a last resort and is intended to deal with situations where there maybe a bug
in the underlying control structure of the performance program (contrary to our assumption
in Section 9.2).
188 InteractiveTransferof Expertise

of the performance program in action, has available all of the facts of the
case, and has seen how the relevant knowledge has been applied. This
makes it much easier for him or her to specify the particular chunk of
knowledge that may be missing. This contextual information will prove
very useful for TEIRESIAS as well. It is clear, for instance, what the effect
of invoking the new rule must be (as TEIRESIASindicates, it must be
rule that will deduce that the category should be Enterobacteriaceae), and it
is also clear what the circumstances of its invocation must be (the rule must
be invocable for the case under consideration, or it wont repair the bug).
Both of these pieces of information are especially useful in Phase II and
Phase V.

9.6.3 Phase II: Interpreting the Rule

As is traditional, "understanding" the experts natural language version of


the rule is viewed in terms of converting it to an internal representation
and then retranslating that into English for the experts approval. In this
case the internal representation is the Interlisp form of the rule, so the
process is also a simple type of code generation.
There were a number of reasons for rejecting a standard natural lan-
guage understanding approach to this problem. First, as noted, under-
standing natural language is well knownto be a difficult problem and was
not a central focus of this research. Second, our experience suggested that
experts frequently sacrifice precise grammar in favor of the compactness
available in the technical language of the domain. As a result, approaches
that were strongly grammar-based might not fare well. Finally, technical
language often contains a fairly high percentage of unambiguous words,
so a simpler approach that includes reliance on keyword analysis has a
good chance of performing adequately.
As will become clear, our approach to analyzing the experts new rule
is based on both simple keyword spotting and predictions TEIRESIASis
able to make about the likely content of the rule. Code generation is ac-
complished via a form of template completion that is similar in some re-
spects to template completion processes that have been used in generating
natural language. Details of all these processes are given below.

Models and Model-Based Understanding

To set the stage for reviewing the details of the interpretation process, we
digress for a moment to consider the idea of models and model-based
understanding, and then to explore their application in TEIRESIAS. In
the most general terms, a model can be seen as a compact, high-level description
of structure, organization, or content that maybe used both to provide a frame-
work for lower-level processing and to express expectations about the world. One
How It All Works 189

early, particularly graphic example of this idea can be found in the work
on computer vision by Falk (1970). The task there was understanding
block-world scenes; the goal was to determine the identity, location, and
orientation of each block in a scene containing one or more blocks selected
from a knownset of possibilities. The key element of this work of interest
to us here is the use of a set of prototypes for the blocks, prototypes that
resembled wire frame models. Although such a description oversimplifies,
part of the operation of Falks system can be described in terms of two
phases. The system first performed a preliminary pass to detect possible
edge points in the scene and attempted to fit a block model to each col-
lection of" edges. The model chosen was then used in the second phase as
a guide to further processing. If, for instance, the model accounted for all
but one of the lines in a region, this suggested that the extra line might be
spurious. If the model fit well except for some line missing from the scene,
that was a good hint that a line had been overlooked and indicated as well
where to go looking for it.
Wecan imagine one further refinement in the interpretation process,
though it was not a part of Falks system, and explain it in these same
terms. Imagine that the system had available some a priori hints about what
blocks might be found in the next scene. One way to express those hints
would be to bias the matching process. That is, in the attempt to match a
model against the data, the system might (depending on the strength of
the hint) try the indicated models first, make a greater attempt to effect
match with one of them, or even restrict the set of possibilities to just those
contained in the hint.
Note that in this system (i) the models supply a compact, high-level
description of structure (the structure of each block), (ii) the description
used to guide lower-level processing (processing of the array of digitized
intensity values), (iii) expectations can be expressed by a biasing or restric-
tion on the set of models used, and (iv) "understanding" is viewed in terms
of a matching and selection process (matching models against the data and
selecting one that fits).

Rule Models

Now,recall our original task of interpreting the experts natural language


version of the rule, and view it in the terms described above. As in the
computer vision example, there is a signal to be processed (the text), it
noisy (words can be ambiguous), and there is context available (from
debugging process) that can supply some hints about the likely content of
the signal. To complete the analogy, we need a model that can (a) capture
the structure, organization, or content of the experts reasoning, (b) guide
the interpretation process, and (c) express expectations about the likely
content of the new rule.
Where might we get such a thing? There are interesting regularities
190 InteractiveTransferof Expertise

EXAMPLES--the
subset of rules this model describes
DESCRIPTION---characterization
of a typical memberof this subset
characterization of the premise
characterization of the action
MORE
GENERAL--pointersto models describing more general subsets of rules
MORE
SPECIFIC--pointers to models describing more specific subsets of rules

FIGURE
9-4 Rule model structure.

in the knowledge base that might supply what we need. Not surprisingly,
rules about a single topic tend to have characteristics in common--there
are ways of reasoning about a given topic. From these regularities we have
constructed rule models. These are abstract descriptions of subsets of rules,
built from empirical generalizations about those rules and used to char-
acterize a typical memberof the subset.
Rule models are composed of four parts as shown in Figure 9-4. They
contain, first, a list of EXAMPLES, the subset of rules from which this
model was constructed. Next, a DESCRIPTIONcharacterizes a typical
memberof the subset. Since we are dealing in this case with rules composed
of premise-action pairs, the DESCRIPTIONcurrently implemented con-
tains individual characterizations of a typical premise and a typical action.
Then, since the current representation scheme used in those rules is based
on associative triples, we have chosen to implement those characterizations
by indicating (a) which attributes typically appear in the premise (or action)
of a rule in this subset and (b) correlations of attributes appearing in the
premise (or action). 4 Note that the central idea is the concept of character-
izing a typical memberof the subset. Naturally, that characterization will look
different for subsets of rules, procedures, theorems, or any other repre-
sentation. But the main idea of characterization is widely applicable and
not restricted to any particular representational formalism.
The two remaining parts of the rule model are pointers to models
describing more general and more specific subsets of rules. The set of
models is organized into a number of tree structures, each of the general
form shown in Figure 9-5. At the root of each tree is the model made from
all the rules that conclude about the attribute (i.e., the CATEGORY model),
below this are two models dealing with all affirmative and all negative rules
(e.g., the CATEGORY-IS model). Below this are models dealing with rules
that affirm or deny specific values of the attribute. These models are not
handcrafted by the expert. They are instead assembled by TEIRESIASon
the basis of the current contents of the knowledge base, in what amounts
to a simple statistical form of concept formation. The combination of TEI-
RESIASand the performance program thus presents a system that has a
model of its own knowledge, one it forms itself.

4Both(a) and(b) are constructedvia simplethresholdingoperations.


HowIt All Works 191

<attribute>

1~ ~ ~~~~
<atlribute>-is <attribute).isWt

(attribute>.is-X <attribute>-is-Y <attribute>.isnt.X <sttribute>-isnt-Y

FIGURE
9-5 Organization of the rule models.

The rule models are the primary example of meta-level knowledge


used in knowledge aquisition (for discussion of other forms, see Chapter
28). This form of knowledge and its generation by the system itself have
several interesting implications illustrated in later sections.
Figure 9-6 shows a rule model; this is the one used by TEIRESIASin
the interaction shownearlier. (Since not all of the details of implementation
are relevant here, this discussion will omit some.) As indicated above, there
is a list of the rules from which this model was constructed, descriptions
characterizing the premise and the action, and pointers to more specific
and more general models. Each characterization in the description is shown

CATEGORY-IS
EXAMPLES ((RULE116.33)
(RULE050.78)
(RULE037.80)
(RULE095.90)
(RULE152
1.0)
(RULE140
1.0))
PREMISE ((GRAMSAMENOTSAME 3,83)
(MORPHSAMENOTSAME 3.83)
((GRAMSAME)(MORPH SAME)
((MORPHSAME)(GRAMSAME)
((AIR SAME)(NOSOCOMIAL NOTSAME
SAME)(MORPH
(GRAMSAME)1.50)
((NOSOCOMIAL NOTSAME SAME)(AIR SAME)(MORPH
(GRAMSAME)1.50)
((INFECTIONSAME)(SITE MEMBF SAME)
((SITE MEMBF SAME)(INFECTIONSAME)(PORTAL
1.23))
ACTION ((CATEGORYCONCLUDE 4.73)
(IDENT CONCLUDE
4.05)
((CATEGORYCONCLUDE)(IDENT CONCLUDE)
4.73))
MORE-GENL(CATEGORY-MOD)
MORE-SPECNIL

FIGURE9-6 Rule model for rules concluding affirmatively


about CATEGORY.
192 InteractiveTransferof Expertise

split into its two parts, one concerning the presence of individual attributes
and the other describing correlations. The first item in the premise de-
scription, for instance, indicates that most rules reaching conclusions about
the category mention the attribute GRAM (for gram stain) in their prem-
ises; when they do mention it, they typically use the predicate functions
SAMEand NOTSAME; and the "strength," or reliability, of this piece of
advice is 3.83 [see Davis (1976) for precise definitions of the quoted terms].
Correlations are shownas several lists of attribute-predicate pairs. The
fourth item in the premise description, for example, indicates that when
the attribute gram stain (GRAM)appears in the premise of a rule in this
subset, the attribute morphology (MORPH)typically appears as well.
before, the predicate functions are those frequently associated with the
attributes, and the numberis an indication of reliability.

Choosing a Model

It was noted earlier that tracking down the bug in the knowledge base
provides useful context and, among other things, serves to set up TEI-
RESIASsexpectations about the sort of rule it is about to receive. As sug-
gested, these expectations are expressed by restricting the set of models
that will be considered for use in guiding the interpretation. At this point
TEIRESIASchooses a model that expresses what it knows thus far about
the kind of rule to expect, and in the cur~ent example it expects a rule
that will deduce that the category should be Enterobacteriaceae.
Since there is not necessarily a rule model for every characterization,
the system chooses the closest one. This is done by starting at the top of
the tree of models and descending until either reaching a model of the
desired type or encountering a leaf of the tree. In this case the process
descends to the second level (the CATEGORY-IS model), notices that there
is no model for CATEGORY-IS-ENTEROBACTERIACEAE at the next
5level, and settles for the former.

Using the Rule Model: Guiding the Natural Language


Interpretation

TEIRESIASuses the rule models in two different ways in the acquisition


process. The first is as a guide in understanding the text typed by the
expert, as is described here. The second is as a means of allowing TEI-

5Thistechniqueis used in several places throughoutthe knowledge transfer process, andin


general supplies tile modelthat best matchesthe current requirements,by accommodating
varyinglevels of specificity in the stated expectations.If, for instance,the systemhadknown
onlythat it expecteda rule that concluded aboutcategory,it wouldhaveselected the first
nodein the modeltree withoutfurther search. TEIRESIAS also has techniquesfor checking
that the appropriate modelhas been chosenand can advise the expert if a discrepancy
appears. See Davis(1976)for an example.
How It All Works 193

The patients infection is primary bacteremia

1
OBJ
VALUE
PREDICATE
FUNCTION

ATTRIBUTE
(a) Connotations found in the new rule.

Function Template
SAME (OBJATTRIBUTE
VALUE)
(b) Templatefor the predicate function SAME.

1) (SAME CNTXTTREAT-ALSOPRIMARY-BACTEREMIA)
"Primarybacteremia
is an associated
infectionfor which
therapyshouldcover."
2) (SAME CNTXTINFECTIONPRIMARY-BACTEREMIA)
"Theinfectionis primarybacteremia."

(c)Two choices for the resulting code (with translations).

FIGURE 9-7 Use of rule models to guide the understanding


of a newrule.

RESIASto see whether the new rule "fits into" its current model of the
knowledge base in Phase IV.
To see how the rule models are used to guide the interpretation of the
text of the new rule in the example, consider the first line of text typed by
the expert in the new rule, Rule 383 (THE PATIENTS INFECTIONIS
PRIMARY-BACTEREMIA). Each word is first reduced to a canonical form
by a process that can recognize plural endings and that has access to a
dictionary of synonyms (see Chapter 18). Wethen consider the possible
connotations that each word may have (Figure 9-7a). Here connotation
means the word might be referring to one or more of the conceptual
primitives from which rules are built (i.e., it might refer to a predicate
6function, attribute, object, or value). One set of connotations is shown.
Code generation is accomplished via a fill-in-the-blank mechanism.
Associated with each predicate function is a template (see Chapter 5), a list
structure that resembles a simplified procedure declaration and gives the

6The connotations of a word are determined by a number of pointers associated with it,
which are in turn derived from the English phrases associated with each of the primitives.
194 InteractiveTransferof Expertise

order and generic type of each argument to a call of that function (Figure
9-7b). Associated with each of the primitives that make up a template (e.g.,
ATTRIBUTE, VALUE)is a procedure capable of scanning the list of con-
notations to find an item of the appropriate type to fill in that blank. The
whole process is begun by checking the list of connotations for the predi-
cate function implicated most strongly (in this case, SAME),retrieving the
template for that function, and allowing it to scan the connotations and
"fill itself in" using the procedures associated with the primitives. The set
of connotations in Figure 9-7a produces the LISP code in Figure 9-7c. The
ATTRIBUTEroutine finds two choices for the attribute name, TREAT-
ALSOand INFECTION,based on associations of the word infection with
the phrases used to mention those attributes. The VALUEroutine finds
an appropriate value (PRIMARY-BACTEREMIA),the OBJect routine
finds the corresponding object type (PATIENT)(but following the con-
vention noted earlier, returns the variable name CNTXT to be used in the
actual code).
There are several points to note here. First, the first interpretation in
Figure 9-7c is incorrect (the system has been misled by the use of the word
infection in the English phrase associated with TREAT-ALSO); well see
in a momenthow it is corrected. Second, several plausible (syntactically
valid) interpretations are usually available from each line of text, and TEI-
RESIASgenerates all of them. Each is assigned a score (the text score)
indicating how likely it is, based on howstrongly it was implicated by the
text. Finally, we have not yet used the rule models, and it is at this point
that they are employed.
We can view the DESCRIPTION part of the rule model selected ear-
lier as a set of predictions about the likely content of the new rule. In these
terms the next step is to see how well each interpretation fulfills those
predictions. Note, for example, that the last line of the premise description
in Figure 9-6 "predicts" that a rule about category of organism will contain
the attribute PORTAL and the third clause of Rule 383 fulfills this pre-
diction. Each interpretation is scored (employing the "strength of advice"
number in the rule model) according to how many predictions it fulfills,
yielding the prediction satisfaction score. This score is then combinedwith the
text score to indicate the most likely interpretation. Because more weight
is given to the prediction satisfaction score, the system tends to "hear what
it expects to hear."

Rule Interpretation: Sources of Performance

While our approach to natural language is very simple, the overall perfor-
mance of the interpretation process is adequate. The problem is made
easier, of course, by the fact that we are dealing with a small amount of
text in a restricted context and written in a semiformal technical language,
rather than with large amounts of text in unrestricted dialogue written in
unconstrained English. Even so, the problem of interpretation is substan-
How It All Works 195

tial. TEIRESIASperformance is based on the application of the ideas


noted above (Section 9.6.1), notably the ideas of building expectations and
model-based understanding. Its performance is also based on the use of
two additional techniques: the intersection of data-driven and model-dri-
ven processing, and the use of multiple sources of knowledge.
First, the interpretation process proceeds in what has been called the
recognition mode:it is the intersection of a bottom-up (data-directed) process
(the interpretations suggested by the connotations of the text) with a top-
down (goal-directed) process (the expectations set up by the choice of
rule model). Each process contributes to the end result, but it is the com-
bination of them that is effective. This intersection of two processing modes
is important when the interpretation techniques are as simple as those
employed here, but the idea is more generally applicable as well. Even with
more powerful interpretation techniques, neither data-directed nor goal-
directed processing is in general capable of eliminating all ambiguity and
finding the correct answer. By moving from both directions, top-down and
bottom-up, we make use of all available sources of information, resulting
in a far more focused search for the answer. This technique is applicable
across a range of different interpretation problems, including those of text,
vision, and speech.
Second, in either direction of processing, TEIRESIASuses a number
of different sources of knowledge. In the bottom-up direction, for exam-
ple, distinct information about the appropriate interpretation of the text
comes from (a) the connotations of individual words (interpretation of each
piece of data), (b) the function template (structure for the whole interpre-
tation), and (c) internal consistency constraints (interactions between
points), as well as several other sources [see Davis (1976) for the full list].
Any one of these knowledge sources alone will not perform very well, but
acting in concert they are much more effective [a principle developed ex-
tensively in the HEARSAY system (Reddy et al., 1973)].
The notion of program-generated expectations is also an important
source of power, since the selection of a particular rule model supplies the
focus for the top-down part of the processing. Finally, the idea of model-
based understanding offers an effective way of using the information in
the rule model to effect the top-down processing.
Thus our relatively simple techniques supply adequate power because
of the synergistic effect of multiple, independent sources of knowledge,
because of the focusing and guiding effect of intersecting data-directed
and goal-directed processing, and because of the effective mechanism for
interpretation supplied by the idea of model-based understanding.

9.6.4 Phase III: Modifying the Interpretation

TEIRESIAShas a simple rule editor that allows the expert to modify ex-
isting rules or (as in our example) to indicate changes to the systems at-
196 InteractiveTransferof Expertise

tempts to understand a new rule. 7 The editor has a number of simple


heuristics built into it to makethe rule modification process as effective as
possible. In dealing with requests to change a particular clause of a new
rule, for instance, the system reevaluates the alternative interpretations,
taking into account the rejected interpretation (trying to learn from its
mistakes) and making the smallest change possible (using the heuristic that
the original clause was probably close to correct). In our example, this
succeeds in choosing the correct clause next (the second choice shown in
Figure 9-7c).
There are also various forms of consistency checking available. One
obvious but effective constraint is to ensure that each word of the text is
interpreted in only one way. In the trace shown earlier, for instance, ac-
cepting the new interpretation of clause 1 means clause 2 must be spurious,
since it attempts to use the word infection in a different sense.

9.6.5 Phase IV: Second-Guessing, Another Use of the


Rule Models

After the expert indicates that TEIRESIAShas correctly understood what


he or she has written, the system checks to see if it is satisfied with the
content of the rule. The idea is to use the rule model to see howwell this
new rule "fits into" the systems model of its knowledge; i.e., does it "look
like" a typical rule of the sort expected?
In the current implementation, an incomplete match between the new
rule and the rule model triggers a response from TEIRESIAS. Recall the
last line of the premise description in the rule model of Figure 9-6:
((SITE MEMBF
SAME)(INFECTIONSAME)(PORTALSAME)

This indicates that when the culture SITE for the patient appears in the
premise of a rule of this sort, then INFECTIONtype and organism POR-
TALof entry typically appear as well. Note that the new rule in the ex-
ample has the first two of these, but is missing the last, and the system
points this out.
If the expert agrees to the inclusion of a new clause, TEIRESIAS
attempts to create it. Since in this case the agreed-on topic for the clause
was the portal of entry of the organism, this must be the attribute to use.
The rule model suggests which predicate function to use (SAME, since
that is the one paired with PORTAL in the relevant line of the rule model),
and the template for this function is retrieved. It is filled out in the usual
way, except that TEIRESIASchecks the record of the consultation when
seeking items to fill in the template blanks. In this case only a value is still
missing. Note that since the expert indicated that the portal of entry was

VMuchof the editor has subsequently been incorporated into EMYCIN--seeChapter 15.
How It All Works 197

GI, TEIRESIASuses this as the value for PORTAL.The result is a plau-


sible guess, since it ensures that the rule will in fact work for the current
case (note this further use of the debugging in context idea). It is not
necessarily correct, of course, since the desired clause may be more general,
but it is at least a plausible attempt.
It should be noted that there is nothing in this concept of second-
guessing that is specific to the rule models as they are currently designed,
or indeed to associative triples or rules as a knowledgerepresentation. The
fundamental point (as mentioned above)is testing to see how the new
knowledge "fits into" the systems current model of its knowledge. At this
point the system might perform any kind of check, for violations of any
established prejudices about what the new chunk of knowledge should look
like. Additional kinds of checks of rules might concern the strength of the
inference, number of clauses in the premise, etc. In general, this second-
guessing process can involve any characteristic that the system may have
"noticed" about the particular knowledge representation in use.
Note also that this use of the rule model for second-guessing is quite
different from the first use mentioned--guiding the understanding of En-
glish. Earlier we were concerned about interpreting text and determining
what the expert actually said; here the task is to see what the expert plau-
sibly should have said. Since, in assembling the rule models, TEIRESIAS
may have noticed regularities in the reasoning about the domain that may
not yet have occurred to the expert, the systems suggestions may conceiv-
ably be substantive and useful.
Finally, all this is in turn an instance of the more general notion of
using meta-level knowledge in the process of knowledge acquisition: TEI-
RESIASdoes not simply accept the new rule and add it to the knowledge
base; it instead uses the rule model to evaluate the new knowledge in light
of its current knowledgebase. In a very simple way, learning is effected as
a process of examining the relationships between what is already known
and the new information being taught.

9.6.6 Phase V: Final Checkout, Bookkeeping,


Rerunning the Consultation

Whenboth the expert and TEIRESIASare satisfied, there is one final


sequence of tests to be performed, reflecting once again the benefit of
knowledge acquisition in context. At this point TEIRESIASexamines sev-
eral things about the rule, attempting to make sure that it will in fact fix
the problem uncovered. In this case, for instance, the action of the new
rule should be a conclusion about category, the category mentioned should
be Enterobacteriaceae, and the conclusion should be affirmative. The premise
should not contain any clauses that are sure to fail in the context in which
the rule will be invoked. All these are potential sources of error that would
makeit obvious that the rule will not fix the bug.
198 InteractiveTransferof Expertise

There are also a number of straightforward bookkeeping tasks to be


performed, including hooking the new rule into the knowledge base so
that it is retrieved and invoked appropriately (in this case it gets added to
the list of rules that conclude about category), 8 and tagging it with infor-
mation that will make .it easier to maintain the large and constantly chang-
ing body of rules (e.g., the name of the rule author, date of creation,
authors justification for adding the rule, a pointer to the consultation that
prompted its creation).
At this point, the system also performs any necessary recomputation
of rule models. The operation is very fast, since it is clear from the action
part of the rule which models may need to be recomputed, and the EX-
AMPLES part of the model then supplies the names of" the other relevant
rules. TEIRESIASthen reruns the performance program as a subprocess,
and checks the results to see if all of the problems have been repaired.

9.7 Other Uses for the Rule Models

Two other uses have been developed for the rule models, which demon-
strate capabilities made possible by meta-level knowledge.

9.7.1 "Knowing What You Know"

As described in Part Six, MYCIN has the ability to answer simple natural
language questions about the knowledge base. In response to a question
such as "How do you determine the identity of an organism causing an
infection?" MYCIN would originally have printed the relevant rules. But
a rule model, as a generalization of an entire class of rules, answers the
question, too. Figure 9-8 shows one example of MYCINscapabilities after
rule models had been added. By simply "reading" the rule model to the
user, TEIRESIAScan supply an overview of the knowledge in the relevant
rules. This suggests the structure of global trends in the knowledge of the
expert who assembled the knowledge base, and thus helps to make clear
the overall approach of the system to a given topic.

8Notethat these tests requirethe ability to dissect andpartially evaluatethe rule. Thesame
functiontemplatethat is usedas a pattern for constructingrules is also usedas a guidein
this dissectionandpartial evaluationprocess.
Other Uses for the Rule Models 199

** HOWDOYOUDECIDETHAT AN ORGANISM
IS PSEUDOMONAS
AERUGINOSA?
Ruleswhichconclude that the identity of the organism
is
pseudomonas-aeruginosa generallyuse oneor moreof the following pieces
of information:
thesite of theculture
the gramstain of the organism
the morphology of the organism
Furthermore,the followingrelationshipshold:
Thegramstain of the organismandthe morphology of the
organism tend to appeartogetherin theserules.
RULE184,RULE116,RULE047,RULE085, and RULE040 conclude that the
identity of the organismis pseudomonas-aeruginosa.
Whichof these do youwishto see?

FIGURE9-8 "Knowing what you know."

9.7.2 "Knowing What You Dont Know"

Another use of the models demonstrates that, in a primitive fashion, they


give TEIRESIASa model of what it doesnt know. There are models in the
current system made from between 2 (the defined minimum) and 35 rules.
We have defined a metric to measure the strength of a model, based on
both the total number of rules from which the model was constructed and
the strength of the inference of each of those rules. The entire model set
is kept ordered from weakest to strongest, giving the system some indi-
cation of" its likely competence on a range of subjects. In a very primitive
way, it thus gains knowledge of where it is ignorant.
This naakes possible a new option for the expert. Instead of adding a
rule to fix a bug, the expert can ask the system what it would like to learn.
(See Figure 9-9.) The system cycles through the rule models in order,
indicating the weakest topics first. This is, of course, a first-order solution
to the problem of giving the system an indication of its areas of ignorance.
A better solution would supply an indication of how much the system
knows about a subject, compared with how much there is to know. There
surely are subjects for which three or four rules exhaust the available
knowledge, while for others a hundred or more rules may not suffice. The
issue is related to work described by Carbonell and Collins (1973) on closed
versus open sets. That paper offers some interesting strategies for allowing
a program to decide when it is ignorant and how it might reason in the
face of the inability to store every fact about a given topic.
There appear to be no easy ways to deduce the incompleteness of the
knowledge base using only the information stored in it. It is not valid to
say, for instance, that there ought to be even a single rule for every attribute
(how could a patients name be deduced?). Nor is there a well-defined set
of attributes for which no rules are likely to exist. Nor is it clear what sort
200 Interactive Transfer of Expertise

--** WHATWOULD YOULIKE TO KNOW?


Answer
-Y for yes
-N for no, or
-Qto quit this,
Thereare only tworules for concludingthat the identity of an
organismmight be enterococcus. Wouldyoulike to give mea
rule for that?
++**N
Thereare only 2 rules for concludingthat aninfectionmight
beinfective-endocarditis.Wouldyoulike to offer a rule for
that?
++**y
Thenewrule will be called RULE384
if (1)

FIGURE 9-9 "Knowing what you dont know."

of information would allow the incompleteness to be deduced. The issue


is a significant one, since a good solution to the problem would not only
give TEIRESIAS a better grasp of where the performance program was
weak but also provide several important capabilities to the performance
programitself. It would, for example, permit the use of the "if it were true
I would know" heuristic of Carbonell and Collins (1973). Roughly restated,
this says "If I know a great deal about subject S, and fact F concerns an
important aspect of S, then if I dont already know that F is true, its
probably false." Thus in certain circumstances a lack of knowledge about
the truth of a statement can plausibly be used as evidence suggesting that
the statement is false. This is another useful form of meta-level knowledge.

9.8 Assumptions and Limitations

The work reported here can be evaluated with respect to both the utility
of its approach to knowledge acquisition and its success in implementing
that approach.

9.8.1 The Approach

As noted, our approach involves knowledgetransfer that is interactive, that


is set in the context of a shortcoming in the knowledge base, and that
transfers a single rule at a time. Each of these has implications about TEI-
RESIASsrange of applicability.
Interactive knowledge transfer seems best suited to task domains in-
Assumptions and Limitations 201

volving problem solving that is entirely or primarily a high-level cognitive


task, with a number of distinct, specifiable principles. Consultations in
medicine or financial investments seem to be appropriate domains, but the
approach would not seem well suited to those parts of, say, speech under-
standing or scene recognition in which low-level signal processing plays a
significant role.
The transfer of" expertise approach presents a useful technique for
task domains that do not permit the use of programs (like those noted in
Section 9.2) that autonomously induce new knowledge from test data. The
autonomous mode may most commonly be inapplicable because the data
for a domain simply dont exist yet. In quantitative domains [such as mass
spectrum analysis (Buchanan and Feigenbaum, 1978)] or synthesized
("toy") domains [such as the line drawings in Hayes-Roth and McDermott
(1977)], a large body of data points is easily assembled. This is not currently
true for many domains; consequently induction techniques cannot be used.
9In such cases interactive transfer of expertise offers a useful alternative.
Knowledge acquisition in context appears to offer useful guidance
wherever knowledge of the domain is as yet ill-specified. The context of
the interaction need not be a shortcoming in the knowledge base uncovered
during a consultation, however, as it was here. Our recent experience sug-
gests that an effective context is also provided by examining certain subsets
of rules in the knowledge base and using them as a framework for speci-
fying additional rules. The overall concept is limited, however, to systems
that already have at least some minimal amount of information in their
knowledge bases. Prior to this, there may be insufficient information to
provide any context for the acquisition process.
Finally, the rule-at-a-time approach is a limiting factor. The example
given earlier works well, of course, because the bug was manufactured by
removing a single rule. In general, acquiring a single rule at a time seems
well suited to the later stages of knowledge base construction, in which
bugs may indeed be caused by the absence of one or a few rules. Weneed
not be as lucky as in the example, in which one rule repaired three bugs;
the approach will also work if three independent bugs arise in a consul-
tation. But early in knowledge base construction, when large subareas of
a domain are not yet specified, it appears more useful to deal with groups
of rules or, more generally, with larger segments of the basic task [as in
Waterman (1978)].
In general then, the interactive transfer of expertise approach seems
well suited to the later stages of knowledge base construction for systems
pertorming high-level tasks, and offers a useful technique for domains
where extensive sets of" data points are not available.

9Where tile atttonomous induction technique can be used, it offers the interesting advantage
that tile knowledge we expect the system to acquire need not be specified ahead of time,
indeed not even known. Induction programs are in theory capable of inducing new infor-
mation (i.e., information unknownto their author) from their set of examples. Clearly, the
interactive transfer of expertise approach requires that the expert knowand be able to specify
precisely what it is the programis to learn.
202 InteractiveTransferof Expertise

9.8.2 The Program

Several difficult problems remained unsolved in the final implementation


of the program. There is, for instance, the weakness of the technique of
natural language understanding. There is also an issue with the technique
used to generate the rule models. Model generation could be made more
effective even without using a different approach to concept fi)rmation.
Although an early design criterion suggested keeping the models trans-
parent to the expert, making the process interactive would allow the expert
to evaluate new patterns as they were discovered by TEIRESIAS. This
might make it possible to distinguish accidental correlations from valid
interrelations and might increase the utility and sophistication of TEIRE-
SIASs second-guessing ability. Alternatively, more sophisticated concept
formation techniques might be borrowed from existing work.
There is also a potential problem in the way the models are used. Their
effectiveness both in guiding the parsing of the new rule and in second-
guessing its content is dependent on the assumption that the present
knowledge base is both correct and a good basis for predicting the content
of future rules. Either of these can at times be false, and the system may
then tend to continue stubbornly down the wrong path.
There is also the difficult problem of determining the impact of any
new or changed rule on the rest of the knowledge base, as discussed in
Chapter 8, which we have considered only briefly. One difficulty (avoided
in the work described in Chapter 8) involves establishing a formal defini-
tion of inconsistency for inexact logics, such as CFs (see Chapter 11), since,
except for obvious cases (e.g., two identical rules with different strengths),
it is not clear what constitutes an inconsistency. Once the definition is es-
tablished, we would also require routines capable of uncovering them in a
large knowledge base. This can be attacked by using an incremental ap-
proach (i.e., by checking every rule as it is added, the knowledge base is
kept consistent and each consistency check is a smaller task), but the prob-
lem is substantial.

9.9 Conclusions
Each of the ideas reviewed above offers some contribution toward achiev-
ing the two goals set out at the beginning of this chapter: the development
of a methodology of knowledge base construction via transfer of expertise,
and the creation of an intelligent assistant to aid in knowledgeacquisition.
These ideas provide a set of tools and ideas to aid in the construction of
knowledge-based programs and represent some new empirical techniques
of knowledge engineering. Their contribution here may arise from their
Conclusions 203

potential utility as case studies in the development of a methodology for


this discipline.
Knowledgeacquisition in the context of a shortcomingin the knowledgebase,
for instance, has proved to be a useful technique for achieving transfer of
expertise, offering advantages to both the expert and TEIRESIAS.It of-
fers the expert a framework for the explication of a new chunk of domain
knowledge. By providing a specific example of the performance programs
operation and forcing the expert to be specific in his or her criticism, it
encourages the formalization of previously implicit knowledge. It also en-
ables TEIRESIASto form a number of expectations about the knowledge
it is going to acquire and makes possible several checks on the content of
that knowledge to ensure that it would in fact fix the bug. In addition,
because the system has a model of its own knowledge, it is able to determine
whether a newly added piece of knowledge "fits into" its existing knowledge
base.
A second contribution of the ideas reviewed above lies in their ability
to support a number of intelligent actions on the part of the assistant.
While those actions have been demonstrated for a single task and system,
it should be clear that none of the underlying ideas are limited to this
particular task or to associative triples or rules as a knowledgerepresen-
tation. The toundation for many of these ideas is the concept of meta-level
knowledge, which has made possible a program with a limited form of
introspection.
The idea of model-based understanding, for instance, found a novel ap-
plication in the fact that TEIRESIAShas a model of the knowledge base
and uses this to guide acquisition by interpreting the model as predictions
about the information it expects to receive.
The idea of biasing the set of models to be considered offers a specific
mechanism for the general notion of program-generated expectations and
makes possible an assistant whose understanding of the dialogue is more
effective.
TEIRESIASis able to second-guess the expert with respect to the
content of the new knowledge by using its models to see how well the new
piece of knowledgefits into" what it already knows. An incomplete match be-
tween the new knowledge and the systems model of its knowledge prompts
it to make a suggestion to the expert. With this approach, learning becomes
more than simply adding the new information to the knowledge base;
TEIRESIASexamines as well the relationship between the new and exist-
ing knowledge.
The concept of meta-level knowledge makes possible multiple uses of
the knowledge in the system: information in the knowledge base is not only
used directly (during the consultation) but also examined and abstracted
to form the rule models.
TEIRESIASalso represents a synthesis of the ideas of model-based
understanding and learning by experience. Although both of these have
been developed independently in previous AI research, their combination
204 InteractiveTransferof Expertise

produces a novel sort of" feedback loop (Figure 9-10). Rule acquisition relies
on the set of" rule models to effect the model-based understanding process.
This results in the addition of a new rule to the knowledge base, which in
1
turn prompts the recomputation of the relevant rule model(s).
This loop has a number of" interesting implications. First, performance
on the acquisition of the next rule may be better because the systems
"picture" of its knowledge base has improved--the rule models are now
computed from a larger set of" instances, and their generalizations are more
likely to be valid. Second, since the relevant rule models are recomputed
each time a change is made to the knowledge base, the picture they supply
is kept constantly up to date, and they will at all times be an accurate
reflection of the shifting patterns in the knowledge base. This is true as
well for the trees into which the rule models are organized: they too grow
(and shrink) to reflect the changes in the knowledge base.
Finally, and perhaps most interesting, the models are not handcrafted
by the system architect or specified by the expert. They are instead formed
by the system itself, and formed as a result of its experience in acquiring
rules from the expert. Thus, despite its reliance on a set of models as a
basis for understanding, TEIRESIASsabilities are not restricted by the
existing set of" models. As its store of knowledge grows, old models can
become more accurate, new models will be formed, and the systems stock
of knowledge about its knowledge will continue to expand. This appears
to be a novel capability for a model-based system.

IThe modelsare recomputedwhenany changeis madeto the knowledgebase, including


rule deletionor modification,
as wellas addition.
205

a Q
) 0 z
)

)

t.l..
PART FOUR

Reasoning Under
Uncertainty
10
Uncertainty and Evidential
Support

As we began developing the first few rules for MYCIN,it became clear
that the rules we were obtaining from our collaborating experts differed
from DENDRALs situation-action rules in an important way--the infer-
ences described were often uncertain. Cohen and Axline used words such
as "suggests" or "lends credence to" in describing the effect of a set of
observations on the corresponding conclusion. It seemed clear that we
needed to handle probabilistic statements in our rules and to develop a
mechanism for gathering evidence for and against a hypothesis when two
or more relevant rules were successfully executed.
It is interesting to speculate on why this problem did not arise in the
DENDRAL domain. In retrospect, we suspect it is related to the inherent
complexity of biological as opposed to artificial systems. In the case of
DENDRAL we viewed our task as hypothesis generation guided by rule-
based constraints. The rules were uniformly categorical (nonprobabilistic)
and were nested in such a way as to assure that contradictory evidence was
never an issue.l In MYCIN,however, an overall strategy for nesting cate-
gorical rules never emerged; the problem was simply too ill-structured. It
was possible to tease out individual inference rules from the experts work-
ing with us, but the program was expected to select relevant rules during
a consultation and to accumulate probabilistic evidence regarding the com-
peting hypotheses.
In response to these observations we changed the evolving system in
two ways. First, we modified the rule structure to permit a conclusion to
be drawn with varying degrees of certainty or belief. Our initial intent was
to represent uncertainty with probabilistic weights on a 0-to-1 scale. Sec-
ond, we modified the data structures for storing information. Rather than
simply recording attribute-object-value triples, we added a fourth element
to represent the extent to which a specific value was believed to be true.
This meant that the attribute of an object could be associated with multiple
competing values, each associated with its own certainty weight.

lln the modelof massspectrometryused by DENDRAL, the statistical nature of events is


largely ignoredin favor of binary decisionsaboutoccurrenceor nonoccurrenceof events.

209
210 UncertaintyandEvidentialSupport

It was logical to turn to probability theory in our initial efforts to define


the meaning of these certainty values. Bayes Rule (or Bayes Theorem)
the traditional evidence-combining technique used in most medical diag-
nosis programs, provided a model for how the weights could be manipu-
lated if they were interpreted as probabilities. For reasons that are dis-
cussed in detail in the next chapter, we were gradually led to consider
other interpretations of the numerical weights and to reject a purely prob-
abilistic interpretation of their meaning.
Shortliffe was encouraged by Buchanan, as well as by Professors Pa-
trick Suppes and Byron Brown, who were on his thesis committee, to at-
tempt to formalize the numerical weights rather than to define and com-
bine them in a purely ad hoc fashion. There ensued many months of
reading the literature of statistics and the philosophy of science, focusing
on the theory of confirmation and attempting to understand the psycho-
logical issues underlying the assignment of certainty weights. Chapter 11,
originally published in 1975, summarizes the formal model that ultimately
emerged from these studies. The concept of certainty factors (CFs) was
implemented and tested in MYCINand became a central element of other
EMYCIN systems that have been developed in the ensuing years.
Another source of uncertainty in a knowledge base is the imprecision
in language. Even though the vocabulary of medicine is technical, it is not
without ambiguity. For example, one question asks whether the dosage of
a drug given previously was "adequate." Rules use the answers given in
response to such questions with the assumption that the user and the ex-
pert who wrote the rules agree on the meanings of such terms. What do
we do to help satisfy this assumption? Rule writers are encouraged to
anticipate the ambiguities when formulating their questions. They write
the English forms of the TRANSand PROMPTvalues. Also, they can
supply further clarification in the REPROMPT value, which is printed
when the user types a question mark. MYCIN(and EMYCIN)provides
facilities for experts to clarify their use of" terms, but cannot guarantee the
2elimination of ambiguity.

10.1Analyses of the CF Model

Although the motives behind the CF model were largely pragmatic and
we justified the underlying assumptions by emphasizing the systems ex-
cellent performance (see, [br example, Chapter 31), several theoretical ob-

"Fuzzylogic (Zadeh,1978)quantifies the degreeto whichimpreciseconceptsare satisfied,


thus addinganotherlevel of detail to the reasoning.Forour purposes,it is sufficientto ask
the user whethera concept,such as "adequateness,"is satisfied--wherean appropriatere-
sponsemaybe "Yes(0.7)." In fuzzylogic, a possibilitydistributionfor the usersunderstanding
of the concept"adequate"wouldbe matchedagainst a correspondingdistribution [oi" the
rule writersunderstanding.
Webelievethis is an unnecessary layer of detail fbr the precision
wewantto achieve(or feel is justified bythe precisi6nof the information).
Analyses of the CF Model 211

jections to the model were subsequently raised. Professor Suppes had been
particularly influential in urging Shortliffe to relate CFs to the rules of
conventional probability theory, :~ and the resulting definitions of MBsand
MDsdid help,us develop an intuitive sense of what our certainty measures
might mean. However, the probabilistic definitions also permitted formal
analyses of the underlying assumptions in the combining functions and of
limitations in the applicability of the definitions themselves.
For example, as we note in Chapter 11, the source of confusion be-
tween CF(h,e) and P(h[e) becomes clear when one sees that, for small values
of the prior probabilities P(h), CF(h,e) P(hle ). Our ef fort to ignore pri or
probabilities was largely defended by observing that, in the absence of all
intormation, priors for a large number of competing hypotheses are uni-
formly small. For parameters such as organism identity, which is the major
diagnostic decision that MYCINmust address, the assumption of small
priors is reasonable. The same model is used, however, to deal with all
uncertain parameters in the system, including yes-no parameters for which
the prior probability of one of the values is necessarily greater than or
equal to 0.5.
The significance of the 0.2 threshold used by many of MYCINspred-
icates (see Chapter 5) was also a source of puzzlement to many observers
of the CF model. This discontinuity in the evaluation function is not an
intrinsic part of the CF theory (and is ignored in Chapter 11) but was
added as a heuristic for pruning the reasoning network. 4 If any small
positive CF were accepted in evaluating the premise of a rule, without a
threshold, two undesirable results would occur:

1. Very weak evidence favoring a condition early in the rule premise would
be "accepted" and would lead to consideration of subsequent conditions,
possibly with resulting backward-chained reasoning. It is wasteful to
pursue these conditions, possibly with generation of additional ques-
tions to the user, if" the evidence favoring the rules premise cannot
exceed 0.2 (recall that SANDuses min in calculating the TALLY--see
Chapters 5 and 11 for further details).
2. Even if low-yield backward chaining did not occur, the rule would still
have limited impact on the value of the current subgoal since the
TALLYfor the rule premise would be less than 0.2.

:~Suppes pressed us early on to state whether we were trying to model how expert physicians
do think or how they ought to think. Weargued that we were doing neither. Although we were
of course influenced by information regarding the relevant cognitive processes of experts
[see, for example, the recent books by Elstein et al. (1978) and Kahnemanet al. (1982)],
goals were oriented much more toward the development of a high-performance computer
program. Thus we sought to show that the CF model allowed MYCINto reach good decisions
comparable to those of experts and intelligible both to experts and to the intended user
community of practicing physicians.
4Duda et al. (1976) have examined this discontinuity aud the relationship of CFs to their
Bayesian updating model used in the PROSPECTOR system.
212 UncertaintyandEvidentialSupport

Thus the 0.2 threshold was added for pragmatic reasons and should not
be viewed as central to the CF model itself. In later years questions arose
as to whether the value of the threshold should be controlled dynamically
by the individual rules or by meta-rules (rather than being permanently
bound to 0.2), but this feature was never implemented.
Another important limitation of MYCINscontrol scheme was noted
in the mid-1970s but was never changed (although it would have been easy
to do so). The problem results from the requirement that the premise of
a rule be a conjunction of conditionals with disjunctions handled by mul-
tiple rules. As described in Chapter 5, A V B V C --, D was handled by
defining three rules: A --, D, B ~ D, and C -, D. If all rules permitted
conclusions with certainty, the three rules would indeed be equivalent to a
single disjunctive rule with certain inference (CF= 1). However, with CFs
less than unity, all three rules might succeed fi)r a given case, and then
each rule would contribute incremental evidence in favor of D. This evi-
dence would be accumulated using the CF combining function, that is,
CFcoMBINE,and might be very different from the CF that the expert
would have given if asked to assign a weight to the single disjunctive rule.
This problem could have been handled by changing the rule monitor to
allow disjunctions in a rule premise, but the change was never implemented
because a clear need never arose.
The rule interpreter does not allow rules to be written whose primary
connective is disjunction ($OR). We have encouraged splitting primary
disjunctions into separate rules for this reason. Thus
[1] ($ORABC)
~

would be written as three separate rules:


[2] A ~ D
[3] B ~ D
[4]C ~ D
Conceptually this is simple and straightforward. In some cases, however,
the disjuncts are better understood as a set, and [1] would be a clearer
expression than [2], [3], and [4]. In these cases, Carli Scott has pointed out
that [1] can be rewritten as a primary conjunction wih only one clause:
[5] (SAND ($OR A B C))

This uncovers a limitation on the CF model, however. While [5] should


give the same results as [2], [3], and [4] together, the resulting CFs on
conclusion D will differ. The reason is that in [5] the CF on the rule will
be multiplied by the MAXof the CFs of the disjunction A, B, or C, while
in [2], [3], and [4] the cumulative CF associated with D will be the result
5of combining three products according to the combining function.

5It is possible to force them to give the same result by adjusting the CFs either on [5] or on
[2], [3] and [4]. We would not expect a rule writer to do this, however, nor would we think
the difference would matter much in practice.
Analyses of the CF Model 213

Cumulative CF
1.0

0.3

CF=0.1

4 5 ~ ~ ~ ~ lb
Numberof Rules with the SamePositive CF

FIGURE10-1 Family of curves showing how rapidly MY-


CINs CFcombiningfunction converges for rules with the same
CF.

Another limitation for some problems is the rapidity with which CFs
converge on the asymptote 1. This is easily seen by plotting the family of
curves relating the number of rules with a given CE all providing evidence
for a hypothesis, to the resulting CF associated with the hypothesis. 6 The
result of plotting these curves (Figure 10-1) is that CFcoMB~NE is seen
converge rapidly on 1 no matter how small the CFs of the individual rules
are. For some problem areas, therefore, the combining function needs to
be revised. For example, damping factors of various sorts could be devised

6This was first pointed out to us by Mitch Model, who was investigating the use of the CF
model in the context of the HASP/SIAPprogram (Nii et al., 1982).
214 UncertaintyandEvidentialSupport

(but were not) that would remedy this problem in ways that are meaningful
for various domains. In MYCINsdomain of infectious diseases, however,
this potential problem never became serious. In PROSPECTOR this prob-
lem does not arise because there is no finite upper limit to the likelihood
ratios used.
As we were continuing to learn about the CF model and its implica-
tions, other investigators, faced with similar problems in building medical
consultation systems, were analyzing the general issues of inexact inference
(Szolovits and Pauker, 1978) and were in some cases examining shortcom-
ings and strengths of CFs. Later, Schefe analyzed CFs and fuzzy set theory
(Schefe, 1980). Dr. Barclay Adams, a memberof the research staff at the
Laboratory of Computer Science, Massachusetts General Hospital, re-
sponded to our description of the MYCINmodel with a formal analysis
of its assumptions and limitations (Adams, 1976), included in this book
Chapter 12. The observations there nicely specify the assumptions that are
necessary if the CFs in MYCINsrules are interpreted in accordance with
the probabilistic definitions from Chapter 11. Adamscorrectly notes that
there may be domains where the limitations of the CF model, despite their
minimal impact on MCINs performance, would seriously constrain the
models applicability and success. For example, if MYCIN had required a
single best diagnosis, rather than a clustering of leading hypotheses, there
would be reason to doubt the models ability to select the best hypothesis
on the basis of a maximal CE
Even before the Adams paper appeared in print, many of the same
limitations were being noted within the MYCINproject. For example, in
January of 1976 Shortliffe prepared an extensive internal memothat made
several of the same observations cited by Adams.7 He was aided in these
analyses by Dana Ludwig, a medical student who studied the CF model in
detail as a summer research project. The Shortliffe memooutlined five
alternate CF models and argued for careful consideration of one that
would require the use of a priori probabilities of hypotheses in addition to
the conventional CFs on rules. The proposed model was never imple-
mented, however, partly due to time constraints but largely because
MYCINsdecision-making performance was proving to be excellent despite
the theoretical limitations of CFs. Someof us felt that a one-number cal-
culus was preferable in this domain to a more theoretically sound calculus
that requires experts to supply estimates of two or more quantities per
rule. It is interesting to note, however, that the proposals developed bore
several similarities to the subjective Bayesian model developed at about the
same time for SRIs PROSPECTOR system (Duda et al., 1976). The
model has been used successfully in several EMYCIN systems (see Part
Five) and in the IRIS system (Trigoboff, 1978) developed at Rutgers Uni-
versity for diagnosing glaucomas.

7Thisis the file CEMEMO referred to by Clanceyin the exchangeof electronic messagesat
the end of this chapter.
Evolutionof the CFModel 215

There is an additional element of uncertainty in rules that is also


bound up in the CFs. Besides capturing some measure of increased prob-
ability associated with the conclusion after the premises are known and
some measure of the utility associated with the conclusion, the CF also
includes some measure of how "flaky" the rule is. That is, a CF of 0.2 can
indicate that the probability increases by 20%(rather precisely) or that the
rule writer felt there was a positive association between premises and con-
clusion but was only 20%certain of it. Somerule writers would be able to
quantify their degree of doubt about the CFs (e.g., "I am about 90%certain
that this strength of association is 0.5"), but there is no provision in our
CF model for doing so. In most cases where increased precision is possible,
rule writers would have prior and posterior probabilities and would not
need a one-number calculus.
Despite the shortcomings of the CF model, it must be recognized that
the issues we were addressing reflected a somewhat groping effort to cope
with the limitations of probability theory. It has therefore been with con-
siderable interest that we have discovered in recent years the work of
Dempster and Shafer. Shafers book, The Mathematical Theory of Evidence,
appeared in 1976 and proposed solutions to many of the same problems
being considered in the MYCINwork. Several aspects of the CF model
appear as special cases of" their theory. Interestingly, Bayesian statistics is
another special case. Our recent attempt to understand the Dempster-
Sharer model and its relevance to MYCIN is described in Chapter 13. This
work, the most recent in the book, was largely done by Jean Gordon, a
mathematician who recently joined our group when she came to Stanford
as a medical student. Because of new insights regarding the topics under-
lying CFs and the relationships to probabilistic reasoning, we have chosen
to include that analysis in this volume even though we have not imple-
mented the ideas in the program.

10.2Evolution of the CF Model

Ahhough the model described in Chapter 11 has persisted to the present


for the MYCINprogram, and for other EMYCIN systems (see Part Five),
a few revisions and additional observations have been made in the inter-
vening years. The only major change has been a redefinition of the com-
bining function by Bill van Melle. This was undertaken for two reasons:

1. the potential for a single piece of negative evidence to overwhelmsev-


eral pieces of positive evidence (or vice versa); and
2. the computational expense of storing both MBs and MDs(rather than
cumulative CFs) in order to maintain commutativity.
216 UncertaintyandEvidentialSupport

The second of these points is discussed briefly in Chapter 11, but the first
may require clarification. Consider, for example, eight or nine rules all
supporting a single hypothesis with CFs in the range 0.4 to 0.8. Then the
asymptotic behavior of the cumulative MBwould result in a value of about
0.999. Suppose now that a single disconfirming rule were to succeed with
CF = 0.8. Then the net support for the hypothesis would be

CF = MB - MD = 0.999 - 0.8 = 0.199

This behavior was counterintuitive and occasionally led MYCINto reach


incorrect inferences, especially in situations where the final CFafter tracing
became less than 0.2. This would drop the final belief below the established
threshold. Hence a single piece of negative evidence could overwhelm and
negate the combined evidence of any number of supporting rules.
As a result, we changed both the definition of a CF and the corre-
sponding combining function to soften the effect:

MB - MD
CF=
1 - min(MB,MD)

X + Y(I - X) X, Y both > 0

X+Y
CFCOMBINE(X,Y) one ofX, Y< 0
1 - min(lXl, IY[)
-- CFcoMBINE(
-X, - Y) X, Y both < 0

Note that the definition of CF is unchanged for any single piece of evidence
(where either MDor MBis zero by definition) and that the combining
function is unchanged when both CFs are the same sign. It is only when
combining two CFs of opposite sign that any change occurs. The reader
will note, for example, that

CFcoMBINE(0.999,-- 0.80) = 0.199/0.2 = 0.99

whereas

CFcoMBINE(0.55
,- 0.5) = 0.05/0.5 = 0.1

In addition, the change in CFcoMBIN E preserved commutativity without


the need to partition evidence into positive and negative weights for later
combination. Thus, rather than storing both MBand MDfor each hy-
pothesis, MYCIN simply stores the current cumulative CF value and com-
bines it with new evidence as it becomes available. Beginning in approxi-
mately 1977 these changes were incorporated into all EMYCIN systems.
Assessing the CF Model 217

10.3Assessing the CF Model

Even before the change in the combining function was effected, we had
observed generally excellent decision-making performance by the program
and therefore questioned just how sensitive MYCINsdecisions were to the
CFs on rules or to the model for evidence accumulation. Bill Clancey (then
a student on the project) undertook an analysis of the CFs and the sensi-
tivity of MYCINsbehavior to those values. The following discussion is
based in large part on his analysis and the resulting data.
The CFs in rules reflect two kinds of knowledge. In some cases, such
as a rule that correlates the cause of meningitis with the age of the patient,
the CFs are statistical and are derived from published studies on the in-
cidence of disease. However, most CFs represent a mixture of probabilistic
and cost/benefit reasoning. One criticism of MYCINsrules has been that
utility considerations (in the decision analytic sense) are never madeexplicit
but are "buried" in a rules CEFor example, the rule that suggests treating
for Pseudomonasin a burned patient is leaving out several other organisms
that can also cause infection in that situation. However, Pseudomonasis a
particularly aggressive organism that often causes fatal infections and yet
is resistant to most commonantibiotics. Thus its "weight" is enhanced by
rules to ensure that it is adequately considered when reaching therapy
decisions, s Szolovits and Pauker (1978) have also provided an excellent
discussion of the issues complicating the combination of decision analytic
concepts and categorical reasoning in medical problems.
Figure 10-2 is a bar graph showing how frequently various CF values
occur in MYCINsrules. All but about 60 of the 500 rules in the most
recent version of the system have CFs.9 The cross-hatched portion of each
bar shows the frequency of CFs in the 1975 version of MYCIN,when
there were only 200 rules dealing with bacteremia. The open portion of
each bar refers to the CFs of incremental rules since that time, most of
which deal with meningitis. The overall pattern is about the same, although
the more recent system has proportionally more small positive CFs. This
makes sense because the newer rules often deal with softer data (clinical
evidence) in contrast to the rules for bacteremia, which generally interpret

8Self-referencing rules, described in Chapter 5, were often used to deal with such utility
considerations. As mentioned in Chapter 3, they allowed dangerous organisms, initially sug-
gested with only minimal certainty, to be reconsidered and further confirmed by special
evidence. For example:/fyou are already considering Pseudomonasand the patient has ecthyma
gangrenosum skin lesions, then there is even greater importance to the conclusion that the
pathogen is Pseudomonas.
9The rules without CFs do not associate evidence with hypotheses but make numerical com-
putations or save a text string to be printed later. Note also that some rules, particularly
tabular rules, make manyconclusions and thus account for the fact that there are more CFs
than rules.
218 UncertaintyandEvidentialSupport

+.5

NUMBEROF CONCLUSIONS

FIGURE10-2 Frequency of CFs in MYCINsrules. Cross-


hatchedbars indicate frequencies for the 1975 version of MY-
CIN. Openbars show frequencies since then.

more concrete laboratory results. The bimodal distribution with peaks at


0.8 and 0.2 (ignoring for a momentthose rules, often definitional, that
reach conclusions with certainty) suggests that experts tend to focus on
strong associations ( + 0.8, a number that might seem less binding than 0.9)
and many weak associations (+0.2, the minimumCF that will allow the
inferred parameter to exceed the threshold for partial belief). In contrast
there are relatively few rules with negative CFs. Wesuspect this reflects
the natural tendency to state evidence in a positive way.
Analysis of MYCINsreasoning networks suggests that the program
should not be very sensitive to changes in rule CFs. This conclusion is
based on two observations about how CFs are actually used in the program.
First, inference chains are short, and premises often pass a TALLYof 1.0
to the conclusion (see Chapter 5), so the effect of multiplying CFs from
one step in the chain to the next is minimal. Second, conclusions are fre-
quently made by only a single rule, thereby avoiding the use of CFcoMu~N~:
for all but a few key parameters. Observe that the first effect deals with
combination of CFs from goal to goal (by passing a value from a rule
premise to the conclusion) and the second deals with combination of evi-
dence for a single goal.
Intrigued by observations such as those outlined above, Clancey en-
listed the assistance of" Greg Cooper, and in 1979 they undertook an ex-
periment to determine quantitatively how sensitive MYCINis to changes
in rule CFs. The ten cases used in the formal evaluation of the meningitis
rule set (see Chapter 31) were used for this study. The cases were run
batch mode using systematic variations of the CFs in MYCINsrules. For
Assessing the CFModel 219

Number
of cases (out of 10)
Number Same Different Different
of organisms organisms organisms
intervaL~ and therapy and therapy
I0 9 1 0
5 7 3 0
4 8 2 1
3 5 5 1
2 1 9 3

FIGURE
10-3 Results of CFsensitivity experiment.

each run, rules were modified by mapping the existing rule CFs onto a
new, coarser scale. The original CF scale has 1000 intervals from 0 to
1000. H) Trials were run using ten, five, four, three, and two intervals. Thus,
when there are five intervals, all rule CFs are mapped onto 0, 200, 400,
600, 800, and 1000, rounding as necessary. Whenthere are two intervals,
only the numbers 0, 500, and 1000 are used.
CFs were combined using the usual combining function (the revised
version that was in use by 1979). Thus intermediate conclusions mapped
onto arbitrary numbers from 0 to 1000. Clustering the final organism list
was done in the normal way (cutting off at the largest gap). Finally, negative
CFs were treated analogously, for example, mapping onto 0, - 333, - 666,
and - 1000 when there were three intervals.
In examining results, we are interested primarily in three possible
outcomes: (1) no change to the item list (and hence no change in therapy);
(2) different organisms, but the same therapy; and (3) new therapy
therefore different organisms). Figure 10-3 summarizes the data from the
ten cases run with five different CF scales.
Degradation of performance was only pronounced when the number
of intervals was changed to three (all rule CFs mapped onto 0, 333, 666,
and 1000). But even here five of the ten cases had the same organism list
and therapy. It wasnt until CFs were changed to 0, 500, and 1000 that a
dramatic change occurred; and even with nine new organism lists, we find
that seven of" the ten cases had the same therapy. The fact that the organism
list did not change radically indicates that MYCINsrule set is not "fine-
tuned" and does not need to be. The rules use CFs that can be modified
by -+ 0.2, showing that there are few deliberate (or necessary) interactions
in the choice of CFs. The observed stability of therapy despite changing
organism lists probably results because a single drug will cover for many
organisms, a property of the domain.

lCFsare handledinternally ona 0 to 1000scale to avoidfloating-pointarithmetic,whichis


moreexpensivein Interlisp than is integerarithmetic.
220 UncertaintyandEvidentialSupport

10,4 Additional Uses of Certainty Factors

By the early 1980s, when much of our research was focusing on issues
other than EMYCIN systems, we still often found CFs to be useful com-
putational devices. One such example was the work of Jerry Wallis, de-
scribed in detail in Chapter 20. His research modeled causal chains with
rules and used CFs to represent the uncertainty in the causal links. Because
his system reasoned both from effects to causes and from causes to effects,
techniques were needed to prevent fruitless searching of an entire con-
nected subgraph of the network. To provide a method for search termi-
nation, the concept of a subthreshold path was defined, i.e., a path of
reasoning whose product of CFs can be shown to be below the threshold
used to reject a hypothesis as unknown. For example, if there is a linear
reasoning path of four rules (R1, R2, R3, and R4) where A can be asked
of the user and E is the goal that initiated a line of backward-chained
reasoning:

R1 R2 R3 R4
A ~B
~ ~C
~ ~D ~E
.8 .4 .7 .7

then if B were known with certainty, E would be known only with a CF of


(0.4)(0.7)(0.7) = 0.19. This is less than the conventional cutoff of 0.2
in EMYCIN systems, so the line of reasoning from B to E would be con-
sidered a subthreshold path. There is no need to invoke rule R1 and ask
question A in an effort to conclude B because the result cannot affect the
final value for the variable E. If the product of CFs is tabulated during
the backward-chaining process, the accumulated value provides a method
for limiting the search space that needs to be investigated.
In a branched reasoning tree this becomes slightly more complex. Nor-
mally, when a rule is used to conclude a value with a particular CF, that
number is stored with the parameters value in case it is later needed by
other rules. In the example above, termination of the search from E back
to A (due to the subthreshold condition at B) would have left the value
C "unknown" and might have left a CF of 0 stored at that node. Suppose,
though, that another rule, R5, later needed the value of C because of
consideration of goal F:
R4
bE
.7
R1 R2 RJ
A bB I=C
.8 .4 \R5

F
AnElectronicExchangeRegardingCFs 221

It would be inappropriate to use the unknown value of C stored from the


previous inference process, for now it would be appropriate to back-chain
further using R1 (the higher CF of 0.9 associated with R5, compared to
the composite CF of 0.49 associated with the chaining of rules R3 and R4,
keeps the path to A from being subthreshold this time). Thus, if one wants
to use previous results only if they are appropriate, it is necessary to store
the "vigor" with which a value was investigated along with its CE Wallis
proposed that this be computed by multiplying the CFs from the goal (in
this case E) through the value in question. Then, when a node is investi-
gated for a second time via an alternate reasoning chain, this measure of
vigor, or investigation strength, can be used to determine whether to inves-
tigate the node further. If the stored investigation strength is greater than
the investigation strength of the new reasoning chain, the old value can be
used. Otherwise the backward-chaining process must be repeated over a
larger portion of the search space.
Although there is further complexity in these ideas developed by Wal-
lis, the brief discussion here shows some of the ways in which concepts
drawn from the CF model have been broadened in other settings. Despite
the theoretical limitations discussed above and in the subsequent chapters,
these concepts have provided an extremely useful tool for dealing with
issues of inexact inference in the expert systems that we have developed.

10.5 An Electronic Exchange Regarding CFs

Weclose this chapter with a series of informal electronic mail messages


that were exchanged by some members of our research group in 1976
(Carli Scott, Bruce Buchanan, Bill Clancey, Victor Yu, and Jan Aikins).
Victor was developing the meningitis rule set at the time and was having
frequent problems deciding what CFs to assign to individual rules and
how to anticipate the ramifications of any decisions made. The messages
are included in their entirety. Not only do they provide insight into the
way that our ideas about CFs evolved through a collaborative effort over
many years, but they are also representative of the kinds of dialogues that
occurred frequently among members of the project. Because many of the
ideas in this book evolved through such interchanges, we felt it was ap-
propriate to provide one verbatim transcript of a typical discussion. The
ideas expressed were fresh at the time and not fully worked out, so the
messages (and Clanceys closing memo) should be seen as examples
project style rather than as an exposition of the "last word" on the topics
discussed.
222 Uncertainty and Evidential Support

Date: 26 Feb 1976


From: Scott
Subject: Summaryof discussion
To: MYCIN gang
This is a summaryof what I think came out of yesterdays meeting. Please
read it and send me comments, objections, etc.
1) Victor [Yu] has assigned certainty factors to his rules based on the
relative strengths of the evidence in these rules. While trying to find a nu-
merical scale that would work as he wanted it to with the systems 0.2 cutoff
and combining functions, he had to adjust certainty factors of various rules.
Nowthat this scale has been established, however, he assigns certainty factors
using this scale, and does NOTadjust certainty factors of rules if he doesnt
like the systems performance. Furthermore, he does NOcombinatorial anal-
ysis before determining what CFto use; he is satisfied that using the scale he
has devised, the systems combining function, and the 0.2 cutoff, the program
will arrive at the right results for any combinationof factors, and if it doesnt,
he looks for missing information to add.
2) Assuming that the parameters IDENT and COVERFOR are disam-
biguated in Victors set of rules, Ted [Shortliffe] believes the CFs that Victor
uses in his rules, and approves of the idea of using a cutoff for COVERFOR
since this is what weve been doing with bacteremia (since it is a binary de-
cision, a cutoff makes sense for COVERFOR).Furthermore, this is quite
similar to what clinicians do: they accumulate lots of small bits of clinical
evidence, then decide if the total is enough to make them cover [or a partic-
ular organism--independent of what the microbiological evidence suggests.
3) Bruce [Buchanan] and BC[Bill Clancey] still object to Victors CFs
because they seem too precise (since he is working in the 0 to 0.2 range).
claim is that he really isnt making numbers more precise, the difference in
CFs from one strength to the next is 0.05 (i.e., the classes of rules he has are
assigned CFs 0.05, 0.1, 0.15, 0.2, 0.25 .... ). This is no finer a distinction than
weve had in the past--we have rules with CF 0.2, 0.25, 0.3, 0.35. I dont see
why the smaller absolute values of the CFs Victor uses makes much differ-
ence; the rules have muchsmaller strengths than any rules weve had before,
so they should have smaller CFs.
4) There seems to be concern because Victor believes in his CFs, and
relies on them to combine in the right way. In the past, we never dealt with
this type of accumulation of small bits of information that would combine to
give either enough total info or not (though I believe CFs were designed to
handle just such coml)inations). Since Victor has defined guidelines on de-
ciding how strong the evidence must be in order ti)r a rule to be assigned
certain CF, and since he has tested these guidelines within the framework of
MYCINscombining functions, he believes that it all works as it should. Fur-
thermore, he believes that he can define these "points of reference" so that
future medical people can add rules, using the same guideline that Victor
has used, and they should fit into the system and work fine with his rules.
5) I am satisfied with what Victor is doing, and would like to try Teds
suggestion of separating COVERFOR from IDENTin Victors system. I be-
lieve the result of this would be that the program would continue to perform
very well on meningitis patients, and Ted, Victor, (I believe) Larry [Fagan],
An Electronic ExchangeRegarding CFs 223

and I would all be happy with the results. I think points (3) and (4) above
sum up other peoples objections that might remain. If this is so, what are
suggestions from people who still arent happy with the model? Is everyone
satisfied with everything now? Are there more objections that I missed? Have
I completely misunderstood something? Have I completely misunderstood
everything? Please let me know what you think so we can start to work out
problems that might remain.

Carli

Date: 27 Feb 1976


From: Buchanan
To: MYCIN gang

Carli,
Thanks for your summary--it appears to be correct in almost every de-
tail. I would like you to try separating COVERFOR and IDENTas soon as
possible since that is needed for bacteremia anywayand is a help in clarifying
the conceptual basis on which the program makes a recommendation. I also
think that everyone will be happy with the results, especially me if it brings
the knowledge bases into a common framework.
Myconcern is I would also like you to begin working on the rerepresen-
tation of the context tree to help us with time relations and the infection-
organism link. As Ted described it, you and he have pretty well worked things
out. Because it is necessary for the FOREACH ll mechanism and is desirable
for manyother reasons, I would like us not to delay it. Do you see problems
with this?
As I tried to say yesterday, my reservations with the meningitis system
stem from my uneasiness with the CF model, which we all know needs im-
proving (which Pacquerette [a visiting student from France] was starting, but
wont finish). I dont want Victor to becomedependent on a particular mech-
anism fbr combining CFs--because we hope the mechanism will be improved
soon. I have no doubt that the rules work welt now, and I dont disagree at
all with the need for firm reference points for the CFs.
As soon as COVERFOR and IDENT are separate, could you try the
meningitis patients again, enlisting whatever help you need? Then well be
able to decide whether that meets all our specs. After that we can be working
on the context tree and time problems while Victor continues development
on the medical side. I foresee no difficulty in mapping the CFs from existing
rules (meningitis as well as bacteremia) into whatever numbers are appro-
priate for a new CF model when we have one--with firm reference points if
at all possible.
Bruce

PS: 1 think a reference point for defining how strongly suggestive some
evidence is for a conclusion is easier when almost all conclusions are about
identities o[ organisms that should be treated for. In bacteremia the rules
conclude about so manydifferent things that it is harder--but no less desir-

I IFOREACH
is a quantification primitive in rules.
224 Uncertainty and Evidential Support

able--to be precise about what "weakly suggestive" and "strongly suggestive"


mean.

Date: 27 Feb 1976


From: Clancey
To: MYCIN gang
Your summarization of the meeting was excellent. Here ! will go into
more detail about the problem with Victors choice of certainty factors.
Your claim that Victors preciseness in selecting CFs is not different from
the distinctions made in the past ignores my wariness about the RANGE in
which he is being precise. Your examples (0.25, 0.3, 0.35) are greater than
0.2, the range in which I showed that the current system is insensitive to even
large variation of the CFs chosen. (That is, a change in the range of _+ 0.2
does not affect system performance (rule invocation and success), as long
the numbers are > 0.2.) The area in which Victor is working which is both-
ersome is <0.2 (your examples: 0.05, 0.1, 0.15, 0.2). What Bruce was saying,
I believe, was that accumulation of evidence in this area is going to affect
very much the invocation and success of rules. It is in this range that
CHANGES to the CF of a rule for purposes of adjusting system performance
violate the principle of a rule being a modular, independent chunk of knowl-
edge.
Now,first, Victor tells us that he does not make these adjustments.Rather,
he is assigning numbers according to a consistent scale about belief which he
has devised in his subdomain. I am very pleased to hear this, and am in full
agreement with his claim that such a scale is necessary and should be defined
for ALL rules in MYCIN.
What remains disturbing is the certainty factor model itself. Here we
have no sure intuition about the performance meaning of 0.05 as opposed
to 0.1, yet we are assigning them as if they were significantly different from
one another. It is clear to everyone working on the CF model, I believe, that
we need a combining function that will make use of these numerical repre-
sentations of subjective distinctions. For example, I would expect a good
model to take as manypieces of 0.1 evidence as Victor deemssignificant, i.e.,
makes a condition (parameter value) "true," and bumps the conglomeration
above 0.2. The problem here is that I DONOTexpect Victor or anyone to
be able to assign facts a weighting that is independent of the entire context.
That is, the 0.i that comes from Rule 371 for CATEGORY FUNGUSmay
combine (in Victors mind) with the conclusion in Rule 372 of the same value
to give a feeling of the CATEGORY ACTUALLY BEING FUNGUS>0.2, so
SAMEsucceeds. But perhaps the same CF value combination coining from
Rule 385 DOESNOTmake tot belief in the conclusion (NOT>0.2). It seems
entirely conceivable in my mind that Victor would find some combination of
rule successes to be completely nonsensical. So, he would not knowwhat to
make of it at all, and would almost certainly not make the same conclusions
as he would if he looked at each set of premise clauses independently.
I am saying here that rules that break observations into manysmall parts,
resulting in CFs <0.2 intended to combine to form an accumulated obser-
vation, ignore the total perspective, which says, "Hey, wait a minute, these 6
clauses cant appear together: why was she given corticosteroids if she has
An Electronic ExchangeRegarding CFs 225

XX? This doesnt mean FUNGUSto me; no, I want to know why that pre-
scription was made." This same criticism does not apply with the same force
to manyrules with CF >0.2 because they bring together a "more significant
set of facts." They do this by capturing (often disjoint) pictures of the world
that in themselves MAKE SENSE.I do not at all understand how a rule can
be written that can at once stand on its own and yet NOTbe significant truth
(i.e., believable observation, tangible conclusion). It is mysuspicion that Vic-
tor has not built a system in which EVIDENCE combines plausibly, but rather
a system in which independent rules SUCCEEDTOGETHERto make a
conclusion that could be expressed as a single rule, and WOULD have to be
expressed that way to have a CF > 0.2.
Now, Victor has said that he could have combined these rules!to give a
body of rules in which these same small observations appear together, thus
yielding larger CFs. However, he believes that this would result in far more
rules (to allow for the cross product of occurrences), and he would not
sure that he had covered all of the possible cases. Well, certainly, with respect
to the latter, we can tell him if the larger set covers all of the various com-
binations. The question of having far more rules is, I suppose, a valid con-
cern. But at least then we could feel sure that only the PLAUSIBLE obser-
vations had been combined.
To summarize, we talk about accumulating "lots of small bits of clinical
evidence," but I do not understand how a bit of EVIDENCE could be NOT-
KNOWN (the definition of CF < = 0.2). To me, evidence gathered by a rule
should be an all-or-nothing thing--if something more is needed to make the
parameter KNOWN [i.e., CF > 0.2], then I expect that there is something
to be made explicit in the rule. This is the only way in which I can interpret
the notion of a discrete cutoff at 0.2. Above that point I know something;
below it I know nothing (NOTKNOWN). The only plausible explanation
have for Victors small CFs is that they are like tags that record an observa-
tion. It would make me much happier to see each of these CFs changed to
NOTICEDE with definite (= 1) CFs. Then these parameters could be com-
bined with evidence garnered from lab rules.
I would be happy to hear other opinions about the 0.2 cutoff and its
meaning for rule CFs.

Bill

Date: 28 Feb 1976


From: Aikins
Subject: On Wednesdays meeting
To: MYCIN gang

There are three things that I feel we should consider in our discussions
that have not yet been mentioned. The first is a concern about knowledge
acquisition. I feel that whatever we decide, the MYCIN acquisition module
should be designed so that a recognized medical expert could, without too
much difficulty, add a new rule or other piece of knowledge to the MYCIN
data base. I wonder if a doctor in Boston would be able to add a meningitis
rule to MYCINwithout hurting the performance of Victors system. I got
the impression that Victors system was somewhat fragile in this regard. I
doubt that he would want to give up the ability to easily add medical knowl-
296 Uncertainty and Evidential Support

edge to MYCIN.I fear that we would be doing just that. (This problem
includes the question of maintaining rule modularity.)
Mysecond concern is that even if we carl define fairly well what we mean
by 0.7, 0.5, anything above 0.2, (I.2, etc., it seems that the next problem will
be to define 0.25, 0.225, 0.175, 0.5, etc. Wecould continue this defining of
CFs in smaller and smaller intervals forever. However, I doubt that medical
science is exact enough for us to be able to do this.
This brings us to my third concern. In my recent meeting with Dr. Ken
Vosti [a professor in Stanfords Division of Infectious Diseases], he stated a
problem, already familiar to most of us, that even if we could reach agree-
ment among the infectious disease experts at Stanford as to the "right" CFs
to put on our rules, the infectious disease experts on the East Coast and other
places would probably not agree with us. Nowlets take this one step further.
Say we are able to assign fairly straightforward meanings to our CFs. Now
we have the problem of a doctor in some other part of the country who
doesnt want to use MYCINbecause our CFs dont agree with what he would
use. In other words, by defining our CFs at all rigorously, were inviting
disagreement. So, concerns two and three are saying that we can never define
each number on the 0 to 1.0 scale, and if we could, that might not be such
a good idea anyway.
I have no solutions to offer at this time, but I hope everyone will keep
these concerns in mind. I feel that CFs are designed to give doctors who
read and write the rules a certain "commonsense referent" as to how valid
the rule might be. If CFs become more important than that, I fear we will
use too muchof our medical expertise in deciding on the "right" CF for each
rule, time that could be used to add more medical knowledge to the MYCIN
data base.

Jan

Date: 29 Feb 1976


From: Yu
Subject: On Wed. meeting and Clancey
To: Clancey, Scott
cc: MYCINgang

Bill,

1. Whyis the system insensitive to CF? Certainly, this is not true for the
meningitis rules.
2. Your point about plausible situations is a good one, and deserves fur-
ther amplification and discussion. The reason I have "separated" the number
of premises that in the bacteremia rules would have been combined is that I
believe they are independent premises. I dont believe I ever said the reason
for separating them is to avoid having too many rules; the reason for sepa-
rating them is to cover a number of subtle clinical situations that would
otherwise not have been considered. More on this later.
3. Finally, I should add that the 0.2 cutoff was selected because it is the
one being used for SIGNIFICANCEand I thought it would best mesh with
the current system. I must admit that 1 am surprised at the furor it has
An Electronic ExchangeRegarding CFs 227

evoked; if you wish to use some other cutoff, thats fine with me--the CFs
could be easily adjusted.
4. I didnt understand a few of the points you raised, so I look forward
to the next meeting.
Finally, I should say that the system that I have proposed is not meant
in any way to replace the current bacteremia rules; it was merely a simple,
practical way to handle meningitis. I did not feel the approach used in bac-
teremia was precise enough to handle meningitis.
Victor
Date: 29 Feb 1976
From: Yu
Subject: On Wed. meeting and Aikins
To: Aikins, Scott
cc: MYCINgang

J~Ul,
1. You state that we are giving up the ability to "easily" add rules to
MYCIN.Certainly, it is currently "easy" to add new rules to MYCIN;
however, it is not so "easy" to rationalize, justify, and analyze these
new rules. Furthermore, it becomes "difficult" when the system starts
giving incorrect therapy after these new rules have been added.
2. I believe a doctor in Boston would have an "easier" task of adding
new meningitis rules, as compared to bacteremia rules. He now has
some reference points and definite guidelines on how a rule should
be written. Again, the rule is more likely to be compatible with the
existing system, since the new rule is written along the same guidelines
and same philosophy. This is not the case with the bacteremia rules
where it is likely and even probable that any new rule written by a
non-MYCINperson could cause the system to malfunction.
3. I have not attempted to specifically define every increment between
CFs.
4. I need not renfind all of us that we are dealing directly with human
lives. If" another M.D. on the East Coast disagrees with our CFs and
has data (be it strong or weak) as the basis for his disagreement, then
we had better know about it. I claim that one of the advantages of
specific criteria fbr CFs is that this "invites disagreement" (or to put
it another way---critical analysis of the rules by non-MYCIN experts
is possible).
5. What is this mystical "commonsense referent" that you have men-
tioned? (Likewise, Ted has stated that physicians would PROBABLY
agree fhirly closely on the CFs currently in MYCIN.If this is true,
then my arguments for preciseness are invalid and unnecessary.)
6. Your last point concerning using too much time and effort on the CF
question, when we could be adding more medical knowledge--I will
merely refer you to Matthew: Chapter 7, verses 24-27.

Cheers,
Victor
228 Uncertainty and Evidential Support

Date: 1 Mar 1976


From: Clancey
Subject: More about certainty factors and a reply to your message
To: Yu
cc: MYCINgang

Thanks for commenting on my remarks on CFs. I am well aware that


my observations suffered from vagueness. As you might expect, this was just
a first-shot approach to issues that have been bothering me. I am now pre-
paring a paper that discusses rule modularity, I believe that you will find that
it clarifies my arguments from last week. Briefly, I see now that the problem
is not so muchwith the CFs you have proposed, but is instead a general issue
concerning all rules.
As for the furor, as far as I am concerned, your rules have the precise
property I predicted last August would not occur, namely, small CFs. What
will comeof this discussion, I believe, is primarily a better understanding of
rules. More on this later in the week.
I will now briefly reply to your numbered remarks:
1. You will notice that I said the system Wasinsensitive to variations in
CF>0.2 in so far as rule success and invocation are concerned. This
excludes calculations that use CFs in percentage cutoffs. Do you have
other sensitivities in mind?
2. It was Larry who told me that you wanted to form a large rule set
from the combinations of these rules. Perhaps this was only the gist
of a side argument that centered on allowing for all cases. I look
forward to hearing about these "subtle clinical situations that would
otherwise not be covered."
3. I have no problem with the 0.2 cutoff, per se.

Bill

3 March 1976
From: Clancey
Subject: Modularity of rules
To: Yu
cc: MYCINgang
I have completed a write-up of my understanding of what we
mean by rule independence. I consider this useful as a tutorial to those who
perhaps have not fully appreciated the significance of the constraint
P(el & e21 h)= P(ellh)*P(e2[h), which is discussed in several of Teds
ups on the relation of CFs to probabilities.
For those of you for whomthis is old hat by now, I would appreciate it
if you would peruse my memoand let me know if Ive got it straight.
Ive expanded the discussion of plausibility of rule interaction here also.
This appears to be an issue worth pursuing.
The menlo is CEMODULAR on my directory. It is about 3 pages long.
Bill Clancey
An Electronic ExchangeRegarding CFs 229

<CLANCEY>CEMODULAR. 1
I. Introduction
This memoarose from my desire to understand rule CFs of less than
the 0.2 threshold. Howcould such a rule be evidence of something? Does a
rule having a CF less than 0.2 pose any problems to the process of combining
certainty factors? What does it mean to say that a rule is modular? Must a
rule satisfy some property relating to its certainty factor to be considered
modular?
After thinking out all of these problems for myself, I re-examined our
publications in the light of my new understanding. Alas! The ideas discussed
below have long been known and were simply overlooked or undervalued by
me. Indeed, I suspect that most of us have to some degree failed to appreciate
Teds thesis, from which I will be quoting below.

II. What Is Modularity?


The following is a restatement of one requirement for rule indepen-
dence. As Ted discusses in CEMEMO,it is a necessary assumption
for our combining functions to be consistent with probability theory, namely:
P(el & e21h ) = P(el]h).P(e2]h), and the same for -h (e=premise and
action of rule).
Let {Ri} he a subset of the UPDATED-BY rules for some parameter P,
all of which mention the same value for P in the conclusion, namely VALUEE
though perhaps with different certainty factors. (If P is a yes-no parameter,
then this set contains all of the UPDATED-BY rules.) Nowlet P! be the power
set of R, and for every element of P!, let PREMidesignate the union of the
premises of all rules Rj in the power set element i.
Nowfor every PREMithat is logically consistent (no subset of premises
is unsatisfiable), it must be the case that the CF applied to the new rule
PREMi--*VALUEP is given by the combining function applied over all rule
CFs in the power set element. If so, we can say that these original rules are
independent logically and so can contribute evidence incrementally, regard-
less of the pattern of succession or failure of the set.
This is a requirement for rule modularity. It can also be shown [working
from assumption 9 of the memo: P(el & e2) -- P(el)*P(e2)] that premises
must be independent "for ALLrules dealing with a clinical parameter re-
gardless of the value specified (e.g., all rules that conclude anything about
the identity of an organism). This assumption is generally avoided by Baye-
sians. I have not examined our rules closely with this assumption in mind,
but I suspect we may discover several examples of nonindependent PREM-
ISES" (Shortliffe, CEMEMO). This is a generalization of the above restric-
tion, which I believe is more intuitive.
It is worth reviewing at this time someof the related restrictions on rules
and CFs mentioned in Teds thesis.
A. Given mutually exclusive hypotheses hi for an observation e, the sum
of their CFs, CF(hi,e), must not exceed 1. (From CEMEMO, page 7:
often find that this rule is broken.")
B. "We must insist that dependent pieces of evidence be grouped into
single rather than multiple rules."
C. "The rule acquisition procedure requires a screening process to see if
the new rule improperly interacts with other rules in the knowledge base."
230 Uncertainty and Evidential Support

Some of the consistency checks ]i~d discusses are subsumption and rule con-
tradictions.

III. Understanding Modularity


I did not fully appreciate these problems, even after several readings
over the past year, until I worked out an example containing nonindependent
rules.
Example: Consider the following rules having CFs that I believe to be
valid. The rules would be used in a consultation system for deciding whether
or not to carry an umbrella.
Rule A: If the weathermansaid that there is a 20%chance of rain today,
then I expect it to rain today (0.1).
Rule B: If it is summer, then I expect it to rain today (- 0.9).
Rule C: If there are manyclouds, then I expect it to rain today (0.1).
Nowlet these rules succeed in various combinations:

Powerset element Computed CF Preferred CF Evaluation


A & B &C -0.71 0.5 wrong
A&B - 0.8 0.21 wrong
A &C -0.19 ? okay
B &C -0.8 ? wrong

These rules are not modular--the combined CF does not correspond to


what I believe when I form the combination of the premises in my mind.
Specifically, I give far more weight to clouds and weathermans prediction of
rain in the summer (when I expect neither) than in the winter (when clouds
and 20% chance are common).
Using Websters definition of belief, "the degree of mental acceptance of
an idea or conclusion," I think that it would be fair to say that I DONOT
believe the conclusion of "rain today," given premises A, C, A & C, or B &
C. As far as MYCINsoperation is concerned, this corresponds to a CF<0.2.
The CF combining function has not worked above because my rules are not
independent. (It is also possible for independent rules to combine improperly
because the combining function is wrong--more on this later.)
Looking again at the rules I wrote above, I feel that Rule A in particular
is a bad rule. It takes a mere fragment of an argument and tries to draw a
conclusion. Nowadmittedly we know something, given that 20% was pre-
dicted, but we are being logically naive to think that this fact alone is worth
isolating. It depends radically on other information for its usefulness. More-
over, the context in which it is true will radically determine the conclusion
we draw from it. We saw above that in summer I am far more inclined to
give it weight than in winter. The only thing I AMwilling to say given just
this clause is that it probably wont be fair (0.3). (Like Wittgenstein, I
myself, "What do I know now?")

IV. Implications for 0.2 Rules


I see now that the problem I was anticipating in my earlier message will
hold if the rules are not modular. My fear was that a rule having a CF<0.2
was more likely to have a premise that was incomplete than was a rule of
CF>0.2. I understand now that a 0.2 rule, like any other rule, is acceptable
if there is no known argument that involves its premise with that of another
An Electronic ExchangeRegarding CFs 231

rule, other than one that simply adds the evidence together incrementally
according to the combining function. A new argument that is built from the
evidence mentioned in the other rules is proof that the individual rules are
not modular. (Subsumption is an explicit fi)rm of this.) Thus, Victors claim
that he wants to allow for all combinations MUSTrest on the inherent in-
dependence of his premise sets. Again, no conclusion whatsoever should be
drawn from tile coincidence of any combination of premise sets, other than
that arrived at by the CF combining function. Moreover, every conclusion
collected incrementally by the combining function must be one Victor would
reach with the same strength, given that union of premise clauses (cf., B
C above). In fact, I am willing to believe now that a rule having a CF<0.2 is
perhaps MORElikely to be independent because it wouldnt have been given
such a small CFunless the author saw it as minimally useful. That is, it stands
on its own as a very weak observation having no other inferential value (I
am still wary of calling it "evidence"). If it had a higher CF, it would almost
certainly be useful in combination with other observations. Based on Victors
decision to separate meningitis clinical and lab rules, I conclude that doctors
do not have the ability to relate the two. Is this correct? I believe that Ted
has also questioned Victors rules in this respect.

V. Plausibility
The problem of plausible combination of rules is difficult to anticipate
because it is precisely the unanticipated coincidence of rule success that we
are most likely to find objectionable. Suppose that we do find two rules D
and E that we cant imagine ever succeeding at the same time, yet there is no
logical reason for tiffs not to occur (i.e., the rules are not mutually exclusive;
not always easy to determine since all rules that cause these rules to be in-
w)ked must be examined). In this case we should try to define a new param-
eter that explains the connection between these two parameters, which we
do not as yet understand. (A method of theory formation: ask yourself"What
would I think if these two pieces of evidence were true?" Perhaps the actions
are in conflict--why? Perhaps the premises never appear together (usually
arent both true)--why not? Do this for the power set of all evidence under
consideration.)

VI. What Does This Say About MYCINsRule Set?


(1) They must be disjoint (mutually exclusive) within an UPDATED-BY
subset, or (2) the parameters in the premises of rules that succeed together
must be logically noninteracting. This means that there must be nothing
significant about their coincidence. Their contribution separately must be the
same as an inference that considers them together. [In pseudochemical terms,
the rule CF is a measure of (logical) force, which binds together the clauses
of the premise in a single rule.]
Taking my example, I should rewrite the rules and form a new set in-
cluding A & B & C and A & B. Rules A and C are incomplete. They say
nothing here because they say something when a context is added. Leaving
them separate led to a nonsensical result (B & C), which CF theory claims
should make sense. This is an example of where plausibility of rule interac-
tion must be made at rule acquisition time. Indeed, I believe now that unless
we require our rules to be disjoint within an UPDATED-BY set, it will be very
difficult to say whether or not a rule is modular. For too long I have assumed
232 Uncertainty and Evidential Support

that because a rule looks like a discrete object it is necessarily modular. I have
assumed that it is sufficient to have a CF combining function that models
adequately the process of incrementally collecting evidence, forgetting that
this evidence MUST be discrete [or the [unction to be valid. Otherwise, a
FUNCTION is replacing a logical argument, which a rule unifying the prem-
ises would represent.
VII. Making Rules Modular
It remains to detect if MYCINsrules are modular. We must look for
premises that are still "charged" with inference potential, as measured relative
to clauses in other rules. Victor has said that his rules are modular (at least
the ones having CF<0.2). If so, there is no problem, though we should be
wary about the 0.05/0.15 distinctions. (Howis it that "evidence" that is too
weak to yield an acceptable conclusion nevertheless is definite enough to be
put in one of three CF categories: 0.05, 0.10 and 0.15?)
One method for detecting rule modularity is as follows. Given, for ex-
ample, three rules A, B, and C, where B and C have the same CF (all three
mention VALUEP),then ifA & B and A & C are determined to have different
certainty factors (where & denotes the process of combining the rules into
single rule), then the rules A, B, and C arent modular.
On the other hand, given two rules A and B known to be modular (our
knowledge of the domain cannot yield an argument that combines the prem-
ises), then A & B must have a CF given by the combining function (obviously
true for disjoint rules). This gives us a wayfor evaluating a combining func-
tion.
11
A Model of Inexact
Reasoning in Medicine

Edward H. Shortliffe and Bruce G. Buchanan

Inexact reasoning is commonin the sciences. It is characterized by such


phrases as "the art of good guessing," the "softer aspects of physics" (or
chemistry, or any other science), and "good scientific judgment." By defi-
nition, inexact reasoning defies analysis as applications of sets of inference
rules that are expressed in the predicate logic. Yet it need not defy all
analysis. In this chapter we examine a model of inexact reasoning applied
to a subdomain of" medicine. Helmer and Rescher (1960) assert that the
traditional concept of "exact" versus "inexact" science, with the social sci-
ences accounting for the second class, has relied on a false distinction
usually reflecting the presence or absence of mathematical notation. They
point out that only a small portion of natural science can be termed exact--
areas such as pure mathematics and subfields of physics in which some of
the exactness "has even been put to the ultimate test of formal axiomati-
zation." In several areas of applied natural science, on the other hand,
decisions, predictions, and explanations are made only after exact proce-
dures are mingled with unformalized expertise. The general awareness
regarding these observations is reflected in the commonreferences to the
"artistic" components in the "science of medicine."
During the years since computers were first introduced into the med-
ical arena, researchers have sought to develop techniques for modeling
clinical decision making. Such efforts have had a dual motivation. Not only
has their potential clinical significance been apparent, but the design of
such programs has required an analytical approach to medical reasoning,
which has in turn led to distillation of decision criteria that in somecases

This chapter is a shortened and edited version of a paper appearing in MathematicalBiosciences


23:351-379 (1975). Copyright 1975 by Mathematical Biosciences. All rights reserved. Used
with permission.

233
234 A Model of Inexact Reasoning in Medicine

had never been explicitly stated. It is both fascinating and educational for
experts to reflect on the inference rules that they use when providing
clinical consultations.
Several programs have successfully modeled the diagnostic process.
Manyof these have relied on statistical decision theory as reflected in the
use of Bayes Theorem for manipulation of conditional probabilities. Use
of the theorem, however, requires either large amounts of valid back-
ground data or numerous approximations and assumptions. The success
of Gorry and Barnetts early work (Gorry and Barnett, 1968) and of
similar study by Warner and coworkers using the same data (Warner et al.,
1964) depended to a large extent on the availability of good data regarding
several hundred individuals with congenital heart disease.
Although conditional probability provides useful results in areas of
medical decision making such as those we have mentioned, vast portions
of medical experience suffer from having so few data and so much im-
perfect knowledgethat a rigorous probabilistic analysis, the ideal standard
by which to judge" the rationality of a physicians decisions, is not possible.
It is nevertheless instructive to examine models for the less formal aspects
of decision making. Physicians seem to use an ill-defined mechanism for
reaching decisions despite a lack of formal knowledge regarding the in-
terrelationships of all the variables that they are considering. This mech-
anism is often adequate, in well-trained or experienced individuals, to lead
to sound conclusions on the basis of a limited set of observations.1
The purpose of this chapter is to examine the nature of such non-
probabilistic and unformalized reasoning processes and to propose a model
by means of which such incomplete "artistic" knowledge might be quan-
tified. Wehave developed this model in response to the needs of a com-
puter program that will permit the opinions of experts to become more
generally available to nonexperts. The model is, in effect, an approxima-
tion to conditional probability. Although conceived with medical decision
making in mind, it is potentially applicable to any problem area in which
real-world knowledge must be combined with expertise before an informed
opinion can be obtained to explain observations or to suggest a course of
action.
We begin with a brief discussion of Bayes Theorem as it has been
utilized by other workers in this field. The theorem will serve as a focus
for discussion of the clinical problems that we would like to solve by using
computer models. The potential applicability of the proposed decision
model is then introduced in the context of the MYCINsystem. Once the
problem has been defined in this fashion, the criteria and numerical char-
acteristics of a quantification scheme will be proposed. Weconclude with
a discussion of how the model is used by MYCIN when it offers opinions
to physicians regarding antimicrobial therapy selection.

lIntuition may also lead to unsound conclusions, as noted by Schwartz et al. (1973).
Formulation
of the Problem 235

1 1.1 Formulation of the Problem

The medical diagnostic problem can be viewed as the assignment of prob-


abilities to specific diagnoses after analyzing all relevant data. If the sum
of the relevant data (or evidence) is represented by e, and di is the ith
diagnosis (or "disease") under consideration, then P(dile ) is the conditional
probability that the patient has disease i in light of the evidence e. Diag-
nostic programs have traditionally sought to find a set of evidence that
allows P(dile) to exceed some threshold, say 0.95, for one of the possible
diagnoses. Under these circumstances the second-ranked diagnosis is suf-
ficiently less likely (<0.05) that the user is content to accept disease i as the
2diagnosis requiring therapeutic attention.
Bayes Theoremis useful in these applications because it allows P(dile)
to be calculated from the component conditional probabilities:

P(di) P(e[di)
P(dile) - X P(dj) P(eldi)

In this representation of the theorem, di is one of n disjoint diagnoses,


P(di) is simply the a priori probability that the patient has disease i before
any evidence has been gathered, and P(eldi) is the probability that a patient
will have the complex of symptoms and signs represented by e, given that
he or she has disease di.
Wehave so far ignored the complex problem of identifying the "rel-
evant" data that should be gathered in order to diagnose the patients
disease. Evidence is actually acquired piece by piece, the necessary addi-
tional data being identified on the basis of the likely diagnoses at any given
time. Diagnostic programs that mimic the process of analyzing evidence
incrementally often use a modified version of Bayes Theorem that is ap-
propriate for sequential diagnosis (Gorry and Barnett, 1968):

Let eI be the set of all observations to date, and sl be some new


piece of" data. Furthermore, let e be the new set of observations
once Sl has been added to el. Then:

P(slldi &el) P(dilel)


P(dile) = E e(slldj el) P(djlel)

The successful programs that use Bayes Theorem in this form require
huge amountsof statistical data, not only P(skldj) for each of the pieces of

2Severalprogramshavealso includedutility considerationsin their analyses. Forexample,


an unlikelybut lethal diseaseOmtrespondswell to treatmentmaymerittherapeuticattention
becauseP(dile ) is nonzero(althoughverysmall).
236 A Model of Inexact Reasoningin Medicine

data, Sk, in e, but also the interrelationships of the Sk within each disease
dj. 3 The congenital heart disease programs (Gorry and Barnett, 1968; War-
ner et al., 1964) were able to acquire all the necessary conditional proba-
bilities from a survey of several hundred patients with confirmed diagnoses
and thus had nonjudgmental data on which to base their Bayesian analyses.
Edwards (1972, pp. 139-140) has summarized the kinds of problems
that can arise when an attempt is made to gather the kinds of data needed
for rigorous analysis:

Myfriends who are expert about medical records tell me that to attempt
to dig out from even the most sophisticated hospitals records the frequency
of association between any particular symptomand any particular diagnosis
is next to impossible--and when I raise the question of complexes of symp-
toms, they stop speaking to me. For another thing, doctors keep telling me
that diseases change, that this years flu is different from last years flu, so
that symptom-disease records extending far back in time are of very limited
usefulness. Moreover, the observation of symptomsis well-supplied with er-
ror, and the diagnosis of diseases is even more so; both kinds of errors will
ordinarily be frozen permanently into symptom-disease statistics. Finally,
even if diseases didnt change, doctors would. The usefulness of disease cat-
egories is so much a function of available treatments that these categories
themselves change as treatments change--a fact hard to incorporate into
symptom-diseasestatistics.
All these arguments against symptom-disease statistics are perhaps some-
what overstated. Wheresuch statistics can be obtained and believed, obviously
they should be used. But I argue that usually they cannot be obtained, and
even in those instances where they have been obtained, they may not deserve
belief.

An alternative to exhaustive data collection is to use the knowledge that


an expert has about the diseaseIpartly based on experience and partly on
general principles--to reason about diagnoses. In the case of this judg-
mental knowledge acquired from experts, the conditional probabilities and
their complex interrelationships cannot be acquired in an exhaustive man-
ner, Opinions can be sought and attempts made to quantify them, but the
extent to which the resulting numbers can be manipulated as probabilities
is not clear. We shall explain this last point more fully as we proceed. First,
let us examine some of the reasons that it might be desirable to construct
a model that allows us to avoid the inherent problems of explicitly relating
the conditional probabilities to one another.
A conditional probability statement is, in effect, a statement of a de-
cision criterion or rule. For example, the expression P(diJsk) = ca n be rea d
as a statement that there is a 100x% chance that a patient observed to have
symptom s k has disease d i. Stated in rule form, it would be

aFor example,although sI and s 2 are independentover all diseases, it maybe true that sI and
s2 are closely linked for patients with disease di. Thusrelationships must be knownwithin
eachof the dj; overall relationships are not sufficient.
MYCINs Rule-Based Approach 237

IF: Thepatient hassign or symptom


sk
THEN:Concludethat he hasdiseased~with probability x

Weshall often refer to statements of conditional probability as decision


rules or decision criteria in the diagnostic context. The value of x for such
rules maynot be obvious (e.g., "y strongly suggests that z is true" is difficult
to quantify), but an expert may be able to offer an estimate of this number
based on clinical experience and general knowledge, even when such num-
bers are not readily available otherwise.
A large set of such rules obtained from textbooks and experts would
clearly contain a large amount of medical knowledge. It is conceivable that
a computer program could be designed to consider all such general rules
and to generate a final probability of each di based on data regarding a
specific patient. Bayes Theorem would only be appropriate for such a
program, however, if values for P(slldi) and P(slld i 8c $2) could be obtained.
As has been noted, these requirements become unworkable, even if the
subjective probabilities of experts are used, in cases where a large number
of diagnoses (hypotheses) must be considered. The first requires acquiring
the inverse of every rule, and the second requires obtaining explicit state-
ments regarding the interrelationships of all rules in the system.
In short, we would like to devise an approximate method that allows
us to compute a value for P(di]e ) solely in terms of P(di[sk) , where e is the
composite of all the observed s k. Such a technique will not be exact, but
since the conditional probabilities reflect judgmental (and thus highly sub-
jective) knowledge, a rigorous application of Bayes Theorem will not nec-
essarily produce accurate cumulative probabilities either. Instead, we look
for ways to handle decision rules as discrete packets of knowledge and for
a quantification scheme that permits accumulation of evidence in a manner
that adequately reflects the reasoning process of an expert using the same
or similar rules.

11.2MYCiNs
Rule-Based
Approach

As has been discussed, MYCINsprincipal task is to determine the likely


identity of pathogens in patients with infections and to assist in the selec-
tion of a therapeutic regimen appropriate for treating the organisms under
consideration. We have explained how MYCINmodels the consultation
process, utilizing judgmental knowledge acquired from experts in con-
junction with certain statistical data that are available from the clinical
microbiology laboratory and from patient records.
It is useful to consider the advantages provided by a rule-based system
for computer use of judgmental knowledge. It should be emphasized that
we see these advantages as being sufficiently strong in certain environments
that we have devised an alternative and approximate approach that par-
238 A Modelof Inexact Reasoningin Medicine

alms the results available using Bayes Theorem. We do not argue against
the use of Bayes Theorem in those medical environments in which suffi-
cient data are available to permit its adequate use.
The advantages of rule-based systems for diagnostic consultations in-
clude:

1. the use of general knowledge (from textbooks or experts) tor consid-


eration of a specific patient (even well-indexed books may be difficult
for a nonexpert to use when considering a patient whose problem is
not quite the same as those of patients discussed in the text);
2. the use of judgmental knowledge for consideration of very small classes
of patients with rare diseases about which good statistical data are not
available;
3. ease of modification (since the rules are not explicitly related to one
another and there need be no prestructured decision tree tor such a
system, rule modifications and the addition of new rules need not re-
quire complex considerations regarding interactions with the remainder
of the systems knowledge);
4. facilitated search for potential inconsistencies and contradictions in the
knowledge base (criteria stored explicitly in packets such as rules can
be searched and compared without major difficulty);
5. straightforward mechanisms for explaining decisions to a user by iden-
tifying and communicating the relevant rules;
6. an augmented instructional capability (a system user may be educated
regarding system knowledge in a selective fashion; i.e., only those por-
tions of the decision process that are puzzling need be examined).

We shall use the following rule tor illustrative purposes throughout this
chapter:
IF: 1) Thestainof the organism is gram positive,and
2) Themorphology of the organism is coccus, and
3) Thegrowth conformationof the organism is chains
THEN: There is suggestiveevidence(.7) that the identity
of the organism
is streptococcus

This rule reflects our collaborating experts belief that gram-positive cocci
growing in chains are apt to be streptococci. When asked to weight his
belief in this conclusion, 4 he indicated a 70% belief that the conclusion was
valid. Translated to the notation of conditional probability, this rule seems

41n the English-languageversion of the rules, the programuses phrases such as "suggestive
evidence," as in the above example.However,the numberstbllowing these terms, indicating
degrees of certainty, are all that is used in the model.The English phrases are not given by
the expert and then quantified; they are, in effect, "canned-phrases"used only for translating
rules into English representations. The prompt used for acquiring the certainty measure
from the expert is as follows: "Ona scale of 1 to 10, howmuchcertainty do you affix to this
conclusion?"
Philosophical Background 239

to say P(hl]s I & s2 8c. s:~)=0.7 where hi is the hypothesis that the organism
is a Streptococcus, sl is the observation that the organism is gram-positive,
s~ that it is a coccus, and s3 that it grows in chains. Questioning of the
expert gradually reveals, however, that despite the apparent similarity to
a statement regarding a conditional probability, the number 0.7 differs
significantly from a probability. The expert may well agree that
P(hl]sl & s2 & s:0 = 0.7, but he becomes uneasy when he attempts to follow
the logical conclusion that therefore P(~hllS 1 & s2 & s~) = 0.3. He claims
that the three observations are evidence (to degree 0.7) in favor of the
conclusion that the organism is a Streptococcus and should not be construed
as evidence (to degree 0.3) against Streptococcus. Weshall refer to this prob-
lem as Paradox 1 and return to it later in the exposition, after the inter-
pretation of the 0.7 in the rule above has been introduced.
It is tempting to conclude that the expert is irrational if he is unwilling
to follow the implications of his probabilistic statements to their logical
conclusions. Another interpretation, however, is that the numbers he has
given should not be construed as probabilities at all, that they are judg-
mental measures that reflect a level of" belief. The nature of such numbers
and the very existence of such concepts have interested philosophers of
science for the last half-century. Weshall therefore digress temporarily to
examine some of these theoretical issues. Wethen proceed to a detailed
presentation of the quantitative model we propose. In the last section of
this chapter, we shall show how the model has been implemented for on-
going use by the MYCINprogram.

11.3Philosophical Background

The familiar P-function 5 of traditional probability theory is a straightfor-


ward concept from elementary statistics. However, because of imperfect
knowledge and the dependence of decisions on individual judgments, the
P-function no longer seems entirely appropriate for modeling some of the
decision processes in medical diagnosis. This problem with the P-function
has been well recognized and has generated several philosophical treatises

5The P-function may be defined in a variety of ways. Emanuel Parzen (1960) suggests a set-
theoretical definition: Given a random situation, which is described by a sample description
space s, probability is a function P that to every event e assigns a nonnegative real number,
denoted by P(e) and called the probability of the event e. The probability function must satisfy
three axioms:

Axiom1: P(e) ~> 0 for every event e;


Axiom2: P(s) = 1 tbr the certain element
Axiom3: P(e U f) = P(e) + P(f) ifef = 0 or, in words, the probability of the union of
two mutually exclusive events is the sum of their probabilities.
240 A Modelof Inexact Reasoningin Medicine

during the last 30 years. One difficulty with these analyses is that they are,
in general, more theoretical than practical in orientation. They have char-
acterized the problem well but have offered few quantitative or theoretical
techniques that lend themselves to computer simulation of related reason-
ing processes. It is useful to examine these writings, however, in order to
avoid recognized pitfalls.
This section therefore summarizes some of the theory that should be
considered when analyzing the decision problem that we have described.
Wediscuss several interpretations of probability itself, the theory on which
Bayes Theoremrelies. The difficulties met when trying to use the P-func-
tion during the modeling of medical decision making are reiterated. Then
we discuss the theory of confirmation, an approach to the interpretation
of evidence. Our discussion argues that confirmation provides a natural
environment in which to model certain aspects of" medical reasoning. We
then briefly summarize some other approaches to the problem, each of"
which has arisen in response to the inadequacies of applied probability.
Although each of" these alternate approaches is potentially useful in the
problem area that concerns us, we have chosen to develop a quantification
scheme based on the concept of confirmation.

11.3.1 Probability

Swinburne (1973) provides a useful classification of the theories of prob-


ability proposed over the last 200 years. The first of these, the Classical
Theory of Probability, asserts that if" the probability of an event is said to
be P, then "there are integers m and n such that P = m/n.., such that n
exclusive and exhaustive alternatives must occur, m of which constitute the
occurrence of s." This theory, like the second and third to be described, is
called "statistical probability" by Swinburne. These interpretations are typ-
ified by statements of" the form "the probability of an A being a B is P."
The second probability theory cited by Swinburne, the Propensity The-
ory, asserts that probability propositions "makeclaims" about a propensity
or "would-be" or tendency in things. If an atom is said to have a probability
of 0.9 of disintegrating within the next minute, a statement has been made
about its propensity to do so.
The Frequency Theory is based on the familiar claim that propositions
about probability are propositions about proportions or relative frequen-
cies as observed in the past. This interpretation provides the basis for the
statistical data collection used by most of the Bayesian diagnostic programs.
Harr6 (1970) observes that statistical probability seems to differ syn-
tactically fiom the sense of probability used in inference problems such as
medical diagnosis. He points out that the traditional concept of probability
refers to what is likely to turn out to be true (in the future), whereas the
other variety of" probability examines what has already turned out to be
true but cannot be determined directly. Although these two kinds of prob-
Philosophical Background 241

lems may be approached on the basis of identical observations, the occur-


rence or nonoccurrence of" future events is subject to the probabilistic anal-
ysis of statistics, whereas the verification of a belief, hypothesis, or
conjecture concerning a truth in the present requires a "process" of analysis
commonlyreferred to as confirmation. This distinction on the basis of tense
mayseem somewhatartificial at first, but it does serve a useful purpose as
we attempt to develop a framework for analysis of the diagnosis problem.
Swinburne also discusses two more theories of probability, each of
which bears more direct relation to the problem at hand. One is the Sub-
jective Theory originally put forward by Ramsey (1931) and developed
particular by Savage (1974) and de Finetti (1972). In their view, statements
of" probability regarding an event are propositions regarding peoples ac-
tual belief" in the occurrence (present or future) of the event in question.
Although this approach fails as an explanation of statistical probability
(where beliefs that may be irrational have no bearing on the calculated
probability of, say, a six being rolled on the next toss of a die), it is alluring
for our purposes because it attempts to recognize the dependence of de-
cisions, in certain problem areas, on both the weight of evidence and its
interpretation as based on the expertise (beliefs) of the individual making
the decision. In fact, de Finetti (1972, p. 4) has stated part of our problem
explicitly:

Oil manyoccasions decision-makers make use of expert opinion. Such


opinions cannot possibly take the form of advice bearing directly on the
decision; .... Occasionally,[the expert] is required to state a probability, but
it is not easy to find a convenienttorm in whichhe can express it.

Furthermore, the goals of the subjective probabilists seem very similar to


those which we have also delineated (de Finetti, 1972, p. 144):

Wehold it to be chimerical for anyoneto arrive at beliefs, opinions, or


determinations without the intervention of his personal judgment. Westrive
to makesuch .judgmentsas dispassionate, reflective, and wise as possible by
a doctrine which showswhere and howthey intervene and lays bare possible
inconsistencies amongjudgments.

One way to acquire the subjective probabilities of experts is suggested


by Savage and described by a geological analyst as follows (Grayson, 1960,
p. 256):

The simplest [way] is to ask the geologist .... The geologist looks at the
evidence, thinks, and then gives a figure such as 1 in 5 or 50-50. Admittedly
this is difficult .... Thus, several wayshave been proposed to help the ge-
ologist makehis probability estimate explicit .... The leading proponentof
personal [i.e., subjective] probabilities, Savage,proposeswhat seemsto be the
most workable method. One can, namely, ask the person not how he feels
242 A Modelof Inexact Reasoningin Medicine

but what he woulddo in such and such a situation. Accordingly, a geologist


wouldbe confronted with a choice-makingsituation.

There is one principal problem to be faced, however, in attempting to


adopt the subjectivist model for our computer program--namely, the sub-
jectivists criticism of those whoavoid a Bayesian approach. Subjectivists
assert that the conditional and initial probabilities needed for use of Bayes
Theorem may simply be acquired by asking the opinion of an expert. We
must reject this approach when the number of decision criteria becomes
large, however, because it would require that experts be asked to quantify
6an unmanageably large number of interrelationships.
A final point to be maderegarding subjectivist theory is that the prob-
abilities so obtained are meant to be utilized by the P-function of statistical
probability so that inconsistencies among the judgments offered by the
experts may be discovered. Despite apparently irrational beliefs that may
be revealed in this way ("irrational" here means that the subjective prob-
abilities are inconsistent with the axioms of the P-function), the expert
opinions provide useful criteria, which maylead to sound decisions if it is
accepted that the numbers offered are not necessarily probabilities in the
traditional sense. It is our assertion that a new quantitative system should
therefbre be devised in order to utilize the experts criteria effectively.
Let us return now to the fifth and final category in Swinburnes list of
probability theories (Swinburne, 1973). This is the Logical Theory, which
gained its classical exposition in J. M. Keynes A Treatise on Probability
(1962). Since that time, its most notable proponent has been Rudolf Car-
nap. In the Logical Theory, probability is said to be a logical relation
between statements of evidence and hypotheses. Carnap describes this
and the frequency interpretation of probability as follows (Carnap, 1950,
p. 19):

(i) Probability I is the degree of confirmation of a hypothesis h with


respect to an evidence statement e; e.g., an observational report. This is a
logical semantical concept. A sentence about this concept is based, not on
observationof facts, but on logical analysis ....
(ii) Probability2 is the relative frequency(in the long run) of one property
of events or things with respect to another. A sentence about this concept is
factual, empirical.

In order to avoid confusion regarding which concept of probability is


being discussed, the term probability will hereafter be reserved for
probability2, i.e., the P-function of statistical probability. Probability 1, or
epistemic probability as Swinburne (1973) describes it, will be called degree
of confirmation in keeping with Carnaps terminology.

~ilt wouldalso complicatethe additionof newdecisioncriteria since they wouldno longerbe


modularand wouldthus require itemizationof all possibleinteractionswithpreexistingcri-
teria.
Philosophical Background 243

11.3.2 Confirmation

Carnaps interpretation of confirmation rests upon strict logical entailment.


Several authors, however, have viewed the subject in a broader context,
such as our application requires. For example, just as the observation of a
black raven would logically "confirm" the hypothesis that "all ravens are
black" (where "confirm" means "lends credence to"), we also want the fact
that an organism is gram-positive to "confirm" the hypothesis that it is a
Streptococcus, even though the conclusion is based on world knowledge and
not on logical analysis.
Carnap (1950) makes a useful distinction among three forms of con-
firmation, which we should consider when trying to characterize the needs
of" our decision model. He calls these classificatory, comparative, and quan-
titative uses of the concept of confirmation. These are easily understood
by example:

a. classificatory: "the evidence e confirms the hypothesis h"


b. comparative: "e I confirms h more strongly than e 2 confirms h" or "e
confirms hi more strongly than e confirms h2"
e. quantitative: "e confirms h with strength x"

In MYCINstask domain, we need to use a semiquantitative approach


in order to reach a comparative goal. Thus, although our individual de-
cision criteria might be quantitative (e.g., "gram-positive suggests Strepto-
coccus with strength 0.1"), the effort is merely aimed at singling out two or
three identities of organisms that are approximately equally likely and that
are "comparatively" much more likely tha n any others. There is no need
to quote a number that reflects the consulting experts degree of certainty
regarding his or her decisions.
Whenquantitative uses of confirmation are discussed, the degree of
confirmation of hypothesis h on the basis of evidence e is written as C[h,e].
This form roughly parallels the familiar P-function notation for condi-
tional probability, P(hle). Carnap has addressed the question of whether it
is reasonable to quantify degree of confirmation (Carnap, 1950). He notes
that, although the concept is familiar to us all, we attempt to use it for
comparisons of relative likelihood rather than in a strict numerical sense.
In his classic work on the subject, however, he suggested that we all know
how to use confirmation as a quantitative concept in contexts such as "pre-
dictions of" results of games of chance [where] we can determine which
numerical value [others] implicitly attribute to probability1, even if they do
not state it explicitly, by observing their reactions to betting proposals."
The reason for our reliance on the opinions of experts is reflected in his
observation that individuals with experience are inclined to offer theoret-
ical arguments to defend their viewpoint regarding a hypothesis; "this
shows that they regard probabilityj as an objective concept." However, he
244 A Modelof Inexact Reasoningin Medicine

was willing to admit the subjective nature of such concepts some years later
when, in discussing the nature of inductive reasoning, he wrote (Carnap,
1962, p. 317):

I wouldthink that inductive reasoning should lead, not to acceptance or


rejection [of a proposition], but to the assignmentof a numberto the prop-
osition, viz., its value (credibility value) .... This rational subjective proba-
bility.., is sufficient for determiningfirst the rational subjective value o[ any
act, and then a rational decision.

As mentioned above, quantifying confirmation and then manipulating


the numbers as though they were probabilities quickly leads to apparent
inconsistencies or paradoxes. Carl Hempel presented an early analysis of
confirmation (Hempel, 1965), pointing out as we have that C[h,e] is a very
different concept from P(hle ). His famous Paradox of the Ravens was pre-
sented early in his discussion of the logic of confirmation. Let hl be the
statement that "all ravens are black" and h2 the statement that "all nonblack
things are nonravens." Clearly hi is logically equivalent to h,2. If one were
to draw an analogy with conditional probability, it might at first seem valid,
therefore, to assert that C[hl,e] = C[h2,e] for all e. However,it appears coun-
terintuitive to state that the observation of a green vase supports hi, even
though the observation does seem to support h,2. C[h,e] is therefore differ-
ent from P(hle) for it seems somehowwrong that an observation of a vase
could logically support an assertion about ravens.
Another characteristic of a quantitative approach to confirmation that
distinguishes the concept from probability was well-recognized by Carnap
(1950) and discussed by Barker (1957) and Harrd (1970). They note
it is counterintuitive to suggest that the confirmation of the negation of a
hypothesis is equal to one minus the confirmation of the hypothesis, i.e.,
C[h,e] is not 1 - C[-qh,e]. The streptococcal decision rule asserted that a
gram-positive coccus growing in chains is a Streptococcus with a measure of
support specified as 7 out of 10. This translates to C[h,e]=0.7 where h is
"the organism is a Streptococcus" and e is the information that "the organism
is a gram-positive coccus growing in chains." As discussed above, an expert
does not necessarily believe that C[mh,e] = 0.3. The evidence is said to be
supportive of the contention that the organism is a Streptococcus and can
therefore hardly also support the contention that the organism is not a
Streptococcus.
Since we believe that C[h,e] does not equal 1 - C[-nh,e], we recognize
that disconfirmation is somehowseparate from confirmation and must be
dealt with differently. As Harrd (1970) puts it, "we need an independently
introduced D-function, for disconfirmation, because, as we have already
noticed, to confirm something to ever so slight a degree is not to disconfirm
it at all, since the favorable evidence for some hypothesis gives no support
whatever to the contrary supposition in many cases." Our decision model
PhilosophicalBackground 245

must therefore reflect this distinction between confirmation and disconfir-


marion (i.e., confrmatory and disconfirmatory evidence).
The logic of confirmation has several other curious properties that
have puzzled philosophers of science (Salmon, 1973). Salmons earlier anal-
ysis on the confirmation of scientific hypotheses (Salmon, 1966) led to the
conclusion that the structure of such procedures is best expressed by Bayes
Theorem and a frequency interpretation of probability. Such an assertion
is appealing because, as Salmon expresses the point, "it is through this
interpretation, I believe, that we can keep our natural sciences empirical
and objective." However, our model is not offered as a solution to the
theoretical issues with which Salmon is centrally concerned. Wehave had
to abandon Bayes Theorem and the P-function simply because there are
large areas of expert knowledge and intuition that, although amenable in
theory to the frequency analysis of statistical probability, defy rigorous
analysis because of insufficient data and, in a practical sense, because ex-
perts resist expressing their reasoning processes in coherent probabilistic
terms.

11.3.3 Other Approaches

There are additional approaches to this problem area that bear mention-
ing, even though they are peripheral to confirmation and probability as
we have described them. One is the theory of fuzzy sets first proposed by
Zadeh (1965) and further developed by Goguen (1968). The theory
tempts to analyze and explain an ancient paradox paraphrased by Goguen
as follows:

If" you add one stone to a small heap, it remains small. A heap containing
one stone is small. Therefore (by induction) every heap is small.

The term fuzzy set refers to the analogy with set theory whereby, for
example, the set of tall people contains all 7-foot individuals but may or
may not contain a man who is 5 feet 10 inches tall. The "tallness" of a man
in that height range is subject to interpretation; i.e., the edge of the set is
fuzzy. Thus, membershipin a set is not binary-valued (true or false) but
expressed along a continuum from 0 to 1, where 0 means "not in the set,"
1 means "in the set," and 0.5 means "equally likely to be in or out of the
set." These numbers hint of statistical probability in much the same way
that degrees of confirmation do. However, like confirmation, the theory of
fuzzy sets leads to results that defy numerical manipulation in accordance
with the axioms of the P-function. Although an analogy between our di-
agnostic problem and fuzzy set theory can be made, the statement of di-
agnostic decision criteria in terms of set membership does not appear to
be a natural concept for the experts who must formulate our rules. Fur-
246 A Modelof Inexact Reasoningin Medicine

thermore, the quantification of Zadehs "linguistic variables" and the mech-


anisms for combining them are as yet poorly defined. Fuzzy sets have
therefore been mentioned here primarily as an example of another semi-
statistical field in whichclassic probability theory fails.
There is also a large body of literature discussing the theory of choice,
an approach to decision making that has been reviewed by Luce and
Suppes (1965). The theory deals with the way in which personal prefer-
ences and the possible outcomes of an action are considered by an indi-
vidual who must select among several alternatives. Tversky describes an
approach based on "elimination by aspects" (Tversky, 1972), a method
which alternatives are ruled out on the basis of either their undesirable
characteristics (aspects) or the desirable characteristics they lack. The the-
ory thus combines preference (utility) with a probabilistic approach. Shac-
kle suggests a similar approach (Shackle, 1952; 1955), but utilizes different
terminology and focuses on the field of economics. He describes "expec-
tation" as the act of "creating imaginary situations, of associating them with
named future dates, and of assigning to each of the hypotheses thus
formed a place on a scale measuring the degree of belief that a specified
course of action on our own part will make this hypothesis come true"
(Shackle, 1952). Selections among alternatives are made not only on the
basis of likely outcomes but also on the basis of uncertainty regarding
expected outcomes (hence his term the "logic of surprise").
Note that the theory of choice differs significantly from confirmation
theory in that the former considers selection among mutually exclusive
actions on the basis of their potential (future) outcomes and personal pref-
erences regarding those outcomes, whereas confirmation considers selec-
tion among mutually exclusive hypotheses on the basis of evidence ob-
served and interpreted in the present. Confirmation does not involve
personal utilities, although, as we have noted, interpretation of evidence
may differ widely on the basis of personal experience and knowledge. Thus
we would argue that the theory of choice might be appropriately applied
to the selection of therapy once a diagnosis is known, a problem area in
which personal preferences regarding possible outcomes clearly play an
important role, but that the formation of the diagnosis itself" more closely
parallels the kind of decision task that engendered the theory of confir-
mation.
Wereturn, then, to confirmation theory as the most useful way to think
about the medical decision-making problem that we have described. Swin-
burne suggests several criteria for choosing among the various confirma-
tion theories that have been proposed (Swinburne, 1970), but his reasons
are based more on theoretical considerations than on the pragmatics of"
our real-world application. We will therefore propose a technique that,
although it draws closely on the theory of confirmation described above,
is based on desiderata derived intuitively from the problem at hand and
not from a formal list of" acceptability criteria.
The Proposed Model of Evidential Strength 247

11.4 The Proposed Model of Evidential Strength

This section introduces our quantification scheme for modeling inexact


medical reasoning. It begins by defining the notation that we use and
describing the terminology. A formal definition of the quantification func-
tion is then presented. The remainder of the section discusses the char-
acteristics of the defined functions.
Although the proposed model has several similarities to a confirmation
function such as those mentioned above, we shall introduce new terms for
the measurement of evidential strength. This convention will allow us to
clarify from the outset that we seek only to devise a system that captures
enough of the flaw)r of" confirmation theory that it can be used for accom-
plishing our computer-based task. Wehave chosen belief and disbelief as our
units of measurement, but these terms should not be confused with their
formalisms from epistemology. The need for two measures was introduced
above in our discussion of a disconfirmation measure as an adjunct to a
measure for degree of confirmation. The notation will be as follows:

MB[h,e] = x means "the measure of increased belief in the hypothesis


h, based on the evidence e, is x"
MD[h,e] = y means "the measure of increased disbelief in the hypothesis
h, based on the evidence e, is y"

The evidence e need not be an observed event, but may be a hypothesis


(itself" subject to confirmation). Thus one may write MB[hl,h2] to indicate
the measure of increased belief" in the hypothesis hi given that the hypoth-
esis h,2 is true. Similarly MD[hl,hz]is the measure of increased disbelief in
hypothesis hi if" hypothesis h~ is true.
To illustrate in the context of the sample rule from MYCIN,consider
e = "the organism is a gram-positive coccus growing in chains" and h =
"the organism is a Streptococcus." Then MB[h,e] = 0.7 according to the
sample rule given us by the expert. The relationship of the number 0.7 to
probability will be explained as we proceed. For now, let us simply state
that the number 0.7 reflects the extent to which the experts belief that h
is true is increased by the knowledge that e is true. On the other hand,
MD[h,e] = 0 fi)r this example; i.e., the expert has no reason to increase
his or her disbelief in h on the basis of e.
In accordance with subjective probability theory, it may be argued that
the experts personal probability P(h) reflects his or her belief in h at any
given time. Thus 1 - P(h) can be viewed as an estimate of the experts
disbelief" regarding the truth of h. It" P(hIe) is greater than P(h), the obser-
vation of e increases the experts belief in h while decreasing his or her
248 A Modelof Inexact Reasoningin Medicine

disbelief regarding the truth of h. In fact, the proportionate decrease in


disbelief is given by the fi)llowing ratio:

p(hf~)- e(h)
1 - P(h)

This ratio is called the measure of increased belief" in h resulting from the
observation of e, i.e., MB[h,e].
Suppose, on the other hand, that P(h{e) were less than P(h). Then the
observation of e would decrease the experts belief in h while increasing his
or her disbelief regarding the truth of h. The proportionate decrease in
belief in this case is given by the following ratio:

P(h) - P(hle)
p(h)

Wecall this ratio the measure of increased disbelief in h resulting from the
observation of e, i.e., MD[h,e].
To summarize these results in words, we consider the measure of
increased belief, MB[h,e], to be the proportionate decrease in disbelief
regarding the hypothesis h that results tiom the observation e. Similarly,
the measure of increased disbelief, MD[h,e], is the proportionate decrease
in belief regarding the hypothesis h that results from the observation e,
where belief is estimated by P(h) at any given time and disbelief is estimated
by 1 - P(h). These definitions correspond closely to the intuitive concepts
of confirmation and disconfirmation that we have discussed above. Note
that since one piece of evidence cannot both favor and disfavor a single
hypothesis, when MB[h,e] > 0, MD[h,e] = 0, and when MD[h,e] > 0,
MB[h,e] = 0. Furthermore, when P(h]e) P(h), th e ev idence is ind ependent
of the. hypothesis (neither confirms nor disconfirms) and MB[h,e]
MD[h,e] = 0.
The above definitions may now be specified formally in terms of con-
ditional and a priori probabilities:

1 if P(h) =
MB[h,e] = max[P(hle),P(h)] -
otherwise
max[ 1,0] - P(h)
1 if P(h) =
MD[h,e] = min[P(hle),P(h)] -
otherwise
min[1,0] - P(h)

Examination of these expressions will reveal that they are identical to the
definitions introduced above. The tbrmal definition is introduced, how-
ever, to demonstrate the symmetry between the two measures. In addition,
we define a third measure, termed a certainty factor (CF), that combines the
MBand MDin accordance with the fblh)wing definition:
The Proposed Model of Evidential Strength 249

CF[h,e] = MB[h,e] - MD[h,e]

The certainty factor is an artifact for combining degrees of belief and


disbelief into a single number. Such a number is needed in order to facil-
itate comparisons of the evidential strength of competing hypotheses. The
use of this composite number will be described below in greater detail. The
following observations help to clarify the characteristics of the three mea-
sures that we have defined (MB, MD, CF):

Characteristics of the Belief Measures


1. Range of degrees:
a. 0 ~< MB[h,e] ~ 1
b. 0 ~< MD[h,e]<~ 1
c. -1 ~< CF[h,e] ~< +1
2. Evidential strength and mutually exclusive hypotheses:
Ifh is shownto be certain [P(hle) = 1]:
1 - P(h)
a. MB[h,e] ......... 1
1 - P(h)
b. MD[h,e] = 0
c. CF[h,e] = 1
If" the negation of h is shownto be certain [P(-lhle ) = 1]:
a. MB[h,e] = 0
0 - P(h)
b. MD[h,e] = = 1
0 - P(h)
c. CF[h,e] = - 1

Note that this gives MB[~h,e] = t if and only if MD[h,e] = 1 in accor-


dance with the definitions of MBand MDabove. Furthermore, the num-
ber 1 represents absolute belief (or disbelief) for MB(or MD).Thus
MB[hl,e] = 1 and hI and h,~ are mutually exclusive, MD[h2,e] 7= 1.

7There is a special case of Characteristic 2 that should be mentioned. This is the case of
logical truth or falsity where P(hle ) = 1 or P(h[e) = 0, regardless of e. Popper has also
suggested a quantilication scheme for confirmation (Popper, 1959) in which he uses - 1
C[h,e] <~ +1, defining his limits as:
- 1 = Club,h] <~ C[h,e] <~ C[h,h] = +1
This proposal led one observer (HarrY, 1970) to assert that Poppers numbering scheme
"obliges one to identify the truth of a self-contradiction with the falsity of a disconfirmed
general hypothesis and the truth of a tautology with the confirmation of a confirmed exis-
tential hypothesis, both of which are not only question begging but absurd." As we shall
demmlstrate, we awfid Poppers problem by introducing mechanisms for approaching cer-
tainty asymptotically as items of confirmatory evidence are discovered.
250 A Modelof Inexact Reasoningin Medicine

3. Lack of evidence:
a. MB[h,e] = 0 if h is not confirmed by e (i.e., e and h are independent
or e disconfirms h)
b. MD[h,e]= 0 if h is not disconfirmed by e (i.e., e and h are indepen-
dent or e confirms h)
c. CF[h,e] = 0 if e neither confirms nor disconfirms h (i.e., e and h are
independent)

Weare now in a position to examine Paradox 1, the experts concern


that although evidence may support a hypothesis with degree x, it does
not support the negation of the hypothesis with degree 1 - x. In terms of
our proposed model, this reduces to the assertion that, when e confirms h:

CF[h,e] + CF[--qh,e] :/= 1

This intuitive impression is verified by the following analysis for e


confirming h:

CF[--qh,e] = MB[~h,e] - MD[~h,e]

= 0 - P(-~hie) p(-Th)
- p(Th)
[1 - P(hle)] - [1 - P(h)] P(h) - P(hle)
1 - P(h) 1 - P(h)
CF[h,e] = MB[h,e] - MD[h,e]

_ P(hie) - P(h) 0
1 - P(h)

Thus

P(hie ) -) P(h) P(h) - P(hle


CF[h,e] + CF[~h,e] - +
1 - P(h) 1 - P(h)
=0

Clearly, this result occurs because (for any h and any e) MB[h,e]
MD[--qh,e]. This conclusion is intuitively appealing since it states that evi-
dence that supports a hypothesis disfavors the negation of the hypothesis
to an equal extent.
Wenoted earlier that experts are often willing to state degrees of belief
in terms of conditional probabilities but they refuse to follow the assertions
to their logical conclusions (e.g., Paradox 1 above). It is perhaps revealing
to note, therefore, that whenthe a priori belief in a hypothesis is small (i.e.,
The Proposed Model of Evidential Strength 251

P(h) is close to zero), the CF of a hypothesis confirmed by evidence is


approximately equal to its conditional probability on that evidence:

P(hle) - P(h)
CF[h,e] = MB[h,e] - MD[h,e] - 0) ~ P(hle
1 - P(h)

whereas, as shownabove, CF[-qh, e] = -P(hle ) in this case. This observation


suggests that confirmation, to the extent that it is adequately represented
by CFs, is close to conditional probability (in certain cases), although it still
defies analysis as a probability measure.
Webelieve, then, that the proposed model is a plausible representation
for the numbers an expert gives when asked to quantify the strength of
his or her judgmental rules. The expert gives a positive number (CF > 0)
if the hypothesis is confirmed by observed evidence, suggests a negative
number (CF < 0) if the evidence lends credence to the negation of the
hypothesis, and says there is no evidence at all (CF = 0) if the observation
is independent of the hypothesis under consideration. The CF combines
knowledgeof both P(h) and P(hle ). Since the expert often has trouble stat-
ing P(h) and P(hle) in quantitative terms, there is reason to believe that a
CF that weights both the numbers into a single measure is actually a more
natural intuitive concept (e.g., "I dont knowwhat the probability is that
all ravens are black, but I do know that every time you show me an addi-
tional black raven mybelief is increased by x that all ravens are black.").
If" we therefore accept CFs rather than probabilities from experts, it
is natural to ask under what conditions the physicians behavior based on
CFs is irrational. 8 Weknowfrom probability theory, for example, that if
there are n mutually exclusive hypotheses hi, at least one of which must be
true, then Y." P(hile) = 1 for all e. In the case of certainty factors, we can
also show that there are limits on the sums of CFs of mutually exclusive
hypotheses. Judgmental rules acquired from experts must respect these
limits or else the rules will reflect irrational quantitative assignments.
Sumsof CFs of" mutually exclusive hypotheses have two limits--a lower
limit fbr disconfirmed hypotheses and an upper limit for confirmed
hypotheses. The lower limit is the obvious value that results because
CF[h,e] 1> - 1 and because more than one hypothesis may have CF = - 1.
Note first that a single piece of evidence may absolutely disconfirm several
of the competing hypotheses. For example, if there are n colors in the
universe and Ci is the ith color, then ARC/ may be used as an informal
notation to denote the hypothesis that all ravens have color Ci. If we add
the hypothesis ARC0that some ravens have different colors from others,
we know Y.{~ P(ARCi) = 1. Consider now the observation e that there is
raven of color C,,. This single observation allows us to conclude that
CF[ARCi,e] = -1 for 1 ~<i~< n- 1. Thus, sincethesen- 1 hypotheses

SWeassert that behavior is irrational if actions taken or decisions madecontradict the result
that would be obtained t, nder a probabilistic analysis of the behavior.
252 A Modelof Inexact Reasoningin Medicine

are absolutely disconfirmed by the observation e, Y.f-I CF[ARCi,e] =


-(n - 1). This analysis leads to the general statement that, if k mutually
exclusive hypotheses hi are disconfirmed by an observation e:

~k1 CF[hi,e] ~ -k [for hi disconfirmed by e]

In the colored raven example, the observation of a raven with color


C,, still left two hypotheses in contention, namely ARC,, and ARC0.What,
then, are CF[ARCn,e], CF[ARC0,e], and the sum of CF[ARC,,,e] and
CF[ARC0,e]?It can be shown that, if k mutually exclusive hypotheses hi are
confirmed by an observation e, the sum of their CFs does not have an
upper limit of k but rather:

~k CF[hi,e] ~ 1 [for hi confirmed by e]

In fact, Ek CF[hi,e] is equal to 1 if and only if k = 1 and e implies h1 with


certainty, but the sum can get arbitrarily close to 1 for small k and large n.
The analyses that lead to these conclusions are available elsewhere (Short-
liffe, 1974).
The last result allows us to analyze critically new decision rules given
by experts. Suppose, for example, we are given the following rules:
CF[hl,e] = 0.7 and CF[h2,e] = 0.4, where hi is "the organism is a Strepto-
coccus," h2 is "the organism is a Staphylococcus," and e is "the organism is a
gram-positive coccus growing in chains." Since hI and h2 are mutually exclu-
sive, the observation that 5". CF[hi,e] > 1 tells us that the suggested certainty
factors are inappropriate. The expert must either adjust the weightings,
or we must normalize them so that their sum does not exceed 1. Because
behavior based on these rules would be irrational, we must change the
rules.

11.5The Model as an Approximation Technique

Certainty factors provide a useful way to think about confirmation and the
quantification of degrees of belief. However, we have not yet described
how the CF model can be usefully applied to the medical diagnosis prob-
lem. The remainder of this chapter will explain conventions that we have
introduced in order to use the certainty factor model. Our starting as-
sumption is that the numbers given us by experts who are asked to quantify
their degree of belief in decision criteria are adequate approximations to
the numbers that would be calculated in accordance with the definitions
of MBand MDif the requisite probabilities were known.
When we discussed Bayes Theorem earlier, we explained that we
would like to devise a method that allows us to approximate the value for
P(dile) solely from the P(di]sk), wheredi is the ith possible diagnosis, sk is the
The Modelas an Approximation
Technique 253

kth clinical observation, and e is the composite of all the observed sk. This
goal can be rephrased in terms of certainty factors as follows:

Suppose that MB[di,sk] is knownfor each Sk, MD[di, Sk] is knownfor


each Sh, and e represents the conjunction of all the sk. Then our goal
is to calculate CF[di,e ] from the MBsand MDsknown for the indi-
vidual sks.

Suppose that e = sl & s9 and that e confirms di. Then:

CF[di,e] = MB[di,e] - 0 = P(dile) - P(di)


1 - P(d~)

P(di[s1 ~: s2) - P(di)


1 -- P(di)

There is no exact representation of CF[di,s 1 & $2] purely in terms of


CF[di,sl] and CF[di,s2]; the relationship of sl to s2, within di and all other
diagnoses, needs to be knownin order to calculate P(di[s 1 & s2). Further-
more, the CF scheme adds one complexity not present with Bayes Theo-
rem because we are forced to keep MBs and MDs isolated from one an-
other. Suppose sl confirms di (MB> 0) but s9 disconfirms di (MD> 0).
Then consider CF[di,sl &s,2]. In this case, CF[di,sl & $2] must reflect both
the disconfirming nature of s 2 and the confirming nature of sl. Although
these measures are reflected in the componentCFs (it is intuitive in this
case, for example, that CF[di,s2] ~ CF[di,s 1 & s2] ~< CF[di,Sl]), we shall
demonstrate that it is important to handle component MBs and MDs
separately in order to preserve commutativity (see Item 3 of the list of
defining criteria below). Wehave therefore developed an approximation
technique for handling the net evidential strength of incrementally ac-
quired observations. The combining convention must satisfy the following
criteria (where e + represents all confirming evidence acquired to date, and
e- represents all disconfirming evidence acquired to date):

Defining Criteria
1. Limits:
a. MB[h,e+] increases toward 1 as confirming evidence is found,
equaling 1 if and only if a piece of evidence logically implies h with
certainty

b. MD[h,e-] increases toward 1 as disconfirming evidence is found,


equaling I if" and only if a piece of evidence logically implies ~h
with certainty

c. CF[h,e-] ~< CF[h,e- & e+] ~< CF[h,e+]


254 A Modelof Inexact Reasoningin Medicine

These criteria reflect our desire to have the measure of belief


approach certainty asymptotically as partially confirming evidence is
acquired, and to have the measure of disbelief approach certainty
asymptotically as partially disconfirming evidence is acquired.
o Absolute confirmation or disconfirmation:
a. If MB[h,e+] = 1, then MD[h,e-] = 0 regardless of the
disconfirming evidence in e- ; i.e., CF[h,e + ] = 1
b. If MD[h,e-] = 1, then MB[h,e+] = 0 regardless of the
confirming evidence in e + ; i.e., CF[h,e- ] = - 1
c. The case where MB[h,e+] = MD[h,e-] = 1 is contradictory and
hence the CF is undefined

Commutativity:
.

Ifsl & s2 indicates an ordered observation of evidence, first sI and then


$2;

a. MB[h,sl & s2] = MB[h,s2 & sl]


b. MD[h,sl & s2] = MD[h,s2 & sl]
c. CF[h,sI & s2] = CF[h,s2 & st]
The order in which pieces of evidence are discovered should not affect
the level of belief or disbelief in a hypothesis. These criteria assure that
the order of discovery will not matter.

Missing information:
.

If s? denotes a piece of potential evidence, the truth or falsity of which


is unknown:
a. MB[h,sl & s?] = MB[h,Sl]
b. MD[h,sl & s?] = MD[h,sl]
c. CF[h,s1 & s?] = CF[h,sl]
The decision model should function by simply disregarding rules of the
form CF[h,s2] = x if the truth or falsity of s 2 cannot be determined.

A number of observations follow from these criteria. For example,


Items 1 and 2 indicate that the MBof a hypothesis never decreases unless
its MDgoes to 1. Similarly, the MDnever decreases unless the MBgoes to
1. As evidence is acquired sequentially, both the MBand MDmay become
nonzero. Thus CF = MB- MDis an important indicator of the net belief
in a hypothesis in light of current evidence. Furthermore, a certainty factor
of zero may indicate either the absence of both confirming and disconfirm-
The Model as an Approximation Technique 255

ing evidence (MB = MD= 0) or the observation of pieces of evidence


that are equally confirming and disconfirming (MB= MD, where each is
nonzero). Negative CFs indicate that there is more reason to disbelieve the
hypothesis than to believe it. Positive CFs indicate that the hypothesis is
more strongly confirmed than disconfirmed.
It is important also to note that, if e = e+ &e-, then CF[h,e] repre-
sents the certainty factor for a complex new rule that could be given us by
an expert. CF[h,e], however, would be a highly specific rule customized for
the few patients satisfying all the conditions specified in e + and e-. Since
the expert gives us only the component rules, we seek to devise a mecha-
nism whereby a calculated cumulative CF[h,e], based on MB[h,e+] and
MD[h,e-], gives a number close to the CF[h,e] that would be calculated if
all the necessary conditional probabilities were known.
The first of the following four combiningfunctions satisfies the criteria
that we have outlined. The other three functions are necessary conventions
for implementation of the model.

Combining Functions

1. Incrementally acquired evidence:

0 if MD[h,sl & $2] = 1


MB[h,sl & s2] = MB[h,sl] + MB[h,s2](1 - MB[h, sl]) otherwise

0 if MB[h, Sl & S2] = 1


MD[h,sl & s2] = MD[h,Sl] + MD[h,s,e](1 - MD[h,sl]) otherwise

2. Conjunctions of hypotheses:
MB[hz& hz,e] = min(MB[hl,e], MB[h2,e])
MD[hl & h2,e] = max(MD[hl,e], MD[hz,e])

3. Disjunctions of" hypotheses:


MB[hl or h2,e] = max(MB[hl,e], MB[hz,e])
MD[hl or hz,e ] = min(MD[hl,e], MD[hz,e])

4. Strength of evidence:
If" the truth or falsity of" a piece of evidence sl is not knownwith cer-
tainty, but a CF (based on prior evidence e) is known reflecting the
degree of belief in sl, then if MB[h,sl] and MD[h, sx] are the degrees
256 A Modelof Inexact Reasoningin Medicine

of belief and disbelief in h when sI is knownto be true with certainty


(i.e., these are the decision rules acquired from the expert) then the
actual degrees of belief and disbelief are given by:

MB[h,sl] = MB[h,sl] max(0, CF[sl,e])


MD[h,sl] = MD[h,Sl] max(0, CF[sl,e])

This criterion relates to our previous statement that evidence in favor


of a hypothesis may itself be a hypothesis subject to confirmation. Sup-
pose, for instance, you are in a darkened room when testing the gen-
eralization that all ravens are black. Then the observation of a raven
that you think is black, but that may be navy blue or purple, is less
strong evidence in favor of the hypothesis that all ravens are black than
if the sampled raven were known with certainty to be black. Here the
hypothesis being tested is "all ravens are black," and the evidence is
itself a hypothesis, namelythe uncertain observation "this raven is black."

Combining Function 1 simply states that, since an MB(or MD)rep-


resents a proportionate decrease in disbelief (or belief), the MB(or
of a newly acquired piece of evidence should be applied proportionately
to the disbelief (or belief) still remaining. CombiningFunction 2a indicates
that the measure of belief in the conjunction of two hypotheses is only as
good as the belief in the hypothesis that is believed less strongly, whereas
Combining Function 2b indicates that the measure of disbelief in such a
conjunction is as strong as the disbelief in the most strongly disconfirmed.
Combining Function 3 yields complementary results for disjunctions of
hypotheses. The corresponding CFs are merely calculated using the def-
inition CF = MB- MD. Readers are left to satisfy themselves that Com-
9bining Function 1 satisfies the defining criteria.
Combining Functions 2 and 3 are needed in the use of Combining
Function 4. Consider, for example, a rule such as:

CF[h, st & s 2 ~ (s 3 OF $4) ] = X

Then, by Combining Function 4:

CF[h,sl & s2 &(s3 or s4)] = x max(0,CF[s1 &s 2 &


) (sa or s4),el
= x max(0,MB[s1 &
] s2 & (s3 or s4),e
)- MD[sl& s2 &(s3 or s4),e]

ONote that MB[h,s?] = MD[h,s?] = 0 when examining Criterion 4.


The Model as an Approximation Technique 257

Thus we use Combining Functions 2 and 3 to calculate:

MB[sl & s 2 & (s3 or Sq),e] = min(MB[sl,e], MB[sz,e], MB[s~or sq,e])

= min(MB[sl,e], MB[sz,e],
max(MB[s3,e], MB[s4,e]))

MD[sl&sz &(s3 or s4),e ] is calculated similarly.


An analysis of CombiningFunction 1 in light of the probabilistic def-
initions of" MBand MDdoes not prove to be particularly enlightening. The
assumptions implicit in this function include more than an acceptance of
the independence of sl and s 2. The function was conceived purely on
intuitive grounds in that it satisfied the four defining criteria listed. How-
ever, some obvious problems are present. For example, the function always
causes the MBor MDto increase, regardless of the relationship between
new and prior evidence. Yet Salmon has discussed an example from sub-
particle physics (Salmon, 1973) in which either of two observations taken
alone confirms a given hypothesis, but their conjunction disproves the hy-
pothesis absolutely! Our model assumes the absence of such aberrant sit-
uations in the field of application for which it is designed. The problem
of fbrmulating a more general quantitative system for measuring confir-
mation is well recognized and referred to by Harr6 (1970): "The syntax
of confirmation has nothing to do with the logic of probability in the nu-
merical sense, and it seems very doubtful if any single, general notion of
confirmation can be found which can be used in all or even most scientific
contexts." Although we have suggested that perhaps there is a numerical
relationship between confirmation and probability, we agree that the chal-
lenge for a confirmation quantification scheme is to demonstrate its use-
fulness within a given context, preferably without sacrificing human in-
tuition regarding what the quantitative nature of confirmation should be.
Our challenge with Combining Function 1, then, is to demonstrate
that it is a close enough approximation for our purposes. Wehave at-
tempted to do so in two ways. First, we have implemented the function as
part of" the MYCIN system (Section 11.6) and have demonstrated that the
technique models the conclusions of the expert from whomthe rules were
acquired. Second, we have written a program that allows us to compare
CFs computed both from simulated real data and by using Combining
Function 1. Our notation for the following discussion will be as follows:

CF*[h,e] = the computed CF using the definition of CF from Section 11.4


(i.e., "perfect knowledge" since P(hle ) and P(h) are known)
CF[h,e] = the computed CF using Combining Function 1 and the known
MBsand MDsfor each Sk where e is the composite of the sks
(i.e., P(hle) not known,but P(hlSk) and P(h) knownfor calculation
of MB[h,Sk] and MD[h,sk]
)
258 A Modelof Inexact Reasoningin Medicine

~" 1.0 CFt[hE]


/
/
0.8 /
/
0.6 /
o.4 e //

-0, o/ ..

-1.0 -0.8 -0.6 -0.4 -0.2 / I wg 0.2 0.4 0.6 0.8 1.0
0
-o.2 c1:[..,:]
/e
45 -0.4
/
/
-0.6
/

/ -0.8
/
/ -1.0

FIGURE11-1 Chart demonstrating the degree of agreement


between CF and CF*for a sample data base. CF is an approxi-
mationof CF*. The terms are defined in the text.

The program was run on sample data simulating several hundred patients.
The question to be asked was whether CF[h,e] is a good approximation to
CF*[h,e]. Figure 11-1 is a graph summarizing our results. For the vast
majority of cases, the approximation does not produce a CF[h,e] radically
different from the true CF*[h,e]. In general, the discrepancy is greatest
when Combining Function 1 has been applied several times (i.e., several
pieces of evidence have been combined). The most aberrant points, how-
ever, are those that represent cases in which pieces of evidence were
strongly interrelated for the hypothesis under consideration (termed con-
MYCINs
Use of the Model 259

ditional nonindependence).This result is expected because it reflects precisely


the issue that makes it difficult to use Bayes Theoremfor our purposes.
Thus we should emphasize that we have not avoided many of the
problems inherent with the use of Bayes Theorem in its exact form. We
have introduced a new quantification scheme, which, although it makes
many assumptions similar to those made by subjective Bayesian analysis,
permits us to use criteria as rules and to manipulate them to the advantages
described earlier. In particular, the quantification schemeallows us to con-
sider confirmation separately from probability and thus to overcome some
of the inherent problems that accompany an attempt to put judgmental
knowledge into a probabilistic format. Just as Bayesians who use their
theory wisely must insist that events be chosen so that they are independent
(unless the requisite conditional probabilities are known), we must insist
that dependent pieces of evidence be grouped into single rather than mul-
tiple rules. As Edwards (1972) has pointed out, a similar strategy must
used by Bayesians who are unable to acquire all the necessary data:

An approximation technique is the one nowmost commonlyused. It is


simply to combine conditionally non-independent symptomsinto one grand
symptom,and obtain [quantitative] estimates for that larger more complex
symptom.

The system therefore becomes unworkable for applications in which


large numbers of observations must be grouped in the premise of a single
rule in order to ensure independence of the decision criteria. In addition,
we must recognize logical subsumption when examining or acquiring rules
and thus avoid counting evidence more than once. For example, if sl im-
plies s,), then CF[h,sI & $2] = CF[h, si] regardless of the value of CF[h, s2].
Function 1 does not "know" this. Rules must therefore be acquired and
utilized with care. The justification for our approach therefore rests not
with a claim of" improving on Bayes Theorem but rather with the devel-
opment of a mechanism whereby judgmental knowledge can be efficiently
represented and utilized for the modeling of medical decision making,
especially in contexts where (a) statistical data are lacking, (b) inverse prob-
abilities are not known, and (c) conditional independence can be assumed
in most cases.

11.6MYCINs Use of the Model

Formal quantification of the probabilities associated with medical decision


making can become so frustrating that some investigators have looked for
ways to dispense with probabilistic information altogether (Ledley, 1973).
Diagnosis is not a deterministic process, however, and we believe that it
260 A Modelof Inexact Reasoningin Medicine

should be possible to develop a quantification technique that approximates


probability and Bayesian analysis and that is appropriate for use in those
cases where formal analysis is difficult to achieve. The certainty factor
model that we have introduced is such a scheme. The MYCINprogram
uses certainty factors to accumulate evidence and to decide on likely iden-
tities for organisms causing disease in patients with bacterial infections. A
therapeutic regimen is then determined--one that is appropriate to cover
for the organisms requiring therapy.
MYCINremembers the alternate hypotheses that are confirmed or
disconfirmed by the rules for inferring an organisms identity. With each
hypothesis is stored its MBand MD,both of which are initially zero. When
a rule ~br inferring identity is found to be true for the patient under
consideration, the action portion of the rule allows either the MBor the
MDof the relevant hypothesis to be updated using Combining Function
1. Whenall applicable rules have been executed, the final CF may be
calculated, for each hypothesis, using the definition CF = MB- MD.
These alternate hypotheses may then be compared on the basis of their
cumulative certainty factors. Hypotheses that are most highly confirmed
thus become the basis of the programs therapeutic recommendation.
Suppose, for example, that the hypothesis hI that the organism is a
Streptococcus has been confirmed by a single rule with a CF = 0.3. Then,
if e represents all evidence to date, MB[hl,e] = 0.3 and MD[hl,e] = 0. If
a new rule is now encountered that has CF = 0.2 in support of hi, and if
e is updated to include the evidence in the premise of the rule, we now
have MB[hl,e] = 0.44 and MD[hl,e] = 0. Suppose a final rule is encoun-
tered for which CF --- -0.1. Then if e is once again updated to include
all current evidence, we use Function 1 to obtain MB[hl,e] = 0.44 and
MD[hl,e] = 0.1. If no further system knowledge allows conclusions to be
maderegarding the possibility that the organism is a Streptococcus, we cal-
culate a final result, CF[hl,e] = 0.44 - 0.1 = 0.34. This number becomes
the basis for comparison between hi and all the other possible hypotheses
regarding the identity of the organism.
It should be emphasized that this same mechanism is used for evalu-
ating all knowledge about the patient, not just the identity of pathogens.
Whena user answers a system-generated question, the associated certainty
factor is assumed to be + 1 unless he or she explicitly modifies the response
with a CF (multiplied by ten) enclosed in parentheses. Thus, for example,
the following interaction might occur (MYCINsquestion is in lower-case
letters):
14)Didtheorganism
grow
in clumps,
chains,
orpairs?
** CHAINS
(6) PAIRS
(3) CLUMPS

This capability allows the system automatically to incorporate the users


uncertainties into its decision processes. A rule that referenced the growth
conformation of the organism would in this case find:
MYCINsUse of the Model 261

MB[chains,e] = 0.6 MD[chains,e] = 0


MB[pairs,e] = 0.3 MD[pairs,e] = 0
MB[clumps,e] = 0 MD[clumps,e] = 0.8

Consider, then, the sample rule:

CF[hl,sl & s 2 8~ 53] = 0.7

where hi is the hypothesis that the organism is a Streptococcus, Sl is the


observation that the organism is gram-positive, s2 3that it is a coccus, and s
that it grows in chains. Suppose gram stain and morphology were known
to the user with certainty, so that MYCIN has recorded:

CF[sl,e] = 1 CF[sz,e] = 1

In the case above, however, MYCINwould find that

CF[chains,e] = CF[s3,e ] = 0.6 - 0 = 0.6

Thus it is no longer appropriate to use the rule in question with its full
confirmatory strength of 0.7. That CF was assigned by the expert on the
assumption that all three conditions in the premise would be true with
certainty. The modified CF is calculated using Combining Function 4:

CF[hl,sa & Sz & s3] = MB[hl,sl & s 2 & s3] - MD[hl,Sl & s2 & s3]
= 0.7 max(0, CF[sl & s2 & s3,e]) -

Calculating CF[sl & s z & s3,e] using Combining Function 2 gives:

CF[hl,Sl & s2 & s3] = (0.7) (0.6)


= 0.42 - 0

i.e., MB[hl,sl & s2 & s3] = 0.42


and MD[hl,Sl & s2 & s3] = 0

Thus the strength of the rule is reduced to reflect the uncertainty re-
garding s3. Combining Function 1 is now used to combine 0.42 (i.e.,
MB[hl,sI & s,~ & s3]) with the previous MBfor the hypothesis that the
organism is a Streptococcus.
We have shown that the numbers thus calculated are approximations
at best. Henceit is not justifiable simply to accept as correct the hypothesis
with the highest CF after all relevant rules have been tried. Therapy is
therefore chosen to cover for all identities of organisms that account for a
sufficiently high proportion of the possible hypotheses on the basis of their
262 A Model of Inexact Reasoning in Medicine

CFs. This is accomplished by ordering them from highest to lowest and


selecting all those on the list until the sum of their CFs exceeds z (where
z is equal to 0.9 times the sum of the CFs for all confirmed hypotheses).
This ad hoc technique therefore uses a semiquantitative approach in order
to attain a comparative goal.
Finally, it should be noted that our definition of CFs allows us to
validate those of our rules for which frequency data becomeavailable. This
would become increasingly important if the program becomes a working
took in the clinical setting where it can actually be used to gather the sta-
tistical data needed for its ownvalidation. Otherwise, validation necessarily
involves the comments of recognized infectious disease experts who are
asked to evaluate the programs decisions and advice. Evaluations of" MY-
CIN have shown that the program can give advice similar to that suggested
by infectious disease experts (see Part Ten). Studies such as these have
allowed us to gain confidence that the certainty factor approach is robust
enough for use in a decision-making domain such as antimicrobial selec-
tion.
12
Probabilistic Reasoning and
Certainty Factors

J. Barclay Adams

The. development of automated assistance for medical diagnosis and de-


cision makingis an area of both theoretical and practical interest. Of meth-
ods for utilizing evidence to select diagnoses or decisions, probability the-
ory has the firmest appeal. Probability theory in the form of Bayes
Theorem has been used by a number of" workers (Ross, 1972). Notable
among recent developments are those of de Dombal and coworkers (de
Dombal, 1973; de Dombal et al., 1974; 1975) and Pipberger and coworkers
(Pipberger et al., 1975). The usefulness of Bayes Theorem is limited
practical difficulties, principally the lack of data adequate to estimate ac-
curately the a priori and conditional probabilities used in the theorem. One
attempt to mitigate this problem has been to assume statistical indepen-
dence among various pieces of evidence. Howseriously this approximation
affects results is often unclear, and correction mechanisms have been ex-
plored (Ross, 1972; Norusis and Jacquez, 1975a; 1975b). Even the in-
dependence assumption requires an unmanageable number of estimates
of" probabilities for most applications with realistic complexity. To circum-
vent this problem, some have tried to elicit estimates of probabilities di-
rectly from experienced physicians (Gorry, 1973; Ginsberg, 1971; Gustaf-
son et al., 1971), while others have turned from the use of Bayes Theorem
and probability theory to the use of" discriminant analysis (Ross, 1972) and
nonprobabilistic methods (Scheinok and Rinaldo, 1971; Cumberbatch and
Heaps, 1973; Cumberbatch et al., 1974; Glesser and Collen, 1972).
Shortliffe and Buchanan (1975) have offered a model of inexact rea-
soning in medicine used in the MYCINsystem (Chapter 11). Their model

This chapter is a shortened and edited version of a paper appearing in MathematicalBiosciences


32: 177-186 (1976). Copyright 1976 by Mathematical Biosciences. All rights reserved. Used
with pernfission.

263
264 Probabilistic Reasoning and Certainty Factors

uses estimates provided by expert physicians that reflect the tendency of a


piece of evidence to prove o1" disprove a given hypothesis. Because of the
highly promising nature of the MYCINsystem, this model deserves ex-
amination. Shortliffe and Buchanan conceived their system purely on in-
tuitive grounds and assert that it is an ahernative to probability theory. I
shall show below that a substantial part of this model can be derived from
and is equivalent to probability theory with the assumption of statistical
independence. In Section 12.1 I first review a simple probability model
and discuss some of its limitations.

12.1A Simple Probability Model

Consider a finite population of n members. Members of" the population


may possess one or more of several properties that define subpopulations
or sets. Properties of interest might be el or e2, which might be evidence
for or against a disease, and h, a certain disease state or other hypothesis
about an individual. The number of individuals with a certain property,
say e, will be denoted n(e), and
1 the number with both of two properties e
and e 2 will be denoted n(el & e2). Probabilities are taken as ratios of num-
bers of individuals. From the observation that:

n(e & h) n n(e & h) n


n(e) n(h) n(h) n(e)

a convenient tbrm of Bayes Theorem follows immediately:

P(hle) P(elh)
P(h) P(e)

Nowconsider the case in which two pieces of evidence e 1 and e 2 bear on


a hypothesis or disease state h. Let us make the assumptions that these
pieces of evidence are independent both in the population as a whole and
in the subpopulation with h; that is:

n(el & e2) n( eO n( e2)


n
(1)

and

n(el & e,e & h) n(e I & h) n(e 2 & h)


(2)
n(h) n(h) n(h)
A Simple Probability Model 265

or

P(e! & e2) = P(el)P(e2)

and

P(el & e21h) = P(ellh)P(ezlh) (4)

With these the right-hand side of Bayes Theorem becomes

P(el & e21h) P(ellh) P(ezlh)


= ---- (5)
P(el &e,2) P(el) P(e2)

and, because of this factoring, the right-hand side is computationally sim-


ple.
Now,because of the dearth of empirical data to estimate probabilities,
suppose we were to ask experts to estimate the probabilities subjectively.
Wecould ask for estimates of the ratios P(ei[h)/P(ei) and P(h), and from
these compute P(hle i & e2 & ... & e,,). The ratios P(eilh)/P(ei) must be in
the range [O,1/P(h)]. Most physicians are not accustomed to thinking of
diseases and evidence in terms of probability ratios. They would more
willingly attempt to quantitate their intuition by first deciding whether a
piece of evidence tends to prove or disprove a hypothesis and then assign-
mg a parameter on a scale of 0 to 1 0 as a measure of the weight or strength
of" the evidence. One way to translate this parameterization into an "esti-
mate" of a probability ratio is the following. Divide the intuitive parameter
by 10, yielding a new parameter, which for evidence favoring the hypoth-
esis will be called MB,the physicians measure of belief, and for evidence
against the hypothesis will be called MD,the physicians measure of disbe-
lief. Both MBand MDare in the range [0,1] and have the value 0 when
the evidence has no bearing on the hypothesis. The value 1 for MB[h,e]
means that all individuals with e have h. The value 1 for MD[h,e] means
that no individual with e has h. From these physician-estimated parameters
we derive the corresponding probability ratios in the following way. For
evidence against the hypothesis we simply take

P(elh)
- 1 - MD[h,e] (6)
P(e)

For evidence favoring the hypothesis we use a similar construct by taking


the evidence as against the negation of the hypothesis, i.e., by considering
the subpopulation of individuals who do not have h, denoted -nh. So we
construct the ratio of probabilities using MB:

P(el-nh
) - 1 - MB[h,e] (7)
P(e)
266 ProbabilisticReasoning
andCertaintyFactors

Now, to continue the parallel, we write Bayes Theorem for two pieces of"
evidence favoring a hypothesis:

P(el &e21-nh)P(e1 & e2)


(8)
P(-Th) P(e1 & e2)

with

P(el &e21h) P(e,l-Th) P(e2[-nh) (9)


P(e1 &e,2) P(el) P(e2)

where, for the last equality, independence of el and e2 in --7h is assumed.


By using the identities

P(h) + p(-Th) (lO)


P(hle) + P(-Thle) (11)

one then has a computationally simple way of serially adjusting the prob-
ability of a hypothesis with new evidence against the hypothesis:

P(hle") =P(eilh) P(i, (12)


P(ei)

or new evidence favoring the hypothesis:

P(hle") = 1 - P(eil~h) [1- P(hle)] (13)


P(ei)

where e i is the new evidence, e" is the total evidence after the introduction
of ei, and e is the evidence before the new evidence is introduced [note
that P(hle)=P(h ) before any evidence is introduced]. Alternatively, one
could combine all elements of evidence against a hypothesis simply by
using independence as in Equation (5) and separately combine all elements
of" evidence favoring a hypothesis by using Equation (9), and then use
Equations (12) and (13) once.
The attractive computational simplicity of this scheme is vitiated by
the restrictive nature of" the independence assumptions made in deriving
it. The MBs and MDs for different pieces of evidence cannot be chosen
arbitrarily and independently. This can be clearly seen in the following
simple theorem. If el and e 2 are independent both in the whole population
and in the subpopulation with property h, then

P(hlel)P(hle2) = P(hlel &e2)P(h) (14)


The MYCIN
Model 267

Tiffs follows from dividing Equation (2) by Equation (1). The nature
restrictions placed on the probabilities can be seen from the limiting case
in which all membersof el are in h. In that case, P(hlel) = P(hle 1 & e2)
1, so P(hle,e) = P(h); that is, if" somepiece of evidence is absolutely diagnostic
of an illness, then any evidence that is independent can have no diagnostic
value. This special case of the theorem was noted in a paper of Warner et
al. (1961). Restrictions this forces on the MBscan be further demonstrated
by the following example. Wewrite Bayes Theorem with the independence
assumption as follows:

P(e,lh) P(e21h) P(h]el & e2)


= P(h)
(15)
P(eO P(e,2)

Consider the case of two pieces of evidence that favor the hypothesis. Using
Equations (6), (10), and (11), one can express P(elh)/P(e ) in terms of MB
as follows:

P(e[h)- 1 + 1 1) MB[h,e] (16)


P(e) P(h)

Using this form and the fact that P(h[eI & e2) ~ 1, we get from Equation
(15)

1 + P(h)l 1 MB[h, el] 1 + MB[he2] <~ P(h)

This is not satisfied fbr all values of the MBs; e.g., if P(h) = 1/11 and
MB[h,el] = 0.7, then we must choose the narrow range MB[h,e2] ~< 0.035
to satisfy the inequality. Most workers in this field assume that elements of
evidence are statistically independent only within each of a complete set
of" mutually exclusive subpopulations and not in the population as a whole;
thus the properties of (14) and (15) do not hold. Occasionally, writers
implicitly made the stronger assumption of independence in the whole
space (Slovic et al., 1971).

12.2 The MYCIN Model

The model developed by Shortliffe and Buchanan is in part equivalent to


that in Section 12.1. They introduce quantities MB[h,e] and MD[h,e], which
are identical to those we have defined above (and were the reason for
selecting our choice of parameterization). They postulate rules for com-
268 ProbabilisticReasoning
andCertaintyFactors

bining MB[h,el] with MB[h,e2] to yield MB[h,q & e2] and similar rules for
MD. With one exception discussed below, these rules need not be postu-
lated because they are equivalent to, and can be derived from, the method
of combining probability ratios under the assumption of independence
used in the previous section. For example, the rule for MDsis derived as
follows by using Equation (5):

P(e, &e2]h) P(qlh) . P(e21h)


1 - MD[h,q & e2] = = (18)
P(el &eg) P(el) P(e,2)

or

1 - MD[h,eI & e~] = (1 - MD[h,q])(1


) - MD[h,e,2] (19)

which is an algebraic rearrangement of the rule postulated in their paper.


A similar construct holds for MB. The exceptional case in the MYCIN
model is one in which a piece of evidence proves a hypothesis (all with el
have h). As noted in the previous section, this case excludes the possibility
of other independent diagnostically meaningful evidence. In the MYCIN
model, if e proves h, then one sets MDequal to zero for the combined
evidence. A similar assumption is introduced for the case that evidence
disproves a hypothesis. To maintain internal consistency the MBsand MDs
must be subject to the restrictions discussed in Section 12.1. This important
fact is not noted in the work of Shortliffe and Buchanan.
Twoother properties are assumed for the MBsand MDsby Shortliffe
and Buchanan. The extent or importance of the use of these assumptions
in the employment of their model is not clear, but does not seem great.
One concerns the conjunction of hypotheses hi and h2, for which they
assume

MB[h1 & h2,e] = min(MB[hl,e],MB[h2,e]) (20)


MD[hl & h2,e] = max(MD[hl,e],MD[h2,e]) (21)

Unstated are strong restrictive assumptions about the relationship of hi


and h2. As ;in extreme example, suppose that hl and h9 are mutually ex-
clusive; then the conjunction hi &hz is false (has probability zero) no matter
what the evidence, and the assumptions on the conjunction of hypotheses
would be unreasonable. In the context of the probability model of Section
12.1, one can derive a relationship

P(h, &he[e) = P(h,le) P(hele)


(22)
P(hl & h,)) P(hl) P(h2)

only by making strong assumptions on the independence of hI and hu.


The MYCIN Model 269

A pair of further assumptions made by Shortliffe and Buchanan con-


cerns the disjunction of" two hypotheses, denoted h1 ~/hz. These are

MB[hl x/h2,e ] = max(MB[ht,e],MB[h2,e]) (23)


MD[hl ~/h2,e] = min(MD[hbe],MD[h2,e]) (24)

Again these contain unstated assumptions about the relationship of hl and


h~. If, for example, hl and h,) are mutually exclusive and each has a prob-
ability of being true, then the disjunction h1 ~/h2 should be more likely or
probable or confirmed than either hi or hz. Expressions for P(elhl x~ h2)/
P(e) can be derived in probability theory, but they have no compact or
perspicuous form.
The MYCINmodel combines separately all evidence favoring a
hypothesis to give MB[h,@, where eI. = ell &8f2 8.... ~: efn, the intersection
of all elements of evidence favoring hypothesis h. Similarly, all elements
against a hypothesis are combined to give MD[h,ea]. By Bayes Theorem
these provide measures of P(hlel)/P(h ) and P(hlea)/P(h ). These could be
combined using the probability theory outlined in Section 12.1 to give
P(hle l & e,,)/P(h), an estimate of the change of the probability due to the
evidence. However, it is at this point that the MYCIN model departs from
standard probability theory. Shortliffe and Buchanan combine the MBwith
the MDby defining a certainty factor to be

CF[h, el &ca] = MB[h,ef] - MD[h,ea] (25)

The certainty factor is used in two ways. One is to rank hypotheses to select
those for further action. The other is as a weighting factor for the credi-
bility of a hypothesis h, which is supposed by an intermediate hypothesis
i, which in turn is supported by evidence e. The appropriateness of CF for
each of these roles will be examined.
One of the uses of CF is to rank hypotheses. Because CF[h,e] does not
correspond to the probability ofh given e, it is not difficult to give examples
in which, of two hypotheses, the one with the lower probability would have
the higher certainty factor, or CE For example, consider two hypotheses
hI and h,) and some body of evidence e that tends to confirm both
hypotheses. Suppose that the a priori probabilities were such that P(hl)
P(h2) and P(hlle) > P(h21e); it is possible that CF[hl,e] < CF[h2,e]. For exam-
ple, if P(hl) = 0.8, P(h2) = 0.2, P(hlle ) = 0.9, P(h21e ) = 0.8, then
CF[hl,e] = 0.5 and CF[h2,e] = 0.75. This failure to rank according to
probabilities is an undesirable feature of CEIt would be possible to avoid
it if it were assumedthat all a priori probabilities were equal.
The weighting role for CF is suggested by the intuitive notion that in
a chain of reasoning, if e implies i with probability P(ile), and i, if true,
implies h with probability P(hli ), then

P(hle) = P(hli)P(ile) (26)


270 ProbabilisticReasoning
andCertaintyFactors

This is not true in general; however, a set of assumptions can be identified


under which it will be true. Suppose the population with property h is
contained in the set with i, and the set with i is contained in the set with
e. This may be expressed as

n(h & i) = n(h) n(i &e) = n(i) n(h & e) = n(h) (27)

These allow us to write

n(h & e) = n(h & i) n(i (28)


n(e) n(i) n

which is the desired result in numerical form. The proposal of Shortliffe


and Buchanan, which may be written as

MB[h,e] = MB[h,i]max(0, CF[i,e]) (29)


MD[h,e] = MD[h,i]max(0, CF[i,e]) (30)

is not true in general under the assumptions of (27) or any other natural
set, as may be demonstrated by substitution into these relationships of the
definitions of MB, MD, and CF.

12.3 Conclusions
The simple model of Section 12.1 is attractive because it is computationally
simple and apparently lends itself to convenient estimation of parameters
by experts. The weakness of the system is the inobvious interdependence
restriction placed on the estimation of parameters by the assumptions of
independence. The MYCIN model is equivalent in part to the simple prob-
ability model presented and suffers from the same subtle restrictions on
parameter estimation if it is to remain internally consistent.
The ultimate measure of success in models of medical reasoning of
this sort, which attempt to mimic physicians, is the closeness of their ap-
proach to perfect imitation of experts in the field. The empirical success
of MYCINusing the model of Shortliffe and Buchanan stands in spite of
theoretical objections of the types discussed in the preceding sections. It is
probable that the model does not flmnder on the difficulties pointed out
because in actual use the chains of reasoning are short and the hypotheses
simple. However, there are many fields in which, because of its shortcom-
ings, this model could not enjoy comparable success.
The fact that in trying to create an alternative to probability theory or
reasoning Shortliffe and Buchanan duplicated the use of standard theory
Conclusions 271

demonstrates tile difticulty of creating a useful and internally consistent


system that is not isomorphic to a portion of probability theory. In pro-
posing such a system, a careful delineation of its relationship to conven-
tional probability theory can contribute to an understanding and clear
exposition of its assumptions and approximations. It thereby allows tests
of whether these are satisfied in the proposed field of use.
13
The Dempster-Shafer
Theory of Evidence

Jean Gordon and Edward H. Shortliffe

The drawbacks of pure probabilistic methods and of the certainty factor


model have led us in recent years to consider alternate approaches. Par-
ticularly appealing is the mathematical theory of evidence developed by
Arthur Dempster. Weare convinced it merits careful study and interpre-
tation in the context of expert systems. This theory was first set forth by
Dempster in the 1960s and subsequently extended by Glenn Sharer. In
1976, the year after the first description of CFs appeared, Shafer published
A Mathematical Theory of Evidence (Shafer, 1976). Its relevance to the issues
addressed in the CF model was not immediately recognized, but recently
researchers have begun to investigate applications of the theory to expert
systems (Barnett, 1981; Friedman, 1981; Garvey et al., 1981).
We believe that the advantage of the Dempster-Shafer theory over
previous approaches is its ability to model the narrowing of the hypothesis
set with the accumulation of evidence, a process that characterizes diag-
nostic reasoning in medicine and expert reasoning in general. An expert
uses evidence that, instead of bearing on a single hypothesis in the original
hypothesis set, often bears on a larger subset of this set. The functions and
combining rule of the Dempster-Shafer theory are well suited to represent
this type of evidence and its aggregation.
For example, in the search for the identity of an infecting organism,
a smear showing gram-negative organisms narrows the hypothesis set of
all possible organisms to a proper subset. This subset can also be thought
of as a new hypothesis: the organism is one of the gram-negative orga-
nisms. However, this piece of evidence gives no information concerning
the relative likelihoods of the organisms in the subset. Bayesians might
assume equal priors and distribute the weight of this evidence equally
among the gram-negative organisms, but, as Shafer points out, they would
thus fail to distinguish between uncertainty, or lack of" knowledge, and

272
Basics of the Dempster-Shafer Theory 273

equal certainty. Because he attributes belief to subsets, as well as to indi-


vidual elements of the hypothesis set, we believe that Shafer more accu-
rately reflects the evidence-gathering process.
A second distinct piece of evidence, such as morphology of the orga-
nism, narrows the original hypothesis set to a different subset. Howdoes
the Dempster-Shafer theory pool these two pieces of evidence? Each is
represented by a belief function, and two belief functions are merged via
a combination rule to yield a new function. The combination rule, like the
Bayesian and CF combining functions, is independent of the order in
which evidence is gathered and requires that the hypotheses under con-
sideration be mutually exclusive and exhaustive. In fact, the Dempster-
Shafer combination rule includes the Bayesian and CF functions as special
cases.
Another consequence of the generality of the Dempster-Shafer belief
functions is avoidance of the Bayesian restriction that commitmentof belief
to a hypothesis implies commitmentof the remaining belief to its negation,
i.e., that P(h) = 1 - P(~ h). The concept that, in manysituations, evidence
partially in favor of a hypothesis should not be construed as evidence
partially against the same hypothesis (i.e., in favor of its negation) was one
of the desiderata in the development of the CF model, as discussed in
Chapter 11. As in the CF model, the beliefs in each hypothesis in the
original set need not sum to 1 but may sum to a number less than or equal
to 1 ; someof the belief can be allotted to subsets of the original hypothesis
set.
Thus the Dempster-Shafer model includes many of the features of the
CF model but is based on a firm mathematical foundation. This is a clear
advantage over the ad hoc nature of CFs. In the next sections, we motivate
the exposition of the theory with a medical example and then discuss the
relevance of the theory to MYCIN.

13.1Basics of the Dempster-Shafer Theory

13.1.1 A Simple Example of Medical Reasoning

Suppose a physician is considering a case of cholestatic jaundice for which


there is a diagnostic hypothesis set of hepatitis (hep), cirrhosis (cirr), gall-
stone (gall) and pancreatic cancer (pan). There are, of course, more
four causes of jaundice, but we have simplified the example here for illus-
trative purposes. In the Dempster-Shafer theory, this set is called a frame
of discernment, denoted O. As noted earlier, the hypotheses in O are as-
sumed mutually exclusive and exhaustive.
One piece of evidence considered by the physician might support the
diagnosis of intrahepatic cholestasis, which is defined for this example as
274 TheDempster-Shafer
Theoryof Evidence

{hep, cirr, gall, pan}

{hep, cirr, gall} {hep, cirr, pan} {hep, gall, pan} {cirr, gall, pan}

{hep, cirr} {hep, gall} {cirr, gall} {hep, pan} {cirr, pan} {gall, pan}

{hep} {ci rr} {gall} {pan}

FIGURE
13-1 The subsets of the set of causes of cholestasis.

the two-element subset of 0 {hep, cirr}, also represented by the hypothesis


HEP-OR-CIRR.Similarly, the hypothesis extrahepatic cholestasis corre-
sponds to {gall, pan}. Evidence confirming intrahepatic cholestasis to some
degree will cause the physician to allot a proportional amount of belief to
that subset.
A new piece of evidence might help the physician exclude hepatitis to
some degree. Evidence disconfirming hepatitis (HEP) is equivalent to evi-
dence confirming the hypothesis NOT-HEEwhich corresponds to the hy-
pothesis CIRR-OR-GALL-OR-PAN or the subset {cirr, gall, pan}. Thus
evidence disconfirming hepatitis to some degree will cause the physician
to allot a proportional amount of belief to this three-element subset.
As illustrated above, a subset of hypotheses in O gives rise to a new
hypothesis, which is equivalent to the disjunction of the hypotheses in the
subset. Each hypothesis in O corresponds to a one-element subset (called
a singleton). By considering all possible subsets of O, denoted 2, the set of
hypotheses to which belief can be allotted is enlarged. Henceforth, we use
the term hypothesis in this enlarged sense to denote any subset of the orig-
inal hypotheses in O.
A pictorial representation of 2 is given in Figure 13-1. Note that a
set of size n has 2n subsets. (The emptyset, Q), is one of these subsets, but
corresponds to a hypothesis known to be false and is not shown in Figure
13-1.
In a given domain, only some subsets in 2 will be of diagnostic inter-
est. Evidence often bears on certain disease categories as well as on specific
disease entities. In the case of cholestatic jaundice, evidence available to
Basics of the Dempster-Shafer
Theory 275

Cholestatic Jaundice

Intrahepatic Cholestasis Extrahepatic Cholestasis

{hep} {cirr} {gall} {pan}

FIGURE 13-2 The subsets of clinical interest in cholestatic


jaundice.

the physician tends to support either intrahepatic cholestasis, extra-


hepatic cholestasis, or the singleton hypotheses. The tree of Figure
13-1 can thus be pruned to that of Figure 13-2, which summarizes the
hierarchical relations of clinical interest. In at least one medical artificial
intelligence system, the causes of jaundice have been usefully structured
in this way for the diagnostic task (Chandrasekharan et al., 1979).

13.1.2 Basic Probability Assignments

The Dempster-Shafer theory uses a number in the range [0,1] to indicate


belief in a hypothesis given a piece of evidence. This number is the degree
to which the evidence supports the hypothesis. Recall that evidence against
a hypothesis is regarded as evidence for the negation of the hypothesis.
Thus, unlike the CF model, the Dempster-Shafer model avoids the use of
negative numbers.
The impact of each distinct piece of evidence on the subsets of O is
represented by a function called a basic probability assignment (bpa). A bpa
is a generalization of the traditional probability density function; the latter
assigns a number in the range [0,1] to every singleton of O such that the
numbers sum to 1. Using 2, the enlarged domain of all subsets of O, a
bpa denoted m assigns a number in [0,1] to every subset of O such that
the numbers sum to 1. (By definition, the number 0 must be assigned to
the empty set, since this set corresponds to a false hypothesis. It is false
because the hypotheses in O are assumed exhaustive.) Thus m allows assign-
ment of a quantity of belief to every element in the tree of Figure
13-1, not just to those elements on the bottom row, as is the case for a
probability density function.
The quantity m(A) is a measure of that portion of the total belief com-
mitted exactly to A, where A is an element of 2 and the total belief is 1.
This portion of" belief cannot be further subdivided among the subsets of
A and does not include portions of belief committed to subsets of A. Since
276 TheDempster-Shafer
Theoryof Evidence

belief in a subset certainly entails belief" in subsets containing that subset


(i.e., nodes "higher" in the network of Figure 13-1), it would be useful
define a function that computes a total amount of belief in A. This quantity
would include not only belief committed exactly to A but belief committed
to all subsets of A. Such a function, called a belief function, is defined in the
next section.
The quantity m(O) is a measure of that portion of the total belief that
remains unassigned after commitment of belief to various proper subsets
of O. For example, evidence favoring a single subset A need not say any-
thing about belief in the other subsets. If re(A) =s and massigns no belief
to other subsets of O, then re(O)= 1 - s. Thus the remaining belief
assigned to O and not to the negation of the hypothesis (equivalent to c,
the set-theoretic complement of A), as would be required in the Bayesian
model.

Examples

Example 1. Suppose that there is no evidence concerning the specific


diagnosis in a patient with known cholestatic jaundice. The bpa repre-
senting ignorance, called the vacuous bpa, assigns 1 to O = {hep, cirr, gall,
pan} and 0 to every other subset of O. Bayesians might attempt to represent
ignorance by a function assigning 0.25 to each singleton, assuming no prior
information. As remarked before, such a function would imply more in-
formation given by the evidence than is truly the case.

Example 2. Suppose that the evidence supports, or confirms, the diag-


nosis of intrahepatic cholestasis to the degree 0.6, but does not support
a choice between cirrhosis and hepatitis. The remaining belief, 1 - 0.6 =
0.4, is assigned to O. The hypothesis corresponding to O is known to
be true under the assumption of exhaustiveness. Bayesians would
assign the remaining belief to extrahepatic cholestasis, the negation of"
intrahepatic cholestasis. Such an assignment would be an example of
Paradox 1, discussed in Chapter 11. Thus m({hep, cirr})=0.6,
m(O) = m({hep, cirr, gall, pan}) = 0.4, and the value of m for every other
subset of O is 0.

Example 3. Suppose that the evidence disconfirms the diagnosis of


hepatitis to the degree 0.7. This is equivalent to confirming that of NOT-
HEPto the degree 0.7. Thus m({cirr, gall, pan})= 0.7, re(O)= 0.3, and
value of m for every other subset of O is 0.

Example 4. Suppose that the evidence confirms the diagnosis of hep-


atitis to the degree 0.8. Then m({hep})= 0.8, m(O)=0.2, and m is 0 else-
where.
Basics of the Dempster-Shafer
Theory 277

13.1.3 Belief Functions

A belief function, denoted Bel, corresponding to a specific bpa, m, assigns


to every subset A of O the sum of the beliefs committed exactly to every
subset of A by m. For example,

Bel({hep, cirr, pan})= m({hep, cirr, pan}) + m({hep, cirr})


+ m({hep, pan}) + m({cirr, pan})
+ m({hep}) + m({cirr}) + m({pan})

Thus, Bel(A) is a measure of the total amount of belief in A and not of the
amount committed precisely to A by the evidence giving rise to m.
Referring to Figure 13-1, Bel and m are equal for singletons, but
Bel(A), where A is any other subset of O, is the sum of the values of m for
every subset in the subtree formed by using A as the root. Bel(O) is always
equal to 1 since Bel(O) is the sum of the values of mfor every subset of
This sum must be 1 by definition of a bpa. Clearly, the total amount of
belief in O should be equal to the total amount of belief, 1, since the
singletons are exhaustive.
To illustrate, the belief function corresponding to the bpa of Example
2 is given by Bel(O)= 1, Bel(A)= 0.6, where A is any proper subset
containing {hep, cirr}, and the value of Bel for every other subset of O is
0.

13.1.4 Combination of Belief Functions

As discussed in Chapter 11, the evidence-gathering process in medical


diagnosis requires a method for combining the support for a hypothesis,
or for its negation, based on multiple, accumulated observations. The
Dempster-Shafer model also recognizes this requirement and provides a
formal proposal for its management. Given two belief functions, based on
two observations, but with the same frame of discernment, Dempsters
combination rule, shown below, computes a new belief function that rep-
resents the impact of the combined evidence.
Concerning the validity of this rule, Sharer (1976) writes that although
he can provide "no conclusive a priori argument,.., it does seem to reflect
the pooling of evidence." In the special case of a frame of discernment
containing two elements, Dempsters rule can be found in Johann Heinrich
Lamberts book, Neues Organon, published in 1764. In another special case
where the two bpas give support to exactly one and the same hypothesis,
the rule reduces to that found in the MYCINCF model and in Ars Con-
jectandi, the work of the mathematician Jakob Bernoulli in 1713.
The Dempster combination rule differs from the MYCINcombining
function in the pooling of evidence supporting mutually exclusive hy-
potheses. For example, evidence supporting hepatitis reduces belief in each
278 TheDempster-Shafer
Theoryof Evidence

of the singleton hypotheses--CIRR, GALL, and PAN--and in any dis-


junction not containing HER e.g., CIRR-OR-GALL-OR-PAN,NOT-HER
CIRR-OR-PAN,etc. As we discuss later, if the Dempster-Shafer model
were adapted for use in MYCIN,each new piece of evidence would have
a wider impact on other hypotheses than it does in the CF model. The
Dempster combination rule also gives rise to a very different result re-
garding belief in a hypothesis when confirming and disconfirming evi-
dence is pooled.
Let Belt and Bel2 and ml and m2 denote two belief functions and their
respective bpas. Dempsters rule computes a new bpa, denoted into m2,
which represents the combined effect of ml and m,). The corresponding
belief function, denoted BellOBel,), is then easily computed from m1)
by the definition of a belief function.
If we sum all products of the form ml(X)m2(Y), where X and Y run
over all subsets of O, the result is 1 by elementary algebra and the definition
of a bpa:

~;~ml(X)m2(Y) = ~]mt(X) ~~m2(Y) = 1 1 = 1

The bpa representing the combination of mt and m2 apportions this num-


ber 1, the total amount of belief, among the subsets of O by assigning
mt(X)m2(Y) to the intersection of X and Y. Note that there are typically
several different subsets of O whose intersection equals that of X and Y.
Thus, for every subset A of O, Dempsters rule defines mlO m2(A) to be
the sum of all products of the form mt(X)m2(Y), where X and Y run over
all subsets whose intersection is A. The commutativity of" multiplication
ensures that the rule yields the same value regardless of the order in which
the functions are combined. This is an important property since evidence
aggregation should be independent of the order of its gathering. The
following two examples illustrate the combination rule.

Example 5. As in Examples 2 and 3, suppose that for a given patient


one observation supports intrahepatic cholestasis to degree 0.6 (m0
whereas another disconfirms hepatitis (i.e., confirms {cirr, gall, pan})to
degree 0.7 (m2). Then our net belief based on both observations is given
by into m2. For computational purposes, an "intersection tableau" with
values of mt and m2 along the rows and columns, respectively, is a helpful
device. Only nonzero values of m l and m2 need be considered, since if
mr(X) and/or m2(Y) is 0, then the product mt(X)m2(Y) contributes 0 to
mtG m2(A), where A is the intersection of X and Y. Entry i,j in the tableau
is the intersection of the subsets in row i and column j. Clearly, some of
these entries may be the same subset. The product of the bpa values is in
parentheses next to the subset. The value of mlO m2(A) is computed
summingall products in the tableau adjacent to A.
Basics of the Dempster-Shafer Theory 279

m2

I
{cirr, gall, pan} (0.7) 0 (0.3)

{hep, cirr} (0.6) {cirr} (0.42) {hep, cirr} (0.18)


0 (0.4) {cirr, gall, pan} (0.28) 0 (0.12)

In this example, a subset appears only once in the tableau and mlOm2 is
easily computed:

mlOm2({cirr}) = 0.42
mlOm,)({hep, cirr}) = 0.18
mlGmz({cirr, gall, pan}) = 0.28
mlO m2(O)=0.12
mlOm2is 0 for all other subsets of O

Since BeliGBelz is fairly complex, we give only a few sample values:

BellOBel2({hep, cirr}) mlO m2({hep, cirr}) + mlG m2({hep})


+ mlOmz({cirr})
0.18 + 0 + 0.42
0.60

BellOBel2({cirr, gall, pan}) mlo m2({cirr, gall, pan})


+ mlGmz({cirr, gall})
+ m10 mz({cirr, pan})
+ m10 mz({gall, pan}) + mlO m2({cirr})
+ mlO mz({gall}) + mlO m2({pan})
0.28 + 0 + 0 + 0 + 0.42 + 0 + 0
0.70

BeliGBel,)({hep, cirr, pan})= BellOBe12({hep, cirr}) = 0.60

since

ml(~ m2({hep, cirr, pan}) = mlG m2({hep, pan}) = mlo m2({cirr, pan})

In this example, the reader should note that mlOmz satisfies the def-
inition of a bpa: "Z ml~ m2(X) = 1, where X runs over all subsets of O and
mxGm,)(O)= 0. Equation (1) shows that the first condition in the definition
is always fulfilled. However, the second condition is problematic in cases
where the "intersection tableau" contains null entries. This situation did
not occur in Example 5 because every two sets with nonzero bpa values
always had at least one element in common. In general, nonzero products
280 TheDempster-Shafer
Theoryof Evidence

of the form ml(X)m2(Y) may be assigned when X and Y have an empty


intersection.
Dempster deals with this problem by normalizing the assigned values
so that mlff~ m2(O) = 0 and all values of the new bpa lie between 0 and
This is accomplished by defining K as the sum of" all nonzero values as-
signed to Q) in a given case (K = 0 in Example5). Dempster then assigns
to ml~ m2(Q~)and divides all other values of mlOm2 by 1- 1

Example 6. Suppose now that, for the same patient as in Example 5,


a third observation (ms) confirms the diagnosis of hepatitis to the degree
0.8 (cf. Example 4). We now need to compute m3~ m4, where m4=m I +m
2
of Example 5.
I?L! ~ Ill 10 ]ll
2
{cirr}(0.42) {hep, cirr} (0.18) {cirr, gall, pall} (0.28) O(0.12)

{hep}(0.8) Q3(0.336) {hep} (0.144) Q (0.224) {hep}(0.096)


m3
O(0.2) {cirr} (0.084) {hep,cirr} (0.036){cirr, gall, pan}(0.056)O(11.1t24)

In this example, there are two null entries in the tableau, one assigned
the value 0.336 and the other 0.224. Thus

K = 0.336 + 0.224 = 0.56 and 1 - K = 0.44


m3Gm4({hep}) = (0.144 + 0.096)/0.44 = 0.545
m3~m4({cirr}) = 0.084/0.44 = 0.191
m3ff~m4({hep,cirr}) = 0.036/0.44 = 0.082
m3Gm4({cirr, gall, pan}) = 0.056/0.44 = 0.127
m3Gm4(O) = 0.024/0.44 = 0.055
m~Om4 is 0 for all other subsets of O

Note that Y,m~m4(X)=1, as is required by the definition of a bpa.

13.1.5 Belief Intervals

After all bpas with the same frame of discernment have been combined
and the belief function Bel defined by this new bpa has been computed,
how should the information given by Bel be used? Bel(A) gives the total

INotethat the revisedvalueswill still sumto 1 andhencesatisfy that conditionin the defi-
nition ofa bpa. Ifa+b+c= 1 then (a+b)/(l-c)= 1 and a/(l-c) + b/(l-c)= 1.
Basics of the Dempster-Shafer
Theory 281

amount of belief" committed to the subset A after all evidence bearing on


A has been pooled. However, the function Bel contains additional infor-
mation about A, namely, Bel(AC), the extent to which the evidence supports
the negation of A, i.e., Ac. The quantity 1 - Bel(A~) expresses the plausibility
of" A, i.e., the extent to which the evidence allows one to fail to doubt A.
The information contained in Bel concerning a given subset A may be
conveniently expressed by the interval

[Bel(A) 1 - Bel(AC)]

It is not difficult to see that the right endpoint is always greater than the
left: 1-Bel(A~) i> Bel(A) or, equivalently, Bel(A) + Bel(A~) ~< 1. Since
BeI(A) and Bel(A~) c,are the sum of all values of m for subsets of A and A
respectively, and since A and Ac have no subsets in common, Bel(A)
BeI(A~) ~< ~Lm(X)= 1 where X ranges over all subsets of O.
In the Bayesian situation, in which Bel(A) + Bel(A~) = 1, the two
endpoints of the belief interval are equal and the width of the interval
1 - BeI(A~) - Bel(A) is 0. In the Dempster-Shafer model, however, the
width is usually not 0 and is a measure of the belief that, although not
committedto A, is also not committedto Ac. It is easily seen that the width
is the sum of belief committed exactly to subsets of @that intersect A but
that are not subsets ofA. IfA is a singleton, all such subsets are supersets
of A, but this is not true for a nonsingleton A. To illustrate, let A = {hep}:

1 - BeI(Ac) - BeI(A) 1 - Bel({cirr, gall, pan}) - Bel({hep})


1 - [m({cirr, gall, pan}) + m({cirr, gall})
+ m({cirr, pan}) + m({gall, pan}) + m({cirr})
+ m({gall}) + m({pan})] - m({hep})
m({hep, cirr}) + m({hep, gall})
+ m({hep, pan}) + m({hep, cirr, gall})
+ m({hep, cirr, pan})
+ m({hep, gall, pan}) + m(O)

Belief committed to a superset of {hep} might, on further refinement


of the evidence, result in belief committed to {hep}. Thus the width of the
belief" interval is a measureof that portion of the total belief, l, that could
be added to that commitedto {hep} by a physician willing to ignore all but
the disconfirming effects of the evidence.
The width of a belief interval can also be regarded as the amount of
uncertainty with respect to a hypothesis, given the evidence. It is belief
that is committed by the evidence to neither the hypothesis nor the ne-
gation of the hypothesis. The vacuous belief function results in width 1
for all belief intervals, and Bayesian functions result in width 0. Most evi-
dence leads to belief functions with intervals of varying widths, where the
widths are numbers between 0 and 1.
282 The Dempster-Shafer
Theoryof Evidence

13.2The Dempster-Shafer Theory and MYCIN

MYCIN is well suited for implementation of the Dempster-Shafer theory.


First, mutual exclusivity of singletons in a frame of discernment is satisfied
by the sets of hypotheses in MYCIN constituting the frames of discernment
(single-valued parameters; see Chapter 5). This condition may be a stum-
bling block to the models implementation in other expert systems where
mutual exclusivity cannot be assumed. Second, the belief functions that
represent evidence in MYCINare of a particularly simple form and thus
reduce the combination rule to an easily managed computational scheme.
Third, the variables and functions already used to define CFs can be
adapted and modified for belief function values. These features will now
be discussed and illustrated with examples from MYCIN.It should be
noted that we have not yet implemented the model in MYCIN.

13.2.1 Frames of Discernment in MYCIN

How should the frames of" discernment in MYCINbe chosen? Shafer


(1976, p. 36) points out:

It should not be thought that the possibilities that comprise O will be


determined and meaningful independently of our knowledge. Quite to the
contrary: O will acquire its meaning from what we knowor think we know;
the distinctions that it embodieswill be embeddedwithin the matrix of our
language and its associated conceptual structures and will dependon those
structures for whatever accuracy and meaningfulnessthey possess.

The "conceptual structures" in MYCINare the associative triples


found in the conclusions of" the rules, which have the form (object attribute
value). 2 Such a triple gives rise to a singleton hypothesis of the form "the
attribute of object is value." A frame of discernment would then consist of
all triples with the same object and attribute. Thus the number of triples,
or hypotheses in O, will equal the number of possible values that the object
may assume for the attribute in question. The theory requires that these
values be mutually exclusive, as they are for single-valued parameters in
MYCIN.
For example, one frame of discernment is generated by the set of all
triples of the form (Organism-1 Identity X), where X ranges over all possible
identities of organisms knownto MYCIN---Klebsiella, E. coli, Pseudomonas,
etc. Another frame is generated by replacing Organism-1 with Organism-2.
A third frame is the set of all triples of the form (Organism-1 Morphology

2Alsoreferred to as (contextparametervalue); see Chapter

282
The Dempster-Shafer Theory and MYCIN 283

X), where X ranges over all known morphologies--coccus, rod, bacillus,


3etc.
Although it is true that a patient may be infected by more than one
organism, ()rganisms are represented as separate contexts in MYCIN(not
as separate values of the same parameter). Thus MYCINsrepresentation
scheme is particularly well suited to the mutual exclusivity demand of the
Dempster-Shafer theory. Manyother expert systems meet this demand less
easily. Consider, for example, how the theory might be applicable in a
system that gathers and pools evidence concerning the identity of a pa-
tients disease. Then there is often the problem of multiple, coexistent
diseases; i.e., the hypt)theses in the frame of discernment may not be mu-
tually exclusive. One way to overcome this difficulty is to choose O to be
the set of all subsets of all possible diseases. The computational implications
of" this choice are harrowing, since if there are 600 possible diseases (the
approximate scope t)f the INTERNISTknowledge base), then

]OI = 26oo and 12] = 22600

,
However, since the evidence may actually focus on a small subset of 2
the computations need not be intractable. A second, more reasonable al-
ternative would be to apply the Dempster-Shafer theory after partitioning
the set of" diseases into groups of mutually exclusive diseases and consid-
ering each group as a separate frame of discernment. The latter approach
would be similar to that used in INTERNIST-1(Miller et al., 1982), where
scoring and comparison of hypotheses are undertaken only after a special
partitioning algorithm has separated evoked hypotheses into subsets of
mutually exclusive diagnoses.

13.2.2 Rules as Belief Functions

In the nqost general situation, a given piece of evidence supports many of


the subsets of O, each to varying degrees. The simplest situation is that in
which the evidence supports only one subset to a certain degree and the
remaining belief is assigned to O. Because of the modular way in which
knowledge is captured and encoded in MYCIN,this latter situation applies
in the case of MYCINrules.
If the premises confirm the conclusion of a rule with degree s, where
s is above threshold value, then the rules effect on belief in the subsets of

"~The objection may be raised that in somecases all triples with the same object and attribute
are not mutually exclusive. For example, both (Patient-1 Allergy Penicillin) and (Patient-1
Allergy Ampicillin) may be true. In MYCIN,however, these triples tend not to have partial
degrees of belief associated with them; they are usually true-false propositions ascertained
by simple questioning of the user by the system. Thus it is seldom necessary to combine
evidence regarding these multi-valued parameters (see Chapter 5), and these hypotheses need
not be t,eated by the Dempster-Shafer theory.
284 The Dempster-Shafer
Theoryof Evidence

O can be represented by a bpa. This bpa assigns s to the singleton corre-


sponding to the hypothesis in the conclusion of" the rule, call it A, and
assigns 1 -s to 0. In the language of" MYCIN,the CF associated with this
conclusion is s. If the premise disconfirms the conclusion with degree s,
then the bpa assigns s to the subset corresponding to the negation of" the
conclusion, Ac, and assigns 1-s to t9. The CF associated with this con-
clusion is -s. Thus, we are arguing that the CFs associated with rules
in MYCINand other EMYCINsystems can be viewed as bpas in the
Dempster-Shafer sense and need not be changed in order to implement
and test the Dempster-Shafer model.

13.2.3 Types of Evidence Combination in MYCIN

The revised quantification scheme we propose for modeling inexact infer-


ence in MYCINis the replacement of the previous CF combining function
with the Dempster combination rule applied to belief functions arising
from the triggering of domain rules. The combination of such functions
is computationally simple, especially when compared to that of two general
belief functions.
To illustrate, we consider a frame of" discernment, O, consisting of all
associative triples of the form (Organism-1 Identity X), where X ranges
over all identities of organisms known to MYCIN.The triggering of two
rules affecting belief in these triples can be categorized in one of the three
following ways.

Category 1. Two rules are both confirming or both disconfirming of


the same triple, or conclusion. For example, both rules confirm Pseudomonas
(Pseu), one to degree 0.4 and the other to degree 0.7. The effect of trig-
gering the rules is represented by bpas ml and m2, where ml({Pseu})= 0.4,
ml(O) = 0.6, and m~({Pseu})= 0.7, m2(19)=0.3. The combined effect on
lief is given by mlo m~, computed using the following tableau:

m2
{Pseu} (0.7) 0 (0.3)

{Pseu}(0.4) {Pseu} (0.28) {Pseu} (0.12)


19(0.6) {Pseu} (0.42) O (0. l 8)

Note that K=0 in this example, so no normalization is required (i.e.,


l-K= 1).

mlGmz({Pseu})= 0.28 + 0.12 + 0.42 = 0.82


ml(~m2(O ) = 0.18
The Dempster-Shafer Theory and MYCIN 285

Note that mlOm2 is a bpa that, like ml and mg, assigns some belief to a
certain subset of O, {Pseu}, and the remaining belief to O. For two con-
firming rules, the subset is a singleton; for disconfirming rules, the subset
is a set of size n- 1, where n is the size of 0.

This category demonstrates that the original MYCINCF combining


function is a special case of the Dempster function (MYCINwould also
combine 0.4 and 0.7 to get 0.82). From earlier definitions, it can easily be
shown, using the Dempster-Shafer model to derive a new bpa correspond-
ing to the combination of two CFs of the same sign, that

rnl~ m2(A)si s2 + Sl(1 -s 2) + s2(1 -S l) wh si=rrt i(A), i = 1, 2


= Sl + s2(1-Sl)
= s9 + sl(1-s,2)
= 1 - (l-sl)(1-s2)
= 1 - m10 m2(O)

Category 2. One rule is confirming and the other disconfirming of the


same singleton hypothesis. For example, one rule confirms {Pseu} to degree
0.4, and the other disconfirms {Pseu} to degree 0.8. The effect of triggering
these two rules is represented by bpas ml and m3, where ml is defined in
the example from Category 1 and m3({Pseu}c) = 0.8, m3(O ) = 0.2. The com-
bined effect on belief is given by mi@m3.

m3
{Pseu}c (0.8) 0(0.2)

{Pseu} (0.4) O (0.32) {Pseu} (0.08)


ml
0 (0.6) {Pseu}c (0.48) O (0.12)

Here K = 0.32 and 1 - K = 0.68.

ml@m3({Pseu}) = 0.08/0.68 = 0.1"18


ml@m3({Pseu}) = 0.48/0.68 = 0.706
ml@m,3(O) = 0.12/0.68 = 0.176
ml@m3 is 0 for all other subsets of O

Given ml above, the belief interval of {Pseu} is initially [Bell({Pseu})


l-Bell({Pseu})] = [0.4 1]. After combination with m3, it becomes
[0.118 0.294]. Similarly, given m3alone, the belief interval of {Pseu} is
[0 0.2]. After combination with ml, it becomes [0.118 0.294].

As is illustrated in this category of evidence aggregation, an essential


aspect of the Dempster combination rule is the reducing effect of evidence
286 The Dempster-Shafer
Theoryof Evidence

supporting a subset of O on belief in subsets disjoint from this subset. Thus


evidence confrming {Pseu}: will reduce the effect of evidence confirming
{Pseu}; in this case the degree of support for {Pseu}, 0.4, is reduced to
0.118. Conversely, evidence confirming {Pseu} will reduce the effect of
evidence confirming {Pseu}C; 0.8 is reduced to 0.706. These two effects are
reflected in the modification of the belief interval of {Pseu} from [0.4 1]
to [0.118 0.294], where 0.294 = 1 - Bel({Pseu} c) = 1 - 0.706.
If A ={Pseu}, sl =ml(A), and s~=m3(A%we can examine this modifi-
cation of belief quantitatively:

mlO m:~(A) = Sl(1-s:0/(1-sls:O where K=S1S3

mlO c) = s!~( 1 - s 1)/( 1 - s is3)


m3(A
ml(~ m:~(O)= (1 -sl)(1 -s:0/(l

Thus sI is multiplied by the factor (1 -s:0/(1 -sls~), and s3 is multiplied by


(1-sl)/(1-sls~). Each of these factors is less than or equal to 4 Thus
combination of confirming and disconfirming evidence reduces the sup-
port provided by each before combination.
Consider the application of the MYCIN CF combining function to this
situation. If CFpis the positive (confirming) CF for {Pseu} and CFn is the
5negative (disconfirming) CF:

CFcoMBINE[CFp,CF,,]= (CFp + CF,,)/(1 - min{lCFpl,ICFnl})


= (Sl - s3)/(1 - min{sl,sa})
= (0.4 - 0.8)/(1 - 0.4)
= - 0.667

Whenthis CF is translated into the language of Dempster-Shafer, the result


of the MYCIN combining function is belief in {Pseu} and {Pseu} c to the
degrees 0 and 0.667, respectively. The larger disconfirming evidence of
0.8 essentially negates the smaller confirming evidence of 0.4. The con-
firming evidence reduces the effect of the disconfirming from 0.8 to 0.667.
By examining CFCOMBIN~:, it is easily seen that its application to CFs
of the opposite sign results in a CF whose sign is that of the CF of greater
magnitude. Thus support for A and Ac is combined into reduced support
for one or the other. In contrast, the Dempster function results in reduced
support for both A and A~. The Dempster function seems to us a more
realistic reflection of the competingeffects of conflicting pieces of evidence.
Looking more closely at the value of 0.667 computed by the MYCIN
function, we observe that its magnitude is less than that of the correspond-

4sis~ <~si implies1 -sis,3 ~1-si implies(1 -si)/(l -sLs3)~<1 for i= 1, 3.


5SeeSection10.2 for a discussion of this modifiedversion of the original CFcombining
function, whichwas defined and defendedinChapter11.
The Dempster-Shafer Theory and MYCIN 287

ing value of 0.706 computed by the Dempster function. It can be shown


that the MYCINfunction always results in greater reductions. To sum-
marize, if" Sl and s:~ represent support for A and Ac, respectively, with
sl /> s:,, and if" s l and s:( represent support after Dempstercombination,
then the MYCIN function results in support for only A, where this support
is less than Sl. Similarly, ifs:~ >i sl, the MYCINfunction results in support
for only A~, where the magnitude of this support is less than s3.
The difference in the two approaches is most evident in the case of
aggregation of two pieces of" evidence, one confirming A to degree s and
the other disconfirming A to the same degree. MYCINsfunction yields
CF=0, whereas the Dempster rule yields belief of s(1-s)/(1-s2)=s/(1 +s)
in each of A and At. These results are clearly very different, and again the
Dempster rule seems preferable on the grounds that the effect of confirm-
ing and disconfirming evidence of the same weight should be different
from that of no evidence at all.
Wenow examine the effect on belief of combination of two pieces of
evidence supporting mutually exclusive singleton hypotheses. The MYCIN
combining function results in no effect and differs most significantly from
the Dempster rule in this case.

Category 3. The rules involve different, hypotheses in the same frame


of" discernment. For example, one rule confirms {Pseu} to degree 0.4, and
the other disconfirms {Strep} to degree 0.7. The triggering of the second
rule gives rise to m4defined by m4({Strep}c) = 0.7, m4(O ) = 0.3. The com-
bined effect on belief" is given by mlGm4.

m4
{Strep} ~ (0.7) 0 (0.3)

{Pseu}(0.4) {Pseu} (0.28) {Pseu} (0.12)


O(0.6) {Strep} c (0.42) O (0.18)

In this case, K=0.

mlOm4({Pseu}) = 0.28 + 0.12 = 0.40


mlG m4({Strep}c) = 0.42
m10 m4(O) 0. 18
mlOm4is 0 fbr all other subsets of 0
BellGBel4({Pseu}) = 0.40
BellGBel4({Strep}) = mlO m4({Strep} c) + mlO m4({Pseu})
= 0.42 + 0.40
= 0.82
Bell~)Bel4({Pseu} c) = BellOBel4({Strep} ) = 0
288 TheDempster-Shafer
Theoryof Evidence

Before combination, the belief intervals for {Pseu} and {Strep} c are
[0.4 1] and [0.7 1], respectively. After combination, they are [0.4 1] and
[0.82 1], respectively. Note that evidence confirming {Pseu} has also con-
firmed {Strep}c, ca superset of {Pseu}, but that evidence confirming {Strep}
has had no effect on belief in {Pseu}, a subset of {Strep}:.

13.2.4 Evidence Combination Scheme

We now propose an implementation in MYCINof the Dempster-Shafer


method, which minimizes computational complexity. Barnett (1981) claims
that direct translation of" the theory, without attention to the order in which
the belief functions representing rules are combined, results in exponential
increases in the time for computations. This is due to the need to enu-
merate all subsets or supersets of a given set. Barnetts scheme reduces the
computations to linear time by combining the functions in a simplifying
order. We outline his scheme adapted to MYCIN.

Step 1. For each triple (i.e., singleton hypothesis), combine all bpas
representing rules confirming that value of" the parameter. If st, s9 ..... sk
represent different degrees of support derived from the triggering of" k
rules confirming a given singleton, then the combined support is

1 - (1 - Sl)(1 - s2)...(1 sk)

(Refer to Category 1 combinations above if this is not obvious.) Similarly,


[or each singleton, combine all bpas representing rules disconfirming that
singleton. Thus all evidence confirming a singleton is pooled and repre-
sented by a bpa, and all evidence disconfirming the singleton (confirming
the hypothesis corresponding to the set complement of the singleton) is
pooled and represented by another bpa. Wethus have 2n bpas, where n
is the size of" O. These functions all have the same form as the original
functions. This step is identical to the original approach for gathering
confirming and disconfirming evidence into MBsand MDs, respectively.

Step 2. For each triple, combine the two bpas computed in Step 1.
Such a computation is a Category 2 combination and has been illustrated.
Wenow have n bpas, which are denoted Evil, Eviz ..... Evi,,.

Step 3. Combine the bpas computed in Step 2 in one computation,


using formulae developed by Barnett (1981), to obtain a final belief func-
tion Bel. A belief" interval for each singleton hypothesis can then be com-
puted. The form of the required computation is shown here without proof.
See Barnett (1981) for a complete derivation.
Let {i} represent the ith of n singleton hypotheses in O and let
The Dempster-ShaferTheoryand MYCIN 289

Evii({i}) Pi
Evii({i}O = ci
Evii(O) ri

Since Pi + ci + ri = 1, ri = 1 - Pi - ci. Let di = ci + r i. Then it can be


shown that the function Bel resulting from combination of Evi1 ..... Evi,~
is given by

Bel({i}) K[PijHidj + ri jHicJ]

For a subset A of 0 with [A[ > 1,

BeI(A) =K([aH
[I dj] [~,,jEAPi/dj] + [[Ij~:ACj] [I]jEAMj]-- I~ j)
all j

where

K- t =[allndj][
j + all
Xp/dj]-
j
ncj
allj

as long as pj 4:1 fbr all j.


An Example

The complex formulation for combining belief functions shown above is


computationally straightforward for limited numbers of competing hy-
potheses such as are routinely encountered in medical domains. As we
noted earlier, the INTERNIST program (Miller et al., 1982) partitions its
extensive set of possible diagnoses into a limited subset of likely diseases
that could be seen as the current frame of discernment. There are likely
to be knowledge-based heuristics that can limit the search space in other
domains and thereby make calculations of a composite belief function ten-
able.

Example 7. Consider, for example, the net effect of the following set
of rules regarding the diagnosis of the infecting organism. Assumethat all
other rules failed and that the final conclusion about the beliefs in com-
peting hypotheses will be based on the following successful rules:

RI: disconfirms {Pseu} to the degree 0.6


R2: disconfirms {Pseu} to the degree 0.2
R3: confirms {Strep} to the degree 0.4
R4: disconfirms {Staph} to the degree 0.8
R5: confirms {Strep} to the degree 0.3
R6: disconfirms {Pseu} to the degree 0.5
290 The Dempster-Shafer
Theoryof Evidence

R7: confirms {Pseu} to the degree 0.3


R8: confirms {Staph} to the degree 0.7

Note, here, that O = {Staph, Strep, Pseu} and that for this example
we are making the implicit assumption that the patient has an infection
with one of these organisms.

Step 1. Considering first confirming and then disconfirming evidence


for each organism, we obtain:

{Pseu} confirmed to the degree sI = 0.3, disconfirmed to the degree st =


1 - (1 - 0.6)(1 - 0.2)(1 - 0.5)
{Staph} confirmed to the degree s 2 = 0.7, disconfirmed to the degree
s2 = 0.8
{Strep} confirmed to the degree s3 = 1 - (1 - 0.4)(1 - 0.3) = 0.58,
disconfirmed to the degree s3 = 0

Step 2. Combining the confirming and disconfirming evidence for


each organism, we obtain:

0.3(1 - 0.84)
Evil({Pseu}) = 0.064 = Pl
1 - (O.3)(0.84)
0.84(1 - 0.3)
Evil({Pseu} c) = = 0.786 = Cl
1 - (0.3)(0.84)

Thus r I = 0.15 and dl = 0.786 + 0.15 = 0.936.

0.7(1 -0.08)
Eviz({Staph}) = i ----(0.7~ = 0.318 =

0.8(1 - 0.07)
Eviz({Staph}~) = i ---~(0.7)(O~.8~ = 0.545

Thus r 2 = 0.137 and d2 = 0.545 + 0.137 = 0.682.

Evi3({Strep}) = 0.58 =
Evi3({Strep} c) = 0 = cg

Thus r~ = 0.42 and d3 = 0.42.

Step 3. Assessing the effects of belief in the various organisms on each


other, we obtain:
Conclusion 291

K-1 _= dld,2d:~(l
3 + p~/dl + p,2/d2 + p3/d3) - c12c
= (0.936)(0.682)(0.42)(1 + 0.064/0.936 + 0.318/0.682
+ 0.58/0.42) - (0.786)(0.545)(0)
= 0.268(1 + 0.068 + 0.466 + 1.38)
= 0.781
K = 1.28
Bel({Pseu}) K(pldzd 3 + rl czc3)
= 1.28((0.064)(0.682)(0.42) + (0.15)(0.545)0)
= 0.023
Bel({Staph}) K(pzdld~ + r2 clc3)
= 1.28((0.318)(0.936)(0.42) + (1.137)(0.786)0)
= 0.160
Bel({Strep}) K(p3dld2 + r3clc2)
= 1.28((0.58)(0.936)(0.682) + (0.42)(0.786)(0.545))

Bel({Pseu}c) = K(dld2d:~(p2/d,2 + p:jd3) + cld2d~ -- c12c3)


= 1.28(0.268(0.466 + 1.381) + (0.786)(0.682)(0.42))
= 0.922
Bel({Staph}c) = K(dld2d3(pl/dl + p3/d3) + c,2dld3 -
1.28(0.268(0.068 + 1.381) + (0.545)(0.936)(0.42))
0.771
Bel({Strep}") K(dld,2d,3(pl/dl + p2/d2) + c3dld2 -
1.28(0.268(0.068 + 0.466) +
0.184

The final belief intervals are therefore:

Pseu: [0.023 0.078] Staph: [0.160 0.229] Strep: [0.704 0.816]

13.3Conclusion
The Dempster-Shafer theory is particularly appealing in its potential for
handling evidence bearing on categories of diseases as well as on specific
disease entities. It facilitates the aggregation of evidence gathered at vary-
ing levels of detail or specificity. Thus collaborating experts could specify
rules that refer to semantic concepts at whatever level in the domain hi-
erarchy is most natural and appropriate. They would not be limited to the
most specific level--the singleton hypotheses of their frame of discern-
ment--but would be free to use more unifying concepts.
In a system in which all evidence either confirms or disconfirms sin-
292 TheDempster-Shafer
Theoryof Evidence

gleton hypotheses, the combination of evidence via the Dempster scheme


is computationally simple if ordered appropriately. Due to its present rule
format, MYCIN provides an excellent setting in which to implement the
theory. Claims by others that MYCIN is ill-suited to this implementation
due to failure to satisfy the mutual exclusivity requirement (Barnett, 1981)
reflect a misunderstanding of the programs representation and control
mechanisms. Multiple diseases are handled by instantiating each as a sep-
arate context; within a given context, the requirements of single-valued
parameters maintain mutual exclusivity.
In retrospect, however, we recognize that the hierarchical relationships
that exist in the MYCIN domain are not adequately represented. For ex-
ample, evidence suggesting Enterobacteriaceae (a family of gram-negative
rods) could have explicitly stated that relationship rather than depending
on rules in which an observation supported a list of gram-negative orga-
nisms with varying CFs based more on guesswork than on solid data. The
evidence really supported the higher-level concept, Enterobacteriaceae, and
further breakdown may have been unrealistic. In actual practice, decisions
about treatment are often made on the basis of high-level categories rather
than specific organism identities (e.g., "Im pretty sure this is an enteric
organism, and would therefore treat with an aminoglycoside and a ceph-
alosporin, but I have no idea which of the enteric organisms is causing the
disease").
If the MYCINknowledge base were restructured in a hierarchical
fashion so as to allow reasoning about unifying high-level concepts as well
as about the competing singleton hypotheses, then the computations of"
the Dempster-Shafer theory would increase exponentially in complexity.
The challenge is therefore to make these computations tractable, either by
a modification of the theory or by restricting the evidence domain in a
reasonable way. Further work should be directed to this end.
PART FIVE

Generalizing MYCIN
14
Use of the MYCIN
Inference Engine

One of the reasons for undertaking the original MYCINexperiment was


to test the hypothesis that domain-specific knowledge could successfully be
kept separate from the inference procedures. We felt we had done just
that in the original implementation; specifically, we believed that knowl-
edge of a new domain, when encoded in rules, could be substituted for
MYCINsknowledge of infectious diseases and that no changes to the in-
ference procedures were required to produce MYCIN-like consultations.
In the fall of 1974 Bill van Melle began to investigate our claim seriously.
He wrote (van Melle, 1974):

The MYCIN programfor infectious disease diagnosis claims to be gen-


eral. One ought to be able to take out the clinical knowledgeand plug in
knowledge about some other domain. The domain we had in mind was the
diagnosis of failures in machines. Wehad available a 1975 Pontiac Service
Manual, containing a wealth of diagnostic information, mostly in decision
tree form, with branchingon the results of specific mechanicaltests. Since
MYCINs rule base can be viewedas an implicit decision tree, with judgments
based on laboratory test results, it at least seemedplausible that rules could
be written to represent these diagnostic procedures. Becauseof the need to
understanda systemin order to write rules for diagnosingit, a fairly simple
system, the horn circuit, was selected for investigation.

After some consideration, van Melle decided that the problem re-
quired only a degenerate context tree, with "the horn" as the only context,
and that all relevant rules in the Pontiac manual could be written as defi-
nitional rules with no uncertainty. Tworules of his fifteen-rule system are
shown in Figure 14-1.
Much of MYCINs elaborate mechanism for gathering and weighing
evidence was unnecessary for this simple problem. Nevertheless, the proj-
ect provided support for our belief that MYCINsdiagnostic procedures

295
296 Use of the MYCIN
Inference Engine

RULEO02
IF: 1) Thehornis inoperativeis a symptom
of thehorn,and
2) Therelay doesclick whenthehornbuttonis depressed, and
3) Thetest lampdoesnot light when oneendis grounded and
theotherconnectedto thegreenwireterminal
of the relay
whilethe hornbuttonis depressed
THEN:It is definite(1.0)thatadiagnosis
of thehorn
replace
the relay

[HORNRULES]
RULEO03
IF: 1) Thehornis inoperative is a symptom
of the horn,and
2) Therelaydoesnot click when thehornbuttonis depressed,and
3) Thetest lampdoeslight when oneendis grounded andthe
otheris touched
to theblackwireterminal of therelay
THEN: It is definite(1.0)thatthereis anopen betweenthe
blackwireterminalof the relayandground

[HORNRULES]

FIGURE14-1 English versions of two rules from the first


nonmedical knowledge base for EMYCIN.

were general enough to allow substitutions of new knowledge bases.1 As a


result, we began the project described in Chapter 15, under the name
EMYCIN3 In Chapter 16 we describe two applications of EMYCIN and
discuss the extent to which building those two systems was easier because
of the framework provided. Remember, too, that the MYCINsystem itself
was successfully reimplemented as another instantiation of EMYCIN.
The flexibility needed by MYCINto extend or modify its knowledge
base was exploited in EMYCIN.Neither the syntax of rules nor the basic
ideas underlying the context tree and inference mechanism were changed.
The main components of an EMYCINconsultation system are described
in Chapter 5, specifically for the original MYCINprogram. These are as
follows:

llt also revealed several places in the code whereshortcuts had beentaken in keeping medical
knowledgeseparate. For example, the term organism was used in the code occasionally as
being synonymouswith cause.
ZWeare indebted to Joshua Lederberg for suggesting the phrase Essential MYCIN, i.e,
MYCIN stripped of its domainknowledge. EMYCIN is written in lnterlisp, a programming
environment for a particular dialect of the LISP language, and runs on a DECPDP-10or
-20 under the TENEX or TOPS20operating systems. The current implementation of EMY-
(IN uses about 45K words of resident memoryand an additional 80K of swapped code
space. The version of Interlisp in which it is embeddedoccupies about 130Kof resident
memory,leaving approximately 80K free for the domain knowledge base and the dynamic
data structures built up during a consultation. A manualdetailing the operation of the system
for the prospective systemdesigner is available (van Melleet al., 1981).
Use of the MYCINInference Engine 297

Contexts Objects of interest, organized hierarchically in a tree,


called the context tree
Parameters The .attributes of objects about which the system reasons
Rules Associations amongobject-attribute-value triples

While these concepts were generalized and access to them made simpler,
they are much the same in EMYCIN as they were in the original system.
The major conceptual shift in generalizing MYCINto EMYCINwas
to focus primarily on the persons who build new systems rather than on
the persons who use them. Much of the interface to users remains un-
changed. The interface to system builders, however, became easier and
more transparent. Wewere attempting to reduce the time it takes to create
an expert system by reducing the effort of a knowledge engineer in helping
an expert. As discussed in Chapter 16, we believe the experiment was
successful in this respect.
Much of the TEIRESIAS system (discussedin Chapter 9) has been
incorporated in EMYCIN.Thus the debugging facilities are very similar.
In addition, EMYCIN allows rules to be entered in the Abbreviated Rule
Language, called ARL, that simplifies the expression of new relations. For
example, the rule premise
(SAND(SAMECNTXTSITE BLOOD)
(GREATERP*(VAL1 CNTXTSICKDEGREE)
($OR (NOTSAMECNTXTBURNED)
(LESSERQ*(PLUS (VAL1 CNTXTNUMCULS)
(VAL1 CNTXTNUMPOS))
3)))

might have been entered as either


(SITE = BLOODAND
SICKDEGREE > 2 AND
-BURNEDOR NUMCULS+ NUMPOS
LE 3)

or

(SITE = BLOOD,SICKDEGREE
> 2, -BURNEDORNUMCULS
+ NUMPOS
LE 3)

In the sample EMYCINdialogue shown in Figure 14-2, EMYCINcan


be seen to ask about contexts, parameters, and rules for a hypothetical
consultation system about business problems. EMYCIN keeps track of the
properties, such as TRANS (for the appropriate English translation of the
concept name), and does the bookkeeping necessary to create a new knowl-
edge base. Then it helps the user debug the knowledge base in the context
of sample cases.
298 Use of the MYCINInference Engine

@<EMYCIN>EMYCIN [RunEMYCIN from the appropriate


directory.]
EMYCIN
t2-Dec-80...

Do you wish to create a newknowledgebase?Yes


Entera wordor phrase to describeyourdomain, i.e. fill in the blankin
"this program performs a .... consultation".(e.g.
infectiousdisease,structuralanalysis).
** businessproblem
Entera one-word name for the root of your contexttree, the central"object" with whichthe consultationis
concerned(e.g. PATIENT, STRUCTURE):
** COMPANY [The"root" of the contexttree]
Pleaseenter a wordor phraseas "translation" of COMPANY:
** company--Okay? Yes [EMYCIN makesthe obvious
suggestion,weacceptit. The
translationmighthavebeenfancier,
say "municipalagency."]
Andthe plural formis:
** companys--Okay?No [EMYCINdoesnt know some
spellingrules.]
** companies [Sospellit right.]
Parameter group: COMPANY-PARMS--Okay? Yes
COMPANY
PRINTID: COMPANY--Okay? Yes [i.e., there will bea COMPANY-I,
COMPANY-2, in consultations]
Entera list of parameters whichwill alwaysbe askedto the user whena COMPANY is created:
COMPANY
INTIALDATA: (CNAMEPROBLEM EMPLOYEES)
Andnowa list of parameters to infer whenever a COMPANY is created:
COMPANY
GOALS:(TOOLS)
Doyouwantthe programto automaticallydisplay the valuesconcludedfor someor all of the GOALS? Yes
[Automaticallyprint the valuesof
TOOLS at endof eachconsultation.]
Nowpleasedefine eachof the parameters listed above.Eachis a member
of COMPANY-PARMS.
Whatkind of parmis CNAME--Yes/No, Singlevalued,Multivalued, or
Ask-All? Slnglevalued
CNAME
EXPECT:ANY
CNAME
TRANS:(THE NAME OF *)
CNAME
PROMPT:(WHATCOMPANY IS HAVINGA PROBLEM?)

Whatkind of parmis PROBLEM--Yes/No,Singlevalued,


Multivalued,or Ask-All?Slnglevalued
PROBLEM
EXPECT:(PAYROLLINVENTORY)
PROBLEM
TRANS:(THE TYPEOF PROBLEM)

FIGURE 14-2 Sample dialogue with EMYCIN to create a new


consultation program for business problems. (Users input is in
boldface, and <cr> indicates that the user typed a carriage re-
turn. Comments are in italics.) [This sample is taken from The
EMYCIN Manual (van Melle et ai., 1981).]
Use of the MYCINInference Engine 299

PROBLEM
PROMPT:(IS THE PROBLEM
WITH PAYROLL
OR INVENTORY?)

Whatkind of parmis EMPLOYEES--Yes/No,


Singlevalued,Multivalued, Ask-All? Singlevalued
EMPLOYEES
EXPECT: POSNUMB
EMPLOYEES
UNITS: <or> [<cr> heregives the property
EMPLOYEES a valueof NIL.]
RANGE:<or>
EMPLOYEES
TRANS:(THE NUMBER OF EMPLOYEES
OF *)
EMPLOYEES
PROMPT:(HOWMANYPEOPLEDOES* EMPLOY?)

Whatkind of parmis TOOLS--Yes/No,


Singlevalued,Multivalued,or
Ask-All?Multlvalued
TOOLS
LEGALVALS: TEXT [Values producedby
CONCLUDETEXT,the results to be
printed.]
TOOLS
TRANS:(THE TOOLSTO USEIN SOLVINGTHEPROBLEM)

Okay, now back to COMPANY... [Nowthat wevedefined those


parms,finish definingthe context
type.]
COMPANY
SYN:(((CNAME) (CNAME)))--Yes,
No, or Edit? [The companyname(CNAME) will
Creating rule group COMPANYRULES
to apply to COMPANY
contexts.., be used to translate a COMPANY.]

. Autosave..
Pleasegive a one-word
identifier for your knowledge
basefiles:
** BUSINESS
<EMYCIN>CHANGES,BUSINESS;1
Are there any descendantsof COMPANY in the context tree? No

Rules, Parms,Go,etc.? Rules


Authorof anynewrules, if not yourself: <or>
Will youbeenteringanyof the rule informationproperties?No [This is askeduponthe first entrance
to therule editor.]
Rule# or NEW:NEW [Nowenter rules to deduceeachof
the GOALS definedabove;in this
case, just TOOLS.]
RULE001
PREMISE:(PROBLEM = PAYROLL ANDEMPLOYEES
> 1000)
RULE001
ACTION:(TOOLS
= "a large computer")

Translate, Nofurther change,or prop name:TRANSLATE

RULE001

[This rule appliesto companies,andis tried in orderto find out


aboutthe tools to usein solvingthe problem]
300 Use of the MYCINInference Engine

If: 1) Thetype of problemis payroll, and


2) Thenumber of employees of the company is greater than 1000
Then: It is definite (1.0) that the followingis oneof thetools to usein solvingthe problem:
a largecomputer

Translate, Nofurther change,or propname:<or>

Rule# or NEW:<or> [Finishedenteringrules.]

Rules, Parms,Go,etc.? Save [Save the knowledge


base.]
< EMYCIN>CHANGES.BUSINESS;2
Rules, Parms,Go,etc.? Go [Runa consultationto test the
knowledgebase.]
Specialoptions(type? for help):
** <cr> [No optionsneeded.]

20-Oct-7914:16:48

........ COMPANY-1........
1) Whatcompany is having a problem?
** IBM
2) Is the problemwith payroll or inventory?
** PAYROLL
3) Whatis the number of employees
of ibm?
~* 10000000
Conclusions: the tools to usein solvingthe problemare as follows: a largecomputer.

Enter Debug/review,
Rules, Farms,Go,etc.? Parameters

Parametername:cname [,4 small parameterchange--we


Property: PROPERNOUN notedthat ibmwasnot capitalized.
CNAME Setting the PROPERNOUN property
PROPERNOUN: T will fix theproblem.]
Property: <cr>

Parametername:<cr> [Finishedenteringparameters.]

Rules, Parms,Go,etc.? Save [Savethesechangesto the


knowledgebase.]
< EMYCIN>CHANGES.BUSINESS;3

Rules,Parms,Go,etc.? Quit
@

[Sometime
later...]
@<EMYClN>EMYClN
EMYCIN12-DEC-80...

Hi.

Should I load <EMYCIN>CHANGES.BUSINESS;3?


Yes
File created25-Sep-81
10:49:24
CHANGESCOMS

FIGURE 14-2 continued


Use of the MYCINInference Engine 301

(<EMYCIN>CHANGES.BUSlNESS;3)

Doyouwantto enter Rules,Parms,Go,etc. (? for help)? Newconsultation


[confirm] <or>
Specialoptions(type? for help):
** <cr>

23-Feb-9110:28:37

........ COMPANY-1
........
1) Whatcompany is having a problem?
** STANFORD
2) Is the problemwith payroll or inventory?
INVENTORY
3) Howmanypeopledoes Stanford employ?
"10000

I wasunableto makeanyconclusionaboutthe tools to usein solving


the problem. [No rules haveyet beenenteredfor
makingconclusionsaboutinventory
problems.]

EnterDebug/review
phase,or other option(? for help)?Quit

FIGURE 14-2 continued


15
EMYCIN: A Knowledge
Engineers Tool for
Constructing Rule-Based
Expert Systems

William van Melle, Edward H. Shortliffe, and


Bruce G. Buchanan

Muchcurrent work in artificial intelligence focuses on computer programs


that aid scientists with complex reasoning tasks. Recent work has indicated
that one key to the creatior~:..@f intelligent systems is the incorporation of
large amounts of task-specific knowledge. Building knowledge-based, or
expert, systems from scratch can be very time-consuming, however. This
suggests the need for general tools to aid in the construction of knowledge-
based systems.
This chapter describes an effective domain-independent framework
for constructing one class of expert programs: rule-based consultants. The
system, called EMYCIN,is based on the domain-independent core of the
MYCINprogram. We have reimplemented MYCINas one of" the consul-
tation systems that run under EMYCIN.

15.1 The Task

EMYCIN is used to construct a consultation program, by which we mean a


program that offers advice on problems within its domain of expertise.
The consultation program elicits information relevant to the case by asking

This chapter is a shortened and edited version of a paper appearing in Pergamon-lnfotech state
of the art report on machine intelligence, pp. 249-263. Maidenhead, Berkshire, U.K.: Infotech
Ltd., 1981.
302
Background 303

SYSTEM DESIGNER)

expertise debuggingfeedback

Knowledge Base
Construction Aids Domain

EMYCIN Knowledge

Consultation Base
Driver

case data advice

FIGURE15-1 The major roles of EMYCIN:acquiring a


knowledgebase from the system designer, and interpreting that
knowledgebase to provideadvice to a client.

questions. It then applies its knowledgeto the specific facts of the case and
informs the user of its conclusions. The user is free to ask the program
questions about its reasoning in order to better understand or validate the
advice given.
There are really two "users" of EMYCIN,as depicted in Figure 15-1.
The system designer, or expert, interacts with EMYCIN to produce a knowledge
base for the domain. EMYCIN then interprets this knowledge base to pro-
vide advice to the client, or consultation user. Thus the combination of EMY-
CIN and a specific knowledgebase of domain expertise is a new consultation
program. Someinstances of such consultation programs are described be-
low.

15.2 Background
Someof the earliest work in artificial intelligence attempted to create gen-
eralized problem solvers. Programs such as GPS (Newell and Simon, 1972)
304 EMYCIN:
A Knowledge
Engineers Tool

and theorem provers (Nilsson, 1971), for instance, were inspired by the
apparent generality of human intelligence and motivated by the desire to
develop a single program applicable to many problems. While this early
work demonstrated the utility of many general-purpose techniques (such
as problem decomposition into subgoals and heuristic search in its many
forms), these techniques alone did not offer sufficient power for high per-
formance in complex domains.
Recent work has instead focused on the incorporation of large
amounts of task-specific knowledge in what have been called knowledge-
based systems. Such systems have emphasized high performance based on
the accumulation of large amounts of knowledge about a single domain
rather than on nonspecific problem-solving power. Someexamples to date
include efforts at symbolic manipulation of algebraic expressions (Moses,
1971), chemical inference (Lindsay et al., 1980), and medical consultations
(Pople, 1977; Shortliffe, 1976). Although these systems display an expert
level of performance, each is powerfill in only a very narrow domain. In
addition, assembling the knowledge base and constructing a working pro-
grain for such domains is a difficult, continuous task that has often ex-
tended over several years. However, because MYCIN included in its design
the goal of keeping the domain knowledge well separated from the pro-
gram that manipulates the knowledge, the basic rule methodology pro-
vided a fbundation for a more general rule-based system.
With the development of EMYCINwe have now come full circle to
GPSs philosophy of separating the deductive mechanism from the prob-
lem-specific knowledge; however, EMYCINs extensive user facilities make
it a much more accessible environment for producing expert systems than
were the earlier programs. 1 Like MYCINs, EMYCINsrepresentation of
facts is in attribute-object-value triples, with an associated certainty factor.
Facts are associated in production rules. Rules of the same form are shown
throughout this book. Figures 16-2 and 16-5 in the next chapter show rules
from two different consultation systems constructed in EMYCIN.

15.2.1 Application of Rules---The Rule Interpreter

The control structure is primarily MYCINsgoal-directed backward chain-


ing of rules. At any given time, EMYCIN is working toward the goal of
establishing the value of some parameter of a context; this operation is
termed tracing the parameter. To this end, the system retrieves the (pre-
computed) list of rules whose conclusions bear on the goal. SACONsRule
50 (see Figures 15-2 and 16-2) would be one of several rules retrieved
an attempt to determine the stress of a substructure. Then for each rule

IEven so, it is still not an appropriatetool for buildingcertain kindsof applicationsystems


becausesomeof its powercomesflomthe specificity of the rule-basedrepresentationand
backward-chaining inferencestructure. See Section15.5 fbr a discussionof these limitations.
Background 305

in the list, EMYCIN evaluates the premise; if true, it makes the conclusion
indicated in the action. The order of the rules in the list is assumed to be
arbitrary, and all the rules are applied unless one of them succeeds and
concludes the value of" the parameter with certainty (in which case the
remaining rules are superfluous).
This control structure was also designed to be able to deal gracefully
with incomplete information. If the user is unable to supply some piece of
data, the rules that need the data will fail and make no conclusions. The
system will thus make conclusions, if possible, based on less information.
Similarly, if" the system has inadequate rules (or none at all) for concluding
some parameter, it may ask the user for the value. Whentoo many items
of information are missing, of course, the system will be unable to offer
sound advice.

15.2.2 More on the Rule Representation

There are many advantages to having rules as the primary representation


of knowledge. Since each rule is intended to be a single "chunk" of infor-
mation, the knowledgebase is inherently modular, making it relatively easy
to update. Individual rules can be added, deleted, or modified without
drastically affecting the overall performance of the system. The rules are
also a convenient unit for explanation purposes, since a single step in the
reasoning process can be meaningfully explained by citing the English
translation of the rule used.
While the syntax of rules permits the use of any LISP functions as
matching predicates in the premises of rules, or as special action functions
in the conclusions of rules, there is a small set of standard functions that
are most frequently used. The system contains information about the use
of these predicates and functions in the form of function templates. For
example, the predicate SAMEis described as follows:

(a) function template: (SAME CNTXTPARMVALUE)


(b) sample.function call: (SAMECNTXTSITE BLOOD)

The system can use these templates to "read" its own rules. For example,
the template shown here contains the standard symbols CNTXT,PARM,
and VALUE,indicating the components of the associative triple that SAME
tests. If" clause (b) above appears in the premise of a given rule, the system
can determine that the rule needs to know the site of the culture and, in
particular, that it tests whetherthe culture site is (i.e., is the sameas) blood.
Whenasked to display rules that are relevant to blood cultures, the system
will know that this rule should be selected. The most commonmatching
predicates and conclusion functions are those used in MYCIN (see Chapter
5): SAME, NOTSAME, KNOWN, NOTKNOWN, DEFINITE, NOT-
DEFINITE, etc.
306 EMYCIN:
A KnowledgeEngineers Tool

15.2.3 Explanation Capability

As will be described in Part Six, EMYCINs explanation program allows the


user of a consultation program to interrogate the systems knowledge,
either to find out about inferences made (or not made) during a particular
consultation or to examine the static knowledge base in general, indepen-
dently of any specific consultation.
During the consultation, EMYCIN can offer explanations of the cur-
rent, past, and likely future lines of reasoning. If the motivation for any
question that the program asks is unclear, the client may temporarily put
off answering and instead inquire why the information is needed. Since
each question is asked in an attempt to evaluate some rule, a first approx-
imation to an explanation is simply to display the rule currently under
consideration. The program can also explain what reasoning led to the
current point and what use might later be made of the information being
requested. This is made possible by examining records left by the rule
interpreter and by reading the rules in the knowledge base to determine
which are relevant. This form of explanation requires no language under-
standing by the program; it is invoked by simple commandsfrom the client
(WHY and HOW).
Another form of explanation is available via the Question-Answering
(QA) Module, which is automatically invoked after the consultation has
ended, and which can also be entered during the consultation to answer
questions other than those handled by the specialized WHYand HOW
commands mentioned above. The QA Module accepts simple English-lan-
guage questions (a) dealing with any conclusion drawn during the consul-
tation, or (b) about the domain in general. Explanations are again based
on the rules; they should be comprehensible to anyone familiar with the
domain, even if that person is not familiar with the intricacies of the EMY-
CIN system. The questions are parsed by pattern matching and keyword
look-up, using a dictionary that defines the vocabulary of the domain.
EMYCIN automatically constructs the dictionary from the English phrases
used in defining the contexts and parameters of the domain; the system
designer may refine this preliminary dictionary to add synonymsor to fine-
tune QAs parsing.

15.3The
System-Building Environment
The system designers principal task is entering and debugging a knowl-
edge base, viz., the rules and the object-attribute structures on which they
operate. The level at which the dialogue between system and expert takes
place is an important consideration for speed and efficiency of acquisition.
The System-Building Environment 307

IF: Composition= (LISTOFMETALS)


and
Error < 5 and
Nd-stress> .5 and
Cycles > 10000
THEN:Ss-stress = fatigue

FIGURE 15-2 Example of ARL format for SACONs Rule 50.

The knowledge base must eventually reside in the internal LISP format
that the system manipulates to run the consultation and to answer ques-
tions. At the very basic level, one could imagine a programmer using the
LISP editor to create the necessary data structures totally by hand; 2 here
the entire translation from the experts conceptual rule to LISP data struc-
tures is performed by the programmer. At the other extreme, the expert
would enter rules in English, with the entire burden of understanding
placed on the program.
The actual means used in EMYCIN is at a point between these ex-
tremes. Entering rules at the base LISP level is too error-prone, and re-
quires greater facility with LISP on the part of the system designer than
is desirable. On the other hand, understanding English rules is far too
difficult for a program, especially in a new domain where the vocabulary
has not even been identified and organized for the programs use. (Just
recognizing new parameters in free English text is a major obstacle. 3) EMY-
CINinstead provides a terse, stylized, but easily understood, language for
writing rules and a high-level knowledge base editor for the knowledge
structures in the system. The knowledge base editor performs extensive
checks to catch commoninput errors, such as misspellings, and handles all
necessary bookkeeping chores. This allows the system builder to try out
new ideas quickly and thereby to get some idea of the feasibility of any
particular formulation of the domain knowledge into rules.

15.3.1 Entering Rules

The Abbreviated Rule Language (ARL) constitutes an intermediate form


between English and pure LISE ARLis a simplified ALGOL-like language
that uses the names of the parameters and their values as operands; the
operators correspond to EMYCINpredicates. For example, SACONsRule
50 could have been entered or printed as shown in Figure 15-2.
ARLresembles a shorthand form derived from an ad hoc notation that
we have seen several of our domain experts use to sketch out sets of rules.

2This is the way the extensive knowledge base for the initial MYCIN system was originally
created.
3The task of building an assistant for designers of new EMYCIN systems is the subject of
current research by James Bennett (Bennett, 1983). The name of the program is ROGET.
308 EMYCIN:
A Knowledge
Engineers Tool

The parameter names are simply the labels that the expert uses in defining
the parameters of the domain. Thus they are familiar to the expert. The
conciseness of ARLmakes it much easier to enter than English or LISP,
which is an important consideration when entering a large body of rules.

Rule Checking

As each rule is entered or edited, it is checkedfor syntactic validity to catch


commoninput errors. By syntactic, we mean issues of rule form--whether
terms are spelled correctly, values are legal for the parameters with which
they are associated, etc.--rather than the actual information content (i.e.,
semantic considerations as to whether the rule "makes sense"). Performing
the syntactic check at acquisition time reduces the likelihood that the con-
sultation program will fail due to "obvious" errors, thus freeing the expert
to concentrate on debugging logical errors and omissions. These issues are
also discussed in Chapter 8.
EMYCINspurely syntactic check is made by comparing each clause
with the corresponding function template and seeing that, for example,
each PARM slot is filled by a valid parameter and that its VALUE slot holds
a legal value for the parameter. If an unknown parameter is found, the
checker tries to correct it with the Interlisp spelling corrector, using a
spelling list of all parameters in the system. If that fails, it asks if this is a
new (previously unmentioned) parameter. If" so, it defines the new param-
eter and, in a brief diversion, prompts the system builder to describe it.
Similar action is also taken if an illegal value for a parameter is found.
A limited semantic check is also performed: each new or changed rule
is compared with any existing rules that conclude about the same param-
eter to make sure it does not directly contradict or subsume any of them.
A contradiction occurs when two rules with the same set of premise clauses
make conflicting conclusions (contradictory values or CFs for the same
parameter); subsumption occurs when one rules premise is a subset of"
anothers, so that the first rule succeeds wheneverthe second one does (i.e.,
the second rule is more specific), and both conclude about the same values.
In either case, the interaction is reported to the expert, who may then
examine or edit any of the offending rules.

15.3.2 Describing Parameters

Information characterizing the parameters and contexts of the domain is


stored as properties of each context or parameter being described. Whena
new entity is defined, the acquisition routines automatically prompt for the
properties that are always needed (e.g., EXPECT, the list of values expected
for this parameter); the designer may also enter optional properties (those
The System-BuildingEnvironment 309

needed to support special EMYCINfeatures). The properties are all


checked for validity, in a fashion similar to that employed by the rule
checker.

15.3.3 System Maintenance

While the system designer builds up the domain knowledge base as de-
scribed above, EMYCIN automatically keeps track of the changes that have
been made (new or changed rules, parameters, etc.). The accumulated
changes can be saved on a file by the system builder either explicitly with
a simple commandor automatically by the system every n changes (the
frequency of automatic saving can be set by the system builder). When
EMYCIN is started in a subsequent session, the system looks for this file
of changes and loads it in to restore the knowledge base to its previous
state.

15.3.4 Human Engineering

Although the discussion so far has concentrated on the acquisition of the


knowledgebase, it is also important that the resulting consultation program
be pleasing in appearance to the user. EMYCINsexisting human-engi-
neering features relieve the system builder of manyof the tedious cosmetic
concerns of producing a usable program. Since the main mode of inter-
action between the consultation program and the client is in the programs
questions and explanations, most of the features concentrate on making
that interface as comfortable as possible. A main feature in this category
that has already been described is the explanation program--the client can
readily find out why a question is being asked, or how the program arrived
at its conclusions. The designer can also control, by optionally specifying
the PROMPT property for each parameter that is asked for, the manner
in which questions are phrased. More detail can be specified, for example,
than would appear in a simple prompt generated by the system from the
parameters translation.
EMYCIN supplies a uniform input facility that allows the normal in-
put-editing functions---character, word, and line deletions--and on display
terminals allows more elegant editing capabilities (insertion or deletion in
the middle of the line, for example) in the style of screen-oriented text
4editors. It performs spelling correction and TENEX-style completion
from a list of possible answers; most commonlythis list is the list of legal

4After the user types ESCAPE or ALTMODE,


EMYCIN fills out the rest of the phrase if the
part the user has typed is mmmbiguous. For example,whenEMYCIN expects the nameof
an organism, PSEUis unambiguousfor PSEUDOMONAS-AERUGINOSA. Thus the auto-
maticcompletionof input can saveconsiderableeffort and frustration.
310 EMYCIN:
A KnowledgeEngineers Tool

values for the parameter being asked about, as supplied by the system
designer.
In most places where EMYCIN prompts for input, the client may type
a question mark to obtain help concerning the options available. Whenthe
program asks for the value of a parameter, EMYCIN can provide simple
help by listing the legal answers to the question. The system designer can
also include more substantial help by giving rephrasings of or elaborations
on the original question; these are simply entered via the data base editor
as an additional property of the parameter in question. This capability
provides for both streamlined questions for experienced clients and more
detailed explanations of what is being requested for those who are new to
the consultation program.

15.3.5 Debugging the Knowledge Base

There is more to building a knowledge base than just entering rules and
associated data structures. Anyerrors or omissions in the initial knowledge
base must be corrected in the debugging process. In EMYCIN the principal
method of debugging is to run sample consultations; i.e., the expert plays
the role of a client seeking advice from the system and checks that the
correct conclusions are made. As the expert discovers errors, he or she
uses the knowledgeacquisition facilities described above to modify existing
rules or add new ones.
Although the explanation program was designed to allow the consul-
tation user to view the programs reasoning, it is also a helpful high-level
debugging aid for the system designer. Without having to resort to LISP-
level manipulations, it is possible to examine any inferences that were
made, find out why others failed, and thereby locate errors or omissions
in the knowledge base. The TEIRESIAS program developed the WHY/
HOW capability used in EMYCIN for this very task (see Chapter 9).
EMYCINprovides a debugger based on a portion of the TEIRESIAS
program. The debugger actively guides the expert through the programs
reasoning chain and locates faulty (or missing) rules. It starts with a con-
clusion that the expert has indicated is incorrect and follows the inference
chain back to locate the error.
The rule interpreter also has a debugging mode, in which it prints out
assorted information about what it is doing: which rules it tries, which ones
succeed (and what conclusions they make), which ones fail (and for what
reason), etc. If the printout indicates that a rule succeeded that should
have failed, or vice versa, the expert can interrupt immediately, rather than
waiting for the end of the consultation to do the more formal TEIRESIAS-
style review.
In either case, once the problem is corrected, the expert can then
restart and try again, with the consultation automatically replayed using
the new or modified rules.
The System-Building Environment 311

Case Library

EMYCIN has facilities for maintaining a library of sample cases. These can
be used for testing a complete system, or for debugging a growing one.
The answers given by the consultation user to all the questions asked dur-
ing the consultation are simply stored away, indexed by their context and
parameter. Whena library case is rerun, answers to questions that were
previously asked are looked up and automatically supplied; any new ques-
tions resulting from changes in the rule base are asked in the normal
fashion. This makes it easy to check the performance of a new set of rules
on a "standard" case. It is especially useful during an intensive debugging
session, since the expert can make changes to the knowledge base and,
with a minimumof extra typing, test those changes---effectively reducing
the "turnaround time" between modifying a rule and receiving consulta-
tion feedback.

The BATCH Program

A problem commonto most large systems is that new knowledge entered


to fix one set of problems often introduces new bugs, affecting cases that
once ran successfully. To simplify the task of keeping the knowledge base
consistent with cases that are known to be correctly solved, EMYCINs
BATCHprogram permits the system designer to run any or all cases in
the library in background mode. BATCHreports the occurrence of any
changes in the results of the consultation and invokes the QAModule to
explain why the changes occurred. Of course, the system builder must first
indicate to the system which parameters represent the results or the most
important intermediate steps by which the correctness of the consultation
is to be judged. The use of the BATCHprogram could be viewed as a
form of additional semantic checking to supplement the checking routinely
performed at the time of rule acquisition.

15.3.6 The Rule Compiler

To improve efficiency in a running consultation program, EMYCIN pro-


vides a rule compiler that transforms the systems production rules into a
decision tree, eliminating the redundant computation inherent in a rule
interpreter. The rule compiler then compiles the resulting tree into ma-
chine code. The consultation program can thereby use an efficient deduc-
tive mechanismfor running the actual consultation, while the flexible rule
format remains available for acquisition, explanation, and debugging. For
details about the rule compiler see van Melle (1980).
312 EMYCIN:
A KnowledgeEngineers Tool

15.4Applications

Several consultation systems have been written using EMYCIN.The orig-


inal MYCINprogram provides advice on diagnosis and therapy for infec-
tious diseases. MYCINis now implemented in EMYCIN,but its knowledge
base was largely constructed before EMYCIN was developed as a separate
system. SACONand CLOT(described in Chapter 16), PUFF (Aikins
al., 1983), HEADMED (Heiser et al., 1978), LITHO(Bonnet, 1981),
(Bennett and Hollander, 1981), BLUEBOX(Mulsant and Servan-
Schreiber, 1983), and several other demonstration systems have been suc-
cessfully built in EMYCIN.All have clearly shown the power of starting
with a well-developed framework and concentrating on the knowledge
base. For example, to bring the SACONprogram to its present level of
performance, about two person-months of the experts time were required
to explicate their task as consultants and to formulate the knowledge base,
and about the same amount of time was required to implement and test
the rules in a preliminary version of EMYCIN.CLOTwas constructed as
a joint effort by an experienced EMYCIN programmer and a collaborating
medical student. Following approximately ten hours of discussion about
the contents of the knowledge base, they entered and debugged in another
ten hours a preliminary knowledge base of some 60 rules using EMYCIN.
Both knowledge bases would need considerable refinement before the pro-
grams would be ready for general use. The important point, however, is
that starting with a framework like EMYCINallows system builders to
focus quickly on the expertise necessary for high performance because the
underlying framework is ready to accept it.

15,5 Range of Applicability

EMYCINis designed to help build and run programs that provide con-
sultative advice. The resulting consultation system takes as input a body of
measurements or other iniormation pertinent to a case and produces as
output some form of recommendation or analysis of the case. The frame-
work seems well suited for many diagnostic or analytic problems, notably
some classes of fault diagnosis, where several input measurements (symp-
toms, laboratory tests) are available and the solution space of possible di-
agnoses can be enumerated. It is less well suited for "tormation" problems,
where the task is to piece together existing structures according to specified
constraints to generate a solution.
EMYCINwas not designed to be a general-purpose representation
language. It is thus wholly unsuited for some problems. The limitations
Rangeof Applicability 313

derive largely from the fact that EMYCIN has chosen one basic, readily
understood representation for the knowledge in a domain: production
rules that are applied by a backward-chaining control structure and that
operate on data in the form of associative triples. The representation, at
least as implemented in EMYCIN,is unsuitable for problems of constraint
satisfaction, or those requiring iterative techniques. 5 Amongother classes
of problems that EMYCIN does not attempt to handle are simulation tasks
and tasks involving planning with stepwise refinement. One useful heuris-
tic in thinking about the suitability of EMYCIN for a problem is that the
consultation system should work with a "snapshot" of information about a
case. Good advice should not depend on analyzing a continued stream of
data over a time interval.
Even those domains that have been successfully implemented have
demonstrated some of the inadequacies of EMYCIN.In addition to rep-
resentational difficulties, other problems noted have been the lack of user
control over the consultation dialogue (e.g., the order of questions) and
the amount of time a user must spend supplying information. These lim-
itations are discussed further in subsequent chapters.

5TheVM program(Chapter22), however,has shownthat productionrules can be used


provideadviceill a dynamic setting whereiterative monitoring
is required.Greatlyinfluenced
by EMYCIN design issues, VMdeals with the management of patients receiving assisted
ventilationafter cardiacsurgery.
16
Experience Using EMYCIN

James s. Bennett and Robert S. Engelmore

The development of expert systems is plagued with a well-known and


crucial bottleneck: in order for these systems to perform at all the domain-
specific knowledge must be engineered into a form that can be embedded
in the program. Advances in understanding and overcoming this knowl-
edge acquisition bottleneck rest on an analysis of both the process and the
product of our current, rather informal interactions with experts. To this
end the purpose and structure of two quite dissimilar rule-based systems
are reviewed. Both systems were constructed using the EMYCINsystem
after interviewing an expert. The first, SACON (Bennett et al., 1978),
meant to assist an engineer in selecting a method to perform a structural
analysis; the second, CLOT(Bennett and Goldman, 1980), is meant
assist a physician in determining the presence of a blood clotting disorder.
The presentation of the details of these two systems is meant to ac-
complish two functions. The first is to provide an indication of the scope
and content of these rule-based systems. The reader need not have any
knowledge of the specific application domain; the chapter will present the
major steps and types of inferences drawn by these consultants. This con-
ceptual framework, what we term the inference structure, forms the basis for
the experts organization of the domain expertise and, hence, the basis for
successful acquisition of the knowledge base and its continued mainte-
nance. The second purpose of this chapter is to indicate the general form
and function of these inference structures.
Wefirst present the motivations and major concepts of both the SA-
CONand CLOTsystems. A final section then summarizes a number of
observations about the knowledge acquisition process and the applicability
of EMYCINto these tasks. This chapter thus shows how the knowledge
acquisition ideas from Chapter 9 and the EMYCIN framework from Chap-
ter 15 have been used in domains other than infectious disease.

This chapter is a shortened and edited version of a paper appearing in Pergamon-lnfotech state
of the art report on machineintelligence. Maidenhead, Berkshire, U.K.: Infotech Ltd., 1981.
314
SACON:
A Consultantfor StructuralAnalysis 315

16.1SACON:A Consultant for Structural Analysis

SACON(Structural Analysis CONsultant) was developed to advise nonex-


pert engineers in the use of a general-purpose computer program for
structural analysis. The automated consultant was constructed using the
EMYCIN system. Through a substitution of structural engineering knowl-
edge for the medical knowledge, the program was converted easily from
the domain of infectious diseases to the domain of structural analysis.
The purpose of a SACON consultation is to provide advice to a struc-
tural engineer regarding the use of a structural analysis program called
MARC(MARCCorporation, 1976). The MARCprogram uses finite-ele-
ment analysis techniques to simulate the mechanical behavior of objects,
for example, the metal fatigue of an airplane wing. Engineers typically
know what they want the MARC program to do--e.g., examine the behav-
ior of a specific structure under expected loading conditions--but do not
know/tow the simulation program should be set up to do it. The MARC
program offers a large (and, to the novice, bewildering) choice of analysis
methods, material properties, and geometries that may be used to model
the structure of interest. From these options the user must learn to select
an appropriate subset of methods that will simulate the correct physical
behavior, preserve the desired accuracy, and minimize the (typically large)
computational cost. A year of experience with the program is required to
learn how to use all of MARCsoptions proficiently. The goal of the au-
tomated consultant is to bridge this "what-to-how" gap, by recommending
an analysis strategy. This advice can then be used to direct the MARC user
in the choice of specific input data--e.g., numerical methods and material
properties. Typical structures that can be analyzed by both SACONand
MARC include aircraft wings, reactor pressure vessels, rocket motor cas-
ings, bridges, and buildings.

16.1.1 The SACONKnowledge Base

The objective of a SACON consultation is to identify an analysis strategy for


a particular structural analysis problem. The engineer can then implement
this strategy, using the MARC program, to simulate the behavior of the
structure. This section introduces the mathematical and physical concepts
used by the consultant when characterizing the structure and recommend-
ing an analysis strategy.
An analysis strategy consists of an analysis class and a number of as-
sociated analysis recommendations. Analysis classes characterize the complex-
ity of modeling the structure and the ability to analyze the material be-
haviors of the structure. Currently, 36 analysis classes are considered;
316 Experience Using EMYCIN

among them are Nonlinear Geometry Crack Growth, Nonlinear Geometry


Stress Margin, Bifurcation, Material Instability, Inelastic Stiffness Degra-
dation, Linear Analysis, and No Analysis. The analysis recommendations
advise the engineer on specific features of the MARC program that should
be activated when performing the actual structural analysis. (The example
consultation in Figure 16-3 concludes with nine such recommendations.)
To determine the appropriate analysis strategy, SACON infers the crit-
ical material stress and deflection behaviors of a structure under a number
of loading conditions. Amongthe material stress behaviors inferred by
SACONare Yielding Collapse, Cracking Potential, Fatigue, and Material
Instabilities; material deflection behaviors inferred by SACON are Exces-
sive Deflection, Flexibility Changes, Incremental Strain Failure, Buckling,
and Load Path Bifurcation.
Using SACON,the engineer decomposes the structure into one or
more substructures and provides the data describing the materials, the gen-
eral geometries, and the boundary conditions for each of these substruc-
tures. A substructure is a geometrically contiguous region of the structure
composed of a single material, such as high-strength aluminum or struc-
tural steel, and having a specified set of kinematic boundary conditions. A
structure may be subdivided by the structural engineer in a number of
different ways; the decomposition is chosen that best reveals the worst-case
material behaviors of the structure.
For each substructure, SACON estimates a numeric total loading from
one or more loadings. Each loading applied to a substructure represents
one of the typical mechanical forces on the substructure during its working
life. Loadings might, for example, include loadings experienced during
various maneuvers, such as braking and banking for planes, or, for build-
ings, loadings caused by natural phenomena, such as earthquakes and
windstorms. Each loading is in turn composed of a number of point or
distributed load components.
Given the descriptions of the component substructures and the de-
scriptions of the loadings applied to each substructure, the consultant es-
timates stresses and deflections for each substructure using a number of
simple mathematical models. The behaviors of the complete structure are
found by determining the sum of the peak relative stress and deflection
behaviors of all the substructures. Based on these peak responses (essen-
tially the worst-case behaviors exhibited by the structure), its knowledgeof
available analysis types, and the tolerable analysis error, SACON recom-
mendsan analysis strategy. Figure 16-1 illustrates the basic types of infer-
ences drawn by SACONduring a consultation.
Judgmental knowledge for the domain, and about the structural anal-
ysis task in particular, is represented in EMYCIN in the form of production
rules. An example of a rule, which provides the transition from simple
numeric estimates of stress magnitudes to symbolic characterizations of
stress behaviors for a substructure, is illustrated in Figure 16-2.
One major feature of EMYCIN that was not used in this task was the
SACON:
A Consultant for Structural Analysis 317

Analysis StrategAy of the Structure


t
Worst-Case Stress and Deflection
Behaviors of the Structure
t
Symbolic Stress and Deflection
Behaviors of Each Substructure
t
Composite Numeric Stress and Deflection
Estimations of Each Loading
Numeric Stres~ and Deflection
Magnitudes of Each Load Component

FIGURE16-1 Inference structure during a SACONconsulta-


tion. The user specifies loading and substructure descriptions
that the system uses to infer material behaviors and, finally, an
analysis strategy.

certainty factor mechanism--i.e., the ability to draw inferences using un-


certain knowledge. The present consultation strategy and the associated
mathematical models were designed to estimate extreme loading condi-
tions, from which SACONconcludes the appropriate analysis class. Con-
sequently, by using a conservative model, the rules, though inexact, are
sufficiently accurate for predicting response bounds with certainty.
The existing knowledge base is able to select from among 36 nonlinear
analysis strategies. If nonlinear analysis is not indicated by the response
estimates, the consultation recommends linear analysis. In addition, if rel-
ative stress and displacement estimates are low (less than 5% of critical
values), the consultation indicates that no analysis is required. The knowl-
edge base consists of 170 rules and about 140 consultation parameters. A

RULE050
IF: 1) Thematerial
composing thesub-structureis oneof: themetals,and
2) Theanalysis
error(in percent)thatis tolerable
is between
5 and30, and
3) Thenon-dimensional
stressof thesub-structure
is greater
than.9, and
4) Thenumberof cycles theloading is to beappliedis between1000and10000
THEN: It is definite(1.0)thatfatigueis oneof thestressbehavior
phenomena
in thesub-structure

PREMISE:(SAND(SAMECNTXT
MATERIAL(LISTOFMETALS))
(BETWEEN*
CNTXTERROR 5 30)
(GREATER*
CNTXT
ND-STRESS .9)
(BETWEEN*
CNTXT
CYCLES 100010000))
ACTION:
(CONCLUDECNTXTSS-STRESS
FATIGUE TALLY1.0)

FIGURE16-2 A sample rule from SACON,showing both the


English and LISP forms.
318 Experience Using EMYCIN

typical consultation (2 substructures, 3 loadings, 3 load components) re-


quires about 25 minutes in an interactive session.
The transcript in Figure 16-3 illustrates SACONsinteractions with an
engineer and SACONsrecommendation of an analysis strategy. The en-
gineers responses to questions follow the prompt (**) and are printed
bold capitals.

July 26, 1978 10:54AMin <SACON>MARC.SAV;8O7O9


by SACON>

HelloDr. Melosh,

This consultationconcerns the 747wing. Thepurposeof the consultationis to demonstrate the generalfeatures
of a structural analysisconsultationon a simpleproblem.Thusthe consultationinvolvesonly onesubstructure:
Thetotal wing.

26-Jul-7810:54:44
[consultationof 13-Jul-781:55PM].
........ STRUCTURE-28........
1) Whatis the nameof STRUCTURE-28?
** BOEING 747 WING
2) Assuming that yourcharacterization of the boeing747wingin termsof its geometry,materialcharacteristics,
andboundary conditionsare accurate,howmucherror (in percent)is tolerable for the analysis?
** 10
3) DOyouwantto examine the responses
of the boeing747wing,to evaluateits instability, or both?
** EXAMINE-RESPONSES
4) Doesthe boeing747winghaveanytime dependent termsin its equationsof equilibrium?
** NO
5) Is heatingor coolingof interest in determining structureresponses?
** NO
6) Haveyoudecomposed the boeing747winginto a number of potential sub-structureswhichare relevantto this
analysis?
** YES
........ SUB-STRUCTURE-1
........
7) Whatis the nameof SUB-STRUCTURE-l?
** TOTALWING
8) Whatis the material composing mostof the total wing (SUB-STRUCTURE-I)?
** HIGH-STRENGTH-ALUMINUM
9) Enter the averagelongestdimensionof the total wing(SUB-STRUCTURE-I) in inches(or <number>
CMS):
** 1420
10) Enter the weight of the total wing (SUB-STRUCTURE-I) in pounds(or <number> KGS):
** 46000
11) Is the total wing(SUB-STRUCTURE-I) thin-walled or solid?
** THIN-WALLED
12) Whatis the configurationof supportfor the total wing(SUB-STRUCTURE-I)?
** ONE-SIDE
13) Is the total wing(SUB-STRUCTURE-I) to bemodelledas a 1, 2, or 3 dimensionalsystem?
** 2
14) Wouldyoudescribethe constructionof the total wing (SUB-STRUCTURE-I) to be truss and/or frame
elements(NETWORK),
sheets and/or blocks of material (CONTINUUM), or a combinationof both?
** CONTINUUM
15) Is the overall geometry of the total wing(SUB-STRUCTURE-I) curvedor planar?
** PLANAR

FIGURE 16-3 A SACON consultation.


SACON:A Consultant for Structural Analysis 319

Thefirst loadingappliedto the total wing(SUB-STRUCTURE-I)


will be called:
........ LOADING-1
........
16) Whatis the nameof LOADING-l?
** FLIGHT
17) Enterthe number of loadingcyclesto be applied:
** 20000
Thefirst load component associatedwith LOADING-1
will becalled:
........ LOAD-COMPONENT-1 ........
18) Wouldyou describe LOAD-COMPONENT-1 as being DISTRIBUTED over most of the
sub-structureor as actingat a POINT of the sub-structure?
** DISTRIBUTED
19) Whichsurface of the sub-structure does LOAD-COMPONENT-1 act NORMAL to? (If
morethan onesurface, youshouldconsiderthe loadingas two or morecomponent
Ioadings)
** WIDTH-LENGTH
20) Enter the averagewall thicknessof the total wing(SUB-STRUCTURE-I) in inches(or
<number>CMS):
** .31
21) Enter the averagedepthof the total wing (SUB-STRUCTURE-I) in inches(or
<number>CMS):
** 41
22) Enterthe magnitude of the distributedload(in psi):

23) Are there anyother load componentsassociatedwith LOADING-l?


** NO
24) Are there anyother loadingconditionsassociatedwith the total wing (SUB-STRUCTURE-I)?
** NO
25) Are thereanyothersub-structuresof the boeing747wingrelevantto this analysis?
NO
26) Dothe supportsof the boeing747winginvolve Coulombfriction, nonlinearsprings,and/orgapping?
** NO
Thefollowinganalysisclassesare relevantto the analysisof yourstructure:
1) general-inelastic
Thefollowing recommendations
applyto this case:
Activateincremental
stress--incremental
strain analysis.
Modelnonlinearstress-strainrelationof the material.
Solutionwill be basedon a mix of gradientandNewton
methods.
Logicto scanpeakstressat eachstep andevaluatefatigue integrity shouldbeused.
Logicto scanstresses,smooth,andcompare
with allowablestresses(with appropriatesafety factors)
shouldbe used.
Logicto scandeflections,calculaterelative values,andcompare
with codelimits, shouldbecalled upon.
Cumulative
strain damage
shouldbe calculated.
Analysisshouldincludetwoor moreloadcycles(if cyclic) with extrapolationfor strain accumulation.
Shakedown
extrapolationlogic shouldbe used.
Asinglecycleof loadingis sufficient for the analysis.

Doyouwish adviceon anotherstructure?


** NO
320 Experience Using EMYCIN

16.2CLOT: A Consultant for Bleeding Disorders

In a different, and in some ways more standard, application of EMYCIN,


we have recently developed a prototype of a consultant called CLOT,which
advises physicians on the presence and types of disorders of the human
coagulation system. CLOTwas constructed by augmenting the EMYCIN
system with domain-specific knowledge about bleeding disorders encoded
as production rules. Section 16.3 describes the general structure of the
CLOTknowledge base.
Our primary intent in constructing CLOTwas to explore knowledge
acquisition techniques that might be useful during the initial phases of
knowledge base specification. Thus we sought to determine the primary
inference structures and preliminary medical concepts that a consultant
might require. Weacquired the initial medical expertise for CLOTfrom a
third-year medical student within a brief amount of time. This expertise
has not yet been refined by an acknowledged expert physician. Weconjec-
ture that with these structures now in place the arduous task of detailing
the knowledge required for truly expert performance can proceed at a
more rapid pace. However, we have riot had the opportunity to test this
conjecture (cf. Mulsant and Servan-Schreiber, 1984).

16.3The CLOT Knowledge Base

The primary objective of a CLOTconsultation is to identify the presence


and type of bleeding defect in a patient. If a defect is diagnosed, the
consultant attempts to refine its diagnosis by identifying the specific con-
ditions or syndromes in the patient and their plausible causes. These re-
fined diagnoses can then be used by the physician to evaluate the patients
clinical status and to suggest possible therapies. At present, CLOTmakes
no attempt to recommendsuch therapies. This section briefly introduces
the physiological basis and inference structure used by the consultant when
characterizing the bleeding defect of the patient.
There are two major types of bleeding disorders, corresponding to
defects in the two component subsystems of the human coagulation system.
The first subsystem, termed the platelet-vascular system, is composed of
the blood vessels and a component of the blood, the platelets. Upon sus-
taining an injury, the blood vessels constrict, reducing the flow of blood to
the injured area. This vasoconstriction in turn activates the platelets, caus-
ing them to adhere to one another and form a simple, temporary "plug,"
The CLOTKnowledge Base 321

or thrombus. This thrombus is at last reinforced by fibrin, a protein re-


sulting from a complex, multienzyme pathway, the second component sub-
system of the coagulation system. Fibrin converts the initial platelet plug
into the more permanent clot with which most people are familiar. A defect
in either the platelet-vascular or the coagulation (enzymatic) subsystem can
cause prolonged and uncontrollable bleeds. For example, the familiar
"bleeders" disease (hemophilia) is the result of a missing or altered enzyme
in the coagulation system, which inhibits the formation of fibrin and hence
of the final clot.
CLOTwas designed to be used eventually by a physician attending a
patient with a potential bleeding problem. The system assumes that the
physician has access to the necessary laboratory tests and the patients med-
ical history. CLOTattempts to diagnose the bleeding defect by identifying
which of" the two coagulation subsystems might be defective. This inference
is based first on clinical evidence and then, independently, on the labora-
tory findings. Finally, if these independent conclusions are mutually con-
sistent, an overall estimation of the defect is deduced and reported.
The consultation begins with the collection of standard demographic
data about the patient (name, age, sex, and race) followed by a review
the clinical, qualitative evidence for a bleeding disorder. The physician is
asked to describe an episode of bleeding in terms of its location, whether
its onset was immediate or prolonged, and whether the physician feels the
amount of bleeding was disproportionate for its type. Other factors such
as the spontaneity of the bleeding, its response to applied pressure, and its
persistence (duration) are also requested. These data are supplemented
with facts from the patients background and medical history to provide
an estimate of the significance of the episode. These factors are then used
to provide suggestive, but not definitive, evidence for the presence of a
bleeding defect. This suggestive, rather than diagnostic, expertise was en-
coded using EMYCINscertainty factor mechanism. Each rule mentions a
key clinical parameter whose presence or absence contributes to the final,
overall certainty of a particular bleeding disorder. (See Figure 16-4.)
The clinical description of the bleeding episode is followed by a report
of the coagulation-screen test results. These six standard, quantitative mea-
surements made of the patients blood sample are used to determine if the
blood clots abnormally. If the patients blood does clot abnormally, CLOT
attempts to infer what segment of the enzymatic pathway might be im-
paired and what platelet dysfunction might be present.
Finally, if clinical and laboratory evidence independently produce a
mutually consistent estimation of the defect type, the case data and the
intermediate inferences about the significance and possible causes of the
bleed combine to produce a refined diagnosis for the patient. Currently,
for patients experiencing a significant bleed, these conclusions include spe-
cific enzyme deficiencies, von Willebrands syndrome, Kallikrein defects,
thrombocytopenia, and thrombocytosis.
322 Experience Using EMYCIN

RULE025
iF: 1) Bleeding-history
is oneof thereasons forthisconsultation,
2) There
is anepisode of significant
bleedingin thepatient,
3) Coagulation-defect
is oneof thebleeding disordersin thepatient,
4) Thedefective
coagulation pathwayof thepatientis intrinsic,and
5) There
arenotfactorswhich interferewiththepatientsnormal bleeding
THEN: It is definite(1.0)thatthefollowing is oneofthebleeding diagnoses
of thepatient:The
patient
hasoneor more of thefollowingconditions: Hemophilia
A, vonWillebrands
syndrome,
anIX, XI, or XII deficiency,
or a highmolecularweightKallikreindefect.

PREMISE:(SAND(SAMECNTXTREASONBLEEDING-HISTORY)
(SAMECNTXT
SIGBLD)
(SAMECNTXT
FINALDEFCOAGULATION-DEFECT)
(SAMECNTXT
DEFPATH
INTRINSIC)
(NOTSAME
CNTXTiNTERFERENCE))
ACTION:
(CONCLUDETEXTCNTXT
DX(TEXTDXHEMOPHILIA)
TALLY1000)

FIGURE16-4 A sample rule from CLOT. RULE025 partly


determines the final diagnosis.

Figure 16-5 indicates the major inferences drawn by CLOT during a


consultation.
The current knowledge base is able to select from among 13 refined
diagnoses and can indicate that the patient is a normal bleeder. CLOTis
also able to detect possibly false negative test results due to various inter-
fering factors such as recent exercise, pregnancy, presence of chronic ill-
nesses, and the patients use of drugs known to alter the clotting process.
The knowledge base consists of 63 rules and 50 consultation parameters.
A typical consultation requires about five minutes of interaction. A sample
consultation is shown in Figure 16-6. The transcript in Figure 16-6 illus-
trates CLOTs interactions with a physician about a patient and CLOTs
final diagnosis. The physicians responses follow the prompt (**) and are
printed in bold capitals.

Final Diagnosis
Final, Consisttent Estimation
of the Bleeding Defect
t
Estimation of Enzymatic and Platelet
Defects on the Basis of the Coagulation Battery
t
Initial Estimation of Bleeding Defect Type
Based on Clinical Evidence

FIGURE16-5 Inference structure during a CLOT con-


sultation.
EMYCINas a Knowledge Representation Vehicle 323

....... PATIENT-
110........
1) Name:
"1=1110
2) Age:
** 90 YEARS
3) Sex:
** FEMALE
4) Race:
** CAUCASIAN
5) Pleaseindicateyourreason(s)for requesting this consultation:
" BLEEDING-HISTORY
6) Whattype of bleedingdescribesPt110smostrecent episodeof bleeding?
** HEMARTHROSIS
7) Is therea history of a geneticbleedingdisorderin Pt110sfamily?
"* YES
8) Wasthe onset of the bleed immediateor delayed?
** DELAYED
9) BT:
"1 5 MINUTES
10) PT:
** 13
11) PTT:
"50
12) TT:
"15
13) FSF:
** NORMAL
14) HasPt110recently exercised?
** NO
15) Is Pt110currently receivinganyof the followingdrugs:ASA,Heparin,Coumarin, oral-contraceptives,
Ephedrine,Epinephrine,ADH?
" NO
16) Is Ptl10 diagnosed as havingcirrhosis, collagendisease,cancer,or anychronicdisease?
** NO
Conclusions:
the blooddisordersof Pt110are as follows:
COAGULATION-DEFECT (.97)
Conclusions:
the statementsaboutthe consistencyof the casedata andCLOTs
interpretation are as follows:
Bothclinical andlab dataare internally consistent
andthereis overall,consistentinterpretationof the blood
disorder.
Conclusions:
the bleedingdiagnoses
of Ptl "10 are as follows:
Thepatient hasoneor moreof the followingconditions:
Hemophilia A, vonWillebrandssyndrome,
anIX, XI, or XII deficiency,or a high molecularweight
Kallikreindefect.(.97)

FIGURE 16-6 Transcript of a CLOT consultation.

!6.4 EMYCIN as
Vehicle
a Knowledge Representation

We did not find the representation formalism of EMYCINto be a hin-


drance to either the formulation of the knowledge by the expert or its
eventual implementation in either program. In fact, the simplicity of using
324 Experience Using EMYCIN

and explaining both EMYCINsrule-based formalism and its backward-


chaining control structure actually facilitated the rapid development of the
knowledge base during the early stages of the consultants design. More-
over, the control structure, like the rule-based formalism, seemed to impose
a salutary discipline on the expert during the formulation of the knowledge
base.
The development of SACONwas a major test of the domain-indepen-
dence of the EMYCINsystem. Previous applications using EMYCINhad
been primarily medical, with the consultations focusing on the diagnosis
and prescription of therapy for a patient. Structural analysis, with its em-
phasis on structures and loadings, allowed us to detect the small number
of places where this medical bias had unduly influenced the system design,
notably in the text strings used for prompting and giving advice.
Both the MARCexpert and the medical student found that their
knowledge was easily cast into the rule-based formalism and that the ex-
isting predicate functions and context-tree mechanismprovided sufficient
expressive power to capture the task of advising their respective clients.
The existing interactive facilities for performing explanation, question an-
swering, and consultation were found to be well developed and were used
directly by our application. None of these features required any significant
reprogramming.
EMYCINprovides many tools to aid the knowledge engineer during
the process of embedding the expertise into the system. During the con-
struction of CLOTwe found that the knowledge acquisition tools in EMY-
CIN had substantially improved since the construction of SACON.These
facilities now perform a large amount of useful checking and default spec-
ification when specifying an initial knowledge base. In particular, a new
facility had been implemented that provides assistance during the specifi-
cation of the context tree. This facility eliminates a substantial amount of
user effort by setting up the multitude of data structures for each context
and ensuring their mutual consistency. Furthermore, the facility for ac-
quiring clinical parameters of a context now performs a significant amount
of prompting and value checking on the basis of a simple parameter clas-
sification scheme; we found these facilities very useful.
We made extensive use of the ARL(Abbreviated Rule Language) fa-
cility when acquiring the rules for CLOT.Designed to capitalize on the
stereotypically terse expression of rule clauses by experts, ARLreduces the
amount of typing time and, again, ensures that the correct forms are used
when specifying both the antecedent and consequent parts of a rule. For
example, when specifying the CLOTrule shown in Figure 16-4, the med-
ical student engaged in the interaction shown in Figure 16-7. The users
input follows a colon or a question mark.
In addition to ARL, EMYCINsrule-subsumption checker also proved
very useful during the specification of larger rule sets in the system. This
checker analyzes each new rule for possible syntactic subsumptions, or
equivalences with the premise clauses of the other rules. Wefound that,
Observations About KnowledgeAcquisition 325

EnterFarms,Rules,Savechanges,
or Go?
Rules
Rulenumber of NEW:NEW
RULE025
PREMISE:(REASON = BLEEDING, SIGBLD,FINALDEF
= COAGULATION,
DEFPATH = INTRINSIC- INTERFERENCE)
RULE025
ACTION:(DX = DXHEMOPHILIA)
BLEEDING--* BLEEDING-HISTORY?Yes
COAGULATION --, COAGULATION-DEFECT?Yes
Translate,
Nofurtherchanges,
or propname:

FIGURE16-7 Interaction with EMYCIN,using the Abbrevi-


ated Rule Language (ARL) to specify the CLOTrule shown
Figure 16-4.

for the larger rule sets, the checker detected these inconsistencies, due to
either typing mistakes or actual errors in the rule base logic, and provided
a graceful method for dealing with them. Together these facilities contrib-
uted to the ease and remarkable rapidity of construction of this consultant.
For further details on the design and operation of these aids, see van Melle
(198o).

16,5Observations About Knowledge Acquisition

To bring the SACONprogram to its present level of performance, we


estimate that two person-months of the experts time were required to
explicate the consultation task and formulate the knowledge base, and
about the same amount of time was required to implement and test the
rules. This estimate does not include the time devoted to meetings, prob-
lem formulation, demonstrations, and report writing. For the first 170
rules in the knowledge base, we estimate the average time for formulating
and implementing a rule was about four hours. The marginal time for a
new rule is about two hours.1
The construction of CLOT required approximately three days, divided
as follows. The first day was spent discussing the major medical concepts,
clinical setting, and diagnostic strategies that were appropriate for this
consultant. At the end of this period, the major subtasks of the consultant
had been sketched, and a large portion of the clinical parameters the con-
sultant would request of the physician had been mentioned. The following

ITheseestimates represent a simple average that held during the initial construction of these
projects. Theydo not reflect the wide variation in the amountof effort spent defining rules
versus the other knowledgebase developmenttasks that occurred over that time period.
326 Experience Using EMYCIN

two days were spent detailing aspects of the parameters and rules that the
EMYCINsystem required (i.e., specifying expected values, allowable
ranges on numeric parameters, question formats, etc.) and entering these
details into the system itself. We may approximate the average cost of
formulating and implementing a rule in such a system based on the num-
ber of person-hours spent in construction versus the number of rules spec-
ified. CLOTrequired about 60 person-hours to specify 60 rules yielding
a rate of 1 person-hour per rule. The marginal cost for a new rule is
expected to be similar.
Our experience explicating these rule bases provided an opportunity
to make some observations about the process of knowledge acquisition for
consultation systems. Although these observations were made with respect
to the development of SACONand CLOT, other knowledge-based con-
sultation systems have demonstrated similar processes and interactions.
Our principal observation is that the knowledge acquisition process is
composed of three major phases. These phases are characterized strongly
by the types of interaction that occur between expert and knowledge en-
gineer and by the type of knowledge that is being explicated and trans-
ferred between the participants during these interactions. At present only
a small fraction of these interactions can be held directly with the knowl-
edge-based system itself (Davis, 1976; 1977), and research continues
expand the knowledge acquisition expertise of these systems.

16.5.1 The Beginning Phase

The beginning phase of the knowledge formalization process is character-


ized by the experts ignorance of knowledge-based systems and unfamil-
iarity with the process of explicitly describing exactly what he or she knows
and does. At the same time, the knowledge engineers are notably ignorant
about the application domain and clumsily seek, by analogy, to characterize
the possible consultation tasks that could be performed (i.e., "Well, in MY-
CIN we did this .... ").
During the initial weeks of effort, the domain expert learns what tools
are available for representing the knowledge, and the knowledge engineer
becomes familiar with the important concepts of the domain. During this
period, the two formulate a taxonomy of the potential consultation areas
for the application of the domain and the types of advice that could be
given. Typically, a small fragment of the complete spectrum of consultation
tasks is selected to be developed during the following phases of the knowl-
edge acquisition effort. For example, the MYCIN project began by limiting
the domain of expertise to the diagnosis and prescription of therapy for
bacteremia (blood infections); SACON is currently restricted to determin-
ing analysis strategies for structures exhibiting nonlinear, nonthermal,
time-independent material behaviors.
Having decided on the subdomain that is to be developed and the type
ObservationsAboutKnowledge
Acquisition 327

of advice that is to be tendered, the team next identifies the major factors
(parameters) and reasoning steps (rules) that will be used to characterize
the object of the consultation (be it patient or airplane wing) and to rec-
ommendany advice. This forms the inference structure of the consultant.

16.5.2 The Middle Phase

After this initial conceptual groundworkis laid, work proceeds to detailing


the reasoning chains and developing the major rule sets in the system.
During the development of these rule sets, the amount of domain vocab-
ulary, expressed as contexts, parameters, and values, increases substantially.
Enough knowledge is explicated during this middle phase to advise a large
number of commoncases.
While developing these systems, we profited by "hand-simulating" any
proposed rules and parameter additions. In particular, major advances in
building the structural analysis knowledge base came when the knowledge
engineer would "play EMYCIN"with the expert. During the sessions the
knowledge engineer would prompt the expert for tasks that needed to be
performed. By simulating the backward-chaining manner of EMYCIN,we
asked, as was necessary, for rules to infer the parameter values, "fired"
these rules, and thus defined a large amount of the parameter, object, and
rule space used during the present consultations. This process of simulat-
ing the EMYCINsystem also helped the expert learn how the program
worked in detail, which in turn helped him develop more rules and pa-
rameters.

16.5.3 The Final Phase

Finally, when the knowledge base is substantially complete, the system de-
signers concentrate on debugging the existing rule base. This process typi-
cally involves the addition of single rules to handle obscure cases and might
involve the introduction of new parameters. However, the major structure
of the knowledge base remains intact (at least for this subdomain), and
interactions with the expert involve relatively small changes. (Chapters
and 9 describe debugging and refining a knowledge base that is nearly
complete.)
The initial development of the knowledge base is greatly facilitated
when the knowledge engineering team elicits a well-specified consultation
goal for the system as well as an inference structure such as that depicted
in Figure 16-1. Without these conceptual structures to give direction to the
knowledge explication process, a confused and unusable web of facts typ-
ically issues from the expert. Wespeculate that the value of these organi-
zational structures is not restricted to the production system methodology.
They seem to be employed whenever human experts attempt to formalize
328 Experience Using EMYCIN

their knowledge in any representation fi~rmalism, be it production rules,


predicate calculus, frames, etc. Indeed, when difficulties arise in building
a usable knowledge base, we suspect that the trouble is as likely to come
from a poor choice of inference structure as from the choice of any par-
ticular representation scheme.
The inference structure is a form of meta-knowledge, i.e., knowledge
about the structure and use of the domain expertise (see Part Nine). Our
experience shows that this meta-knowledge should be elicited and dis-
cussed early in the knowledge acquisition process, in order to insure that
a sufficient knowledge base is acquired to complete a line of reasoning,
and to reduce the time and cost of system development. Also, Chapter 29
discusses the need to explain such meta-level knowledge.
Making the inference structure an explicit part of the program would
assist the explanation, tutoring, and further acquisition of the knowledge
base. Several researchers, including Swartout (1981) and Clancey (1979b),
have employed portions of the inference structure to guide both the design
and tutoring of a knowledge-based system. The success of" this work sup-
ports the hypothesis that the inference structure will play a critical role in
the development of new knowledge-based consultation systems.
PART SIX

Explaining the
Reasoning
17
Explanation as a Topic of
AI Research

In describing MYCINsdesign considerations in Chapter 3, we pointed out


that an ability of the program to explain its reasoning and defend its advice
was an early major performance goal. It would be misleading, however, to
suggest that explanation was a primary focus in the original conception.
As was true for many elements of the system, the concept of system trans-
parency evolved gradually during the early years. In reflecting on that
period, we now find it impossible to recall exactly when the idea was first
articulated. The SCHOLAR program (Carbonell, 1970a) was our working
model of an interactive system, and we were trying to develop ways to use
that model for both training and consultation. Thus, with hindsight, we
can say that the issue of making knowledge understandable was in our
model, although it was not explicitly recognized at first as a research issue
of importance.

17.1The Early Explanation Work

Whenthe first journal article on MYCIN appeared in 1973 (Shortliffe et


al., 1973), it included examples of the programs first rudimentary expla-
nation capabilities. The basic representation and control strategies were
relatively well developed at that time, and it was therefore true that any
time the program asked a question some domain rule under consideration
had generated the inquiry. To aid with system debugging, Shortliffe had
added a RULEcommand that asked MYCINto display (in LISP) the rule
currently under consideration. At the weekly research meetings it was ac-
knowledgedthat if the rules were displayed in English, rather than in LISP,
they would provide a partial justification of the question for the user and
thereby be useful to a physician obtaining a consultation. Wethen devised
the translation mechanism (described in Chapter 5), assigning the TRANS

331
332 Explanation
as a Topicof AI Research

property to all clinical parameters, predicate functions, and other key data
structures used in rules. Thus, when a user typed "RULE"in response to
a question from MYCIN,a translation of the current rule was displayed
as an explanation. This was the extent of MYCINsexplanation capability
when the 1973 paper was prepared.
At approximately the same time as that first article appeared, Gorry
published a paper that influenced us greatly (Gorry, 1973). In retrospect,
we believe that this is a landmark essay in the evolution of medical AI. In
it he reviewed the experience of the M.I.T. group in developing a program
that used decision analysis techniques to give advice regarding the diag-
nosis of acute renal failure (Gorry et al., 1973). Despite the successful
decision-making performance of that program, he was concerned by its
obvious limitations (p. 50):

Decision analysis is a useful tool whenthe problemhas been reduced to


a small, well-definedtask of action selection. [However,]it cannot be the sole
basis of a programto assist clinicians in an area such as renal disease.

He proceeded to describe the M.I.T. groups nascent work on an AI system


that used "experimental knowledge" as the basis for understanding renal
diseases 1 and expressed excitement about the potential of the symbolic
reasoning techniques he had recently discovered (p. 50):

The newtechnology [AI] . .. has greatly facilitated the development[of


the prototype system] and it seems likely that a muchimprovedprogramcan
be implemented. The real question is whether sufficient improvementcan
be realized to makethe programuseful. At present, we cannot answer the
question, but I can indicate the chief problemareas to be explored: [concept
identification, language development,and explanation].

Wewill not dwell here on his discussion of the first two items, but regarding
the third (p. 51):

,If experts are to use and improvethe programdirectly, then it must be


able to explain the reasons for its actions. Furthermore,this explanation must
be in terms that the physician can understand. The steps in a deduction and
the facts employedmust be identified for the expert so that he can correct
one or more of them if necessary. As a corollary, the user must be able to
find out easily what the programknowsabout a particular subject.

Gorrys discussion immediately struck a sympathetic chord for us in


our own work. The need for explanation to provide transparency and to
encourage acceptance by physicians seemed immediately intuitive, not only
for expert system builders (as Gorry discussed) but also tor the eventual

IThis programlater becamethe Present Illness Program(Paukeret al., 1976).


The Early Explanation Work 333

end-users of consultation systems. 2 Our early RULEcommand, however,


did not meet the criteria for explanation outlined by Gorry above.
During the next two years, the development of explanation facilities
for MYCINbecame a major focus of the research effort. Randy Davis had
joined the project by this time, and his work on the TEIRESIASprogram,
which would become his thesis, started by expanding the simple RULE
commandand language translation features that Shortliffe had developed.
Davis changed the RULE command to WHYand implemented a history
tree (see Chapter 18) that enabled the user to examine the entire reasoning
chain upward to the topmost goal by asking WHYseveral times in succes-
sion. He also developed the HOWfeature, which permitted the user to
descend alternate branches of the reasoning network. By the time the
second journal article appeared in 1975 (Shortliffe et al., 1975), explana-
tion and early knowledge acquisition work were the major topics of the
:~
exposition.
In addition to the RULEcommand, Shortliffe developed a scheme
enabling the user to ask free-text questions at the end of a session after
MYCINhad given its advice. He was influenced in this work by Dr. Ken
Colby, then at Stanford and actively involved in the development of the
PARRY program (Colby et al., 1974). Shortliffe was not interested in un-
dertaking cutting-edge research in natural language understanding (he
had taken Roger Schanks course at Stanfbrd in computational linguistics
and realized it would be unrealistic to tackle the problem exhaustively for
a limited portion of his owndissertation work). He was therefore convinced
by Colbys suggestion to exploit existing methods, such as keyword search,
and to take advantage of the limited vocabulary used in the domain of
infectious diseases. The resulting early version of MYCINsquestion-an-
swering system was described in a chapter of his dissertation (Shortliffe,
1974).
WhenCarli Scott first joined the project, she was completing a masters
degree in computer science and needed a project to satisfy her final re-
quirements. She was assigned the task of refining and expanding the ques-
tion-answering (QA) capability in the program. Not only did this work
complete her M.S. requirements, but she continued to devote much of her
time to explanation during her next few years with the project. She was
assisted in this work by Bill Clancey, then a Ph.D. candidate in computer
science, who joined us at about the same time. MYCINsexplanation ca-
pability was tied to its rule-based representation scheme, so Clancey was
particularly interested in how the therapy algorithm might be transferred
from LISP code into rules so that it could be made accessible to the expla-
nation routines. His work in this area is the subject of Chapter 6 in this
volume.

2Almost ten years later we undertook a formal study (described in Chapter 34) that confirmed
this early intuition. A survey of 200 physicians revealed that high-quality explanation capa-
bilities were the most important requirement for an acceptable clinical consultation system.
3This simple model of explanations still has considerable appeal. See Clark and McCabe
(1982) for a discussion of implementing WHYand HOWin PROLOG,for example.
334 Explanation as a Topic of AI Research

By late 1976 the explanation features of the system had become highly
polished, and Scott, Clancey, Davis, and Shortliffe collaborated on a paper
that appeared in the American Journal of Computational Linguistics in 1977.
That paper is included here as Chapter 18. It describes MYCINsexpla-
nation capabilities in some detail. Although most of the early work de-
scribed in that chapter stressed the need to provide explanations to users,
we have also seen the value such capabilities have tor system builders. As
mentioned in Chapters 9 and 20, system builders--both experts and knowl-
edge engineers--find explanations to be valuable debugging aids. The fea-
tures described in Chapter 18 were incorporated into EMYCIN and exist
there relatively unchanged to the present.

17.1.1 Explaining the Pharmacokinetic Dosing Model

By the mid-1970s much of the project time was being spent on knowledge
base refinement and enhancement. Because we needed assistance from
someone with a good knowledge of the antimicrobial agents in use, we
sought the involvement of a clinical pharmacist. Sharon Bennett, a recent
pharmacy graduate who had taken a clinical internship at the Palo Alto
Veterans Administration Hospital affiliated with Stanlord, joined the proj-
ect and played a key role in knowledge base development during the mid-
to late-1970s. Amongthe innovations she brought to the group was an
eagerness to heighten MYCINsutility by making it an expert at dosage
adjustment as well as drug selection. She and Carli Scott worked together
closely to identify the aspects of pharmacokinetic modeling that could be
captured in rules and to identify the elements that were so mathematical
in nature that they required encoding in special-purpose functions. By this
time, however, the need for explanation capabilities had becomeso obvious
to the projects members that even this specialized code was adapted so
that explanations could be provided. A paper describing the features, in-
cluding a brief discussion of explanation of dosing, was prepared for the
American Journal of Hospital Pharmacyand is included here as Chapter 19.
Weinclude the paper here not only because it demonstrates the special-
purpose explanation features that were developed, but also because it
shows the way in which mathematical modeling techniques were integrated
into a large system that was otherwise dependent on AI representation
methods.

17.2Recent Research in Explanation

Even after research on MYCINterminated, the development of high-per-


formance explanation capabilities for expert systems remained a major
focus of our work. Several small projects and a few doctoral dissertations
Recent Research in Explanation 335

have dealt with~the issue. This level of interest developed out of the MYCIN
experience and a small group seminar series held in 1979 and 1980. Sev-
eral examples of inadequate responses by MYCIN(to questions asked by
users) were examined in an effort to define the reasons for suboptimal
performance. One large area of problems related to MYCINslack of sup-
port knowledge, the underlying mechanistic or associational links that explain
why the action portion of a rule follows logically from its premise. This
limitation is particularly severe in a teaching setting where it is incorrect
to assume that the system user will already knowmost rules in the system
and merely needs to be reminded of their content. Articulation of these
points was largely due to Bill Clanceys work, and they are a central element
of his analysis of MYCINsknowledge base in Chapter 29.
Other sources of MYCINsexplanation errors were its failure to deal
with the context in which a question was asked (i.e., it had no sense of
dialogue, so each question required full specification of the points of in-
terest without reference to earlier exchanges) and a misinterpretation of
the users intent in asking a question. Wewere able to identify examples
of simple questions that could mean four or five different things depend-
ing on what the user knows, the information currently available about the
patient under consideration, or the content of earlier discussions. These
issues are inevitably intertwined with problems of natural language un-
derstanding, and they reflect back on the second of Gorrys three concerns
(language development) mentioned earlier in this chapter.
Partly as a result of work on the problem of student modeling by Bill
Clancey and Bob London in the context of GUIDON,we were especially
interested in how modeling the users knowledge might be used to guide
the generation of explanations. Jerry Wallis began working on this problem
in 1980 and developed a prototype system that emphasized causal reason-
ing chains. The system associated measures of complexity with both rules
and concepts and measures of importance with concepts. These reasoning
chains then guided the generation of explanations in accordance with a
users level of expertise and the reasoning details that were desired. Chap-
ter 20 describes that experimental system and defines additional research
topics of ongoing interest.
Our research group continues to explore solutions to the problems of
explanation in expert systems. John Kunz has developed a program called
AI/MM(Kunz, 1983), which combines simple mathematical models, phys-
iologic principles, and AI representation techniques to analyze abnormal-
ities in fluids and electrolyte balance. The resulting system can use causal
links and general laws of nature to explain physiologic observations by
reasoning from first principles. The program generates English text to
explain these observations.
Greg Cooper has developed a system, known as NESTOR,that cri-
tiques diagnostic hypotheses in the area of calcium metabolism. In order
to critique a users hypotheses, his system utilizes powerful explanation
capabilities. Similarly, the work of Curt Langlotz, who has adapted ON-
COCINto critique a physicians therapy plan (see Chapter 32), requires
336 Explanation
as a Topicof AIResearch

the programto explain the basis for any disagreementsthat occur. Langlotz
has developed a technique knownas hierarchical plan analysis (Langlotz
and Shortliffe, 1983), whichcontrols the comparisonof two therapy plans
and guides the resulting explanatory interaction. Langlotz is also pursuing
a new line of investigation that we did not consider feasible during the
MYCIN era: the use of" graphics capabilities to facilitate explanations and
to minimizethe need for either typing or natural language understanding.
Professional workstations and graphics languages have recently reduced
the cost of high-resolution graphics systems (and the cost of programming
them) enough that we expect considerably more work in this area.
Bill Clanceys NEOMYCIN research (Clancey and Letsinger, 1981),
mentionedbriefly in Chapter 21 and developedpartially in response to his
analysis of MYCIN in Chapter 29, also has provided a fertile arena for
explanation research. Diane Warner Hasling has worked with Clancey to
develop an explanation feature for NEOMYCIN (Hasling et al., 1983)
similar to the HOWsand WHYsof MYCIN (Chapter 18). Because NEO-
MYCIN is largely guided by domain-independent meta-rules, however,
useful explanations cannot be generated simply by translating rules into
English. NEOMYCIN is raising provocative questions about howstrategic
knowledgeshould be capsulized and instantiated in the domainfor expla-
nation purposes.
Finally, we should mention the work of RandyTeach, an educational
psychologist whobecamefascinated by the problem of explanation, in part
because of the dearth of published information on the subject. Teach
joined the project in 1980, discovered the issue while workingon the survey
of physicians attitudes toward computer-based consultants reported in
Chapter 34, and undertook a rather complexpsychological experiment in
an attempt to understand howphysicians explain their reasoning to one
another (Teach, 1984). Wementionthe work because it reflects the way
which the legacy of MYCIN has broadened to involve a diverse group of
investigators from several disciplines. Webelieve that explanation contin-
ues to provide a particularly challenging set of issues for researchers from
computerscience, education, psychology, linguistics, philosophy, and the
domainsof potential application.

17.3Current Perspective

Webelieve nowthat there are several overlapping reasons for wanting an


expert system to explain its reasoning. These are

understanding
debugging
CurrentPerspective 337

education
acceptance
persuasion

Understanding the contents of the knowledge base and the line of


reasoning is a major goal of work on explanation. Both the system builder
and the user need to understand the knowledge in the system in order to
maintain it and use it effectively. The system can sometimes take the ini-
tiative to infbrm users of its line of reasoning, such as when MYCIN prints
intermediate conclusions about the type of infection or the likely identities
of organisms causing a problem. More often, however, we think of a system
providing explanations in response to specific requests.
The debugging rationale is important, especially because knowledge
bases are built incrementally. As mentioned, this was one of Shortliffes
original motivations for displaying the rule under consideration. This line
of research continues in work to provide monitoring tools within program-
ming environments so that a system builder can watch what a system is
doing while it is running. Mitch Models Ph.D. research (Model, 1979) used
MYCINas one example for the monitoring tools he designed. His work
shows the power of describing a reasoning systems activities along several
different dimensions and the power of displaying those activities in dif-
ferent windowson a display screen.
Education is another important reason to provide insights into a
knowledge base. Users who feel they learn something by interacting with
an expert system are likely to use it again. As discussed in Part Eight,
educating users can become as complex as providing good advice. In any
case, making the knowledge base and line of reasoning understandable is
a necessary step in educating users. This line of research continues in
Clanceys work on NEOMYCIN (Clancey and Letsinger, 1981).
Acceptance and persuasion are closely linked. Part of making an
expert system acceptable is convincing potential users and managers that
its conclusions are reasonable. That is, if they understand how a system
reaches conclusions on several test cases and believe that process is reason-
able, they will be more likely to trust its conclusions on new cases. For the
same reason, it is also important to show that the system is responsive to
differences between cases.
Persuading users that a systems conclusions are correct also requires
the same kind of window into the knowledge base and line of reasoning.
Whenusing a consultant program, a person is expected to understand the
conclusions (and the basis for them) well enough to accept responsibility
for acting on them. In medicine, for example, physicians have a moral and
legal responsibility for the consequences of their actions, so they must
understand why--and sometimes be persuaded that--a consultants rec-
ommendations are appropriate.
18
Methods for Generating
Explanations

A. Carlisle Scott, William J. Clancey,


Randall Davis, and Edward H. Shortliffe

A computer program that models an expert in a given domain is more


likely to be accepted by experts in that domain, and by nonexperts seeking
its advice, if the system can explain its actions. This chapter discusses the
general characteristics of explanation capabilities for rule-based systems:
what types of explanations they should be able to give, what types of knowl-
edge they will need in order to give these explanations, and how this knowl-
edge might be organized (Figure 18-1). The explanation facility in MYCIN
is discussed to illustrate how the various problems can be approached.
A consultative rule-based system need not be a psychological model,
imitating a humans reasoning process. The important point is that the
system and a human expert use the same (or similar) knowledge about the
domain to arrive at the same (or similar) answers to a given problem. The
systems knowledge base contains the domain-specific knowledgeof an expert
as well as facts about a particular problem under consideration. Whena
rule is used, its actions make changes to the internal data base, which
contains the systems decisions or deductions.
The process of trying rules and taking actions can be compared to
reasoning, and explanations require displays of how the rules use the in-
formation provided by the user to make various intermediate deductions
and finally to arrive at the answer. If the information contained in these
rules adequately shows why an action was taken (without getting into pro-
grammingdetails), an explanation can simply entail printing each rule or
its free-text translation.

Thischapteris a revisedversionof a paperoriginally appearingin American


Journalof Com-
putationalLinguistics, Microfiche62, 1977. Copyright1977by American Societyfor Com-
putationalLinguistics.All rights reserved.Usedwithpermission.

338
Methods for Generating Explanations 339

Static Knowledge I

Judgmental Knowledge
about domain

DATA BASE

Knowledge of
domain

I
EXPLANATIONI
CAPABILITY
I GeneralFactual I

I t__
explanations
Dynamic Knowledge

Facts about
,~
II the problem
entered by user

consultative
advice
I Deductions
made by system
I
~

FIGURE 18-1 A rule-based consultation system with expla-


nation capability. The three components of a rule-based system
(a rule interpreter, a set of production rules, and a data base)
are augmented by an explanation capability. The data base is
made up of general facts about the systems domain of expertise,
facts that the user enters about a specific problem, and deduc-
tions made about the problem by the systems rules. These de-
ductions form the basis of the systems consultative advice. The
explanation capability makes use of the systems knowledge
base to give the user explanations. This knowledge base is made
up of static domain-specific knowledge (both factual and judg-
mental) and dynamic knowledge specific to a particular prob-
lem.

Pertbrmance Characteristics of an Explanation Capability

The purpose of an explanation capability (EC) is to give the user access


as muchof the systems knowledge as possible. Ideally, it should be easy
for a user to get a complete, understandable answer to any sort of question
about the systems knowledge and operation--both in general terms and
340 Methods
for GeneratingExplanations

with reference to a particular consultation. This implies three major goals


in the development of an explanation capability:

1. It is important to ensure that the EC can handle questions about all


relevant aspects of the systems knowledge and actions. It should be
capable of giving several basic types of explanation, for example,

how it mode a certain decision


how it used a piece of information
what decision it made about some subproblem
why it did not use a certain piece of information
why it failed to make a certain decision
why it required a certain piece of information
why it did not require a certain piece of information
how it will find out a certain piece of information (while the consul-
tation is in progress)
what the system is currently doing (while the consultation is in prog-
ress)

2. It is important to enable the user to get an explanation that answers


the question completely and comprehensively.
3. Finally, it is also necessary to make the ECeasy to use. A novice should
be able to use the EC without first spending a large amount of time
learning how to request explanations.

Wewill distinguish two functions for an EC: the reasoning status checker
(RSC) to be used during the consultation, and the general question an-
swerer (GQA)to be used during the consultation or after the system has
printed its results. An RSCanswers questions asked during a consultation
about the status of the systems reasoning process. A few simple commands
often suffice to handle the questions that the RSCis expected to answer.
A GQAanswers questions about the current state of the systems knowl-
edge base, including both static domain knowledge and facts accumulated
during the consultation. It must recognize a wide range of question types
about manyaspects of the systems knowledge. For this reason, a few simple
commandsthat are easy to learn but still cover all the possible questions
that might be asked may be difficult to define. Consequently, natural lan-
guage processing may be important for a useful GQA.
In an interactive consultation, the system periodically requests infor-
mation about the problem. This offers the user an opportunity to request
explanations while the consultation is in progress. In noninteractive con-
sultations, the user has no opportunity to interact with the system until
after it has printed its conclusions. Unless there is a mechanismfor inter-
rupting the reasoning process and asking questions, the EC for such a
Methods for Generating Explanations 341

KNOWLEDGE
BASE OF CONSULTATIONSYSTEM

Knowledge i i o,nmc Knowledge

HISTORICAL KNOWLEDGE
OF CONSULTATION

Recordof all deductions madeduring


the consultation

PROCEDURALKNOWLEDGEABOUT THE
CONSULTATIONSYSTEM
Knowledge of [
Knowledge of

I production
rules
the rule
interpreter

MISCELLANEOUSDOMAIN-INDEPENDENT
KNOWLEDGE

I settheory [ [~-] I arithmetic I

FIGURE 18-2 Knowledge requirements for an explanation ca-


pability (EC). Access to the consultation systems knowledge
base is a prerequisite for adequate performance of the EC. Other
types of knowledge may be added to the system to enable the
EC to answer a wider range of questions.

system will be limited to questions about the systems final knowledgestate.


It will have no RSC.
An EC must know what is in the systems knowledge base and how it
is organized (Figure 18-2). In order to give explanations of the systems
actions, an ECalso needs to understand how the systems rule interpreter
works: when rules will be tried, how they can fail, and what causes the
interpreter to try one rule but not another. This general "schema" for how
or why certain rules are used, together with a comprehensive record of
the specific actions taken during a particular consultation, can be used as
a basis for explaining the results of that consultation.
An RSCwill need a record of what the system has done in order to
explain how it arrived at the current step. General knowledge of how the
rule interpreter works is necessary to explain where the current step will
lead. The ability to understand individual rules is necessary to the extent
342 Methods
for GeneratingExplanations

that the content of" a rule mayexplain whyit was necessary to use that rule
or may affect which rules will be tried in the future.
A GQAwill need more information about the system since the scope
of its explanations is much broader. It must know how the system stores
knowledge about its area of expertise (the static knowledge with which it
starts each consultation), how it stores facts gathered during a particular
consultation (its dynamic knowledge), and how the dynamic knowledge
was obtained or inferred. Thus the GQAmust have access to all the in-
formation that the RSCuses: a detailed record of" the consultation, an
understanding of the rule interpreter, and the ability to understand rules.

18.1Design Considerations

To complement the preceding discussion of an EC, we must describe rel-


evant design considerations for the parent consultation system. This dis-
cussion is not meant to define the "correct" way of representing or orga-
nizing knowledge, but rather to mention factors that should be taken into
account when deciding what representation or organization will be best
for a developing system.
The first step is to decide what basic types of questions the system
should be able to answer. This will have a direct influence on how the EC
is implemented. It is important, however, to makethe initial design flexible
enough to accommodate possible future additions; if the basic forms are
sufficiently diverse, limited natural language understanding may be nec-
essary, depending on the level of performance expected of the EC.
The format and organization of the consultation systems knowledge
base will also affect the design of an ECbecause both static and dynamic
knowledge must be readily accessible. The more disorganized the knowl-
edge base, the more difficult will be the task of the EC because more
complicated routines will be needed to access the desired information.
Similarly, when the ordering of events is important, the dynamic record
must reflect that ordering as well as the reasons why each event occurred.
The EC often needs to understand the underlying semantics of indi-
vidual rules. This requirement can be met by having the systems knowl-
edge base include a description of what each rule means, encoded in a
form that is of use to the EC. If the fi)rmat of the systems rules is highly
stylized and well defined, however, it is possible instead to implement a
mechanismfor "reading" the rules and describing their meaning in natural
language. This can be achieved through a high-level description of the
individual components of the rules, one that tells what each element
means. If the rule set consists of a large number of rules, and they are
composedentirely of a relatively small number of primitive elements, this
second approach has the advantage that less information needs to be
An Example--MYCIN 343

stored--a description of each of the primitive components, as opposed to


a description of each rule. Whennew rules are added to the system, the
first approach requires that descriptions of these rules must be added. With
the second approach, provided that the new rules are constructed from
the standard rule components, no additional descriptive information is
needed.
As well as understanding rules in the knowledge base, an EC must
also be able to "read" the interpreter or have access to some stored de-
scription of howthe interpreter works. A third option is to build knowledge
of how the interpreter works directly into the EC; the information need
not be stated explicitly but can be used implicitly by the programmer in
writing the actual EC code. The ECcan then function as a set of "special-
ists," each capable of giving a single type of explanation.
Finally, the GQAgenerally must be able to make deductions from facts
in the knowledge base. If logic is needed only to determine the answers to
questions of a certain type, it may be possible to build the necessary de-
ductions into the specialist for answering that type of question. On the
other hand, the GQAwill often need to be expanded to do more than
simply give explanations of the systems actions or query its data base--it
will be expected to answer questions involving inferences (e.g., to check
for equality or set membership, to make arithmetical comparisons, or to
make logical deductions). Information of this type can often be embodied
in a new kind of specialist that deals with logical deduction or comparison.

18.2An Example--MYCIN

MYCINsdomain of expertise, its mechanisms for knowledge representa-


tion, and its inference mechanismshave been discussed in detail earlier in
this book. Wewill not repeat those points here except to emphasize issues
that relate directly to this discussion.

18.2.1 Organization of Knowledge in MYCIN

As we have discussed, an EC must have access to all components of the


systems knowledge base. MYCINsknowledge base consists of static med-
ical knowledgeplus dynamic knowledge about a specific consultation. Static
knowledgeis further classified as factual or judgmental. Factual knowledge
consists of facts that are medically valid, by definition and with certainty,
independent of the particular case. Judgmental knowledge, on the other
hand, is composed of the rules acquired from experts. Although this
knowledge is also assumed to be medically valid, the indicated inferences
are often drawn with less than complete certainty and are seldom defini-
344 Methods
for GeneratingExplanations

tional. The conventions [or storing both dynamic and static knowledge,
including attribute-object-value triples, tables, lists, and rules themselves,
are described in detail in Chapter 5.

Knowledge of Rule Structure

Each of MYCINsrules is composed of a small number of conceptual prim-


itives drawn from a library of 60 such primitives that make up the language
in which rules are written. This design has facilitated the implementation
of a mechanismfor translating rules into English (described in Chaper 5).
Each primitive function has a template (Chapter 9) with blanks to be filled
in using translations of the functions arguments. A large part of MYCINs
explanation capability depends on this ability to translate rules into a form
that the user can understand.
In order to understand rules, the systems various specialists use a
small amount of knowledge about rules in general, together with descrip-
tions or templates of each of the rule components. As an example, the
following rule (shown in LISP and its English translation) is composed
the units SAND, SAME, and CONCLUDE:
RULE009
PREMISE:
(SAND(SAMECNTXTGRAMGRAMNEG)
(SAMECNTXTMORPHCOCCUS))
ACTION:
(CONCLUDECNTXT
IDENTITY
NEISSERIA
TALLY
800)
IF: 1) Thegramstainof theorganism
is gramneg,and
2) Themorphology
of theorganism
is coccus
THEN:There is strongly
suggestive
evidence
(.8)thattheIdentity
of theorganism
is Neisseria
Whenthe rule is used, the LISP atom CNTXTis bound to some object,
the context to which the rule is applied; see Chapter 5. The template for
CONCLUDE is shown below. This describes each of the arguments to the
function: first, an object (context); second, an attribute (clinical parameter);
third, a value for this parameter; [ourth, the tally, or degree of certainty,
of the premise; and last, the certainty factor, a measure of how strong our
belief in this conclusion would be if the premise of the rule were definitely
true.

Template for CONCLUDE: (CNTXTPARM


VALUTALLYCF)

Having a small number of rule components also facilitates examination


of rules to see which might be applicable to the explanation at hand. MY-
CINs knowledge of rules, therefore, takes the form of a general mecha-
nism for "reading" them. On the other hand, no attempt has been made
to read the code of the rule interpreter. Procedural knowledge about the
interpreter is embodiedin "specialists," each capable of answering a single
An ExampIe--MYCIN 345

type of question. Each specialist knowshow the relevant part of the control
structure works and what pieces of knowledge it uses.
To understand how a specialist might use a template such as that
shown above, consider an explanation that involves finding all rules that
can conclude that the identity of an organism is Neisseria. The appropriate
specialist would start with those rules used by the system to conclude values
for the parameter IDENTITY.Using templates of the various action func-
tions that appear in each of these rules, the specialist picks out only those
(like Rule 009) that have NEISSERIAin their VALUslot.
This also illustrates the sort of knowledge that can be built into a
specialist. The specialist knowsthat the control structure uses stored lists
telling which rules can be used to determine the value of each parameter.
Furthermore, it knowsthat it is necessary to look only at the rules actions
since it is the action that concludes facts, while the premise uses facts.

The History Tree

Manyof the ECs specialists need a record of the interaction with the user.
This record is built during the 5onsultation and is organized into a tree
structure called the history tree, which reflects MYCINs goal-directed ap-
proach. Each node in the tree represents a goal and contains information
about how the system tried to accomplish this goal (by asking the user or
by trying rules). Associated with each rule is a record of whether or not
the rule succeeded, and if not, whyit failed. If evaluating the premise of
a rule causes the system to trace a new parameter, thereby setting up a
new subgoal, the node for this subgoal is the offspring of the node con-
taining the rule that caused the tracing. Figure 18-3 shows part of a rep-
resentative history tree. In this example, Rule 003 caused the tracing of
the parameter CATEGORY, which is used in the premise of this rule.

Other Domain-Independent Knowledge

MYCINsquestion-answering ability is limited to describing the systems


actions and explaining what facts the system knows. The system also has
capabilities for the use of specialized logic. For example, to explain whya
particular decision was not made, MYCINrecognizes that a reasonable
response is to explain what prevented the system from using rules that would
have made that decision. For situations such as this, the necessary logic is
built into the appropriate specialist; there is no general representation of
knowledge about logic, arithmetic, or set theory. To find out if ORGA-
NISM-I and ORGANISM-2 have the same identity, for example, it is nec-
essary fk)l" the user to ask separately fk)r the identity of each organism, then
to compare the answers to these questions.
346 Methodsfor Generating Explanations

goal: IDENTITY of O,qGANISM-1


ask: question7
rules: RULEO09(failed, clause 1) ... RULEO03(succeeded)...

goal:
I
GRAMof ORGANISM-1 goal:
I
CATEGORYOF ORGANISM-1

L ask: question 11
[no rules]
rules: RULEO37(succeeded) ...

__[ goal: HOSPITAL-ACQUIREDof

ORGANISM-1
ask: question 15
[no rules]

FIGURE18-3 Portion of a history tree. (Rule 009 is shown in


the text; see Figure 18-4 for Rule 003 and Rule 037.)

18.2.2 Scope of MYCINs Explanation Capability (EC)

Because we wish to allow the user to see how MYCINmakes all its decisions,
we have tried to anticipate all types of questions a user might ask and to
make every part of the systems knowledge base and reasoning process
accessible. The EC consists of several specialists, each capable of giving one
type of explanation. These specialists are grouped into three sets: one [or
explaining what the system is doing at a given time, one for answering
questions about the systems static knowledge base, and one tbr answering
questions about the dynamic knowledge base. The first set forms MYCINs
reasoning status checker; the second and third together make up the sys-
tems general question answerer.

MYCINs Reasoning Status Checker (RSC)

Whenever MYCIN asks a question, the user is allowed to examine the


current reasoning chain by asking WHYthe piece of information being
sought by the system is important. As explained above, the system asks a
question in order to find out about its current goal. Consider the partial
history tree shown in Figure 18-3. HOSPITAL-ACQUIRED is one subgoal,
CATEGORY is another at the next level up, and Rule 037 links them. The
"reason" for asking if the infection was hospital-acquired, then, is the pro-
grams use of Rule 037 in an effort to determine the value of the higher
subgoal CATEGORY.The answer to WHYat this point is thus a lucid
display of the goals and the rules linking them. Since any WHYcould
An Example--MYCIN 347

[preceded
by the first 14questionsin the consultation]

15) Is the patients illness with ORGANISM-1 a hospital-acquiredinfection?


** WHY
[i.e. WHY is it importantto determine whetheror not the infectionwith
ORGANISM-1 wasacquiredwhile the patient washospitalized?]
[1.0] This will aid in determiningthe categoryof ORGANISM-1.
It hasalreadybeenestablishedthat
[1.1] the gramstain of ORGANISM-1 is gramneg,and
[1.2] the morphology of ORGANISM-1 is rod, and
[1.3] the aerobicityof ORGANISM-1 is facultative
Therefore, if
[1.4] the infection with ORGANISM-1 wasnot acquiredwhile the
patient washospitalized
then
thereis stronglysuggestive evidence (.8) that the category
ORGANISM-1 is enterobacteriaceae
ALSO: there is weaklysuggestive evidence(.1) that the identity
ORGANISM-1 is pseudomonas-aeruginosa
[RULE037]
** WHY
[i.e. WHY is it importantto determinethe categoryof ORGANISM-l?]
[2.0]... in order to determinethe identity of ORGANISM-I.
It hasalreadybeenestablishedthat
[2.1] this bloodculturewastakenfroma sterile source
Therefore, if
[2.2] this currentorganism andat least oneof the list of members
associatedwith the categoryof the organism agreewith
respectto the followingproperties:air conformation
then
Thereis stronglysuggestive evidence(.9) that eachof themis the
identity of ORGANISM-1
[RULE003]
** HOW 1.3
[i.e. HOW wasit establishedthat the aerobicity of ORGANISM-1 is
facultative?]
Thefollowing wereused:
[3.1] RULE027 indicatedthere is stronglysuggestive evidence (.8) that the
aerobicity of ORGANISM-1 is facultative
[3.2] RULE027 indicatedthere is weaklysuggestiveevidence (.2) that the
aerobicity of ORGANISM-1 is anaerobic
Sincethis gavea cumulative CFof (.8) for facultative, and(.2)
anaerobic,it hasbeenestablishedthat the aerobicity of ORGANISM-1
is facultative
** HOW 1.1
[i.e. HOW wasit establishedthat the gramstain of ORGANISM-1 is gramneg?]
Yousaid so[question11].

FIGURE 18-4 MYCINs reasoning status checker (user entries


follow the double asterisks).

potentially have several interpretations, and we wanted to avoid extensive


natural language processing in this component of MYCINsEC, MYCIN
first restates its understanding of the meaning of the users WHY (Figure
18-4). The higher goal is then presented, followed by the rule. The clauses
in the premise of the rule are divided into those already established and
those yet to be determined. Finally, since rules may have multiple conclu-
348 Methodsfor Generating Explanations

IS BLOODA STERILE SITE?


WHAT ARE THE NONSTERILE SITES?
WHAT ORGANISMSARE LIKELY TO BE FOUNDIN THE THROAT?
IS BACTEROIDESAEROBIC?
WHAT METHODSOF COLLECTING SPUTUM CULTURES DO YOU CONSIDER?
WHAT DOSAGE OF STREPTOMYCIN DO YOU GENERALLY RECOMMEND?
HOWDO YOU DECIDE THAT AN ORGANISM MIGHT BE STREPTOCOCCUS?
WHY DO YOU ASK WHETHERTHE PATIENT HAS A FEVER OF UNKNOWNORIGIN?
WHAT DRUGS WOULDYOU CONSIDER TO TREAT E.COLI?
HOWDO YOU USE THE SITE OF THE CULTURETO DECIDE AN ORGANISMS IDENTITY?

FIGURE18-5 Sample questions about MYCINsstatic knowl-


edge.

sions about different clinical parameters, the relevant conclusion is pre-


sented first and all others follow.
As Figure 18-4 illustrates, additional links in the reasoning chain can
be examined by repeating the WHYcommand. For any of the subgoals
mentioned in answer to a WHY, the user may ask HOWthis goal was (or
will be) achieved. MYCINs reasoning status checker is described in more
detail by Shortliffe et al. (1975) and Davis et al. (1977).

MYCINs General Question Answerer (GQA)

The question-answering part of the system has natural language routines


for analyzing the users input. The system recognizes questions phrased in
a number of ways, thereby making the question-answering facility easier
to use. Questions about the static knowledge base may deal with judgmental
knowledge (e.g., rules used to conclude a certain piece of information)
they may ask about factual knowledge (e.g., entries in tables and lists). Some
questions about static knowledge are shown in Figure 18-5.
Perhaps the more important part of the question-answering system is
its ability to answer questions about a particular consultation. While some
users may be interested in checking the extent of MYCINs static knowl-
edge, most questions will ask for a justification of, or for the rationale
behind, particular decisions that were made during the consultation. Listed
in Figure 18-6 are the types of questions about dynamic knowledge that
can be handled at present. A few examples of each type are given. The
slot <cntxt> indicates some context that was discussed in the consultation;
<parm> is some clinical parameter of this context; <rule> is one of the
systems decision rules. Before a question can be answered, it must be
classified as belonging to one of these groups. As Figure 18-6 illustrates,
each question type may be asked in a variety of ways, some specifying the
parameters value, some phrased in the negative, and so forth. MYCINs
natural language processor must classify the questions, then determine
what specific clinical parameters, rules, etc., are being referenced.
An ExampIe--MYCIN 349

1. What is <parm> of <cntxt>?


TO WHATCLASSDOESORGANISM-IBELONG?
IS ORGANISM-ICORYNEBACTERIUM-NON-DIPHTHERIAE?

2. Howdo you know the value of <parm>of <cntxt>?


HOWDO YOU KNOWTHAT CULTURE-1WASFROMA STERILE SOURCE?
DID YOUCONSIDERTHATORGANISM-1MIGHTBE A BACTEROIDES?
WHYDONTYOUTHINKTHATTHE SITE OF CULTURE-1
IS URINE?
WHYDID YOURULEOUTSTREPTOCOCCUS AS A POSSIBILITY FORORGANISM-l?

3. Howdid you use <parm> of <cntxt>?


DID YOUCONSIDERTHE FACTTHATPATIENT-1IS A COMPROMISED
HOST?
HOWDID YOUUSE THE AEROBIClTYOF ORGANISM-l?

4. Wily didnt you find out about <parm>of <cntxt>?


DID YOUFIND OUTABOUTTHE CBCASSOCIATED
WITH CULTURE-l?
WHYDIDNT YOUNEEDTO KNOW WHETHER
ORGANISM-1 IS A CONTAMINANT?

5. What (lid <rule> tell you about <cntxt>?

HOWWASRULE178 HELPFULWHENYOU WERECONSIDERINGORGANISM-l?


DID RULE116 TELLYOUANYTHINGABOUT
INFECTION-l?
WHYDIDNT YOUUSE RULE189 FORORGANISM-2?

FIGURE18-6 Types of questions about a consultation, with


examples.

18.2.3 Understanding the Question

The main emphasis in the development of MYCINhas been the creation


of a system that can provide sound diagnostic and therapeutic advice in
the field of" infectious diseases. The explanation system was included in the
systems original design in order to make the consultation programs de-
cisions acceptable, justifiable, and instructive. Since the question-answering
facility was not the primary focus of the research, it is not designed to be
a sophisticated natural language understander. Instead, it uses crude tech-
niques, relying strongly on the very specific vocabulary of the domain, to
"understand" what information is being requested (Figure 18-7).
The analysis of a question is broken into three phases (Steps 1-3 of
Figure 18-7): the first creates a list of terminal, or root, words; the second
determines what type of question is being asked (see the classification of
questions above); and the last determines what particular parameters, lists,
etc., are relevant to the question. In the first and third steps, the system
dictionary is important. The dictionary contains approximately 1400 words
that are commonlyused in the domain of infectious diseases. It includes
all words that are acceptable values for a parameter, commonsynonyms
350 Methods
for GeneratingExplanations

1. The question is reduced to a list of terminal words.


2. Pattern matchingclassifies the question as a rule-retrieval question, and divides
it into a premise part and an action part.
3. Dictionary properties of the terminal words are used to determine which pa-
rameters (and their values) are relevant to each part of the question. These
vocabulary clues are listed in the fbrm (<parm>(<values>) weight) where
weight is used by the scoring mechanismto determine which parameters should
be eliminated from consideration.
4. Alter selecting only the most strongly indicated parameters,tile final translation
tells whatrules can answerthe question: there are no restrictions on the premise,
and the action must contain the parameter CONTAMINANT with any value.
5. Theanswerconsists of finding all rules that meet these restrictions, and printing
those that the user wants to see.

FIGURE 18-7 Majorsteps in understanding a question, find-


ing rules, and printing an answer. See Figure 18-8 for an ex-
ample.

of these words, and words used elsewhere by the system in describing tile
parameter (e.g., when translating a rule into English or requesting the
value of the parameter).
Wenow briefly describe how MYCINachieves each of the five tasks
outlined in Figure 18-7. An example analysis is shown in Figure 18-8.

Step 1: Reducing the Question to Terminal Words

Each word in the dictionary has a synonym pointer to its terminal word
(terminal words point to themselves). For the purpose of analyzing the
question, a nonterminal word is considered to be equivalent to its (terminal)
synonym. Terminal words have associated with them a set of properties or
descriptors (Table 18-1) that are useful in determining the meaning of
question that uses a terminal word or one of its synonyms. A given word
may be modified by more than one of these properties.
The first three properties of terminal words are actually inverse point-
ers, generated automatically from attributes of the clinical parameters. Spe-
cifically, a word receives the "acceptable value" pointer to a clinical param-
eter (Property 1 in Table 18-1) if it appears in the parameters list
acceptable values--a list that is used during the consultation to check the
users response to a request for the parameters value (see EXPECT attrib-
ute, Chapter 5).
Also, each clinical parameter, list, and table has an associated list of
keywords that are commonlyused when talking about that parameter, list,
or table. These words are divided according to how sure we can be that a
doctor is referring to this parameter, list, or table when the particular word
An Example--MYCIN 351

**WHENDO YOUDECIDETHATAN ORGANISM


IS A CONTAMINANT?

[1] Terminalwords: WHENDO YOUCONCLUDE


THATA ORGANISM
IS A CONTAMINANT

[2] Questiontype: Ruleretrieval


Premisepart: (WHENDO YOUCONCLUDE)
Actionpart: (THAT A ORGANISM
IS A CONTAMINANT)

[3] vocab,clues: (WHENINFECT


(ANY) 1) (WHENSTOP (ANY)
(Premise) (WHENSTART
(ANY) 1) (DURATION (ANY)
vocab,clues: (CONTAMINANT
(ANY) 4) (FORM(ANY)
(Action) (SAMEBUG
(ANY) 1) (COVERFOR (ANY)

[4] Final translation:


Premise: ANY
Action: (CONTAMINANT
ANY)

[5] Therules listed belowconcludeabout:


whetherthe organismis a contaminant
6, 31,351,39, 41, 42, 44, 347, 49, 106
Whichdo you wish to see?

RULE006
IF: 1) Theculture wastakenfroma sterile source,and
2) It is definitethat the identityof theorganism
is oneof: staphylococcus-coag-neg bacillus-
subtilis corynebacteriu m-non-diphtheriae
THEN: Thereis stronglysuggestive evidence (.8)
that the organism is a contaminant

FIGURE 18-8 Sample of MYCINs analysis of a general ques-


tion. (User input follows the double asterisks. Steps 1 through
4 are usually not shown to the user. See Figure 18-7 for a de-
scription of what is occurring in each of the five steps.)

TABLE 18-1 Properties of Terminal Words

1. The word is an acceptable value for some clinical parameter(s).


2. The word always implicates a certain clinical parameter, system list, or table (e.g.,
the word "identity" always implicates the parameter IDENTITY, which means
the identity of an organism).
3. The word might implicate a certain parameter, system list, or table (e.g., the
word "positive" might implicate the parameter NUMPOS, which means the
number of positive cultures in a series).
4. The word is part of a phrase that can be thought of as a single word (examples
of such phrases are "transtracheal aspiration," "how long," and "not sterile").
352 Methods
for GeneratingExplanations

is used in a question. It is from this list that terminal words "implication"


pointers (Properties 2 and 3 in Table 18-1) are generated.
During the first phase of" parsing, each word in the original text is
replaced by its terminal word. For words not found in the dictionary, the
system uses Winograds root-extraction algorithm (Winograd, 1972) to see
if the words lexical root is in the dictionary (e.g., the root of "decision" is
"decide"). If so, the word is replaced by the terminal word for its root.
Words still unrecognized after root extraction are left unchanged.
The resulting list of terminal and unrecognized words is then passed
to a function that recognizes phrases. Using Property 4 (Table 18-1), the
function identifies a phrase and replaces it with a single synonymouster-
minal word (whose dictionary properties may be important in determining
the meaning of the question).

Step 2: Classifying the Question

The next step is to classify the question so that the program can tell which
specialist should answer it. Since all questions about the consultation must
be about some specific context, the system requires that the name of the
context (e.g., ORGANISM-I) be stated explicitly. This provides an easy
mechanism to separate general questions about the knowledge base from
questions about a particular consultation.
Further classification is done through a pattern-matching approach
similar to that used by Colby et al. (1974). The list of words created by the
first phase is tested against a number of patterns (about 50 at present).
Each pattern has a list of actions to be taken if the pattern is matched.
These actions set flags that indicate what type of question was asked. In
the case of questions about judgmental knowledge(called rule-retrieval ques-
tions), pattern matching also divides the question into the part referring to
the rules premise and the part referring to its action. For example, in
"Howdo you decide that an organism is streptococcus?" there is no premise
part, and the action part is "an organism is streptococcus"; in "Do you ever
use the site of the culture to determine an organisms identity?" the premise
part is "the site of the culture" and the action part is "an organisms
identity."

Steps 3 and 4: Determining What Pieces of Knowledge


Are Relevant

The classification of a question guides its further analysis. Each question


type has an associated template with blanks to be filled in from the ques-
tion. The different blanks and the techniques for filling them in are listed
in Table 18-2. With the question correctly classified, the general question
An Example--MYCIN 353

TABLE18-2 Mechanisms for Analyzing a Question


Slot Analysis cues for filling a slot

<cntxt> The context must be mentioned by name, e.g., ORGANISM-2.

<rule> Either a rules name (RULE047) will be mentioned or the word


"rule" will appear, together with the rules number (47).

<value> One of the terminal words in the question has a dictionary property
indicating that it is a legal value fbr the parameter (Property 1, Table
18-1), e.g., THROAT is a legal value for the parameter SITE.

<parm> All of the words in the list are examinedto see if they implicate any
clinical parameters. Strong implications come from words with prop-
erties showing that the word is an acceptable value of the parameter,
on" that the word always implicates that parameter (Properties 1 and
2, Table 18-1). Weakimplications come from words with properties
showing that they might implicate the parameter (Property 3, Table
18-1). The system uses an empirical scoring mechanism for picking
out only the most likely parameters.

Associated with certain parameters are words or patterns that must


appear in the question in order for the parameter to be implicated.
This scheme allows the system to distinguish among related param-
eters that may be implicated by the same keywords in the first pass.
For example, the word "PMN"implicates parameters CSFPOLY (the
percent of PMNsin the CSF) and PMN(the percent of PMNs
the complete blood count). These are distinguished by requiring that
the word "CSF" be present in a question in order for CSFPOLY to
be implicated.

<list> System lists are indicated in a mannersimilar to that for parameters,


except that scoring is not done. Lists, like parameters, may have
associated patterns that must be present in the question. Further-
more, lists have properties telling which other system lists are their
subsets. If a question implicates both a list and a subset of that list,
the more general (larger) list is discarded. As an example, the ques-
tion "Whichdrugs are aminoglycosides?" implicates two lists: the list
of all drugs, and the list of drugs that are aminoglycosides. The
system only considers the more specific list of aminoglycosides when
answering the question.

<table> Tables are indicated in a manner similar to that for lists except that
an entry in the table must also be present in the question. For ex-
ample, the word "organism" may indicate two tables: one containing
a classification of organisms, and the other containing normal flora
of various portals. The question "What organisms are considered to
be subtypes of Pseudomonas?"will correctly implicate the former ta-
ble, and "What are the organisms likely to be found in the throat?"
will implicate the latter, because PSEUDOMONAS is in the first table
and THROATis in the second.
354 Methods
for GeneratingExplanations

answerer can tell which specialist should answer it. Filling in all blanks in
the template gives the specialist all the information needed to find the
answer.

Step 5: Answering the Question

Corresponding to each question type, there are a number of possible an-


swer templates. For example, for questions of the form "How do you know
the value of <parm> of <cntxt>?" two of the answer templates are

I used <rule> to conclude that <parm> of <cntxt> is <value>.


This gave a cumulative CF of <certainty factor>.
The last question asked before the conclusion was made
was <question number>.
In answer to question <question number> you said that <parm> of
<cntxt> is <value>.

The specialist for answering questions of a given type has to check the
history tree or the systems knowledge base in order to determine which
of the answer templates is appropriate for a particular question. Some
blanks in the answer template are filled in by the same items as are used
to fill blanks in the question template. The remainder are filled by the
specialist with the information that answers the question. In the above
example, the slots <parm>, <cntxt>, and possibly <value> would be
filled in from the question, and the other slots would be filled from the
history tree.

18.2.4 Questions About MYCINsStatic Knowledge

General questions about MYCINsfactual knowledge (Figure 18-9) are the


easiest to answer. The specialist that provides answers to such questions
must look up the desired information in the systems static knowledge base.
Generally, an answer consists of simply displaying this information. The
most processing occurs when checking the retrieved information against
some item in the question for equality or set membership.
As was discussed in the previous section, the specialist that answers
questions about judgmental knowledge is more complicated (Figure 18-8).
Answering such questions (Figure 18-10) involves built-in knowledge about
the rule set, plus the ability to "read" the rules. By the time the question
has been analyzed, the specialist knows which parameters must appear in
the premise and which must appear in the action of any rule that answers
the question. Values may be specified for any of the parameters. To answer
the question, the rule-retrieval specialist must first find every rule whose
An Example--MYCIN 355

"*IS BLOOD A STERILESITE?


Yes.Bloodis oneof thosesitesthat arenormally
sterile.
** WHATARETHENONSTERILE SITES?
Those
sites that are normallynonsterile are: CERVIX,
CUTANEOUS-ULCER, LOCHIA,NOSE,SKIN, STOOL,THROAT,URETHRA,
VAGINA.
"* WHATORGANISMS ARELIKELY TO BE FOUNDIN THE THROAT?
Thelikely pathogens
associatedwith throat are: STREPTOCOCCUS-
PNEUMONIAE, STREPTOCOCCUS-GROUP-A, NEISSERIA-MENINGITIDIS.
"* IS BACTEROIDES AEROBIC?
No,the aerobicityof bacteroides
is anaerobic.
"* WHATDOSAGEOF STREPTOMYCIN DO YOU GENERALLY RECOMMEND?
Thesuggested
dosage androuteof streptomycin
is 7.5 mg/kg
q12hIM(or IV).
** HOWDO YOUTREAT MENINGOCOCCAL BACTEREMIA?
For treatment of NEISSERIA-MENINGITIDIS in PRIMARY-BACTEREMIA
Selecttherapyfrom among the following drugs(rankedaccording
to relativeefficacy,assuminguniformsensitivity):
PENICILLIN (.91)
AMPICILLIN (.9)
SULFISOXAZOLE (.8)
CHLORAMPHENICOL(.7)
ERYTHROMYCIN (.5)
TETRACYCLINE (.4)

FIGURE
18-9 Questions about factual knowledge.

premise and action satisfy these constraints. It accordingly needs to know


that there are two special lists associated with each parameter: one con-
taining every rule that uses the parameter in its premise, and the other
containing every rule that concludes about the parameter in its action
(LOOKAHEAD and UPDATED-BY,see Chapter 5). Using these lists for
the various parameters mentioned in the question, the specialist can find
those rules that might answer the question. If no values were specified,
the job is done and the relevant rules can be displayed without further
analysis; otherwise, it is necessary to read each of the rules in the list and
to eliminate those that do not mention the correct values for the parameter.
The rule-retrieval specialist also makes use of a piece of MYCINs
knowledge discussed in Chapter 9. The system contains models of its own
knowledge (called rule models), which are used primarily during acquisition
of new medical knowledge from an expert (Davis, 1976). These models,
however, can be put to many uses---one is to explain general patterns in
decision making. The rule models are abstract descriptions of subsets of
rules and are generated automatically by reading the rules. For example,
the model for IDENT IS PSEUDOMONAS tells what features are common
to the majority of" rules that conclude that the identity of an organism is
Pseudomonas.
If a model exists describing the rules about which the question is ask-
356 Methods for Generating Explanations

** WHYDO YOUASK WHETHER THE PATIENT HASA FEVEROF UNKNOWN


ORIGIN?
Therules listed belowuse:
whetherthe patient hase true fever of unknown
origin
RULE180
IF: 1) Thesite of the cultureis blood,and
2) Thenumber of culturesin the seriesincludingthe
cultureis greaterthanor equalto 1, and
3) Thenumber of cultures in this series whichwere
positivefor the organism is less thanor equalto 1,
and
4) Thepatient hasa true fever of unknown origin, and
5) Cardiac-surgeryis a relevantitemfromthe history of
the patient
Then: Thereis suggestive evidence (.6) that the infection
infective-endocarditis

** WHATDOESGUMANIPULATION TELL YOUABOUTTHE IDENTITY OF AN


ORGANISM?
Therules listed belowuse:
whetherthe patient hashada genito-urinarymanipulative
procedure
to conclude about:
the identity of the organism
156, 163, 190
Whichdo youwish to see?
** 156
RULE156
IF: 1) Thesite of the culture is blood,and
2) Thegramstain of the organismis gramneg, and
3) Themorphology of the organismis rod, and
4) Theportal of entryof the organism is urine, and
5) Thepatient hasnot hada genito-urinarymanipulative
procedure,and
6) Cystitis is not a problem for whichthe patient has
beentreated
THEN: Thereis suggestive evidence (.6) that the identity
the organismis e.coli

FIGURE
18-10 Rule-retrieval questions.

ing, the rule-retrieval specialist incorporates this models information into


its answer (Figure 18-11). Thus the question-answering facility is able
give some information about strategies for achieving some of the systems
goals, as well as giving the individual rules that use the strategies.

18.2.5 Consultation-Specific Questions

Someof the simplest questions about a specific consultation are those that
inquire about the value of a particular parameter (Figure 18-12). The
specialist that answers these questions must knowhow to retrieve this in-
An Example--MYCIN 357

** HOWDO YOUDECIDETHAT AN ORGANISM MIGHTBE


PSEUDOMONAS AERUGINOSA?
Ruleswhichconclude that the identity of the organismis
pseudomonas-aeruginosa generally useoneor moreof the
followingpiecesof information:
the site of theculture
the gramstain of the organism
the morphology of the organism
Furthermore,
the followingrelationshipshold:
Thegramstain of the organism,andthe morphology of the
organism tend to appeartogetherin theserules.
RULE184,RULE116,RULE047,RULE085,RULE040 conclude that
the identity of the organismis pseudomonas-aeruginosa.
Whichof these do youwish to see?
** 184
RULE184
IF: 1) Thecategoryof the organismis not known,and
2) Thegramstain of the organismis gramneg, and
3) Themorphology of the organismis rod, and
4) Theaerobicityof the organismis facul
THEN: Thereis weaklysuggestiveevidence (.1) that the
identity of the organismis pseudomonas-aeruginosa

FIGURE
18-11 Response that uses rule models.

formation. In some cases it will also be necessary to make comparisons to


see if a value specified in the question is one of the values that was inferred
for that parameter.
Answering other types of consultation-specific questions involves the
use of knowledge about the control structure:

how the system acquires information,


why the system tries to find out about parameters,
how information is used once it is acquired,
what causes a rule to be tried, and
what causes a rule to fail.

** TO WHATCLASSDOESORGANISM-1 BELONG?
Thecategoryof ORGANISM-1
is enterobacteriaceae
(.9)
** IS ORGANISM-1 A STREPTOCOCCUS?
No.
Theidentity of ORGANISM-1is e.coli (.74) bacteroides(.7)
klebsieUa-pneumonlae (.4) pseudomonas-aeruginosa (.22)
enterobacter(.21) proteus-non-mirabilis
(.t 9) serratia (.03)
salmonella(.02)

FIGURE 18-12 Questions about a parameters value.


358 Methods for Generating Explanations

** HOWDO YOUKNOW THAT CULTURE-1WASFROMA STERILESOURCE?


I usedrule189to concludethat this bloodculturewastakenfrom
a sterile source.Thisgavea cumulative CFof (1.0). The
last questionaskedbeforethe conclusion wasmade was7.
** DID YOUCONSIDER BACTEROIDES AS A POSSIBILITYFORORGANISM-1
?
Yes.I usedrule095to conclude that the identityof ORGANISM-1
is bacteroides.This gavea cumulative
CFof (.7). The
last questionaskedbeforethe conclusion wasmade was20.
** HOWDOYOUKNOW THATORGANISM-1 IS E.COLI?
Thefollowing rules madeconclusionsaboutwhetherthe identity
of ORGANISM-1is e.coli
cumulativecertainty last questionasked
Rule Yes No before conclusionwasmade
RULE021 (.47) 20
RULE084 (.55) 22
RULE003 (.74) 24
In answer to question9 yousaid that the identity of ORGANISM-1
is e.coli(.3)

FIGURE18-13 Questions regarding how a conclusion was


made.

The specialist for answering questions like "How do you know the
value of <parm> of <cntxt>?" knows that the value of a parameter can
come from two sources: it can be deduced by rules, or the user can give it
in response to a question. The history tree will show which (possibly both)
of these sources provided the particular information mentioned in the
question (Figure 18-13).
If the question is phrased in the negative, it is necessary first to find
all the ways the conclusion could have been made (this is a simple task of
rule retrieval), then to explain whyit wasnt madein this consultation (Fig-
ure 18-14). The specialist for answering these questions must know what
situations can prevent conclusions from being made. The second question
in Figure 18-14 illustrates how the answer to one question might cause
another question to be asked.
The specialist for answering questions of the form "Howdid you use
<parm> of <cntxt>?" needs to know not only how to find the specific
rules that might use a parameter, but also how a parameter can cause a
rule to fail and how one parameter can prevent another from being used.
The history tree can be checked to see which of the relevant rules used
the parameter, which failed because of the parameter, and which failed
for some other reason, preventing the parameter from being used (Figure
18-15).
For questions of the form "Why didnt you find out about <parm>
of <cntxt>?" general knowledge of MYCINscontrol structure tells the
conditions under which it would have been necessary to find out some
piece of information. The record of the consultation can be used to de-
An Example--MYCIN 359

** WHYDID YOURULEOUTSTREPTOCOCCUS AS A POSSIBILITY FOR


ORGANISM-1 ?
Thefollowing rules couldhavebeenusedto determinethat the
identity of ORGANISM-1 is streptococcus:RULE033. However,
noneof these succeeded in the context of ORGANISM-I.
If youwouldlike anexplanationfor whyanyof these
rules failed, pleaseentertheir numbers:
** 33
Clause2 of RULE033 ["the morphologyof the organismis coccus"]
wasalready knownto be false for ORGANISM-I, so the rule
wasnevertried.

** WHYDONTYOUTHINK THATTHE MORPHOLOGY OF ORGANISM-1


IS
COCCUS?
It is definite that the morphologyof ORGANISM-1 is rod. Knowing
this with certaintyrulesoutall othervaluesfor the
morphologyof ORGANISM-I, including coccus.

FIGURE 18-14 Questions regarding why a conclusion was not


made.

** HOWDID YOUUSE THEAEROBICITY OF ORGANISM-l?


Theaerobicity of ORGANISM-1 wasusedin the following rules:
RULE003,RULE005,RULE084.
Theaerobicityof ORGANISM-1 causedthe followingrules to fail:
RULE035,RULE051,RULE052,RULE053,RULE111.
Theaerobicity of ORGANISM-1 also wouldhavebeenusedin:
RULE037,RULE050,RULE058,RULE086,RULE110,RULE184,
RULE203,RULE204, RULE205. However,noneof these
succeeded in the contextof ORGANISM-1. If youwould
like anexplanation for whyanyof theserules failed,
pleaseenter their numbers:
** 37
RULE037wastried in the contextof ORGANISM-I, but it failed
dueto clause1 ["the categoryof the organism is not
known"].

** DID YOUCONSIDER THE FACTTHATPATIENT-1IS NOTA COMPROMISED


HOST?
Thefact that FredJonesis not a compromised host causedthe
following rules to fail: RULE085,RULE106.
Thefact that Fred Jonesis not a compromised host also would
havebeenusedin: RULE109. However,noneof these
succeeded in the contextof PATIENT-1. If youwould
like anexplanation for whyanyof theserules failed,
pleaseenter their numbers:
** NONE

FIGURE 18-15 Questions regarding how information was


used.
360 Methods
for GeneratingExplanations

termine why these conditions never arose for the particular parameter in
question (Figure 18-16). Figure 18-16 also illustrates that MYCINsgeneral
question answerer allows a user to get as much information as is desired.
The first answer given was not really complete in itself, but it led the user
to ask another question to get more information. Then another question
was asked to determine why clause 1 of Rule 159 was false. The answers
to the first two questions both ,mentioned rules, which could be printed if
the user wanted to examine them. The special commandPR (Print Rule)
is for the users convenience. It requires no natural language processing
and thus can be understood and answered immediately ("What is Rule
109?" or "Print Rule 109" also would be understood).
In questions that ask about the application of a rule to a context there
are three possibilities: the rule told us something about the context; the
rule failed when applied to that context; or the rule was never tried in that
context. The history tree tells which of these is the case. Furthermore, if a
rule succeeded, there is a record of all the conclusions it made, and if it
failed, the reason for failure is recorded. As Figure 18-17 illustrates, rules
are only applicable in certain contexts. If the rule could not apply to the
context mentioned in the question but might be relevant when applied to
some other context (which is an ancestor or descendent in the context tree),
then the question is answered as if it had mentioned the relevant context.
There are interactions amongthe different types of questions, but they
werent mentioned earlier for the sake of clarity. Sometimes a question
may indicate an incorrect assumption about the state of MYCINsknowl-
edge. For example, a question could be "Why dont you think that OR-
GANISM-1is E. coli?" when, in fact, the system has concluded that the
organism is E. coli. To answer this question, the explanation system would
explain how it did make such a decision, despite the phrasing of the ques-
tion. Similarly, when asked how the system used (or concluded about) some
parameter that was never traced, the system explains why it didnt need to
find out about the parameter, regardless of how the question was phrased.

18.3 Conclusions
Consultation systems form a class of artificial intelligence programs that
can provide useful solutions to real-world problems. The utility of such
systems, however, depends on their acceptability to humanusers. One fea-
ture that can increase their acceptability is a mechanismwhereby they can
explain or justify their advice.
The development of an explanation mechanism for a consultation sys-
tem is very much related to the problems of representing knowledge and
of making use of different sources of knowledge. Since the use of inference
rules provides a unified way to represent modular pieces of knowledge,
Conclusions 361

** DID YOUFIND OUTABOUT THE CBCASSOCIATED WITHCULTURE-l?


Duringthe precedingconsultation,I neverhadreasonto try to find out
whetherthe results of the CBC fromthe timeperiodof this blood
culture are available. It wouldhavebeennecessary to knowthis
informationin order to useanyof the following: RULE109. However,
noneof thesesucceeded in the contextof this bloodculture. If you
wouldlike anexplanationfor whyanyof theserules failed, please
enter their numbers:
** 109
RULE109 wouldbe relevant to this blood culture whenapplied to: ORGANISM-1.
I neverhadreasonto try to apply RULE109 to ORGANISM-I. I wouldhave
neededto use RULE109 in order to find out whetherORGANISM-1 is a
contaminant.However, during the precedingconsultation,I neverhad
reasonto try to find out whetherORGANISM-1 is a contaminant.

** WHYDIDNT YOUNEEDTO KNOW WHETHER ORGANISM-1 IS A CONTAMINANT?


Duringthe preceding consultation,I neverhadreasonto try to find out
whetherORGANISM-1 is a contaminant,It wouldhavebeennecessaryto
knowthis informationin order to useanyof the following: RULE006,
RULE007,RULE106,RULE108,RULE109,RULE159.However,noneof these
succeeded in the context of ORGANISM-I. If youwouldlike an
explanation for whyanyof theserules failed, pleaseentertheir
numbers:
** 159
RULE159 wastried in the contextof ORGANISM-I, but it failed dueto clause
1 ["it is suspectedthat the identity of the organismis
corynebacterium-non-diphtheriae"]

** IS ORGANISM-1 CORYNEBACTERIUM-NON-DIPHTHERIAE?
No. Theidentity of ORGANISM-1
is e.coli (.74) bacteroides (.7)
klebsiella-pneumoniae
(.4) pseudomonas-aeruginosa (.22) enterobacter
(.21) proteus-non-mirabilis
(.19) serratia (.03) salmonella

** PR109
RULE109
IF: 1) Theculture wastakenfroma sterile source,and
2) It is not known whetherthe organismis a contaminant, and
3) Thepatient is not a compromised host, and
4) Thepatientwasfebrile dueto the infection, and
5) Theresults of the CBCfromthe timeperiodof the culture are
available,and
6) A - Thewhite countfromthe CBCis greater than10.5, or
B - Thepercentof PMNsin the CBCat the time the culture was
obtainedis greaterthan78, or
C - Thepercent of WBCs whichwerebandsin the CBCat the
time the culture wasobtainedis greaterthan 10
THEN: Thereis stronglysuggestive evidence(.8) that the organism
not a contaminant

FIGURE 18-16 Questions about why a parameter was not


traced.
362 Methods for Generating Explanations

** DID RULE116 TELLYOUANYTHING ABOUT INFECTION-l?


RULE116wouldbe relevant to the pfimary-bacteremia whenapplied
to: ORGANISM-1.
Clause5 of RULE116["the infection is meningitis"]wasalready
knownto be false for ORGANISM-I, so the rule wasnevertried.

** HOWWASRULE178 HELPFULWHENYOUWERECONSIDERING ORGANISM-l?


RULE178 wouldbe relevant to ORGANISM-1 whenapplied to:
CULTURE-1.
I neverhadreasonto try to applyRULE178
to this bloodculture.
I wouldhaveneeded to useRULE178in order to find out
whetherthis bloodculturewastakenfroma sterile source.
However,I wasalreadycertainthat this bloodculture was
takenfroma sterile source.

** WHATDID RULE295 TELLYOUABOUT ORGANISM-l?


I usedRULE295to concludethat the identity of ORGANISM-1
is
hemophilus-influenzae.
This gavea cumulativeCFof (,25).
Thelast questionaskedbefore the conclusionwasmade
was36.

** WHYDIDNT YOUUSERULE112 TO FIND OUTABOUTORGANISM-l?


RULE112 wasnot executedbecauseit wouldhavecausedcircular
reasoningwhenapplied to ORGANISM-I. Wouldyou like to
seethe chain of rules andparameters whichmakes up this
circle?
** YES
I wantedto knowaboutthe identity of ORGANISM-1 becauseI
try to find outtheidentity of theorganismfor all
currentorganisms of the patient.
Tofind out aboutthe identity of ORGANISM-I, I tried to use
RULE021. Before I could useRULE021, I neededto knowabout
a prior organism with possiblythe sameidentity as
ORGANISM-1.
Tofind out abouta prior organism with possiblythe same
identity as ORGANISM-I, I tried to useRULE005. BeforeI
coulduseRULE005, I neededto knowaboutthe aerobicity of
ORGANISM-1.
Tofind out aboutthe aerobicity of ORGANISM-1 I tried to use
RULE03t.Before I could use RULE031neededto knowabout
the category of ORGANISM-I.
To find out aboutthe categoryof ORGANISM-1 tried to use
RULE112. Before I could use RULE112neededto knowabout
the identity of ORGANISM-1.
Butthis is the unknown parameter I soughtoriginally.

FIGURE 18-17 Questions regarding the application of rules.

the task of designing an explanation capability is simplified tor rule-based


consultation systems. The example of MYCINshows how this can be done
and illustrates further that a system designed for a single domain with a
small, technical vocabulary can give comprehensive answers to a wide range
of questions without sophisticated natural language processing.
19
Specialized Explanations
for Dosage Selection

Sharon Wraith Bennett and A. Carlisle Scott

In this chapter we describe specialized routines that MYCIN uses to eval-


uate and explain appropriate drug dosing. The processes that the program
uses in its selection of antimicrobials and subsequent dosage calculations
have been refined to take into account a variety of patient- and drug-
specific factors. Originally, all dosage recommendationswere based on nor-
mal adult doses. However, it was soon recognized that the program needed
to be able to recommendoptimal therapy by considering information about
the patient, such as age and renal function, as well as pharmacokinetic
variables of" the drugs. The addition of an ability to customize doses ex-
panded the capabilities of the consultation program.
Earlier chapters have described the way in which MYCIN uses clinical
and laboratory data to establish the presence of an infection and the likely
identity of the infecting organism(s). If positive laboratory identification
not available, MYCINranks possible pathogens in order of likelihood.
Antimicrobials are then chosen to treat effectively all likely organisms. In
order to select drugs to which the organisms are usually sensitive, MYCIN
uses susceptibility data from the Stanford bacteriology laboratory. The pro-
gram also considers the fact that the patients previous antimicrobial treat-
ment may influence an organisms susceptibility. MYCIN disfavors a drug
that the patient is receiving at the time a positive culture was obtained.
Drug-specific factors are then considered before therapy is chosen.
Some drugs, such as many of the cephalosporins, are not recommended
for patients with meningitis because they do not adequately cross the blood-

This chapter is an abridged version of a paper, some of which was originally presented by
Sharon Wraith Bennett at the 12th Annual Midyear Clinical Meeting of the American Society
of Hospital Pharmacists, Atlanta, Georgia, December8, 1977, and which appeared in American
Journal of Hospital Pharmacy 37:523-529 (1980). Copyright 1980 by American Journal of
Hospital Phmw~acy.All rights reserved. Used with permission.

363
364 SpecializedExplanations
for DosageSelection

brain barrier and may lead to the development of resistance (Fisher et al.,
1975). One antimicrobial may be selected over another, similar drug be-
cause it causes fewer or less severe side effects. For example, nafcillin is
generally preferred over methicillin for treatment of staphylococcal infec-
tions because of the reported interstitial nephritis associated with methi-
cillin (Ditlove et al., 1977). MYCINS knowledge base therefore requires
continual updating with new indications or adverse reactions as they are
reported in the medical literature.
Several patient-specific factors mayfurther limit the list of acceptable
antimicrobials. Tetracycline, for example, is not recommendedfor children
(Conchie et al., 1970) or pregnant (Anthony, 1970) or breast-feeding
(OBrien, 1974) women. Also, prior adverse reactions to antimicrobials
must be considered by the program.

19.1Customizing Doses

Efficacious treatment of infectious diseases begins with selection of an an-


timicrobial; however, it is likely that the patient will not be cured unless
the dose and the route of administration of the drug are appropriate.
MYCIN takes into account the site of the infection, the age of the patient,
and the patients renal status in determining the dosage regimen for each
drug.
Consider, for example, the importance of patient age in therapy plan-
ning. The half-life of some drugs may be longer in neonates than in adults
because of the immaturity of the formers microsomal enzyme system and
kidneys (Weiss et al., 1960). Therefore, the doses of these drugs, in mg/kg
amounts, should be lower in the neonates than in the adults. On the other
hand, some antimicrobials, such as gentamicin, may require a higher rel-
ative dose in children than in the adults, possibly because of a larger vol-
ume of distribution (Siber et al., 1975). MYCIN therefore uses different
calculations when appropriate for determining doses for neonates, infants,
children, and adults.
Most antimicrobials are fully or partially excreted bythe kidneys; for
this reason, it is necessary to consider the patients renal [unction in order
to determine a safe and effective regimen. The program uses the patients
creatinine clearance as an indicator of the degree of renal impairment.
Doses are adjusted in patients over six months of age if the creatinine
clearance falls below 80 ml/min/1.73 mz and if more than 15% of the drug
is excreted unchanged in the urine. In children between one week and six
months of age, the dose is changed if the creatinine clearance is less than
60ml/min/1.73 m~. A creatinine clearance of 30ml/min/1.73 mz indicates
Customizing Doses 365

47)Does John Jones havea clinically significantallergicreactionto anyantimicrobial


agent?
** NO
48) Doyouhavereason to suspect that JohnJonesmayhaveimpaired renalfunction?
** YES
49)What is themostrecentcreatinine clearancein ml/minwhichyoufeel is a true representation
of JohnJoness
renalfunction?
** UNKNOWN
50)Whatis the mostrecentserum creatinineof JohnJones (in mg/100ml)?
** 1.9
51)What is the previousserum creatinineof JohnJones(in mg/100ml)?
** 1.8
52)Number of daysbetween JohnJonestwomostrecentserum creatinines:
"2
53) JohnJonessweightin kilograms(or <number> POUNDS):
*" 70
54)JohnJonessheightin centimeters (or <number> INCHES):
175
Estimated bodysurfaceareais 1.9sq. meters,
Dosage adjustmentswill bebased onthe calculated creatinine
clearance of 42.7ml/min/1.73 sq.meters(adjusted to
average bodysurfacearea),

FIGURE19-1 The patients creatinine clearance is used as an


indicator of the degree of renal function. Whenthe creatinine
clearance is not known,it is estimated from the age, sex, weight,
and serum creatinine of the patient and adjusted to average
body surface area before the dose is calculated.

renal impairment in infants 1 between one day and one week old (Edel-
mann and Barnett, 1971). Because of the passage of maternal creatinine
into the infant serum at birth, no estimate of renal function is attempted
if the newborn is less than one day old. For infants younger than six
months, MYCINwarns the user of the large degree of possible error in
the estimated doses because of the changing renal function and a poor
relationship between glomerular filtration rate and body surface area
(Rubin et al., 1949).
When the creatinine clearance of an adult patient is not known, it is
estimated from the age, sex, weight, and serum creatinine (Jelliffe and
Jelliffe, 1972) (Figure 19-1). For children less than 12 years of age,
height and serum creatinine are used to estimate the creatinine clearance
(Schwartz et al., 1976). If two consecutive serum creatinines indicate rap-

~Thecreatinine clearance in infants is less than in adults because of decreased glomerular


membrane permeability, smaller gh)merular filtering surface, lower perfusion pressure, and
lower renal plasmaflow.
366 Specialized Explanationsfor DosageSelection

[REC-1]Mypreferred therapy recommendation


is:
Givethefollowingin combination:
1) CHLORAMPHENICOL
Give:1.75g(17.6ml)q6hIV [calculated onbasisof 25mg/kg]
2) GENTAMICIN
Alter a loadingdoseof: 112mg(2.8 ml, 80mg/2ml ampule)
[calculatedonbasisof 1.6 mg/kg],
give:70mg(1.8 ml, 80mg/2ml ampule)q8hIV [calculated
basisof 1.0 mg/kg]plusconsider giving5 mgq24h
Intrathecal
Or,after a loadingdoseof: 140mg(3.6 ml, 80mg/2mlampule)
IV [calculatedonbasisof 2.0mg/kg],
give:119mg(3.0 ml, 80mg/2ml ampule)q14hIV [calculated
basisof 1.7mg/kg]plusconsider giving5 mgq24h
Intrathecal
[normaldoseis 1.7 mg/kgq8hIV]
Wouldyoulike to entera new dosing
interval?
** YES
Please
enterthenumber of hours.
** 12
After a loadingdoseof: 133mg(3.4 ml, 80mg/2ml ampule)
[calculated onbasisof 1.9mg/kg],
give: 105mg(2.6 ml, 80mg/2ml ampule)q12hIV [calculated
basisof 1.4 mg/kg] plusconsidergiving5 mgq24h
Intrathecal

FIGURE19-2 MYCINprovides three different dosage regi-


mens for each antimicrobial whose dose must be adjusted in
renal failure. One methodchanges the dose, another alters the
interval, and the third calculates a new dose given any interval.

idly changing renal function, a warning is printed, which recommends


repeating the assessment of the degree of impairment before an accurate
dosage regimen can be determined. The creatinine clearance is adjusted
to average body surface area (Boyd, 1935) before the program calculates
a dose (Figure 19-2). If the creatinine clearance indicates renal failure,
MYCINcalculates doses based on the first-order pharmacokinetic prop-
erties of the antimicrobials (see Figure 19-2) and the patients creatinine
clearance. (A description of the formulas is included in Figure 19-4.)
The program provides three different dosage regimens for each an-
timicrobial whose dose must be adjusted. One method changes the dose,
another alters the dosing interval, while the third calculates a new dose
given any interval. This last option allows the physician to select a dosing
interval that is convenient for the staff to follow and a dose that is a rea-
sonable volume to administer. A loading dose is calculated for each regi-
men so that an effective blood level can be reached as soon as possible.
The dose is provided in both a mg/kg amount and the number of milliliters,
capsules, or tablets required (Figure 19-2).
If a patients renal function changes during therapy, the physician can
obtain a new dosage recommendation without repeating the entire infec-
Explanationof Recommendations 367

tious disease consultation. A shortened version of the consultation will


recalculate the doses on the basis of the patients current renal function.
The program will request only the information necessary for determining
the new doses, such as the most recent creatinine clearance (or serum
creatinine).

19.2Selection
ofDosage Regimen

Although it is widely debated which dosage regimen is best, it is generally


recognized that the blood level of antimicrobials used to treat bacteremias
should exceed the minimuminhibitory concentration (MIC) while remain-
ing below toxic levels. The health professional must decide between allow-
ing the drug level to fluctuate above and below the MICand consistently
maintaining the drug level above the MICthrough more frequent dosing.
This decision is based on a variety of factors including the organism iden-
tity and the drug under consideration. To aid the prescriber in selecting
the most appropriate regimen, MYCINgenerates a graph for each regi-
men showing the predicted steady-state blood levels over time (Figure
19-3) (Gibaldi and Perrier, 1975). The MICof the organism and the toxic
level of the drug (when they are available) are also included on the graph.
The graph provides a rough estimate of the blood levels and the time of
peak concentration in the patient. It is provided to improve the initial
selection of a dosage regimen, not to replace the measurement of blood
levels. Monitoring blood levels whenever they are available is strongly rec-
ommended.

!9.3Explanation of Recommendations

At the conclusion of the consultation, the physician can ask MYCIN simple
questions to obtain assurance that the diagnosis and treatment are reason-
able. These questions may refer to the current consultation or they may
be general, regarding any of the systems knowledge. The program pro-
vides a justification for the therapy selection, which includes the reasons
for selecting one antimicrobial instead of another. Also available is an ex-
planation of the calculations used to decide on a dose (Figure 19-4). The
physician can also ask to see relevant parts of MYCINS knowledge base,
including justification and literature references (Figure 19-5).
368 Specialized Explanations for Dosage Selection

BloodLevelofGEN~MICIN[mcg/ml]
2.0 3.0 4.0 5.0 6.0 7.0
+---- .... + .... + .... .... + .... + .... + .... + .... + .... + .... + .... +---
D B I
I D I
D B I
D B
50+ D B I
D B I
D B I
D B
D B
00+ D B I
I D B I
I DI B I
I D B
I D B I
50+ D B
I D B I
I B I
I B I D
I B I D
H 10 00+ B I D
O I B I D
U I B I D
R I B I D
S IB I D
12 50+ I D
I I D B
II D B
11 D B
I D
15 00+ D B
I D B
D B I
B D
B D I
17 50+ B D I
B DI
( B DI
BI DI
B I *
20.00+ B I DI
MIC
KEY: D- druglevel for modifieddose[70.0mgQ8 h]
I - druglevel for modifiedinterval [119mgQ14h]
B - druglevel for modifiedinterval anddose[105 mgQ12h]
MIC- minimum inhibitory concentration[mcg/ml]
* - indicatesoverlappingcurves

FIGURE 19-3 Graph of blood levels of a drug (gentamicin)


over time for different doses and dose intervals. The graph is
shown precisely as it is printed by the computer. Note that reg-
imen I, with the longest dosing interval, reaches the highest
blood levels episodically but also leaves the patient with blood
levels below the MIC for the longest period of time. The some-
what atypical orientation of this graph, designed for terminal
output, displays time vertically, blood level horizontally.
Summaryand Conclusions 369

** HOWDID YOUCALCULATE
THEDOSEFORGENTAMICIN?
WhenGENTAMICIN is given for MENINGITIS, the recommended dosageis:
if ageis > 2 yrs then 1.7 mg/kgq8hIV plus considergiving 5 mgq24hIT,
else 2.3 mg/kgq8hIV plus considergiving2.5-4 rag/dayIT.
Thenormaldosefor JohnJonesis: 119mg(3.0 ml, 80mg/2ml ampule)q8h
[calculatedonthe basis of 1.7 mg/kg]plus considergiving 5 mgq24hIT
GENTAMICINis excretedby the kidneys, so its dosagemustbe modifiedin
renalfailure.
Thefollowing table showshowthe patients renal function wasdetermined:
Identifier Value Definition
SCR1 1.9 the mostrecent serumcreatinine (mg/100ml)
SCR2 1.8 the previousserumcreatinine (mg/100ml)
CCr(f) 42.7 estimatedcreatinine clearance,adjustedfor normal
bodysurfacearea(ml/min/1.73sq. meters)
CCr(n) 100 the averagenormalcreatinine clearancefor
JohnJoness agegroup
CCr(min) 80 the minimum normalcreatinine clearancefor
JohnJoness agegroup
SinceCCr(f) < CCr(min),JohnJoneshasrenal impairment.
To modify the dosageof GENTAMICIN,
the following pharmacokinetic parameters
wereused:
Identifier Value Definition
a .007 the fraction hourlyloss of a drugdueto non-renal
processes
b .0022 theslopeof theplot of theeliminationrate
constantof a drugversusthe patients
creatinine clearance
Q .57 the drugseliminationrate fraction: elimination
rateof the drugasa fraction of the drugs
"normal"eliminationrate.
Qwascalculatedusingthe formula:
Q = [a + b * CCr(f)]/[a + b * CCr(n)]
My first dosagerecommendation decreasedthe amountof GENTAMICIN and left
the dosinginterval unchanged. Thenewamountwascalculated by
multiplyingthe old amount by Q.
Thesecondrecommendation increasedthe dosinginterval andleft the amount
of GENTAMICIN unchanged. Thenewinterval wascalculatedby dividing
the old interval by Q.

FIGURE 19-4 The physician can ask MYCINquestions to ob-


tain assurance that the diagnosis, treatment, and dose are rea-
sonable.

19.4Summary and Conclusions

MYCINis able to customize therapy recommendations by using its knowl-


edge base in conjunction with patient-specific information. The large
knowledgebase contains facts about organism susceptibilities to drugs, ep-
370 Specialized Explanations for Dosage Selection

** HOWDO YOUDETERMINE
BODYSURFACE
AREA?
RULE104
[Thisrule applie~to anypatient,andis tried in orderto find out
aboutthe patients estimatedbodysurfacearea]
IF: 1) Theweightof the patient is known,and
2) Theheightof the patient is known
THEN: UseBoydsalgorithmto calculate bodysurfaceareafrom
weightandheight[Boyd, E. TheGrowthof the Surface
Areaof the Human Body,1935,pp 1-133.]
Author:Wraith
Justification:Acomparative evaluation
with statistical analysisof
the Boydmethodwith the DuBoisformulademonstrated that the
Boydformulais moreaccuratefor abnormal bodytypes.
[SendrayJ. et al., Determinationof human bodysurfaceareafrom
heightandweight,AppliedPhysiology, 7(1):1-12,July 1954.]

FIGURE19-5 The person obtaining a MYCIN consultation


can see the decision rules that the programuses, including jus-
tification andliterature references.

idemiology of infectious diseases, and pharmacokinetics and toxicity of


various antimicrobials. This knowledge enables the program to determine
appropriate treatment specific to a patients age, renal function, and prior
drug reactions. MYCINS explanation capability allows the user to analyze
the process by which the program arrived at a therapy recommendation.
This capability may also play an educational role by reminding the physi-
cian of critical factors to consider when prescribing therapy for other pa-
tients.
Increasing evidence of inappropriate antimicrobial therapy indicates
a need for assistance in the expanding area of infectious disease therapy
selection (Neu and Howrey, 1975). There is a recognized need for contin-
uing education as well as for computational assistance with dosage adjust-
ments in renal failure. This is not surprising when one recognizes all of
the factors that must be considered in a therapy decision. One response to
the problem of antimicrobial misuse is to increase the availability of con-
sultations with infectious diseases experts. A consultation not only provides
assistance in determining the appropriate therapy for the patient under
consideration but also is an educational experience for the physician re-
questing it. Computer-based consultation programs such as MYCINcan
provide medical professionals with clinical advice and educational infor-
mation when human consultants are not available.
2O
Customized Explanations
Using Causal Knowledge

Jerold W. Wallis and Edward H. Shortliffe

Developers of expert systems have increasingly recognized the importance


of explanation capabilities to the acceptance of their programs; such ca-
pabilities are also critical in medical consultation system development
(Gorry, 1973; Shortliffe, 1980). Goodexplanations serve four functions
a consultation system: (1) they provide a method for examining the pro-
grams reasoning if errors arise when the system is being built; (2) they
assure users that the reasoning is logical, thereby increasing user accep-
tance of the system; (3) they may persuade users that unexpected advice
is appropriate; and (4) they can educate users in areas where users knowl-
edge may be weak. These diverse roles impose several requirements on the
system. For example, the explanations must adequately represent the rea-
soning processes of the program, and they should allow the user to ex-
amine the reasoning history or underlying knowledge at various levels of
detail. In addition, although the programs approach to a problem need
not be identical to an experts approach, the programs overall strategy and
reasoning steps must be understandable and seem logical, regardless of
the users level of expertise. This means that the system must have the
capability to tailor its explanations to the varying needs and characteristics
of its users.
In this chapter we describe recent experiments in the design and im-
plementation of a prototype explanation program. Our past work in ex-
planation for consultation systems, described in Chapter 18, dealt primar-
ily with the ability to cite the rules involved in a particular decision.
Although MYCINsresponses provide an accurate description of a portion

This chapter is based on a paper originally appearing in Methods of Information in Medicine


21:127-136 (July 1982). Copyright by Springer-Verlag, Inc. All rights reserved. Used
with permission.

371
372 CustomizedExplanationsUsing CausalKnowledge

** DOYOUTREAT PSEUDOMONAS-AERUGINOSA
BACTEREMIA WITHCARBENICILLIN?
Yes,for treatment
of PSEUDOMONAS-AERUGINOSA
in BACTEREMIA,
thedrugs
ofchoice
(ranked
byrelative
efficacy,
assuming
sensitivity)
are:
I stchoice:
CARBENICILLIN-AND-TOBRAMYCIN
CAFIBENICILLIN-AND-G
ENTAMICIN
AMIKACIN-AND-CARBENICILLIN
2ndchoice:
TOBRAMYCIN
GENTAMICIN
AMIKACIN

FIGURE 20-1 An example of an interaction with MYCINs ex-


planation program.Note that the rule for selecting a drug to
cover Pseudomonasaeruglnosa is adequate for allowing MY-
CINto reach the correct conclusion but that the underlying
reason for combiningtwo drugs is unclear.

of its reasoning, to understand the overall reasoning scheme a user needs


to request a display of all rules that are used. Additionally, rules such as
that shownin the interaction of Figure 20-1 are designed largely for effi-
ciency and therefore frequently omit underlying causal mechanisms that
are known to experts but that a novice may need in order to understand
a decision. The rule guiding the choice of carbenicillin with an aminogly-
coside, for example, does not mention the synergism of the two drugs when
combined in the treatment of serious Pseudomonas aeruginosa infections.
Finally, while MYCIN does have a limited sense of discourse (viz., an ability
to modify responses based on the topic under discussion), its explanations
are not customized to the questioners objectives or characteristics.
MYCINsexplanation capabilities were expanded by Clancey in his
work on the GUIDONtutorial system (Chapter 26). In order to use
MYCINsknowledge base and patient cases for tutorial purposes, Clancey
found it necessary to incorporate knowledge about teaching. This knowl-
edge, expressed as tutoring rules, and a four-tiered measure of the baseline
knowledge of the student (beginner, advanced, practitioner, or expert),
enhanced the ability of a student to learn efficiently from MYCINsknowl-
edge base. Clancey also noted problems arising from the frequent lack of
underlying "support" knowledge, which is needed to explain the relevance
and utility of a domain rule (Chapter 29).
More recently, Swartout has developed a system that generates expla-
nations from a record of the development decisions made during the writ-
ing of a consultation program to advise on digitalis dosing (Swartout,
1981). The domain expert provides information to a "writer" subprogram,
which in turn constructs the advising system. The traces left by the writer,
a set of domain principles, and a domain model are utilized to produce
explanations. Thus both the knowledge acquisition process and automatic
Design Considerations: The User Model 373

programming techniques are intrinsic to the explanations generated by


Swartouts system. Responses to questions are customized for different
kinds of users by keeping track of what class is likely to be interested in a
given piece of code.
Whereas MYCINgenerates explanations that are usually based on a
single rule, I Weiner has described a system named BLAH(Weiner, 1980)
that can summarizean entire reasoning chain in a single explanatory state-
ment. The approach developed for BLAHwas based on a series of psy-
cholinguistic studies (Linde, 1978; Linde and Goguen, 1978; Weiner, 1979)
that analyzed the ways in which human beings explain decisions, choices,
and plans to one another. For example, BLAHstructures an explanation
so that the differences amongalternatives are given before the similarities
(a practice that was noted during the analysis of human explanations).
The tasks of" interpreting questions and generating explanations are
confounded by the problems inherent in natural language understanding
and text generation. A consultation program must be able to distinguish
general questions from case-specific ones and questions relating to specific
reasoning steps from those involving the overall reasoning strategy. As
previously mentioned, it is also important to tailor the explanation to the
user, giving appropriate supporting causal and empirical relationships. It
is to this last task that our recent research has been aimed. Wehave de-
ferred confronting problems of natural language understanding for the
present, concentrating instead on representation and control mechanisms
that permit the generation of explanations customized to the knowledge
and experience of either physician or student users.

20.1Design Considerations: The User Model

For a system to produce customized explanations, it must be able to model


the users knowledge and motivation for using the system. At the simplest
level, such a model can be represented by a single measure of what the
user knows in this domain and how much he or she wants to know (i.e., to
what level of detail the user wishes to have things explained). One approach
is to record a single rating of a users expertise, similar to the four categories
mentioned above for GUIDON.The model could be extended to permit
the program to distinguish subareas of a users expertise in different por-
tions of the knowledge base. For example, the measures could be dynam-
ically updated as the program responds to questions and explains segments

tAlthough MYCINsWHYcommandhas a limited ability to integrate several rules into a


single explanation (Shortliffe et al., 1975), the user wishing a high-level summarymust spe-
cifically augment the WHY with a number that indicates the level of detail desired. Wehave
found that the feature is therefbre seldom used. It would, of course, be preferable if the
system "knew" on its own when such a summary is appropriate.
374 Customized
ExplanationsUsing CausalKnowledge

of its knowledge. If the user demonstrates familiarity with one portion of


the knowledge base, then he or she probably also knows about related
portions (e.g., if physicians are familiar with the detailed biochemistry of
one part of the endocrine system, they are likely to knowthe biochemistry
of other parts of the endocrine system as well). This information can be
represented in a manner similar to Goldsteins rule pointers, which link
analogous rules, rule specializations, and rule refinements (Goldstein,
1978). In addition, the model should ideally incorporate a sense of dia-
logue to facilitate user interactions. Finally, it must be self-correcting (e.g.,
if the user unexpectedly requests information on a topic the program had
assumed he or she knew, the program should correct its model prior to
giving the explanation). In our recent experiments we have concentrated
on the ability to give an explanation appropriate to the users level of
knowledge and have deemphasized dialogue and model correction.

20.2Knowledge Representation

20.2.1 Form of the Conceptual Network

Wehave found it useful to describe the knowledge representation for our


prototype system in terms of a semantic network (Figure 20-2). 2 It is similar
to other network representations used in the development of expert sys-
tems (Duda et al., 1978b; Weiss et al., 1978) and has also been influenced
by Riegers work on the representation and use of causal relationships
(Rieger, 1976). A network provides a particularly rich structure for enter-
ing detailed relationships and descriptors in the domain model. Object nodes
are arranged hierarchically, with links to the possible attributes (parameters)
associated with each object. The parameter nodes, in turn, are linked to the
possible value nodes, and rules are themselves represented as nodes with
links that connect them to value nodes. These relationships are summa-
rized in Table 20-1.
The certainty factor (CF) associated with each value and rule node (Table
20-1) refers to the belief model developed for the MYCIN system (Chapter
11). The property askfirstllast controls whether or not the value of a pa-
rameter is to be requested from the user before an attempt is made to
compute it using inference rules from the knowledge base (see LABDATA,
Chapter 5). The text justification of a rule is provided when the system
builder has decided not to break the reasoning step into further compo-

2Thedescriptive powerof a semanticnetworkprovidesclarity whendescribingthis work.


However,
other representationtechniquesusedin artificial intelligence researchcouldalso
havecapturedthe attributes of our prototypesystem.
Knowledge Representation 375

part-of part-of part.of

~precondition-of

FIGURE 20-2 Sample section of network showing object,


parameter, value, and rule nodes. Dashed lines indicate the fol-
lowing rule:

IF: PARAMETER-1
of OBJECT-1
Is VALUE-l, and
PARAMETER-2
of OBJECT-1is VALUE-4
THEN:Concludethat PARAMETER-4of OBJECT-3is VALUE-7

nent parts but wishes to provide a brief summary of the knowledge un-
derlying that rule. Complexity, importance, and rule type are described in more
detail below.

20.2.2 Rules and Their Use

In the network (Figure 20-2) rules connect value nodes with other value
nodes. This contrasts with the MYCIN system in which rules are function-
ally associated with an object-parameter pair and succeed or fail only after
376 CustomizedExplanations Using Causal Knowledge

TABLE20-1
Static Information Dynamic Information
Type of Node (associated with node) (consultation-specific)

object node part-of link (hierarchic)


parameter list

parameter node object link


value-node list
default value
text definition

value node parameter-node link contexts for which this value is true
precondition-rule list certainty factor
conclusion-rule list explanation data
importance ask state
complexity
ask first/last

rule node precondition list (boolean) explanation data


conclusion
certainty factor
rule type
complexity
text justification

completion of an exhaustive search for all possible values associated with


that pair. To make this clear, consider a rule of the following form:

IF: DISEASE-STATE of the LIVERis ALCOHOLIC-CIRRHOSIS


THEN:It is likely (,7) that the SIZEof ESOPHAGEAL-VEINS
is INCREASED

When evaluating the premise of this rule to decide whether it applies in a


specific case, a MYCIN-like system would attempt to determine the cer-
tainty of all possible values of the DISEASE-STATEof the LIVER, pro-
ducing a list of values and their associated certainty factors. Our experi-
mental system, on the other hand, would only investigate rules that could
contribute information specifically about ALCOHOLIC-CIRRHOSIS. In
either case, however, rules are joined by backward chaining.
Because our system reasons backwards from single values rather than
from parameters, it saves time in reasoning in most cases. However, there
are occasions when this approach is not sufficient. For example, if a value
is concluded with absolute certainty (CF = 1) for a parameter with a mu-
tually exclusive set of values, this necessarily forces the other values to be
false (CF= -1). Lines of reasoning that result in conclusions of absolute
certainty (i.e., reasoning chains in which all rules make conclusions with
Explanation
Capabilities 377

CF = 1) have been termed unity paths (see Chapter 3). In cases of mutually
exclusive values of parameters, complete investigation of one value re-
quires consideration of any other value that could be reached by a unity
path. Thus the representation must allow quick access to such paths.
Whenreasoning by elimination, similar problems arise if a system fo-
cuses on a single value. One needs the ability to conclude a value by ruling
out all other possible values for that parameter; this entails a slight mod-
ification of the organizational and reasoning scheme. One strategy is to use
this elimination method in cases of mutually exclusive options only after
the normal backward-chaining process fails (provided that the possibilities
represented in the knowledge base are knownto span all potential values).

20.2.3 Complexity and Importance

The design considerations for adequate explanations require additions to


the representation scheme described above. To provide customized expla-
nations, appropriate fi)r different levels of expertise, we have found it
useful to associate a measure of complexity, both with the inference rules
and with the concepts about which they are concluding. Because some
concepts are key ideas in a reasoning chain and should be mentioned
regardless of their complexity, a measure of importance associated with con-
cepts is useful as well. Currently, both measures are specified at the time
knowledge is added to the system, but a dynamic modification of these
initial values would improve the flexibility of the approach.
Even though complexity and importance are related, one cannot nec-
essarily be predicted from the other. For example, biochemical details of
the endocrine system are complex but are not important to an understanding
of" endocrine abnormalities, yet the same complexity of biochemical detail is
important for understanding the glycogen storage diseases. A measure of a
facts importance was also used by Carbonell (1970a) in the form of "re-
levancy tags," supplemented by "distance" in a semantic network, but he
did not distinguish between the two concepts discussed here.

20.3Explanation Capabilities

20.3.1 Tailored Explanations

The measurements of complexity and importance described above facili-


tate the generation of" tailored explanations. Consider a linear causal chain
representing a simplified causal mechanism for the existence of kidney
378 Customized Explanations Using Causal Knowledge

stones (Figure 20-3). A sample interaction based on this reasoning chain


:~
might be as follows:

SampleInteraction 1 :
**EXPERTISE
3 [the user indicates his or her level of expertiseona scale of 1 to
10 where10 = expert]
**HOWDOES HYPERPARATHYROIDISM CAUSERENAL STONES?
Renalstones are causedby hypercalcemia
Hypercalcemia is causedby hyperparathyroidism
**MOREDETAIL
Renalstonesare causedby increasedurinary calcium
Increasedurinary calciumis causedby hypercalcemia
Hypercalcemiais causedby increased bonebreakdown
Increasedbonebreakdown is causedby hyperparathyroidism
**EXPERTISE
7 [the user indicates that he or she is moreexpert herethan above]

**HOWDOESHYPERPARATHYROIDISM LEAD TO INCREASEDBONE BREAKDOWN?


Bonebreakdownis causedby increasedosteoclast activity
Increasedosteoclastactivity is causedby increasedcyclic-AMP
Increasedcyclic-AMPis causedby hyperparathyroidism

This sample dialogue demonstrates: (1) tile users ability to specify his
her level of expertise, (2) the programs ability to employthe users exper-
tise to adjust the amount of detail it offers, and (3) the users option
request more detailed information about the topic under discussion.
Twouser-specific variables are used to guide the generation of expla-
4nations:

EXPERTISE: A number representing the users current level of knowl-


edge. As is discussed below, reasoning chains that involve
simpler concepts as intermediates are collapsed to avoid the
display of information that might be obvious to the user.
DETAIL:A number representing the level of detail desired by the user
when receiving explanations (by default a fixed increment
added to the EXPERTISE value). A series of steps that is ex-
cessively detailed can be collapsed into a single step to avoid
flooding the user with information. However, if the user wants
more detailed information, he or she can request it.

As shown in Figure 20-3, a measure of complexity is associated with


each value node. Whenever an explanation is produced, the concepts in

:~Our program functions as shown except that the user input requires a constrained format
rather than free text. Wehave simplified that interaction here for illustrative purposes. The
program actually has no English interlace.
4Another variable we have discussed but not implemented is a [i)cusing parameter that would
put a ceiling on the number of steps in the chain to trace when formulating an explanation.
A highly focused explanation would result in a discussion of only a small part of the reasoning
tree. In such cases, it wouldbe appropriate to increase the detail level as well.
Explanation Capabilities 379

VALUES RULES

Hyperparathyroidism RULE CF RULETYPE


NAME
Comp 3 Imp 8
m9 Cause-effect
~ ~rl
Elevated cyclic-AMP
Comp 9 Imp 1
1 Cause-effect

I nc reased osteoclast activity


Comp 8 Imp 1
.9 Cause-effect

Bone breakdown
Comp 6 Imp 3
.6 Cause-effect

Hypercalcemia
Comp 3 Imp 8
.9 Cause-effect

Increased urinary calcium


Comp 7 Imp 4
.5 Cause-effect

Calcium-based renal stones


Comp 2 Imp 3
1 Definitional

Renal stones
Comp 1 Imp 6

FIGURE 20-3 An example of a small section of a causal


knowledge base, with measures of the complexity (Comp) and
importance (Imp) given for the value nodes (concepts).
highly simplified causal chain is provided for illustrative pur-
poses only. For example, the effect of parathormone on the kid-
ney (promoting retention of calcium) is not mentioned, but
would have an opposite causal impact on urinary calcium. This
reasoning chain is linear (each value has only one cause) and
contains only cause-effect and definitional rules. Sample Inter-
actions 1 and 2 (see text) are based on this reasoning chain.
380 Customized Explanations Using Causal Knowledge

Reasoningsequence:

rl r2 r3 r4 r5
A B ~C D DE ~F

10

concept
complexity

.... expertise

FIGURE 20-4 Diagram showing the determination of which


concepts (parameter values) to explain to a user with a given
expertise and detail setting. The letters A through F represent
the concepts (values of parameters) that are linked by the in-
ference rules rl through rS. Only those concepts whose com-
plexity falls in the range between the dashed lines (including
the lines themselves) will be mentioned in an explanation dia-
logue. Explanatory rules to bridge the intermediate concepts
lying outside this range are generated by the system.

the reasoning chain are selected fin" exposition on the basis of their com-
plexity; those concepts with complexity lying between the users expertise
level and the calculated detail level are used. 5 Consider, for example, the
five-rule reasoning chain linking six concepts shown in Figure 20-4. When
intermediate concepts lie outside the desired range (concepts B and E in
this case), broader inference statements are generated to bridge the nodes
that are appropriate for the discussion (e.g., the statement that A leads to
C would be generated in Figure 20-4). Terminal concepts in a chain are
always mentioned, even if their complexity lies outside the desired range
(as is true for concept F in the example). This approach preserves the

5The default value for DETAILin our system is the EXPERTISEvalue incremented by 2.
Whenthe user requests more detail, the detail measure is incremented by 2 once again. Thus,
fi)r the three interchanges in Sample Interaction 1, the expertise-detail ranges are 3-5, 3-
7, and 7-9 respectively. Sample Interaction 2 demonstrates how this scheme is modified by
the importance measure for a concept.
Explanation Capabilities 381

Reasoningsequence:

rl r2 r3 r4 r5
A C =D E F

10

r5
. r3
r2 ~
rule
complexity

rl
expertise

FIGURE20-5 Diagram showing the determination of which


rules to explain further for a user with a given expertise and
detail setting. Whena rule is mentionedbecauseof the associ-
ated concepts, but the rule itself is too complex,further text
associated with the rule is displayed.

logical flow of the explanation without introducing concepts of inappro-


priate complexity.
Wehave also found it useful to associate a complexity measure with
each inference rule to handle circumstances in which simple concepts (low
complexity) are linked by a complicated rule (high complexity). 6 This sit-
uation typically occurs when a detailed mechanism, one that explains the
association between the premise and conclusion of a rule, consists of several
intermediate concepts that the system builder has chosen not to encode
explicitly. 7 Whenbuilding a knowledgebase, it is always necessary to limit
the detail at which mechanisms are outlined, either because the precise
mechanisms are unknown or because minute details of mechanisms are
not particularly useful for problem solving or explanation. Thus it is useful
to add to the knowledge base a brief text justification (Table 20-1) of the
mechanism underlying each rule.
Consider, fi)r example, the case in Figure 20-5, which corresponds to

6The opposite situation does not occur; rules of low complexity do not link concepts of higher
complexity.
7patil has deah with this problem by explicitly representing causal relationships concerning
acid-base disorders at a variety of different levels of detail (Patil et al., 1981).
382 CustomizedExplanationsUsingCausalKnowledge

the reasoning chain represented in Figure 20-4. Although rule r3 links two
concepts (C and D) that are within the complexity-detail range for the user,
the relationship mentioned in rule r3 is itself considered to be outside this
range. Whengenerating the explanation for this reasoning chain, the pro-
gram mentions concepts C and D, and therefore mentions rule r3 despite
its complexity measure. Since the rule is considered too complex for the
user, however, the additional explanatory text associated with the rule is
needed in this case. If the rule had fallen within the complexity-detail range
of the user, on the other hand, the text justification for the rule would not
shave been required,
Further modulation of rule and concept selection is accomplished us-
ing the importance measure associated with parameters. A high impor-
tance forces the inclusion of a reasoning step in an explanation, thereby
overriding the complexity considerations that were shown in Figures 20-4
and 20-5. Whenthe importance level of a concept is two or more points
above the expertise of the user, the item is included in the explanation.
Consider, for example, the following dialogue, which demonstrates the way
in which the importance measure is used:

SampleInteraction 2:
**EXPERTISE6
**HOW DOES HYPERPARATHYROIDISM CAUSE RENAL STONES?
Renalstones
arecaused byincreased urinary
calcium
Increasedurinary
calcium is causedbyhypercalcemia
Hypercalcemia
is caused byincreasedbonebreakdown
Bonebreakdownis caused
byincreasedosteoclast
activity
Increasedosteoclast
activity
is caused
byhyperparathyroidism

Note that this example shows a response to the same question asked in
Sample Interaction 1. This time, however, the expertise level is 6 rather
than 3. Hypercalcemia is therefore mentioned only because its importance
level of 8 (see Figure 20-3) is 2 points higher than the expertise of the
user; the complexity level of hypercalcemia does not fall within the exper-
tise-detail range of the user and thus would not have been included in the
explanation if it were not for its high importance. The other items men-
tioned are either terminal concepts in the chain (renal stones and hyper-
parathyroidism) or have a complexity measure lying within the users ex-
pertise-detail range of 6-8.
Many reasoning chains are not as simple as those shown in Figures
20-3, 20-4, and 20-5. Whenexplaining a branched reasoning chain, tot
example, the explanation system can set aside the branches of the chain
and mention them only when it is appropriate to the level of detail required
by the user. This feature provides users with an overview of the reasoning
process to help them decide whether it is necessary to examine the more
detailed steps. This capability is illustrated in the following dialogue, which

SAnexampleof this approachis includedin SampleInteraction 4 in Section20.3.2.


Explanation Capabilities 383

involves a patient with hypercalcemia and a possible malignancy who has


undergone prolonged bed rest:

SampleInteraction 3:
**WHYDOESTHE PATIENT HAVEINCREASEDSERUMCALCIUM?
Increasedserumcalciumis suggestedby immobilization and malignancy
**MOREDETAIL
Increasedserumcalciumis implied by increased bonebreakdown
Increasedbonebreakdown is suggestedby 2 paths of reasoning:
Increasedbonebreakdownis implied by increasedosteoclastactivity
Increased
osteoclastactivity is impliedby prolonged
immobilization
Increasedbonebreakdown is also implied by malignantboneinvasion

20.3.2 Types of Rules

Our refinement of the rule types presented by Clancey (Chapter 29) yields
five types of" rules 9 that are relevant to explanation strategies:

definitional: the conclusion is a restatement of the precondition in different


terms
cause-effect: the conclusion follows from the precondition by some mecha-
nism, the details of which may not be known
associational: the conclusion and the precondition are related, but the causal
direction (if any) is not known
effect-cause: the presence of certain effects are used to conclude about a
cause with some degree of certainty
self-referencing: the current state of knowledge about a value is used to
l
update that value further

The importance of distinguishing between cause-effect and effect-


cause rules is shown in Figure 20-6, which considers a simplified network
concerning possible fetal Rh incompatibility in a pregnant patient. Rea-
soning backwards from the goal question "Is there a fetal-problem?" one
traverses three steps that lead to the question of whether the parents are
Rh incompatible; these three steps use cause-effect and definitional links
only. However, in order to use the laboratory data concerning the amniotic
fluid to form a conclusion about the presence of fetal hemolysis, effect-
cause links must be used.
The sample interactions in Section 20.3.1 employed only cause-effect

9Rules considcred here deal with domain knowledge, to be distinguished from strategic or
meta-level rules (Davis and Buchanan, 1977).
lIn manycases self-referencing rules can be replaced by strategy rules (e.g., "If you have
tried to conclude a value [or this parameter and have failed to do so, then use the default
vahle for the parameter").
384 Customized Explanations Using Causal Knowledge

RH INCOMPATABILITY

C~useeffect

FETAL
HEMOLYSIS,,. Other causes
Iff ~x
, E ect cause \Cause effect //__

INCREASED BILIRUBIN
IN AMNIOTIC FLUID
IMPAIRED FETAL
OXYGEN TRANSPORT
\ Definitional
/
FETAL PROBLEM

FIGURE 20-6 A simple causal network showing the differ-


ence in reasoning between effect-cause and cause-effect rules
in the medical setting. The number beside a link indicates the
certainty factor (CF) associated with the rule. Note that an actual
rule network for this domain would be more complex, with
representation of intermediate steps, associated medical con-
cepts, default values, and definitions.

and definitional rules. An explanation for an effect-cause rule, on the other


hand, requires a discussion of the inverse cause-effect rule (or chain of
rules) and a brief mention of other possibilities to explain the certainty
measure associated with the rule. As discussed above, the expertise of a
user may also require that the program display a text justification for the
causal relationships cited in a cause-effect rule. Consider, for example, an
interaction in which an explanation of the effect-cause rule in Figure 20-
6 is produced:

SampleInteraction 4:
**WHYDO INCREASED BILIRUBIN COMPOUNDS IN THE AMNIOTICFLUID IMPLY FETAL
HEMOLYSIS?
Fetal hemolysis
leadsto bilirubin compounds
in the fetal circulation;
Causal Links and Statistical Reasoning 385

equilibration then takesplacebetween the fetal plasma andthe amniotic


fluid, leadingto increasedbilirubin compounds in the amnioticfluid
Whilethe relationshipin this direction is nearlycertain, the inverse
relationshipis less certainbecause of thefollowingotherpossible
causes of increased bilirubin compounds in the amnioticfluid:
Maternalbloodin the amnioticfluid from trauma
Maternalblood in the amnioticfluid fromprior amniocentesis

The response regarding the equilibration of fetal plasma and amniotic


fluid is the stored text justification of the cause-effect rule that leads from
"fetal hemolysis" to "increased bilirubin in amniotic fluid." The individual
steps could themselves have been represented in causal rules if the system
builder had preferred to enter rule-based knowledge about the nature of
hemolysis and bilirubin release into the circulation. The second component
of the response, on the other hand, is generated from the other cause-
effect rules that can lead to "increased bilirubin in amniotic fluid."
The other types of rules require minor modifications of the explana-
tion strategy. Definitional rules are usually omitted for the expert user on
the basis of their low complexity and importance values. An explanation
of an associational rule indicates the lack of knowncausal information and
describes the degree of" association. Self-referencing rules frequently have
underlying reasons that are not adequately represented by a causal net-
work; separate support knowledge associated with the rule (Chapter 29),
similar to the text justification shown in Sample Interaction 4, may need
to be displayed for the user when explaining them.

20.4Causal Links and Statistical Reasoning

Wehave focused this discussion on the utility of representing causal knowl-


edge in an expert system. In addition to facilitating the generation of
tailored explanations, the use of causal relationships strengthens the rea-
soning power of a consultation program and can facilitate the acquisition
of new knowledge from experts. However, an attempt to reason from
causal information faces many of the same problems that have been en-
countered by those who have used statistical approaches for modeling di-
agnostic reasoning. It is possible to generate an effect-cause rule, and to
suggest its corresponding probability or certainty, only if the information
given in the corresponding cause-effect rule is accompanied by additional
statistical information. For example, Bayes Theorem may be used to de-
termine the probability of" the ith of k possible "causes" (e.g., diseases),
given a specific observation Ceffect"):

P(effectlcausei) P(causei)
P(causei[effect) k
P(causej) P(effectlcausej)
j= l
386 Customized
ExplanationsUsing CausalKnowledge

This computation of the probability that the ith possible cause is pres-
ent given that the specific effect is observed, P(causeileffect ), requires
knowledge of the a priori frequencies P(causei) for each of the possible
causes (cause l, cause~ ... causek) of the effect. These data are not usually
available for medical problems and are dependent on locale and prescreen-
ing of the patient population (Shortliffe et al., 1979; Szolovits and Pauker,
1978). The formula also requires the value of P(effectlcausei) for all cause-
effect rules leading to the effect, not just tim one for the rule leading from
cause/ to the effect. In Figure 20-6, tot example, the effect-cause rule
leading from "increased bilirubin in amniotic fluid" to "fetal hemolysis"
could be derived from the cause-effect rule leading in the opposite direc-
tion only if all additional cause-effect rules leading to "increased bilirubin
in amniotic fluid" were known(the "other causes" indicated in the figure)
and if the relative frequencies of the various possible causes of "increased
bilirubin in amniotic fluid" were also available. A more realistic approach
is to obtain the inference weighting for the effect-cause rule directly from
the expert who is building the knowledge base. Although such subjective
estimates are fraught with danger in a purely Bayesian model (Leaper et
al., 1972), they appear to be adequate (see Chapter 31) when the numerical
weights are supported by a rich semantic structure (Shortliffe et al., 1979).
Similarly, problems are encountered in attempting to produce the in-
verse of rules that have Boolean preconditions. For example, consider the
following rule:

IF: (A and(B or C))


THEN:Conclude D

Here D is known to imply A (with a certainty dependent on the other


possible causes of D and their relative frequencies) only if B or C is present.
While the inverse rule could be generated using Bayes Theorem given the
a priori probabilities, one would not knowthe certainty to ascribe to cases
where both B and C are present. This problem of conditional independence
tends to force assumptions or simplifications when applying Bayes Theo-
rem. Dependency information can be obtained from data banks or from
an expert, but cannot be derived directly from the causal network.
It is instructive to note how the Present Illness Program (PIP) and
CADUCEUS, two recent medical reasoning programs, deal with the task
of representing both cause-effect and effect-cause information. CADU-
CEUS(Pople, 1982) has two numbers for each manifestation of disease,
an "evoking strength" (the likelihood that an observed manifestation is
caused by the disease) and a "frequency" (the likelihood that a patient with
a disease will display a given manifestation). These are analogous to the
inference weightings on effect-cause rules and cause-effect rules, respec-
tively. However, the first version of the CADUCEUS program (INTERN-
IST-l) did not allow for combinations of" manifestations that give higher
Conclusion 387

(or lower) weighting than the sum of the separate manifestations, ll nor
did it provide a way to explain the inference paths involved (Miller et al.,
1982).
PIP (Pauker et al., 1976; Szolovits and Pauker, 1978) handles the im-
plication of diseases by manifestations by using "triggers" for particular
disease frames. No weighting is assigned at the time of frame invocation;
instead PIP uses a scoring criterion that does not distinguish between
cause-effect and effect-cause relationships in assigning a numerical value
for a disease frame. While the information needed to explain the programs
12
reasoning is present, the underlying causal information is not.
In our experimental system, the inclusion of both cause-effect rules
and effect-cause rules with explicit certainties, along with the ability to
group manifestations into rules, allows flexibility in constructing the net-
work. Ahhoughcausal information taken alone is insufficient for the con-
struction of a comprehensive knowledge base, the causal knowledge can
be used to propose effect-cause relationships for modification by the sys-
tem-builder. It can similarly be used to help generate explanations for such
relationships when effect-cause rules are entered.

20.5Conclusion
Wehave argued that a need exists for better explanations in medical con-
sultation systems and that this need can be partially met by incorporating
a user model and an augmented causal representation of the domain
knowledge. The causal network can function as an integral part of the
reasoning system and may be used to guide the generation of tailored
explanations and the acquisition of new domain knowledge. Causal infor-
mation is useful but not sufficient for problem solving in most medical
domains. However, when it is linked with information regarding the com-
plexity and importance of the concepts and causal links, a powerful tool
for explanation emerges.
Our prototype system has been a useful vehicle for studying the tech-
niques we have discussed. Topics for future research include: (1) the de-
velopment of methods for dynamically determining complexity and im-
portance (based on the semantics of the network rather than on numbers
provided by the system builder); (2) the discovery of improved techniques
for using the context of a dialogue to guide the formation of an expla-

l lThis problemis one of the reasonsfor the movefromINTERNIST-


1 to the newapproaches
used in CADUCEUS (Pople, 1982).
12Recentlythe ABEL program,a descendentof PIP, has focusedon detailed modelingof
causalrelationships
(Patil et al., 1981).
388 CustomizedExplanationsUsing CausalKnowledge

nation; (3) the use of linguistic or psychological methods for determining


the reason a user has asked a question so that a customized response can
be generated; and (4) the development of techniques for managing the
various levels of complexity and detail inherent in the mechanistic rela-
tionships underlying physiological processes. The recent work of Patil, Szo-
lovits, and Schwartz (1981), who have separated such relationships into
multiple levels of detail, has provided a promising approach to the solution
of the last of these problems.
PART SEVEN

Using Other
Representations
21
Other Representation
Frameworks

Representing knowledge in an AI program means choosing a set of con-


ventions tor describing objects, relations, and processes in the world. One
first chooses a conceptual framework for thinking about the world--sym-
bolically or numerically, statically or dynamically, centered around objects
or around processes, and so forth. Then one needs to choose conventions
within a given computer language tor implementing the concepts. The
former is difficult and important; the latter is both less difficult and less
important because good programmers can find ways of working with al-
most any concept within almost any programming language.
In one respect finding a representation for knowledge is like choosing
a set of data structures for a program to work with. Tables of data, for
example, are often conveniently represented as arrays. But manipulating
knowledge structures imposes additional requirements. Because some of
an experts knowledge is inferential, conventions are needed tor a program
to interpret the structures. And, as we have emphasized, an expert (or
knowledge engineer) needs to be able to edit knowledge structures quickly
and easily in order to refine the programs knowledgebase iteratively. Some
programming conventions facilitate editing and interpreting knowledge;
others throw up road blocks.
The question of how to represent knowledge for intelligent use by
programs is one of two major questions motivating research in AI. (The
other major theme over the last 25 years is how to use the knowledge for
intelligent problem solving.) Although we were not developing new rep-
resentations in MYCIN,we were experimenting with the power of one
representation, modified production rules, for reasoning in a detailed and
ill-structured domain, medicine. Chapters 1 and 3 have described much of
the historical context of our work with rules. As should be obvious from
Chapters 3 through 6, we added many embellishments to the basic pro-
duction rule representation in order to cope with the demands of the
problem and of physicians. We stumbled over many items of medical
knowledge that were difficult to encode or use in the simple formalism

391
392 OtherRepresentationFrameworks

with which we started. Our choice of rules and fact triples, with CFs, has
been explained in Part Two. As summarized at the end of Chapter 3, we
were under no illusion that we were creating a "pure" production system.
Wehad taken many liberties with the formalism in order to make it more
flexible and understandable. However,we still felt that the stylized condi-
tion-action form of knowledge brought many advantages because of its
simplicity. For example, creating English translations from the LISP rules
and translating stylized English rules into LISP were both somewhat sim-
plified because of the restricted syntax. Similarly, creating explanations of
a line of reasoning was simplified as well, because of the simple backward-
chaining control structure that links rules together dynamically.
Representing knowledge in procedures was one alternative we were
trying hard to avoid. Our experience with DENDRAL and with the therapy
algorithm in MYCIN(Chapter 6) showed how inflexible and opaque a set
of procedures could be for an expert maintaining a knowledge base. And,
as mentioned in previous chapters, we saw that production rules offered
some opportunity for making a knowledge base easier to understand and
modify.
Wewere aware of predicate calculus as a possibility for representing
MYCINsknowledge. We were working in a period in AI research when
logic and resolution-based theorem provers were being recommended for
many problems. Wedid not seriously entertain the idea of using logic,
however, largely because we felt that inexact reasoning was undeveloped
in theorem-proving systems.
Wehad initially experimented with a semantic network representation,
as mentioned in Chapter 3. Although we felt we could store medical knowl-
edge in that form, we felt it was difficult to focus a dialogue in which gaps
in the knowledge were filled both by inference and by the users answers
to questions. Minskys paper on frames (Minsky, 1975) did not appear until
after this work was well underway. Even so, we were looking for a more
structured representation, specifically rules, to build editors and parsers
for, to modify and explain, and to reason with in an understandable line
of reasoning.
In this part we describe three experiments with alternative represen-
tations and control structures in programs called VM, CENTAUR,and
WHEEZE.The first two programs were written for Ph.D. requirements,
the last as a class project. All are programs that work on medical problems,
although in areas outside of infectious diseases. Another experiment with
representations is described in Chapter 20 in the context of explanation.
There MYCINsrules are rewritten in an inference net (cf. Duda et al.,
1978b) in order to facilitate explaining the inferences at different levels of
detail.
The VMprogram discussed in Chapter 22 was selected by Professor
E. Feigenbaum, H. Penny Nii, and Dr. John Osborn and worked on pri-
marily by Larry Fagan for his Ph.D. dissertation. Feigenbaum and Nii had
OtherRepresentationFrameworks 393

been developing the SU/X program I (Nii and Feigenbaum, 1978) for in-
terpretation of multisensor data. Feigenbaum was a friend of Osborns,
knew of Osborns pioneering work on computer monitoring in intensive
care, and saw this as a possible domain in which to explore further the
problems in multisensor signal understanding involving signals for which
the time course is important to the interpretation. Osborn agreed to be
the expert collaborator. Fagan had been working on MYCINand had con-
tributed to the code as well as to the knowledge base of meningitis rules.
(In Feigenbaums words, Fagan had become "MYCINized.") So it was nat-
ural that his initial thinking about the ICU data interpretation problem
was in MYCINsterms. Fagan quickly found, however, that the MYCIN
model was not appropriate for a problem of monitoring data continuously
over time. MYCINwas much too oriented toward a "snapshot" of data
about a patient at a fixed time (although some elements of data in the
"snapshot" name historical parameters, such as dates of prior infections).
The only obvious mechanism for making MYCINwork with a stream of
data in the ICU was to restart the program at frequent time intervals to
reason about each new "snapshot" of data gathered during each 2-5 min-
ute time period. This is inelegant and completely misses any sense of con-
tinuity or the changing context in which data are being gathered. Thus
VMwas designed to remedy this deficiency.
The other two programs in Part Seven were designed as alternatives
to a rule-based representation, varying the representation of one program,
called PUFEAlthough desirable, it is difficult in AI to experiment with
programs by varying one parameter at a time while holding everything
else fixed. Of course, not everything else could remain fixed for such a
gross experiment. Both CENTAUR and WHEEZE,discussed in Chapters
23 and 24, were deliberate attempts to alter the representation and control
of the PUFF program (while leaving the knowledge base unchanged)
order to examine advantages and disadvantages of alternatives.
PUFF is a program that diagnoses pulmonary (lung) diseases. The
problem was suggested to Feigenbaum and Nii by Osborn at the time VM
was being fbrmulated, and appeared to be appropriate for a MYCIN-like
approach. It was initially programmed using EMYCIN (see Part Five),
collaboration with Drs. R. Fallat and J. Osborn at Pacific Medical Center
in San Francisco (Aikins et al., 1983). About 50-60 rules were added
EMYCIN[in a much shorter time than expected (Feigenbaum, 1978)]
interpret the type and severity of pulmonary disorders. 2 The primary data
are mostly from an instrument known as a spirometer that measures flows
and volumes of patients inhalation and exhalation. The conlusions are
diagnoses that account for the spirometer data, the patient history data,
and the physicians observations.

nLaterknown as HASP(Nil et al., 1982).


~Thesehandledobstructive airwaysdisease. Manyother rules werelater addedto handle
other classes of pulmonarydisease. Thesystemnowcontainsabout 250rules.
394 OtherRepresentationFrameworks

EMYCIN-PUFF (Aikins and Nii--see Chapter 14)


CENTAUR (Aikins--see Chapter 23)
WHEEZE (Smith and Clayton--see Chapter 24)
BASIC-PUFF (Pacific MedicalCenter--see Aikins et al., 1983)
AGE-PUFF (Nii and Aiello---see Aielto and Nii, 1981)

FIGURE
21-1 Five implementations of PUFF.

PUFFhas been a convenient vehicle for experimentation because it is


a small system. Figure 21-1 lists five different implementations of essen-
tially the same knowledge base.
In developing CENTAUR, Aikins focused on the problem of" making
control knowledge explicit and understandable. She recognized the awk-
wardness of explanations of rules or rule clauses that were primarily con-
trolling MYCINsinferences as opposed to making substantive inferences.
For example, many of the so-called self-referencing rules are awkward to
explain:

IfA&B&C,
then A

In these rules, one intent of mentioning parameter A in both conclusion


and premise is to screen the rule and keep it from forcing questions about
parameters B and C if there is not already evidence for A. This is largely
an issue of control, and the kind of problem that CENTAUR is meant to
remedy. The solution is to use frames to represent the context and control
information and MYCIN-like rules to represent the substantive medical
relations. Thus there is a frame for A to represent the context in which a
set of rules should be invoked, one of which would be:

B &C--,A

This is much more natural to explain than trying to say why, or in what
sense, A can be evidence for itself. CENTAUR was demonstrated using the
same knowledge as in the EMYCINversion of PUFF(Aikins, 1983).
David Smith and Jan Clayton developed WHEEZE as a further ex-
periment with frames. They asked, in effect, if all the knowledge in PUFF
could be represented in frames and what benefits would follow from doing
so. In a short time (as a one-term class project) they reimplemented PUFF
with a frame-based representation. Chapter 24 is a summaryof" their re-
sults.
The version of PUFFwritten in BASIC(BASIC-PUFF)is a simplified
version of the EMYCIN rule interpreter with the medical knowledge built
into the code (Aikins et al., 1983). It was redesigned to run efficiently
OtherRepresentationFrameworks 395

a PDP-11in the puhnonary laboratory at Pacific Medical Center. Its knowl-


edge has been more finely tuned than it was in the original version, but is
largely the same. BASIC-PUFF is directly coupled to the spirometer in the
puhnonary function lab and automatically provides interpretations of the
test results. Thus it turns the spirometer into a "smart instrument" instead
of" simply a data-collecting and recording device. Its interpretations are
printed immediately, reviewed by a physician, and inserted into the per-
manent record with the physicians signature. In the majority of cases, the
physician makes no additions or corrections to the conclusions; in some,
however, additional notes are made to clarify the programs suggestions.
BASIC-PUFFprovides one model of technology transfer for expert sys-
tems: first implement a prototype with "off-the-shelf" tools such as EMY-
CIN, then rewrite the system to run efficiently on a small computer.
Another experiment in which the PUFF knowledge base was recast
into a different formalism is the AGE-PUFF version (Aiello and Nii, 1981).
The intent was to use this small, easily managedknowledge base to exper-
iment with control issues, more specifically to explore the adequacy of the
BLACKBOARD model, with event-driven control (Erman et al., 1980).
Further experiments with AGE-PUFFare reported by Aiello (1983).
One of the difficulties with a production rule formalism is in repre-
senting control information. For example, if we want rules R3, R5, and R7
to be executed in that order, then we have to arrange for the LHSof R7
not to match any current data base until after R3 and R5 have fired. Often
this is accomplished by defining a flag that is set when and only when R3
fires and that is checked by R5, and another that is set by R5 and checked
by R7, as described in Chapter 2. The authors of MYCINsrules have only
a few means available to influence the systems backward chaining, one of
which is to define "dummy"parameters that act as flags. To the best of our
knowledge, this was not done in MYCIN (in fact, it was explicitly avoided),
but it has been done by others using EMYCIN.
Another means of influencing the control is to order the clauses in
premises of rules. This was done much of the time as a way of keeping
MYCINfrom pursuing minutiae before the more general context that
motivates asking about minute details was established. Since MYCIN eval-
uates the premise clauses from first to last, in order, 3 putting more general,
context-setting clauses at the beginning of the premise assures that the
more specific clauses will not be asked about, or even considered, unless
the context is appropriate. Using the order of premise clauses for this kind
of screening permits the system builder to use early clauses to ensure that
some parameters are traced first. For example, the predicate KNOWN is
often used to cause a parameter to be traced.
Still another means of representing controlling information in the
rule-based formalism is via meta-rules, described in Chapter 28. Another

3Anexceptionis the previewmechanism


describedearlier.
396 Other Representation Frameworks

similar approach is via strategy rules, as described in Chapter 29. The unity
path mechanism(Chapter 3) also affects the order of rule invocation.
ONCOCIN(discussed in Chapters 32 and 35) incorporates many
the ideas from these experiments, most notably the framelike representa-
tion of control knowledge and the description of changing contexts over
time. It builds on other results presented in this book as well, so its design
is described later. ONCOCIN clearly shows the influence of the evolution
of our thinking presented in this section.
One piece of recent research not included in this volumeis the rerepre-
sentation of MYCINsknowledge along the lines described in Chapter 29.
The new program, called NEOMYCIN (Clancey and Letsinger, 1981), car-
ries much of its medical knowledgein rules. But it also represents (a) the
taxonomy of diseases as a separate hierarchy, (b) strategy knowledge
meta-rules, (c) causal knowledge as links in a network, and (d) knowledge
about disease processes in the form of frames characterizing location and
temporal properties. One main motivation for the reconceptualization was
to provide improved underpinnings for the tutorial program described in
Chapter 26. Because of the richer knowledge structures in NEOMYCIN,
informative explanations can be given regarding the programs diagnostic
strategies, as well as the medical rules.
NEOMYCIN, along with other recent work, emphasizes the desirabil-
ity of augmenting MYCINshomogeneousset of rules with a classification
of types of knowledge and additional knowledge of each type. In MYCINs
rule set, the causal mechanisms, the taxonomic structure of the domain,
and the problem-solving strategies are all lumped together. An augmented
knowledge base should separate these different types of knowledge to fa-
cilitate explanation and maintenance of the knowledge base, and perhaps
to enhance performance as well. Causal mechanisms have been repre-
sented and used in several domains, including medicine (Patil et al., 1981)
and electronics debugging (Davis, 1983). Mathematical models have been
merged with symbolic causal models in AI/MM(Kunz, 1983). As a result
of this recent work, considerably richer alternatives than MYCINsho-
mogeneous rule set can be found.
Finally, it should be noted that the chapters in this part describe rather
fundamental viewpoints on representation. Within a rule-based or frame-
based (or mixed) framework there are still numerous details of represent-
ing uncertainty, quantified variables, strategies, temporal sequences, book-
keeping information, and other concepts mentioned throughout the book.
22
Extensions to the
Rule-Based Formalism
for a Monitoring Task

Lawrence M. Fagan, John C. Kunz,


Edward A. Feigenbaum, and John J. Osborn

The Ventilator Manager (VM) program is an experiment in expert system


development that builds on our experience with rules in the MYCINsys-
tem. VMis designed to interpret on-line quantitative data in the intensive
care unit (ICU) of a hospital. After a major cardiovascular operation,
patient often needs mechanical assistance with breathing and is put in the
ICU so that many parameters can be monitored. Many of those data are
relevant to helping physicians decide whether the patient is having diffi-
culty with the breathing apparatus (the ventilator) or is breathing ade-
quately enough to remove the mechanical assistance. The VMprogram
interprets these data to aid in managing postoperative patients receiving
mechanical ventilatory assistance.
VMwas strongly influenced by the MYCIN architecture, but the pro-
gram was redesigned to allow for the description of events that change
over time. VMis an extension of a physiologic monitoring system1 and is
designed to perform five specialized tasks in the ICU:

This chapter is a longer and extensively revised version of a paper originally appearing in
Proceedings of the Sixth IJCAI (1979, pp. 260-262). Used by permission of International Joint
Conterences on Artificial Intelligence, Iqc.; copies of the Proceedings are available from Wil-
liam Kaufmann, Inc., 95 First Street, Los Altos, CA94022.
IVM was developed as a collaborative research project between Stanford University and
Pacific Medical Center (PMC)in San Francisco. It was tested with patient information acquired
from a physiologic monitoring system implemented in the cardiac surgery ICU at PMCand
developed by Dr..John Osborn and his colleagues (Osborn et al., 1969).

397
398 Extensionsto the Rule-Based
Formalism
for a Monitoring
Task

1. detect possible errors in measurement,


2. recognize untoward events in the patient/machine system and suggest
corrective action,
3. summarizethe patients physiologic status,
4. suggest adjustments to therapy based on the patients status over time
and long-term therapeutic goals, and
5. maintain a set of case-specific expectations and goals for future evalu-
ation by the program.

VMdiffers from MYCIN in two major respects. It interprets measurements


over time, and it uses a state-transition model of intensive care therapies
in addition to clinical knowledgeabout the diagnostic implications of data.
Most medical decision-making programs, including MYCIN,have based
their advice on the data available at one particular time. In actual practice,
the clinician receives additional information from tests and observations
over time and reevaluates the diagnosis and prognosis of the patient. Both
the progression of the disease and the response to previous therapy are
important for assessing the patients situation.
Data are collected in different therapeutic situations, or contexts. In
order to interpret the data properly, VMincludes a model of the stages
that a patient follows from ICU admission through the end of the critical
monitoring phase. The correct interpretation of physiologic measurements
depends on knowing which stage the patient is in. The goals for intensive
care are also stated in terms of these clinical contexts. The program main-
tains descriptions of the current and optimal ventilatory therapies for any
given time. Details of the VMsystem are given by Fagan (1980).

22.1The Application

The intensive care unit monitoring system at Pacific Medical Center (Os-
born et al., 1969) was designed to aid in the care of patients in the period
immediately following cardiac surgery. The basic monitoring system has
proven to be useful in caring for patients with marked cardiovascular in-
stability or severe respiratory malfunction (Hilberman et al., 1975). Most
of these patients are given breathing assistance with a mechanical ventilator
until the immediate effects of anesthesia, surgery, and heart-lung bypass
have subsided. The ventilator is essential to survival fi)r many of these
patients. Electrocardiogram leads are always attached, and patients usually
have indwelling arterial catheters to assure adequate monitoring of blood
pressure and to provide for the collection of arterial blood fi)r gas analysis.
The ventilator-to-patient airway is monitored to collect respiratory flows,
rates, and pressures. Oxygen and carbon dioxide concentrations are also
Overview
of the VentilatorManager
Program 399

measured. All of these measurements are available at the bedside through


the use of specialized computer terminals.
The mechanical ventilator provides total or partial breathing assistance
(or ventilation) for seriously ill patients. Most ventilator therapy is with
type of machine that delivers a fixed volume of air with every breath, but
a second type of machine delivers air at each breath until a fixed pressure
is attained. Both the type and settings of the ventilator are adjusted to
match the patients intrinsic breathing ability. The "volume" mechanical
ventilator provides a fixed volume of air under pressure through a tube
to the patient. The ventilator can be adjusted to provide breaths at fixed
intervals, which is called controlled mandatory ventilation (CMV),or in re-
sponse to sucking by the patient, which is knownas assist mode. Adjustments
to the output w)lume or the respiration rate of the ventilator are made to
ensure an adequate minute volume to the patient. Whenthe patients status
improves, the mechanical ventilator is disconnected and replaced by a T-
piece that connects an oxygen supply with the tube to the patients lungs.
If the patient can demonstrate adequate ventilation, the tube is removed
(extubation). Often manyof these clinical transitions must be repeated until
the patient can breathe without assistance.
Three types of problems can occur in managing the patient on the
mechanical ventilator:

1. changes in the patients recovery process, requiring modifications to the


life support equipment,
2. malfunctions of" the life support equipment, requiring replacement or
adjustment of the ventilator, and
3. failures of the patient to respond to therapeutic interventions within
the expectations of the clinicians in charge.

22.2Overview of the Ventilator Manager Program

The complete system (diagrammed in Figure 22-1) includes the patient


monitoring sensors in the ICU, the basic monitoring system running on
IBM 1800 and PDP-11 computers at the Pacific Medical Center, and the
VMmeasurement interpretation program running on the SUMEX-AIM
PDP-10 computer located at Stanford University Medical Center. Patient
measurements are collected by the monitoring system for VMat two- or
ten-minute intervals. Summaryinformation, suggestions to the clinicians,
and requests for additional information are generated at SUMEX for eval-
uation by research clinicians. The programs outputs are in the form of
periodic graphical summaries of" the major conclusions of the program and
short suggestions for the clinician (as shown in Figures 22-2 and 22-3).
400 Extensions to the Rule-Based Formalism for a Monitoring Task

t
LIFE SUPPORT
I~

THERAPY

OBSERVATIONS

T
=1 MONITORINGI VM
"1
FIGURE 22-1 VM system configuration. Physiological mea-
surements are gathered automatically by the monitoring system
and provided to the interpretation program. The summary in-
formation and therapeutic suggestions are sent back to the ICU
for consideration by clinicians.

Summary
generatedat time 15:40

All conclusions:
...I ..... I ..... I ..... I,..
12 13 14 15
~===~===== ==
BRADYCARDIA[PRESENT]
HEMODYNAMICS[STABLE]
HYPERVENTILATION[PRESENT]
HYPOTENSION[PRESENT]
GoalLocation CCCCCCCCCCCCC / AAAAAAAAAAA
PatientLocation V/CCCCCCCCCCCCCCCCCCCCCCC

12 13 14 15

FIGURE 22-2 Summary of conclusions drawn by VM based


on four hours of patient data. Current and optimal patient ther-
apy stages are represented by their first letter: V -- VOLUME,
A = ASSIST, C = controlled mandatory ventilation,/= chang-
ing. A double bar (=) is printed for each ten-minute interval
in which the conclusion on the left is made.
Overview of the Ventilator Manager Program 401

1640..
** SUGGEST CONSIDER PLACINGPATIENTON T-PIECE IF
** PA02> 70 ONFI02 < = .4 [measureof blood gasstatus]
** PATIENTAWAKEANDTRIGGERING VENTILATOR
** ECGIS STABLE

.. 1650.... 1700.... 1710.... 1720.... 1730.... 1740.... 1750..


.. 1800..
** HYPERVENTILATION
** PATIENTHYPERVENTILATING.
** SUGGEST REDUCING EFFECTIVEALVEOLAR VENTILATION.
** TO REDUCE ALVEOLAR VENTILATION,REDUCE TIDAL VOLUME,
*" REDUCE RESPIRATION RATE, OR
** INCREASEDISTAL DEADSPACETUBINGVOLUME

1810..
SYSTEMASSUMES
PATIENTSTARTINGT-PIECE

1813.... 1815.... 1817..


HYPOVENTILATION

1819.... 1822..
HYPOVENTILATION

FIGURE 22-3 Trace of programoutput. Format is ".. <time


of day> . ." followed by suggestions for clinicians. Comments
are in brackets.

22.2.1 Measurement Interpretation

Knowledge is represented in VMby production rules of the form shown


in Figure 22-4.
The historical relations in the premise of a rule cause the program to
check values of parameters for a period of time; e.g., HYPERVENTI-
LATIONis PRESENTfor more than ten minutes. Conclusions made in the
action part of the rule assert that a parameter has had a particular value
during tile time instance when the rule was examined. Suggestions are text
statements, printed out for clinicians, that state important conclusions and
a possible list of remedies. Expectations assert that specific measurements

IF: Historical relations aboutoneor moreparameters


hold
THEN:1) Makea conclusionbasedon these facts;
2) Makeappropriate suggestions
to clinicians; and
3) Createnewexpectationsaboutthe
future valuesof parameters.

FIGURE 22-4 Formatfor rules in VM.Not every rules action


part includes conclusions, suggestions, andexpectations.
402 Extensions to the Rule-BasedFormalismfor a MonitoringTask

should be within the specified ranges at some point in the future. Thus a
rule examines the current and historical data to interpret what is happen-
ing at the present and to predict events in the future.
Additional information associated with each rule includes the symbolic
name (e.g., STABLE-HEMODYNAMICS), the rule group (e.g., rules about
instrument faults), the main concept (definition) of the ru~e, and all of
therapeutic states in which it makes sense to apply the rule. The list of
states is used to focus the program on the set of rules that are applicable
at a particular point in time. Figure 22-5 shows a sample rule for deter-
mining hemodynamic stability (i.e., a measure of the overall status of the
2cardiovascular system).

STATUSRULE:STABLE-HEMODYNAMICS
DEFINITION:
Definesstablehemodynamics
based on bloodpressuresandheartrate
APPLIES
TO:patients on VOLUME, CMV,ASSIST, T-PIECE
COMMENT: Lookat mean
arterialpressure
for changesin bloodpressure
andsystolicbloodpres-
surefor maximum
pressures,
IF
HEARTRATEis ACCEPTABLE
PULSERATEdoesNOT CHANGE by 20 beats/minute in 15 minutes
MEAN ARTERIAL PRESSURE is ACCEPTABLE
MEANARTERIAL PRESSUREdoesNOTCHANGE by 15 torr in 15 minutes
SYSTOLICBLOOD PRESSURE is ACCEPTABLE
THEN
The HEMODYNAMICS are STABLE

FIGURE22-5 Sample VMrule.

The VMknowledge base includes rules to support five reasoning steps


that are evaluated at the start of each new time segment:

1. characterize measured data as reasonable or spurious;


2. determine the therapeutic state of" the patient (currently the mode of
ventilation);
3. adjust expectations of future values of measured variables when the
patient state changes;
4. check physiological status, including cardiac rate, hemodynamics, ven-
tilation, and oxygenation; and
5. check compliance with long-term therapeutic goals.

Each reasoning step is associated with a collection of rules, and each rule
is classified by the type of conclusions made in its action portion; e.g., all
rules that determine the validity of the data are classed together.

UTilecompleterule set, fromwhichthis rule wasselected, is included in the dissertation by


Fagan(1980), which is available from University Microfihns, #AADS0-24651.
Overview of the Ventilator Manager Program 403

22.2.2 Treating Measurement Ranges Symbolically

Most of the rules represent the measurement values symbolically, using the
terms ACCEPTABLE or IDEAL to characterize the appropriate ranges.
The actual meaning of" ACCEPTABLE changes as the patient moves from
state to state, but the statement of the relation between physiological mea-
surements remains constant. For example, the rule shown in Figure 22-5
checks to see if" the patients heart rate is ACCEPTABLE. In the different
clinical states, or stages of mechanical assistance, the definition of AC-
CEPTABLE changes. Immediately after cardiac surgery a patients heart
rate is not expected to be in the same range as it is when he or she is moved
out of" the ICU. Mentioning the symbolic value ACCEPTABLE in a rule,
rather than the state-dependent numerical range, thus reduces the number
of rules needed to describe the diagnostic situation.
The meaning of the symbolic range is determined by other rules that
establish expectations about the values of measured data. For example,
whena patient is taken off the ventilator, the upper limit of acceptability
for the expired carbon dioxide measurement is raised. (Physiologically,
patient will not be able to exhale all the CO,) produced by his or her system,
and so CO2 will accumulate.) The actual numeric calculation of EXPIRED
pCO2 HIGH in the premise of any rule will change when the context
switches (removal from ventilatory support), but the statement of the rules
remains the same. A sample rule that creates these expectations is shown
in Figure 22-6.

22.2.3 Therapy Rules

Therapy rules can be divided into two classes: the long-term therapy as-
sessment (e.g., when to put the patient on the T-piece), and the determi-
nation of response to a clinical problem, such as hyperventilation or hy-
pertension. The two rules shown in Figure 22-7, for selecting T-piece
therapy and for responding to a hyperventilation problem, demonstrate
several key factors in the design of the rule base:

use ofa hierarchy of physiological states,


use of" the programs determination of patients clinical state,
generation of" conditional suggestions.

The abstracted hierarchy of states, such as hemodynamicstability, is im-


portant because it makes the rules more understandable. Since the defi-
nition of stability changes with transition to different clinical stages, as
described above, rules about stability are clearer if they mention the con-
cept rather than the context-specific definition. It is important for the
program to determine what state the patient is in, since the program is
404 Extensions to the Rule-Based Formalism for a Monitoring Task

INITIALIZINGRULE:INITIALIZE-CMV
DEFINITION:Initialize expectations
for
patientsoncontrolledmandatory
ventilation (CMV)
therapy
APPLIESTO: all patients on CMV
IF ONEOF:
PATIENTTRANSITIONED FROMVOLUME TO CMV
PATIENTTRANSITIONED FROMASSISTTO CMV
THENEXPECTTHE FOLLOWING

[ ... acceptable range ...


[ ... ideal ... ]
very very
low low min max high high
MEANPRESSURE 60 75 80 95 110 120
HEARTRATE 60 110
EXPIREDpC02 22 28 30 35 42 50

FIGURE22-6 Portion of an initializing rule. This type of rule


establishes initial expectations of acceptable and ideal ranges
of variables after state changes. Not all ranges are defined for
each measurement. EXPIRED pCO2 is a measure of the per-
centage of carbon dioxide in expired air measured at the mouth.

designed to avoid interrupting the activities in the ICU to ask questions of


the physicians or nurses. Its design is thus different from the design of a
one-shot consultation system such as MYCIN.A physician will change the
mode of assistance from CMV(where the machine does all the work of"
breathing) to ASSIST(where the machine responds to a patients attempts
to breathe). The VMprogram has to know that this transition is normal
and to determine when it occurs in order to avoid drawing inappropriate
conclusions. The advice that VMoffers is often conditional. Unlike other
consultation programs such as MYCIN,VMattempts to avoid a dialogue
with the clinician. Whenthe appropriateness of a suggestion depends on
facts not knownto VM,it creates a conditional suggestion. The clinician
can check those additional [acts and make an independent determination
of the appropriateness of the suggestion.

22.2.4 Selecting Optimal Therapy

The stages of ventilatory therapy are represented in VMby a finite state


graph (see Figure 22-8). The boxed nodes of" the graph represent the
values associated with the parameters "PatientLocation," specifying the cur-
rent state, and "GoalLocation," specifying alternative therapies. The arcs
of the graph represent transition rules and therapy rules. Thus goals are
expressed as "moves" away from the current therapeutic setting, and each
Overview of the Ventilator Manager Program 405

THERAPY-RULE: THERAPY.A-T
DEFINITION: DEFINESREADINESS TO TRANSITION FROMASSIST MODETO T-PIECE
COMMENT:If patient hasstablehemodynamics, ventilation is acceptable,andpatient hasbeen
awake
andalert enough to interact with the ventilatorfor a periodof timethentransition
to T-pieceis indicated.
APPLIESTO: ASSIST
IF
HEMODYNAMICS ARE STABLE
HYPOVENTILATION NOTPRESENT
RESPIRATIONRATEACCEPTABLE
PATIENTIN ASSISTFOR> 30 MINUTES
THEN
THEGOALIS FORTHEPATIENTTO BE ONTHET-PIECE
SUGGEST CONSIDER PLACINGPATIENTONT-PIECE IF
Pa02> 70 on FI02 < = 0.4
PATIENTAWAKEANDTRIGGERING VENTILATOR
ECGIS STABLE

THERAPY-RULE: THERAPY.VENTILATOR-ADJUSTMENT-FOR-HYPERVENTILATION
DEFINITION: MANAGE HYPERVENTILATION
APPLIESTO: VOLUME ASSIST CMV
IF
HYPERVENTILATION PRESENT for > 8 minutes
COMMENT
wait a short while to seeif hyperventilationpersists
V02not low
THEN
SUGGEST PATIENTHYPERVENTILATING.
SUGGEST REDUCING EFFECTIVEALVEOLAR VENTILATION.
TO REDUCE ALVEOLAR VENTILATION,REDUCE TV BY 15%, REDUCE
RR, OR
INCREASEDISTAL DEADSPACETUBINGVOLUME

FIGURE22-7 Two therapy rules. The first (THERAPY.A-T)


suggests a T-piece trial; the secondresolves a hyperventilation
problem.

possible movecorresponds to a decision rule. The overall clinical goal, of


course, is to make the patient self-sufficient, specifically, to remove the
mechanical breathing assistance (extubate) as soon as is practical for each
patient. The knowledge base is linked to the graph through the APPLIES
TO statement specified in the introductory portion of each rule.
The mechanism for deriving and representing therapy decisions in
VMtakes into account the relationship between VMssuggestions and ac-
tual therapy changes. Computer-generated suggestions about therapy
changes are decoupled from actual changes due to: (1) additional infor-
mation to the clinician suggesting modification to or disagreement with
VMssuggestion; (2) sociologic factors that delay the implementation
the therapy decisions (e.g., T-piece trials have been delayed due to concern
about disturbing a patient in the next bed); or (3) variation of criteria
amongclinicians for making therapy decisions. Because of the discrepancy
between computer-generated goals and actual therapy, VMcannot assume
that the patient is actually in the stage that the program has determined
406 Extensionsto the Rule-Based
Formalism
for a Monitoring
Task

CMV

NOT VOLUME r

MONITORED~ ~ VENTILATION
T-PIECE ~ EXTUBATE
I

ASSIST _I
FIGURE
22-8 Therapy state graph.

is optimal. Transition rules in VMallow the program to notice changes in


a patients state. Theyreset the description of" the context, then, so that the
data will be interpreted correctly. However, when the therapy rules are
evaluated, the program may determine that the previous state is still more
appropriate.
Two models can be created for representing the period of" time be-
tween the suggestion of therapy (a new goal) and its implementation. The
first modelis that therapy goals are the sameas htst stated, unless explicitly
changed. It assumes that once a new therapeutic goal is established, the
goal should persist until either the therapy is initiated or the goal is negated
by a rule. This model is based on the commonclinical practice of contin-
uing recently initiated therapy even if the situation has changed. This
clinical practice, which might be termed hysteresis, :~ is used to avoid frequent
changes in treatment strategy--i.e., avoid oscillation in the decision-making
process. While clinicians acknowledge this behavior, they find it hard to
verbalize rules for rescinding previous therapy goals. This hysteresis has
also been evident in the formulation of some of the therapy rules. The
rule that suggests a switch from assist modeto T-piece is stated in terms
of ACCEPTABLE limits; the rule for aborting T-piece trials (back to assist
mode or CMV)is stated in terms of" VERYHIGHor VERYLOWlimits.
This leaves a "grey area" between the two decision points and precludes
fluctuating between decisions.

3Hysteresisis "a lag of e|fect whenthe forces acting ona bodyare changed"
(WebstersNew
WorldDictionaTy,1976).
Details of VM 407

The second model for representing therapeutic goals requires that the
appropriate goal be asserted each time the rule set is evaluated. If no
therapy rules succeed in setting a new goal, the goal is asserted to be the
current therapy. This scheme ignores the apparent practices of clinicians,
but represents a more "conservative" approach that is consistent with the
rule-writing strategy used by our experts. This model is potentially sensi-
tive to minor perturbations in the patient measurements, but such sensi-
tivity implies that a borderline therapy decision was originally made.

22.3Details of VM

22.3.1 Parameters

The knowledge in VMis based on relationships among the various param-


eters of the patient, such as respiration rate, sex, and hyperventilation. The
program assigns values to each of these parameters as it applies its knowl-
edge to the patient data: the respiration rate is high, the sex of the patient
is male, and hyperventilation is present. In a changing domain, the values
associated with each parameter may vary with time, for example, "hyper-
ventilation was present for one-half hour, starting two hours ago." Not all
parameters have the same propensity to change over time; a classification
is given in Figure 22-9.
Parameters are represented internally by using the property list no-
tation of LISP. The property list contains both static elements (e.g., the list
of rules that use the parameter in the premise) and dynamic elements (e.g.,
the time when the parameter was last updated). The static elements are
input when the parameter is described or calculated from the contents of
the rule set. The dynamic elements are computed as the program inter-
prets patient data. Figure 22-10 lists the properties associated with param-
eters (ahhough not every parameter has every property).
Figure 22-11 shows a "snapshot" of the parameters RR (respiration
rate) and HEMODYNAMICS taken after 120 minutes of data have been
processed. Associated with values assigned to parameters (e.g., RR LOW
or HEMODYNAMICS STABLE) are lists of intervals when those conclu-
sions were made. Each interval is calculated in terms of the elapsed time
since patient data first became available. Thus, in the example, the hemo-
dynamics were stable from 2-8 minutes into the program, momentarily at
82 minutes, and in the interval of 99-110 minutes of elapsed time.
The properties USED-IN, CONCLUDED-IN, and EXPECTED-IN
are used to specify how the parameters are formed into a network of rules.
These pointers can be used to guide various strategies for examining the
rules--e.g., find and evaluate each rule that uses respiration rate or each
rule that concludes hemodynamicstatus.
408 Extensions to the Rule-BasedFormalismfor a MonitoringTask

Co~tstant
Examples: surgery type, sex
Input: once
Reliability: value is good until replaced
Continuous
Examples: heart rate, blood pressure
Input: at regular intervals (6-20 times/hour)
Reliability: presumed good unless input data are missing or artifactual
Volunteered
Examples: temperature, blood gases
Input: at irregular intervals (2-10 times/day)
Reliability: good for a period of time, possibly a [unction of the current
situation
Deduced
Examples: hyperventilation, hemodynamic status
Input: calculated whenever new data are available
Reliability: a function of the reliability of each of" the component
parameters.

FIGURE22-9 Classification of parameters.

The UPDATED-AT and GOOD-FOR properties are used to deter-


mine the status of the parameter over time, when it was last given a value
and the time period during which a conclusion made about this parameter
can reasonably be used for making future conclusions. The GOOD-FOR
property can also be a pointer to a context-dependent rule.

DEFINITION: free form text describing the parameter


USED-IN:a list of the names of rules that use this parameter to make conclusions
CONCLUDED-IN: names of rules where this parameter is concluded
EXPECTED-IN:names of rules where expectations about this parameter are made
GOOD-FOR:length of time that a measurement can be assumed to be valid; if
missing, then must be recomputed, input if possible, or assumed unknown
UPDATED-AT: the last time any conclusion was made about this parameter

FIGURE22-10 Properties associated with parameters.


Details of VM 409

RR

DEFINITION:(RESPIRATION RATE)
USED-IN: (TRANSITION.V-CMV TRANSITION.V-A TRANSITION.A-CMV
TRANSITION.CMV-A STATUS.BREATHING-EFFORT/T THERAPY.A-CMV
THERAPY.A-TTHERAPY.T-VABNORMAL-EC02)
EXPECTED-IN:(INITIALIZE.VINITIALIZE.CMV INITIALIZE.V-RETURN
INITIALIZE.AINITIALIZE.T-PIECE)
GOOD-FOR:15 [information is goodfor 15 minutes]
UPDATED-AT:82 [last updated at 82minutesafter start]
LOW:((72.82) (52.59)) [concluded to be LOW from 52-59 minutesand 72-82
minutesafter start]

HEMODYNAMICS

DEFINITION: (HEMODYNAMICS)
CONCLUDED-IN: (STATUS.STABLE-H EMODYN/V,A,CMV)
USED-IN: (THERAPY.CMV-A THERAPY.A-TTHERAPY.T-PiECE-TO-EXTUBATE)
GOOD-FOR:NIL [this is a derivedparameter
so reliability is basedon
other parameters]
UPDATED-AT:110 [last updatedat 110minutes
after start]
STABLE:((99. 110) (82.82) (2.8))

FIGURE 22-11 "Snapshot" of parameters RR (respiration


rate) and HEMODYNAMICS after 120 minutes of elapsed time.

22.3.2 Measurements

Over 30 measurements are provided to VMevery 2 to 10 minutes. 4 The


interval is dependent on the situation; shorter intervals are used at critical
times as specified by the clinician at the bedside. It is not appropriate to
store this information using the interval notation above, since most mea-
surements change with every new collection of data. Predefined intervals,
e.g., respiration rate from 5-10, 10-15, and 15-20 breaths/minute, could
be used to classify the data, but meaningful ranges change with time. In-
stead, symbolic ranges such as HIGHand LOWare calculated from the
measurements as appropriate. A large quantity of data is presented to the
program, in contrast to typical knowledge-based medical systems. About
5000 measurement values per patient are collected each day (30 measure-
ments per collection with 6-8 data collections/hour). Patients are moni-
tored from a few hours to a few weeks, with the average about 1.5 days.
While this amount of information could be stored in a large-scale program
such as VM,only the most recent information is used to make conclusions.
The program stores in memory about one hours worth of data, indepen-
dent of the time interval between measurement collections (the remainder

4Clinicians can select the dethuh sample rate: fast (2 minutes) or slow (10 minutes). An extra
data sample can be takeq immediately on request.
410 Extensions to the Rule-Based Formalism for a Monitoring Task

of the data are available on secondary storage). Technically, this storage is


accomplished by maintaining a queue of arrays that contain the entire
collection of measurements that vary over time. The length of the queue
is adjusted to maintain an hours worth of data. Schematically, the mea-
surement storage might be represented as follows:

Elapsed time 69 59 58 ... 09


Respiration rate 9 9 10 ... 9
Systolic blood pressure 141 154 153 ... 150

Clock time 1230 1220 1219 ... 1130


[current
time]

Throwing away old measurements does not limit the ability of the program
to utilize historical data. The conclusions based on the original data, which
are stored much more compactly, are maintained throughout the patient
run. Thus the numerical measurement values are replaced by symbolic
abstractions over time.
One current limitation is the programs inability to reevaluate past
conclusions, especially when measurements are taken but are not reported
until some time later. One example of this is the interpretation of blood
gas measurements. It takes about 20-30 minutes for the laboratory to
process blood gas samples, but by that time the context may have changed.
The program cannot then back up to the time that the blood gases were
taken and proceed forward in time, reevaluating the intervening measure-
ments in light of the new data. The resolution of conflicts between expec-
tations and actual events may also require modification of old conclusions.
This is especially true when forthcoming events are used to imply that an
alternative cause provides a better explanation of observed phenomena.

22.3.3 Rules

Rules used in VMhave a fixed structure. The premise of a rule is con-


structed from the conjunction or disjunction of a set of clauses. Each clause
checks relationships about one or more of the parameters known to the
program. Each of these relationships, such as "the respiration rate is be-
tween an upper and lower limit," will be tested to determine if the premise
is satisfied. If the clauses are combined conjunctively and each clause is
true, or combined disjunctively and at least one clau~e is true, then the
rule is said to "succeed." As explained in Section 22.3.6 on uncertainty in
VM, no probabilistic weighting scheme is currently used in the rule eval-
uation (although the mechanism is built into the program).
Details of VM 411

Whena rule succeeds, the action part of the rule is activated. The
action portion of each rule is divided into three sections: conclusions (or
interpretations), suggestions, and expectations. The only requirement is
that at least one statement (of any of the three types) is made in the action
part of the rule. The first section of the action of the rule is composedof
the conclusions that can be drawn from the premise of the rule. These
conclusions (in the form of a parameter assuming a value) are asserted
the program to exist at the current time and are stored away for producing
summaries and to provide new facts for use by other rules. Whenthe same
conclusion is also asserted in the most recent time when data are available
to the program, then the new conclusion is considered a cantinuatian of the
old one. The time interval associated with the conclusion is then extended
to include the current time. This extension presumes that the time period
between successive conclusions is short enough that continuity can be
asserted.
The second section of the action is a list of suggestions that are printed
for the clinician. Each suggestion is a text string to be printed that sum-
marizes the conclusions made by the rule. 5 Often this list of suggestions
includes additional /actors to check that cannot be verified by the pro-
gram---e.g., the alertness of the patient. By presenting the suggestions as
conditional statements, the need to interact with the user to determine the
current situation is minimized. The disadvantage of this method is that the
program maintains a more nebulous view of the patients situation, unless
it can be ascertained later that one of the suggestions was carried out.
The last section of the action part of the rule is the generation of new
expectations about the ranges of measurements for the future. Expecta-
tions are created to help the program interpret future data. For example,
when a patient is first moved from assist mode to the T-piece, many pa-
rameters can be expected to change drastically because of the stress as well
as the altered mode of breathing. Whenthe measurements are taken, then,
the program is able to interpret them correctly. New upper and lower
bounds are detined for the acceptable range of values for heart rate, for
example, for the duration of time specified. The duration might be spec-
ified in minutes or in terms of a context (e.g., "while the patient is on the
T-piece").
MYCINdoes not place any constraints on the types of conclusions
made in the action part of the rule, although most rules use the CON-
CLUDEfunction in their right-hand sides. For example, MYCINcalls a
program to compute the optimal therapy as an action part of a rule (Chap-
ter 6). The basic motivation behind imposing some structure on rules was
to act as a mnemonic device during rule acquisition. The same advantage
is found in framelike systems with explicit component names--e.g.,
CAUSED-BY,MUST-HAVE,and TRIGGERSin the Present Illness Pro-
gram (Szolovits and Pauker, 1978).

5Notevery conclusionhas a correspondingsuggestion, particularly whenthe conclusion


denotesa "normal"status--e.g., hemodynamic
stability.
412 Extensionsto the Rule-Based
Formalism
for a Monitoring
Task

A rule is represented internally by a property list with a fixed set of


properties attached to the name of the rule:

RULEGROUP Defines type or class of the rule; in this case, the


rules that deduce the status of the patient
DEFINITION Free text that defines the main idea of the rule
COMMENT The collected comments from the external form of
the rule
NODE All of the contexts for which this rule makes sense
(currently limited to the values associated with the
patients therapeutic setting)
EVAL Specifies the methods of evaluation; ALLOF lot
conjunction, ONEOF for disjunction, X%tbr
requirement of a fixed percentage of verified
premise clauses
ORIGLHS A copy of the external notation of the premise of
the rule, used in explanations and tracing
FILELOCATION The description of the location on a file of the
original text of the rule
M The translated premise of the rule, a list of calls to
premise functions (M stands for match)
I The list of interpretations (conclusions) to be made
S The list of suggestions to be printed out
E The list of expectations to be made

The actual processing of a rule is carried out by a series of functions


that test conditions, make interpretations, make suggestions, or create ex-
pectations. Each of these functions has a well-defined semantic interpre-
tation and provides the primitives fbr encoding the knowledge base.
The translation between an external format, e.g., RESPIRATION
RATE> 30, and the corresponding internal format, (MCOMP RR > 30),
is made by the same parsing program used in EMYCIN.6 The MCOMP
function is given a parameter name (RR), a relation (less than, greater
than, or equal to), and a number with which to compare it. The execution
of the MCOMP function returns a numerical representation of TRUE,
FALSE or UNKNOWN, based on the current value of respiration rate.
Figure 22-12 demonstrates the external and internal representations of a
typical rule.

22.3.4 Premise Functions

One goal of the VMimplementation is to create a simple set of premise


functions that are able to test for conditions across time. Manyof the static
premise functions have been adapted from the MYCINprogram; e.g.,

6Theparsing programwas written by JamesBennett, basedon workby Hendrix(1977).


Details of VM 413

STATUSRULE: STATUS.STABLE-HEMODYNAMICS
DEFINITION: Defines stable hemodynamics based
on bloodpressuresandheart rate
APPLIESTO: patients on VOLUME, CMV,ASSIST,
T-PIECE
COMMENT: Look at meanarterial pressurefor
changes in bloodpressureandsystolic
blood pressurefor maximum pressures.
IF
HEARTRATEis ACCEPTABLE
PULSERATEdoes NOTCHANGE by 20 beats/minute
in 15minutes
MEANARTERIALPRESSURE is ACCEPTABLE
MEANARTERIALPRESSURE does NOTCHANGE by 15
torr in 15minutes
SYSTOLICBLOODPRESSURE is ACCEPTABLE
THEN
The HEMODYNAMICS are STABLE

RULEGROUP:STATUS-RULE
DEFINITION: ((DEFINES STABLEHEMODYNAMICS BASED)
(ON BLOODPRESSURES ANDHEARTRATE))
COMMENT: ((LOOK AT MEANARTERIALPRESSURE FOR)
(CHANGES IN BLOODPRESSURE ANDSYSTOLIC)
(BLOOD PRESSURE FORMAXIMUMPRESSURES))
NODE:(VOLUMECMVASSIST T-PIECE)
EVAL: (ALL OF)
ORIGLHS:((HEARTRATEIS ACCEPTABLE)
(PULSERATEDOESNOTCHANGE BY 20 BEATS/MINUTE
IN 15
MINUTES)
(MEANARTERIALPRESSURE IS ACCEPTABLE)
(MEANARTERIALPRESSURE DOESNOT CHANGE BY 15 TORR
IN 15 MINUTES)
(SYSTOLICBLOODPRESSURE IS ACCEPTABLE))
FILELOCATION: (<puffNM>VM.RULES;18 12538 13143)
M: ((MSIMP HR ACCEPTABLE NIL)
(FLUCTPRCHANGE 20 (0.0 15) NOT)
(MSIMPMAPACCEPTABLE NIL)
(FLUCTMAPCHANGE 15 (0.0 15) NOT)
(MSIMPSYSACCEPTABLE NIL))
I: ((INTERPHEMODYNAMICS = STABLE
NIL))

FIGURE 22-12 External and internal representations for a


rule in VM.

MCOMPencompasses the functions of GREATERELESSP, numeric


EQUAL,and their negations in MYCIN.Most of the functions listed below
test the value of a parameter within a time interval and return TRUE,
FALSE or UNKNOWN. As mentioned earlier, they reference concepts,
such as HIGHvalue or STABLEvalue, that are defined by rules at each
stage. Each function is composed of the following program steps: (a) find
out the value of the parameter mentioned in the time period mentioned
(otherwise, use the current time), (b) make the appropriate tests, and
negate the answer, if" required. Table 22-1 lists the premise functions.
414

,.,1
415

.,-= >[.-,

-- D

~~..~

Z,....., c/3

>
,.-,
"~.~
baC

,....,
416 Extensionsto the Rule-Based
Formalism
for a Monitoring
Task

22.3.5 Control Structure

A simple control structure is used to apply the knowledge base to the


patient data. This method starts by the execution of the goal rule, which
in turn evaluates a set of rules corresponding to each level of abstraction
in order: first, data validation, followed by context checking and expecta-
tion setting, determination of physiological status, and finally, therapeutic
response, if necessary. From the group of" rules at each level of reasoning,
each rule is selected in turn. The current context as determined by the
program is compared against the list of applicable contexts for each rule
(the NODEproperty). The premise portions of acceptable rules are ex-
amined. If the parameter mentioned in a premise clause has not yet been
fully evaluated, an indexing scheme is used to select the rules within this
rule set that can make that conclusion. Using this method avoids the ne-
cessity of putting the rules in a specific order. The rule is added to a list
of "used" rules, and the next unexaminedrule is studied. The list of eval-
uated rules is erased each time the rule set is evaluated. Whena rule
succeeds, the action part of the rule is used to make interpretations, print
suggestions, and set expectations.
Most rules attempt to explain the interpretation of measurements that
have "violated" their expectations. Thus, for the portion of the rules that
mention an "out-of-range" measurement value in their premise or that are
based on the conclusions of these rules, the following strategy could be
used: compare all measurements against the current expectations, and for-
ward chain only those measurements with values that require explanation.
This method is not useful when the rule specifies that several normal mea-
surements imply a normal situation, e.g., determining hemodynamic sta-
bility. These "normal" rules would have to be separated and forward- or
backward-chained as appropriate.

22.3.6 Uncertainty in VM

Although the MYCINcertainty factor mechanism (Chapter 11) is incor-


porated into the VMstructure, it has not been used. Most of the repre-
sentation of uncertainty has been encoded symbolically in the contents of
each rule. Rules conclude that measurement values can be spurious (under
specified conditions), and the interpreter prohibits using such aberrant
values for further inferences. Any value associated with a measured pa-
rameter that was concluded too long ago is considered to be unknownand,
therefore, no longer useful in the reasoning mechanism. This is meant to
be a first approximation to our intuition that confidence in an interpre-
tation decays over time unless it is reinforced by new observations.
Uncertainty has been implicitly incorporated in the VMknowledge
base in the formulation of some rules. In order to make conclusions with
a higher level of" certainty, premise clauses were added to rules that cor-
Details of VM 417

related strongly with existing premise clauses--e.g., using both mean and
systolic blood pressures. The choice of measurement ranges in several ther-
apy rules also took into account the element of uncertainty. Although the
experts wanted four or five parameters within the IDEALlimit prior to
suggesting the transition to the next optimal therapy state, they often used
the ACCEPTABLE limits. In fact, it would be unlikely that all measure-
ments would simultaneously fall into IDEALrange. Therefore, incorpo-
rating these "grey areas" into the definition of the symbolic ranges was
appropriate. There are at least two possible explanations for the lack of
certainty factors in VMrule base: (1) on the wards, it is only worthwhile
to make an inference if one strongly believes and can support the conclu-
sion; and (2) the measurements available from the monitoring system were
chosen because of their high correlation with patients conditions.
As mentioned elsewhere, the PUFF and SACONsystems also did not
use the certainty factor mechanism. The main goal of these systems was
to classify or categorize a small number of conclusions as opposed to mak-
ing fine distinctions between competing hypotheses. This view of uncer-
tainty is consistent with the intuitions of" other researchers in the field
(Szolovits and Pauker, 1978, p. 142):

If possible, a carefully chosencategorical reasoning mechanism which is


based on some simple model of the problem domain should be used for
decision making. Manysuch mechanismsmayinteract in a large diagnostic
system, with each being limited to its small subdomain.... Whenthe complex
problems need to be addressed--which treatment should be selected, how
muchof the drug should be given, etc.--then causal or probabilistic models
are necessary. The essential key to their correct use is that they must be
applied in a limited problem domain where their assumptions can be ac-
cepted with confidence. Thus, it is the role of categorical methodsto discover
what the central problemis and to limit it as strongly as possible; only then
are probabilistic techniquesappropriate for its solution.

22.3.7 Representation of Expectations in VM

Representing expectations about the course of patient measurements is a


major design issue in VM.In the ICU situation, most of the expectations
are about the typical ranges (bounds) associated with each physiological
measurement. Interpreting the relationship between measurement values
and their expectations is complicated particularly at the discontinuities
caused by setting numeric boundaries. For example, on a scale of possible
blood pressure values ranging from 50 to 150, how much difference can
there be between measurement values of 119 and 121, in spite of some
boundary at 120? However, the practice of setting specific limits and then
treating values symbolically (e.g., TOOHIGH) appears to be a common
educational and clinical technique. The ill effects on decision making of
418 Extensionsto the Rule-BasedFormalism
for a Monitoring
Task

Symbolicvalue Interpretation
IDEAL The desired level or range of a measurement
ACCEPTABLE The limits of acceptable values beyondwhich
corrective action is necessary--boundsare high
and low (similar for rate)
VERY UNACCEPTABLE Limit at which data are extremely out of range--
e.g., on whichthe definition of severe hypotension
is based
IMPOSSIBLE Outside the limits that are physiologically possible

FIGURE22-13 Representing expectations using symbolic


bounds.

setting specific limits are minimizedby the practice of using multiple mea-
surements in coming to specific conclusions. One alternative to using sym-
bolic ranges would be to express values as a percentage of some predefined
norm. This has the same problems as discrete numeric values, however,
when the percentage is used to draw conclusions. Whenit was important
clinically to differentiate how much an expectation was exceeded, the no-
tion of alternate ranges (e.g.., VERYHIGH)was utilized. For the physio-
logical parameters, several types of bounds on expectations have been es-
tablished, as shown in Figure 22-13. In VMthese limits are not static; they
are adapted to the patient situation. Currently, the majority of the expec-
tation changes are associated with changes in ventilator support. These
expectations are established on recognition of the changes in therapy and
remain in effect until another therapy context is recognized. A more global
type of expectation can be specified that persists for the entire time patient
data are collected. A third type of expectation type corresponds to a per-
turbation, or local disturbance in the patients situation. An example of
this is the suctioning maneuver where a vacuum device is put in the pa-
tients airway. This disturbance has a characteristic effect on almost every
measurement but only persists for a short period of time, usually 10-15
minutes. After this time, the patients state is similar to what it was in the
period just preceding the suction maneuver. It is possible to build a hier-
archy out of these expectation types based on their expected duration; i.e.,
assume the global expectation unless a specific contextual expectation is
set, provided a local perturbation has not recently taken place.
Knowledge about the patient could be used to "customize" the expec-
tation limits for the individual patient. The first possibility is the use of
historical information to establish a priori expectations based on type of
surgery, age, length of time on the heart/lung machine, and presurgical
pulmonary and hemodynamic status. The second type of customization
could be based on the observation that patient measurements tend to run
within tighter bands than the a priori expectations. The third type of ex-
pectation based on transient events can be used to adjust for the effects of
Summary and Conclusions 419

temporary intervention by clinicians. This requires expert knowledge


about the side effects of each intervention and about the variation between
different classes of patients to these temporary changes.

22.3.8 Summary Reports

Summaryreports are also provided at fixed intervals of time, established


at the beginning of the program. Summaries include: (1) a description
current conclusions (e.g., PATIENT HYPERVENTILATINGFOR 45
MINUTES);(2) a graph with time on one axis (up to six hours) and recent
conclusions on the other; and (3) a similar graph with time versus mea-
surements that are beyond the expected limits. (Figure 22-2 shows a por-
tion of a sample summary report.)
The summary report is based on several lists generated by the pro-
gram. The first list is composedof parameter-value pairs concluded by the
program. This list is extended by the INTERPfunction called from the
action portion of rules. The second list includes pairs of measurement
types and symbolic ranges (e.g., RESPIRATIONRATE--HIGH). This list
is augmented during the process of comparing measurement values to
expected ranges. These lists are built up from the start of the program
and are not reset during new time intervals. The graphs are created by
sorting the lists alphabetically, and then collecting the time intervals asso-
ciated with each parameter-value pair. The conclusion and expectation
graphs cover the period from six hours ago until the current time, with a
double bar (=) plotted for each ten-minute period that the conclusion was
made.
The number of items in each graph is controlled by the number of
currently active pathophysiological conclusions subject to a static list of
parameters and values that are omitted (for example, some intermediate
conclusions are not plotted). Whenthe rule base is extended into other
problem areas of ICU data interpretation, new sets of rules may have to
be created to select which of the current conclusions should be graphed.
The graph of "violated" expectations presents a concise display of the
combination of measurements that are simultaneously out of range. Most
of these conclusions have been fed into rules that determine the status of
the patient. Patterns that occur often, but fail to trigger rules about the
status of the patient, becomecandidates for the development of new rules.

22.4Summary and Conclusions

VMuses a simple data-directed interpreter to apply a knowledge base of


rules to data about patients in an intensive care unit. These rules are ar-
ranged according to a set of levels ranging from measurement validation
420 Extensionsto the Rule-Based
Formalism
for a Monitoring
Task

to therapy planning, and are currently formulated as a categorical system.


Interactive facilities exist to examine the evaluation of rules while VMis
monitoring data from a patient, and to input additional test results to the
system for interpretation.

22.4.1 Representing Knowledge About Dynamic


Clinical Settings

In VMwe have begun to experiment with mechanisms for providing MY-


CIN-like systems with the ability to represent the dynamic nature of the
diagnosis and therapy process. As mentioned in the introduction, MYCIN
was designed to produce therapeutic decisions for one critical momentin
a patients hospital course. This was extended with a "restart mechanism"
that allows for selectively updating those parameters that might change in
the interval between consultations. MYCIN can start a new consultation
with the updated information, but the results of the original consultation
are lost. In VM,three requirements are necessary to support the processing
of new time frames: (1) examining the values of historical data and con-
clusions, (2) determining the validity of those data, and 93) combining
conclusions with previous conclusions.
Newpremise functions, which define the relationships about param-
eters that can be tested when a rule is checked for validity, were created
to examine the historical data. Premise functions used in MYCIN include
tests to see if: (a) any value has been determined for a parameter, (b)
value associated with a parameter is in a particular numerical range, or (c)
there is a particular value associated with a parameter. VMincludes a series
of time-related premise functions. One function examines trends in input
data over time--e.g., THE MEANARTERIAL PRESSURE DOES NOT
RISE BY 15 TORRIN 15 MINUTES. A second function determines the
stability of a series of measurements, by examining the variation of mea-
surements over a specific time period. Other functions examine previously
deduced conclusions, as in THE PATIENT HAS BEEN ON THE T-PIECE
FOR GREATER THAN 30 MINUTES or THE PATIENT HAS NEVER
BEENON THE T-PIECE. Functions also exist for determining changes
in the state of the patient--e.g., THE PATIENT HAS TRANSITIONED
FROMASSIST MODETO THE T-PIECE. When VMis required to check
if a parameter has a particular value, it must also check to see if the value
is "recent" enough to be useful.
The notion that data are reliable for only a given period of time is
also used in the representation of conclusions made by the program. When
the same conclusion is made in contiguous time periods (two successive
evaluations of the rule set), then the conclusions are coalesced. The result
is a series of intervals that specify when a parameter assumed a particular
value. In the MYCIN system this information is stored as several different
parameters. For example, the period when a drug was given is represented
Summary
and Conclusions 421

by a pair of parameters corresponding to the starting and ending times of


administration. In MYCIN,if a drug was again started and stopped, a new
entity--DRUG-2--would have to be created. The effect of the VMrep-
resentation is to aggregate individual conclusions into states whose persis-
tence denotes a meaningful interpretation of the status of the patient.

22.4.2 Building a Symbolic Model

A sequence of states recognized by the program represents a segmentation


of" a time line. Specifying the possible sequences of states in a dynamic
setting constitutes a symbolic model of that setting. The VMknowledge
base contains a model of the stages involved in ventilatory therapies. This
model is used in three ways by the program: (1) to limit the number
rules examined by the program, (2) to provide a basis for comparing actual
therapy with potential therapies, and (3) to provide the basis for the ad-
justment of expectations used to interpret the incoming data.
Attached to each rule in VMis a list of the clinical situations in which
the rule makes sense. Whenrules are selected for evaluation, this list is
examined to determine if the rule is applicable. This provides a convenient
filter to increase the speed of the program. A set of rules is used to specify
the conditions for suggesting alternative therapeutic contexts. Since these
rules are examined every few minutes, they serve both to suggest when
the patients condition has changed sufficiently for an adjustment in ven-
tilatory therapy and to provide commentary concerning clinical maneuvers
that have been performed but are not consistent with the embedded knowl-
edge for making therapeutic decisions. The model also provides mecha-
nisms for defining expectations about reasonable values for the measured
data. Muchof the knowledge in VMis stated in terms of these expectations,
and they call be varied in response to changes in the patients situation.

22.4.3 Comparison of MYCINand VMDesign Goals

MYCIN was designed to serve on a hospital ward as an expert consultant


for antimicrobial therapy selection. A typical interaction might take place
after the patient has been diagnosed and preliminary cultures have been
drawn but before very much microbiological data are available. In critical
situations, a tentative decision about therapy must often be made on partial
infbrmation about cultures. In return fbr assistance, the clinician is asked
to provide answers to questions during a consultation.
The intensive care unit is quite different from the static situation ad-
dressed by MYCIN,however. Continuous monitoring and evaluation of
the patients status are required. The problem is one of making therapeutic
adjustments, many of which are minor, such as adjusting the respiratory
rate on the ventilator, over a long period of time. The main reasons for
422 Extensionsto the Rule-Based
Formalism
for a MonitoringTask

using VMare to monitor status or to investigate an unusual event. The


program must therefore be able to interpret measurements with minimal
human participation. Whenan interaction does take place, e.g., when an
unexpected event is noted, the program must be concise in its warning.
VMs environment differs from MYCINsin that natural language is an
unlikely mode of communication.
This difference in the timing and style of the user-machine interaction
has considerable impact on system design. For example, the. VMsystem
must be able to:

1. reach effective decisions on the presumption that input from a clinician


will be brief,
2. use historical data to determine a clinical situation,
3. provide advice at any point during the patients hospital stay,
4. follow up on the outcomes of previous therapeutic decisions, and
5. summarize conclusions made over time.

A consultation program should also be able to model the changing


medical environment so that the program can interpret the available data
in context. Areas such as that of infectious disease require an assessment
of clinical problems in a variety of changing clinical situations, e.g., "pa-
tients whoare severely ill but lack culture results," "patients after culture
data are available," "patients after partial or complete therapy," or "patients
with acquired superinfection."
It is also necessary that VMcontain knowledge that carl be used to
follow a case over a period of time. This is complicated by the fact that the
user of the system may not follow the therapy recommended. VMthen
has to determine what actions were taken and adjust its knowledge of the
patient accordingly. Also, if" the patient does not react as expected to the
given therapy, then the program has to determine what alternative ther-
apeutic steps may be required.
During the implementation of the VMprogram, we observed many
types of clinical behavior that represent a challenge to symbolic modeling.
One such behavior is the reluctance of clinicians to change therapies fre-
quently. After a patient meets the criteria for switching from therapy A to
therapy B, e.g., assist modeto T-piece, clinicians tend to allow the patients
status to drop below optimal criteria before returning to therapy A. This
was represented in the knowledge base by pairs of therapy selection rules
(A to B, B to A) with a grey zone between the two criteria. For example,
ACCEPTABLE limits might be used to suggest going from therapy A to
therapy B, whereas VERYHIGH or VERYLOWlimits would be used for
going from B to A. If the same limit were used for going in each direction,
a small fluctuation of one measurement near a cutoff value would provide
very erratic therapy suggestions. A more robust approach makes decisions
Summary
and Conclusions 423

in such situations based on the length of time a patient has been in a given
state and on the patients previous therapy or therapies.
The VMprogram has been used as a test-bed to investigate methods
for increasing the capabilities of symbolic processing approaches by ex-
tending the production rule methodology. The main area of investigation
has been in the representation of knowledge about dynamic clinical set-
tings. There are two components of representing a situation that changes
over time: (1) providing the mechanism for accessing and evaluating data
in each new time frame, and (2) building a symbolic model to represent
the ongoing processes and transitions in the medical environment.
23
A Representation Scheme
Using Both Frames and
Rules

Janice s. Aikins

Much of artificial intelligence research has focused on determining the


appropriate knowledge representations to use in order to achieve high
performance from knowledge-based systems. The principal hypothesis
being explored in this chapter is that there are manyadvantages to a system
that uses both framelike structures and rules to solve problems in knowl-
edge-intensive domains. These advantages can be grouped into two broad
categories: those dealing with the knowledgebase representation itself, and
those dealing with the systems reasoning and performance. In order to
test this hypothesis, a knowledge representation was designed that uses a
combination of frames and rules in a data structure called a prototype. The
domain chosen was that of" pulmonary physiology. The task was to interpret
a set of pulmonary function test results, producing a set of interpretation
statements and a diagnosis of pulmonary disease in the patient.~ Initially,
a MYCIN-likeproduction rule system called PUFF(Kunz et al., 1978) was
written to perform pulmonary function test interpretations. Problems with
the production rule formalism in PUFFand similar rule-based systems
motivated the creation of a prototype-directed system, called CENTAUR.
See Aikins (1980; 1983) for more detailed discussions of this system.
CENTAUR uses prototypes that characterize the typical features of
each pulmonary disease. Each feature is called a component of the pro-

This chapter is based on a technical memo(HPP-79-10) from tile Heuristic Programming


Project, Department of Computer Science, Stanford University. Used with permission.
JIt should be noted, however, that the methodology used is not domain-specific; the task that
was chosen is not important [br the comparisons made between various knowledge represen-
tation schemes.

424
A RepresentationSchemeUsingBoth FramesandRules 425

PUFF
I
I I I I
NORMAL RESTRICTIVE OBSTRUCTIVE DIFFUSION NEUROMUSCULAR
LUNG
DISEASE AIRWAYS
DISEASE DEFECT DISEASE

I
I I
OAD SEVERE
OAD
MILD MODERATE MODERATELYSEVERE
OAD OAD I I I
I
ASTHMA
BRONCHITIS EMPHYSEMA

FIGURE
23-1 A portion of the prototype network.

totype. Associated with each component are rules used to deduce a value
for the component. The prototypes focus the search for new information
by guiding the invocation of the rules and eliciting the most relevant in-
formation from the user. These prototypes are linked together in a net-
work in which the links specify the relationships between the prototypes.
For example, the obstructive airways disease prototype is linked to the
asthma prototype with a SUBTYPElink, because asthma is a subtype of
obstructive airways disease (see Figure 23-1).
This chapter discusses the problems of a purely rule-based system and
the advantages afforded by using a combination of rules and frames in
the prototype-directed system. A complementary piece of research (Aikins,
1979), not discussed here, deals with the problems of a frame-based system.
Previous research eftbrts have discussed systems using frames [see, for
example, Minsky (1975) and Pauker and Szolovits (1977)] and systems
ing a pure rule-based approach to representation (Chapter 2). Still other
systems have used alternate knowledge representations to perform large
knowledge-based problem-solving tasks. For example, INTERNIST(Po-
ple, 1977) represents its knowledgeusing a framelike association of diseases
with manifestations. Each manifestation, in turn, is associated with the list
of diseases in which the manifestation is known to occur. In PROSPECTOR
(Duda et al., 1978a), the framelike data structures have been replaced
a semantic network. Few researchers, however, have used both frames and
production rules or have attempted to draw comparisons between these
knowledge representation methodologies. CENTAUR offers an appropri-
ate mechanism with which to experiment with these representation issues.
This paper presents an example of the CENTAUR system performing
an interpretation of a set of pulmonary function test results and focuses
on CENTAURsknowledge representation and control structure. In ad-
dition, some advantages of the prototype-directed system over the rule-
based approach for this problem are suggested.
426 A RepresentationSchemeUsing Both FramesandRules

23.1TheCENTAUR
System

CENTAUR is a consultation system that produces an interpretation of data


and a diagnosis based on a set of test results. The inputs to the system are
the puhnonary function test results and a set of patient data including the
patients name, sex, age, and a referral diagnosis. The output consists of
both a set of interpretation statements that serve to explain or comment
on the pulmonary function test results and a final diagnosis of pulmonary
disease in the patient.
CENTAUR uses a hypothesis-directed approach to problem solving
where the hypotheses are represented by the prototypes. The goal of the
system is to confirm that one or more of the prototypes in the prototype
network match the data in an actual case. The final set of confirmed pro-
totypes is the systems solution fbr classifying the data in that case. The
prototypes represent the various pulmonary diseases, their severity, and
their subtypes, with the result that the set of confirmed prototypes repre-
sents the diagnosis of pulmonary disease in the patient.
The system begins by accepting the test and patient data. Data entered
in the system suggest or "trigger" one or more of the prototypes. The
triggered prototypes are placed on a hypothesis, list and are ordered ac-
cording to how closely they match the data. The prototype that matches
the data most closely is selected to be the current prototype, the systems
current best hypothesis about how to classify the data in the case.
In the example in Figure 23-2, the prototype that represents a pul-
monary function consultation (PUFF) has been selected as the initial cur-
rent prototype. 2 Initial data are requested and the users responses (in
boldface and following the asterisks) are recorded. The system attempts to
fill in values for the components of a prototype, which may cause rules to
be invoked, or, if no rules are associated with a component, the system will
ask the user for the value. Whenall of the prototype components have
values, the system decides whether the given data values are sufficiently
close to those expected for the prototype to confirm that the prototype
matches the data. 3 Another prototype is then selected as the current pro-

eJust as the pulmonary diseaseprototypesrepresenttypical rangesof values for the pulmo-


naryfunctiontests tor patients withthat disease, the pulmonary [unctionprototypestates
someof the typical fi~atures of a pulmonary function consultation. For example,tbr any
pulmonary functionconsultation,an initial set of test andpatient data is required,andboth
a final interpretationand pulmonary diagnosisare generated.Sinfilarly, the prototypenet-
workof the CENTAUR system includes a prototype called MYCIN, whichstates typical
features ofa MYCIN in[ectious diseaseconsultation.Abovebothof these prototypesis a third
prototype, CONSULTATION, whichstates somedomain-independent features of any con-
sultation. For example, the CONSULTATION prototype contains a componentcalled
STRATEGY, whichallowsthe user to speci[y whethera confirmationstrategy (to confirmthe
mostlikely hypothesis)or an eliminationstrategy (to disprovethe least likely hypothesis)
desired.
3Thesystemmaintainsa confirmed list of prototypesthat havebeenshownto matchthe data
in the case and a disprovedlist of prototypesthat havebeenprovednot to matchthe data.
The CENTAURSystem 427

totype, and the process repeats. The system moves through the prototype
network confirming or disproving disease prototypes. The attempt to
match data and prototypes continues until each datum has been explained
by some confirmed prototype or until the system has concluded that it
cannot account fo.r any more of" the data. A portion of the prototype net-
work for the puhnonary function application is given in Figure 23-1. De-
tails of the knowledge representation and control structure for the CEN-
TAURsystem are given in Section 23.2 and Section 23.3.
Figure 23-2 is an example of an interpretation of a set of pulmonary
function test results for one patient. Commentsare in italics. Manyaddi-
tional lines of trace are printed to show what CENTAUR is doing between
questions.

"CENTAUR
14-Jan-7913:54:07

CURRENT
PROTOTYPE:PUFF

Thecurrenthypothesis
is that aninterpretationof the pulmonary
functiontests is desired.

[Controlslot of PUFF
prototypebeingexecuted
...]

........ PATIENT-7446
........
(Theinitial datagivenby theuser.)
1) Patientsidentifying number:
** 9007
2) referral diagnosis:
** ASTHMA
[Trigger for ASTHMA and CM900]
PrototypeASTHMA is triggered by the value ASTHMAfor the referral diagnosis. Thecertainty measure (CM)
indicateson a numericalscale the degreeof certainty with whichthe prototypeis indicatedby the data.)

3) RV/RV-predicted:
** 261
4) TLC(bodybox)observed/predicted:
** 139
5) FVC/FVC-predicted:
** 81
[Trigger for NormalandCM500]
(Thequestioningcontinuesandother prototypesare triggered by the data values.)

FIGURE23-2 An example of the interpretation of a set of pa-


tient test results. Debugging information (not routinely pro-
vided) is shown to help demonstrate the process of prototype
invocation. (Key: ALS = amyotrophic lateral sclerosis;
FEVI = forced expiratory volume at one minute; FVC = forced
vital capacity; MMF = maximal midexpiratory flow;
OAD= obstructive airways disease; RDX = referral diagnosis;
RLD = restrictive lung disease; RV = residual volume; TLC =
total lung capacity.)
428 A Representation Scheme Using Both Frames and Rules

6) FEV1/FVC ratio:
** 40
7) the DLCO/DLCO-predicted:
** 117
[Trigger for NORMAL and CM700]
8) Changein FEV1 post-dilation- pre-dilation:
** 31
9) MMF/MMF-predicted:
** 12
[Trigger for OAD and CM900]
10) Theslope (F50-obs-F25-obs)/FVC-obs:
** 9
[Trigger for OAD
and CM900]

TRIGGEREDPROTOTYPES
PROTOTYPE:
ASTHMA,CM: 900, REASON:RDXwas ASTHMA
PROTOTYPE:
NORMAL, CM: 500, REASON: FVCwas 81
PROTOTYPE:
NORMAL, CM: 700, REASON:DLCOwas 117
PROTOTYPE:
OAD,CM: 900, REASON: MMFwas 12
PROTOTYPE:
OAD,CM: 900, REASON: F5025 was 9
(A list of the prototypesthat havebeentriggered is given. TheCMandthe valuethat caused
the triggerare alsolisted.)
.....................................

ConfirmedList: PUFF

It is confirmed
that aninterpretationof the pulmonary
functiontests is desired.

(Theprimarypurposeof the pulmonary


functionprototypeis to acquirethe initial data fromthe user.
It hasno components
andis thus confirmedwhenit hascompleted its task.)

MoreSpecificPrototypes chosen: NORMAL OAD


(Althoughthere are five possible,morespecific diseaseprototypesfor PUFF,
only the twothat were
triggeredby the initial dataareselectedas possibilities to pursue.)

[Newprototypesbeingfilled in ... NORMAL OAD]


(Theseprototypesare filled in with the data valuesthat are alreadyknown
in the case.)

! SurpriseValue ! 261 for RVin NORMAL,CM:700


! SurpriseValue t 139 for TLCin NORMAL,CM:400
f SurpriseValue! 40 for FEV1/FVCin NORMAL,
CM:-166
l Surprise Value! 12 for MMF in NORMAL,
CM:-499
! SurpriseValue! 9 for F5025in NORMAL,
CM:-699

(Anydatavaluesthat are not consistentwith the valuesexpectedfor that diseaseprototypeare


notedas surprisevalues,andthe CMfor that prototypeis lowered.In this case,five of the data
valuesare not consistentwith the NORMAL pulmonary function prototype.)

HypothesisList: (OAD990) (NORMAL -699)


(Thehypothesislist of triggeredprototypesis thenorderedaccordingto the CMof the prototypes
anda newcurrent prototypeis chosen.)

CURRENTPROTOTYPE:OAD

Thecurrenthypothesis
is that thereis aninterpretationof ObstructiveAirwaysDisease.

Components
of OADchosento trace: F25D-RV/TLC

FIGURE 23-2 continued


The CENTAUR System 429

(In order to instantiate the OAD prototype, two morecomponents


musthavevalues. Theseare askedof
the user if there are no rules associatedwith the components
that canbe usedto deducetheir values.)

11) Theflow F25:


UNKNOWN
12) RV/TLCObserved-Pmdicted:
25

ConfirmedList: OADPUFF

It is confirmed
that there is aninterpretationof Obstructive
AirwaysDisease.
(TheOAD prototypeis confirmed.Controlinformationassociatedwith the prototypespecifies that
the degreeof OADshouldbe determinednext, followed by the subtypeof OAD.)

MoreSpecific Prototypes chosen: MILD-OAD


MODERATE-OAD
MODERATELY-SEVERE-OAD SEVERE-OAD
(Nodegreeprototypesweretriggered by the data values, so all of themare selectedas
possiblehypotheses
to be filled in alongwith the data valuesin the case.)

[Newprototypes being filled in ... MILD-OAD


MODERATE-OAD
MODERATELY-SEVERE-OAD SEVERE-OAD]

(Moresurprisevaluesare noted,andthe hypothesis


list is ordered,leadingto
a current prototype of SEVERE-OAD.)

CURRENT
PROTOTYPE:SEVERE-OAD

Thecurrenthypothesisis that there is an interpretationof Severe


ObstructiveAirwaysDisease.

Components
of SEVERE-OAD
chosento trace: FEV1

13) FEV1
** 42
(Theconsultation continueswith newcomponents
beingaskedandclassified.)

ConfirmedList: SEVERE-OAD
OADPUFF

It is confirmed
that thereis aninterpretationof Severe
ObstructiveAirwaysDisease.

MoreSpocificPrototypeschosen:ASTHMA

HypothesisList: (ASTHMA
900)

(Next the subtypesof OAD are explored. ASTHMA


is chosenfrom three possible subtypes
becauseit wastriggeredby the initial data.)

CURRENTPROTOTYPE:ASTHMA

Thecurrenthypothesis
is that thereis aninterpretationof Asthma.

Components
of ASTHMA
chosento trace: DEG-REV
430 A Representation Scheme Using Both Frames and Rules

14) Thechange
in resistancepre-dilation- post-dilation:
** 20

Confirmed List: ASTHMA


SEVERE-OAD
OADPUFF

It is confirmed
that thereis aninterpretationof Asthma.

[Facts markedAccountedFor by ASTHMA,


SEVERE-OAD,
OAD]
(Thereare no further degreesor subtypesto be exploredfor this case.Thusdata valuesthat
canbe accounted for by oneof the confirmedprototypesare marked.If there are data values
remainingthat cannotbe accounted for by the confirmedprototypes,the systemwill
attemptto determineif thereare multiple diseases
in the patient.)

[Action slot of OAD


prototypebeingexecuted
...]
(At this point the systembeginsexecutinganyactions associatedwith the confirmedprototypes.
Thereare no actions for ASTHMA or SEVERE-OAD,
so the OAD action is the first to be executed.)

OADAction Clause
Displaythe findings aboutthe diagnosisof obstructiveairways
disease
Conclusions: the findings aboutthe diagnosisof obstructive
airwaysdiseaseare as follows:
Elevated lung volumes indicate overinflation.
TheRv/rLCratio is increased,suggestinga SEVERE degreeof air trapping.
ForcedVital Capacityis normalbut the FEV1/FVC ratio is
reduced,suggestingairwayobstruction of a SEVERE degree.
Lowmid-expiratoryflow is consistentwith severeairway
obstruction.
Obstructionis indicatedby curvatureof the flow-volume loop
whichis of SEVERE degree.
Reversibility of airwayobstructionis confirmed by improvementin airwayresistancefollowing
bronchodilation.

[Action slot of PUFF


prototypebeingexecuted
...]

PUFFAction Clause
Displaythe conclusionstatementsaboutthis interpretation
Conclusions:the conclusionstatementsaboutthis interpretation
are as follows:
Smoking probablyexacerbatesthe severity of the patients
airwayobstruction.
Discontinuationof smoking shouldhelp relieve the symptoms.
Goodresponseto bronchodilatorsis consistentwith an
asthmaticcondition,andtheir continueduseis indicated.

PUFFAction Clause
Displaythe summary
statementsaboutthis interpretation
Conclusions:the summarystatements
aboutthis interpretation are as
follows:

SEVERE
Obstructive Airways Disease, SubtypeASTHMA

Doyouwishadviceon anotherpatient?
** NO

FIGURE 23-2 continued


Knowledge Representation in CENTAUR 431

23.2Knowledge Representation in CENTAUR

Knowledge is represented in CENTAURby both rules and prototypes.


Each prototype contains two kinds of information: domain-specific com-
ponents that express the substantive characteristics of each prototype, and
domain-independent slots" that specify information used in running the
system. Each component may, in turn, have slots of information associated
with it, including a RULESslot that links the component to rules that
determine values of the component. Thus the outline of a prototype can
be viewed as shown in Figure 23-3.

PROTOTYPE

SLOT
SLOT domain-independent
information
SLOT
COMPONENT
SLOT
SLOT
domain-specific
information
COMPONENT
SLOT
SLOT

FIGURE23-3 Prototype outline.

The rules consist of one or more premise clauses followed by one or


more action clauses. An example is given in Figure 23-4. 4 In general, the
premise clauses specify a set of value ranges for some of a prototypes
components, and the action clauses make conclusions about the values of
other components. Besides these static data structures, there are also data
structures that give information about the actual data values obtained dur-
ing the consultation. These are called facts and are discussed in Section
23.2.3.

23.2.1 Prototypes and Components

Most of CENTAURsprototypes represent the characteristic features of


some pulmonary disease. For example, there is a prototype for obstructive
airways disease (OAD), a portion of" which is shown in Figure 23-5. In the

lAs in MYCIN, tile rule is stored internally in the Interlisp form shown;the English trans-
lation is generated fromthat.
432 A Representation Scheme Using Both Frames and Rules

RULE013

PREMISE: (SAND($OR(SAND(LESSP*(VAL1 CNTXT


20)
(GREATERP* (VAL1 CNTXTFVC)
(SAND(LESSP*(VAL1 CNTXTMMF)
15)
(LESSP*(VAL1 GNTXT FVC)
80)]
ACTION: (DO-ALL(CONCLUDE
CNTXTDEG<-MMF SEVERETALLY 900)
(CONCLUDETEXT
CNTXTFINDINGS<-OAD (TEXT $MMF)
TALLY
f 000))

RULE013
[This rule appliesto anypatient, andis tried in orderto find out aboutthedegreeof obstructiveairways
diseaseasindicatedby the MMF or the findings aboutthe diagnosisof obstructiveairways
disease.]

If: 1) A: TheMMF/MMF-predicted ratio is less than 20, and


B: TheFVC/FVC-predicted ratio is greaterthan 80, or
2) A: TheMMF/MMF-predicted ratio is less than 15, and
B: TheFVC/FVC-predicted ratio is less than 80
Then:1) Thereis stronglysuggestive evidence (.9) that the degreeof obstructiveairwaysdisease
asindicated by the MMF is severe,and
2) It is definite (1.0) that the followingis oneof thefindingsaboutthediagnosis of obstructive
airways disease:Lowmidexpiratory flow is consistentwith severeairwayobstruction.

FIGURE 23-4 A sample rule in CENTAURin both Interlisp


and English versions.

OADprototype, there are components for many of" the pulmonary func-
tion tests that are useful in characterizing a patient with OAD;two of these
are shown in the figure. For example, the total lung capacity of" a patient
with OADis typically higher than that of a person with normal pulmonary
function. Thus there is a component, TOTALLUNGCAPACITY,with a
range of plausible values that are characteristic of a person with OAD.
In addition to a set of plausible values, that is, values consistent with
the hypothesis represented by the prototype, the components may have
additional information associated with them. (The ways in which this in-
formation is used are discussed in Section 23.3.) There may be one or
more possible error values, that is, values that are inconsistent with the pro-
totype or that might have been specified by the expert to check what he
or she considers to be a measurement error. Generally, both a reason for
the error and a possible fix for the error are specified. For example, the
expert may specify that one of the pulmonary function tests be repeated
to ensure accuracy. A component may also have a default value. Thus all of
the components in a disease prototype, with their default values, form a
picture of the typical patient with the disease. Finally, each componenthas
an importance measure (from 0 to 5) that indicates the relative importance
of a particular component in characterizing the disease.
In addition to the domain-specific components, each prototype con-
Knowledge Representation in CENTAUR 433

PROTOTYPE Obstructive AirwaysDisease(OAD)


GENERALINFORMATION

--Bookkeeping
Information Author:Aikins
Date: 27-OCT-78
Source:Dr. Fallat
--Pointersto other Pointers: (degree MILD-OAD)
prototypes (degree MODERATE-OAD) ...
(link prototype) (subtype ASTHMA) ...
--English phrases Hypothesis:"Thereis an
interpretationof OAD."

COMPONENTS TOTALLUNGCAPACITY
PlausibleValues Plausible Values: >100
Default Value Importance:4
PossibleError Values
Rules REVERSIBILITY
Importance of value Rules: 19,21,22,25
to this prototype Importance:0 (valuenot
considered)

CONTROLINFORMATION Deducethe degreeof OAD


Deducethe subtypeof OAD
Deduceanyfindings associated
with OAD

ACTIONINFORMATION Print the findingsassociated


with OAD

FIGURE 23-5 A sample prototype showing possible slots on


the left and values of those slots for OAD
on the right.

tains slots for general information associated with it. This includes book-
keeping information (name of the prototype, its author, date on which the
prototype was created, and source for the information contained there)
and English phrases used in communicating with the user. There are also
pointers to other prototypes in the prototype network, which are useful,
for example, when either more general disease categories or more specific
subtypes of disease are indicated. Somecontrol information is represented
explicitly in slots associated with the prototype (Section 23.3). This infor-
mation includes what to do in order to confirm the prototype and what to
do when the prototype has been confirmed or disproved. Each prototype
also has associated with it a certainty measure (from -1000 to 1000) that
indicates how certain the system is that the prototype matches the data in
each case.

23.2.2 Rules

The CENTAUR knowledge base also includes rules, which are grouped
into four sets according to their functions. They refer to values for com-
ponents in their premise clauses and make conclusions about values of
434 A RepresentationSchemeUsing Both Framesand Rules

components in their action clauses. An example of one of the rules is given


in Figure 23-4. The RULESslot associated with a component contains a
list of all rules that make a conclusion about that component. These may
5be applied when a value is needed for the component.
Manyof the rules are classified as patient rules, rules dealing with the
patient. Besides the patient rules, there are three other sets of rules. Those
rules whose actions make summary statements about the results of the
pulmonaryfunction tests are classified as summaryrules; rules that refer to
values of components in their premises and suggest general disease cate-
gories in their actions are classified as triggering rules. These are used to
"trigger" or suggest the disease prototypes. Those rules that are used in a
second stage of processing, after the system has formulated lists of con-
firmed and disproved prototypes are called refinement rules; they are used
to refine a preliminary diagnosis, producing a final diagnosis about pul-
monary disease in the patient. The refinement rules constitute a further
set of domain expertise; they test the systems tentative conclusions, which
may result in a modification of these conclusions. For example, if two dis-
eases can account for a given pulmonary function test result and both have
been confirmed in that case, a refinement rule may determine which dis-
ease process should account for the test result in the final interpretation.

23.2.3 Facts

In CENTAUR, each piece of case-specific data that has been acquired


either initially from the patients pulmonary function test results or later
during the interpretation process is called a fact. Each fact has six fields of
information associated with it. Whena fact is first introduced into the
system, its name, value, and certainty factor 6 fields are instantiated. For
example, if" the user specifies that the total lung capacity of the patient is
126 with a certainty factor of 0.8, then a fact is created:
NAME:
TotalLung
Capacity
VALUE:
126
CERTAINTY
FACTOR:
.8

The fourth field associated with the fact indicates where it was ob-
tained: from the user (this includes the initial pulmonary function test
results), from the rules, or as a default value associated with a prototype
component. Thus, in the fact about total lung capacity, the fourth field
would have the value USER.
The fifth field of each fact becomes instantiated once fact values are
classified as being plausible values, possible error values, or surprise values

5If no rules are associatedwiththe component,


the user will be askedtor the vahm.If the
user responds UNKNOWN and the componenthas a defauh value, that value will be used.
6Thecertainty factor is just MYCINs CF--anumberrangingfrom - 1 to 1 that indicates
the importanceof the givenvalue.
Control Structure for CENTAUR 435

fi)r a given prototype. Surprise values are all of those values that are neither
plausible values nor possible error values. They indicate facts that cannot
be accounted fi)r by the hypothesis represented by the prototype. In the
fact about total lung capacity, the fifth field might contain the classification
(PV OAD)and (SV NORMAL) meaning that the value of 126 for the total
lung capacity of a patient would be a plausible value if the patient had
obstructive airways disease, but would be a surprise value if the patient
were considered to have normal pulmonary function.
The last field associated with a fact indicates which confirmed proto-
types can account for the given value. Whena prototype is confirmed, all
of the facts that correspond to components in the prototype and whose
values are plausible values for the component are said to be "accounted
for" by that prototype. Whenthe OADprototype is confirmed, for a patient
with total lung capacity of 126, for example, the last field of the sample
fact for total lung capacity would be filled in with the prototype name
OAD.

23.3Control Structure for CENTAUR

The control information used by CENTAUR is contained either in slots


that are associated with the individual prototypes or in a simple interpreter.
Somecontrol strategies are specific to an individual prototype and need to
be associated with it, while more general system control information is
more efficiently expressed in the interpreter.
Basically, the interpreter attempts to match one or more of the pro-
totypes with the data in an actual case. At any one time there is one current
prototype that the system is attempting to match to the facts of the case.
Attempting a match for this prototype entails finding values for the pro-
totype components, i.e., instantiating the prototype. The exact method to
be used in instantiating the prototype depends on the individual prototype
and is expressed in one of the prototype control slots.
When all of the facts have been accounted for by some confirmed
prototype, or when no prototype can account for a knownfact, 7 the system
has completed the hypothesis-formation stage. The confirmed list of pro-
totypes then represents the systems hypothesis about how to classify the
facts. At this point, additional knowledge maybe applied before generating
the final pulmonary function interpretation and diagnosis. Some of this
knowledge is represented in the refinement rules associated with the con-
firmed prototypes. Further information may be sought from the user at

7This statement oversi,nplifies the actual matching criteria used by the system. Sometolerance
tot a mismatch between knownfact values and plausible values in the prototype is allowed.
436 A RepresentationSchemeUsingBoth Framesand Rules

this stage. For example, further lab tests may be suggested or additional
test results maybe required before a final diagnosis is given.
The result of executing the refinement rules is a final set of confirmed
prototypes and a list of all facts with an indication of which prototypes
account for which facts. The system then executes the clauses specified in
the action slot of each confirmed prototype. Typically, these clauses express
a clean-up chore such as executing summary rules associated with the
prototype 8 or printing interpretation statements. The action slot of the
PUFFprototype itself causes the final interpretation and pulmonary di-
agnosis to be printed.

23.3.1 Prototype Control Slots

Four of the slots associated with a prototype contain clauses that are exe-
cuted by the system at specific times to control the consultation. Each clause
expresses some action to be taken by the system at different stages: (a)
order to instantiate the prototype (CONTROL slot), (b) upon confirmation
of the prototype (IF-CONFIRMED slot), (c) in the event that a prototype
is disproved (IF-DISPROVED slot), and (d) in a clean-up phase after
system processing has been completed (ACTIONslot).
Whena prototype is first selected as the current prototype, the system
executes the clauses in the CONTROL slot of that prototype. The infor-
mation in this slot indicates how to proceed in order to instantiate the
prototype, usually specifying what data should be acquired and in what
order they should be acquired. Therefore, executing these clauses will
cause values to be obtained for the prototype components. The CONTROL
slot can be thought of as a rule whose implicit premise is "if this prototype
is selected as the current prototype" and whose action is the given set of
clauses. If no CONTROL slot is associated with a prototype, the interpreter
will attempt to fill in values for the prototype components in order ac-
cording to their importance measures.
Whenall of the clauses in the CONTROL slot have been executed and
the prototype has been instantiated, a decision is made as to whether the
prototype should be confirmed as matching the facts of the case. 9 The
system then checks either the IF-CONFIRMED slot or the IF-DISPROVED
slot to determine what should be done next. These slots can be viewed as
rules whose implicit premise is either "if this prototype is confirmed as
matching the data" or "if this prototype is proved not to match the data."
The appropriate actions are then indicated in the set of clauses contained
in the slot.

SRecallthat the premiseof a summary rule typically checksthe values for one or more
parametersand that the action generatesan appropriatesummarizing statement.
aIt wouldbe possibleto associatesucha confirmationcriterion witheachindividualprototype,
but this has not beenfoundto be necessaryfor the pulmonary diagnosisproblem.Instead,
the systemusesa generalalgorithm,applicableto all of the prototypes,that checksthe values
of the components and their importancemeasuresto determineif the prototype shouldbe
markedas confirmed.
Advantages
of the Prototype-Directed
Approach 437

The fourth slot specifying clauses to be executed is the ACTION slot.


The implicit premise in this slot is "if the system has completed its selection
of" confirmed prototypes and this prototype is confirmed." Thus the clauses
in the ACTIONslot are the last ones to generate summary statements or
print data interpretations.

Advantages of the Prototype-Directed


23.4 Approach

One question addressed by this research is this: in what ways are both
frames and rules superior to either alone? Comparisons can be drawn
between purely rule-based systems, such as PUFF, at one end of the spec-
trum and purely frame-based systems at the other. This section states some
of the advantages of the prototype-directed approach used in CENTAUR
for the pulmonary function interpretation task, as compared to the purely
rule-based approach used in PUFE The next chapter discusses a purely
frame-based approach to the same problem. These advantages can be
grouped into two broad categories: those dealing with knowledge base
representation, and those dealing with reasoning and performance.

23.4.1 Knowledge Representation

Specific advantages of using prototypes in the pulmonary function domain


include the following:

A. Rules attached to prototypes are used to represent only medical expertise,


not computational information. In the PUFFsystem, there are rules that guide
computation by controlling the invocation of other rules. This feature can
be very confusing to the medical experts since they do not know which
rules are intended to represent medical expertise and which rules serve a
necessary computational function. For example, a PUFFrule necessary to
determine whether there is obstructive airways disease (OAD)in the patient
is
If anattempthasbeen made
to deduce
thedegree
of OAD,
andanattempt
hasbeen
made
to
deducethesubtypeof OAD,
andanattempt
hasbeen
madeto deduce
thefindings
about
OAD,
thenthere
is aninterpretation
ofpotential
OAD.

This rule expresses some of the control structure of the system, namely,
that when there is an interpretation of OAD,then the degree, subtype,
and findings associated with the OADshould be determined. The rule is
confusing because it implies that finding out the degree, subtype, and find-
ings leads to an interpretation of OAD--whichmight be misinterpreted as
438 A RepresentationSchemeUsing BothFramesandRules

medical expertise. In fact, this rule is executed for every case and causes
all of the other OADrules to be invoked, even when no OADis present.
In CENTAUR, rules that guide computation have been removed from
the rule base, leaving a less confusing, more uniform rule base, where each
rule represents some "chunk" of medical expertise. Computation is now
guided by the prototypes. For example, the CONTROL slot represents
information dealing with how to instantiate the prototype. For the OAD
prototype, this CONTROL slot specifies that deducing the degree, subtype,
and findings of obstructive airways disease are the steps to take in instan-
dating that prototype.

B. Prototypes represent more clearly some of the medical expertise formerly


contained in rules. In some cases, medical expertise that has been repre-
sented in the production rules is more clearly represented in the prototype.
Consider, for example, the following PUFFrule:
If the degreefor OADis NONE,andthe degree
for OAD
by the MMF
is greaterthan or equal to
MILD,then the degreefor the OAD
is MILD.

The medical expertise expressed in this rule is not apparent. In order to


understand this rule, it is necessary to see it as one part of a group of
several other rules, all of which together help to determine the degree of
obstructive airways disease in the patient. The first clause of the rule, "If
the degree for OADis NONE,"is partly a description of the medical con-
text, indicating that the degree of OADhas not been established. However,
it is also control information in that it requires that the degree for OAD
be determined, which, in turn, invokes the other rules. Yet part of the
motivation for using rules is that each rule should be a single "chunk" of
knowledge, understandable in its own right. Further, what is really being
said in this rule is that in determining the degree of OADin the patient,
there are several pulmonary function measurements to be considered, but,
of these, the MMFmeasurement should be given somewhat more weight.
In CENTAUR, this fact is represented explicitly in the OADprototype by
giving the MMFcomponent an importance measure higher than those of
the other measurement components.

C. Knowledgeis represented explicitly by prototypes. As was indicated in


paragraphs A and B above, making knowledge explicit is one of the ad-
vantages of the prototype representation. Not only is knowledge about how
to instantiate the prototype represented explicitly, but knowledge about
what to do if the prototype is confirmed or disproved, as well as what are
appropriate clean-up actions to perform for the prototype, e.g., printing
findings or summarizing data, is also represented. Other information, such
as the importance measure to assign to one of the prototype components
whenmatching prototypes to data, is also madeexplicit. All of this specifies
to those working with the knowledge base precisely what information is
represented and what role that information plays in the computation.
Advantages of the Prototype-Directed Approach 439

D. Additional knowledge is represented by prototypes. By adding a set of


disease prototypes, some new knowledge about pulmonary disease can be
represented. In MYCINadditional knowledge can be added as properties
of rules, but it is difficult to add new knowledge about diseases. For ex-
ample, plausible ranges of values for each of the pulmonary function tests
for each disease, as well as the relative importance of each measurement
in a particular disease prototype, can be listed.

23.4.2 Reasoning and Performance of the System

A second category of advantages deals with the way the system reasons
about the problem. This is evident in part by watching the performance
of the system, that is, the questions that are asked and the order in which
information is acquired. Some of the advantages of a prototype-directed
system are the following:

E. Consultation flow follows the physicians reasoning. The consultation be-


gins with specific test results suggesting or "triggering" some of the pro-
totypes. The prototypes serve as tentative hypotheses about how to classify
the data in a given case. They also guide further inquiry. As new infor-
mation is acquired, these hypotheses are revised, or, in CENTAURs terms,
prototypes are confirmed or disproved and new prototypes may then be
suggested. The process of medical problem solving has been discussed by
manyresearchers [e.g., Elstein et al. (1978)], and it is widely felt that this
sequence of suggesting hypotheses, acquiring further information, and
then revising the hypotheses is, in fact, the problem-solving proces s used
by most physicians. Thus there is increased conceptual clarity, in that the
user can understand what the program is doing. Other advantages that
accrue from this approach include: (a) the knowledge base is easier
modify and extend, and (b) the system can offer the user a more intelligible
explanation of its performance during the consultation. Giving the system
the ability to explain its knowledge and performance has been a primary
design goal of the present research efforts. Since the prototype-directed
system reasons in a manner more like a human user, its behavior seems
more natural and transparent and thus is more likely to be accepted by
physicians.

E The order in which questions are asked can be controlled. In a rule-based


system such as PUFF, questions are asked of the user as rules are invoked
that contain clauses referring to information that is not yet known. The
designers of PUFF, or any EMYCINsystem, control the order in which
the questions are asked only by writing rules to enforce some order. As
has been discussed, this procedure results in a potentially confusing rule
base where some rules represent medical expertise and others guide com-
putation. In the prototype-directed system, the expert specifies the order
440 A RepresentationSchemeUsing BothFramesandRules

in which information is to be acquired for each prototype in the CON-


TROLslot. Thus control information is labeled explicitly as such, and the
rule base remains uniformly a body of medical expertise. The expert can
also specify what information must be acquired and what information is
optional, using the importance measure associated with each component. 10

G. Only relevant questions are asked. Another advantage of CENTAUR


over the rule-based version of PUFFis that only those hypotheses sug-
gested by the initial data are explored. For example, if the total lung ca-
pacity (TLC) for the patient is 70, then CENTAUR would begin exploring
the possibility of restrictive lung disease (RLD) because a low TLCwould
trigger the RLDprototype.ll In the PUFFprogram, the first disease tried
is always OAD,so the PUFFprogram would begin asking questions dealing
with OAD.These questions would seem irrelevant considering the data,
and, indeed, if there were no data to indicate OAD,such questions would
not be asked by CENTAUR.

H. Inconsistent information is indicated. During a consultation, it is also


possible to point out inconsistent or possibly erroneous data as they are
entered, so that a technician can repeat a test immediately or at least decide
if it is worth the time to continue analyzing the case. This feature is invoked
when possible error values are detected for a component of a prototype,
12
or when no prototype can be determined to account for a given value.

23.5 Summary
CENTAUR was designed in response to problems that occurred while us-
ing a purely rule-based system. The CENTAUR system offers an appro-
priate environment in which to experiment with knowledge representation
issues such as determining what knowledge is most easily represented in
rules and what is most easily represented in frames. In summary, much
research remains to be done on this and associated knowledge represen-
tation issues. This present research is one attempt to make explicit the art
of choosing the knowledge representation in AI by drawing comparisons
between various approaches and by identifying the reasons for selecting
one fundamental approach over another.

lOptionalinformationis indicated by assigninga component an importancemeasureof 0.


l lA lowTLCis consistentwith a hypothesisof RLD;a high TLCis consistentwith OAD.
t2It is alsopossiblethat thereis an overlyrestrictedrangeof plausiblevaluesfor a prototype
component, in whichcase the user mayextendthe range to encompass the indicated value.
24
Another Look at Frames

David E. Smith and Jan E. Clayton

The success of MYCIN-like systems has demonstrated that for many di-
agnostic tasks expert behavior can be successfully captured in simple goal-
directed production systems. However, even for this class of problems,
difficulties have arisen with both the representation and control mecha-
nisms. One such system, PUFF(Kunz et al., 1978), has established a cred-
itable record in the domain of pulmonary function diagnosis. The repre-
sentation problems in PUFFare manifest in a number of rules that have
awkward premises and conclusions. The control problems are somewhat
more severe. Physicians have criticized PUFFon the grounds that it asks
questions that do not follow a logical line of reasoning and that it does not
notice data that are atypical or erroneous for the determined diagnosis.
In the CENTAUR sygtem, described in Chapter 23, an attempt was
made to correct representational deficiencies by using prototypes (frames)
to characterize some of the systems knowledge. A more complex control
scheme was also introduced. It made use of triggering rules for suggesting
and ordering system goals, and included an additional attention-focusing
mechanismby using frames as an index into the set of relevant rules.
In an attempt to carry the work of Aikins one step further, we have
constructed an experimental system for pulmonary function diagnosis,
called WHEEZE.Our objectives were to provide a uniform declarative
representation for the domain knowledge and to permit additional control
flexibility beyond that offered by PUFFor CENTAUR. To achieve the first
of these objectives, all of PUFFsrules have been translated into a frame
representation (discussed in Section 24.1). The second objective, control
flexibility, is achieved by using an agenda-based control scheme (discussed

This chapter is an expanded version of a paper originally appearing in Proceedings of the First
National Conference on Artificial Intelligence, Stanford, Calif., August 1980, pp. 154-156. Used
with permission of the American Association for Artificial Intelligence.

441
442 AnotherLookat Frames

in Section 24.2). Newgoals for the agenda are suggested by the success or
failure of other goals on the agenda. In the final section, results and the
possibilities of generalization are discussed.

24.1Representation
24.1.1 The Language

Wehave chosen to use a representation language called RLL(Greiner and


Lenat, 1980). The language is frame-based, where a frame consists of a set
of slots, or attributes. Wedid not rely on the special features of RLLin any
fundamental way. Any of the multitude of frame-based languages would
have served equally well.

24.1.2 Vocabulary

In our knowledge base, there are three different kinds of frames that
contain domain-specific diagnostic knowledge and knowledge about the
case: assertion frames, patient frames, and patient datum frames.

Assertion Frames

The majority of the diagnostic knowledge is captured in a set of frames


called assertions. Most assertions in the knowledgebase are about the phys-
iological state of the patient, e.g., "the patients total lung capacity is high."
But there are other types of assertions as well, such as "the total lung
capacity measurement is erroneous." The organization of an assertion
frame is shown in Figure 24-1.
An assertion may be related to other assertions in the knowledge base
in several ways as shown in Figure 24-1. The substantiating evidence for
an assertion is specified in the Manifestation slot for the assertion. This
slot can be thought of as a set of links to secondary assertions that con-
tribute to the confirmation of the assertion in question. It has been nec-
essary to allow a considerable richness of combinations of manifestations
for an assertion; consequently, each entry in the slot maybe an individual
manifestation or a simple function of individual manifestations, such as
OneOf, TwoOf, TwoOrMoreOf, SomeOf, etc. Associated with each man-
ifestation link is a numberindicating the importance of the link in suggesting
belief or disbelief in the assertion. The ManifestationOf slot is the inverse
Representation 443

Isa Assertion

Description <commentary>

Manifestation <a list of assertions on which this assertion depends>

ManifestationOf <a list of assertions that this assertion is a manifestation of---the


inverse of the Manifestation slot>

Certainty <a number between - 1000 and 1000 that indicates to what
degree the assertion is believed, if its manifestations are
believed>

SuggestiveOf <related assertions that are worth investigating if this assertion


is believed>

ComplementaryTo <related assertions that are worth investigating if this assertion


is not believed>

CategorizationOf <the patient datum that this assertion is concerned with>

CategoryCriterion <the allowed range of the patient datum corresponding to this


assertion>

DegreeOfBelief <a number between - I000 and 1000 that indicates to what
degree the assertion is believed>

Findings <text to be reported to the user if this assertion is believed>

FIGURE24-1 Organization of an assertion frame.

of the Manifestation slot; i.e., it contains a list of the assertions that have
that assertion as a manifestation.
The Certainty slot, in WHEEZE,is an indicator of how likely an as-
sertion is, given that its manifestations are believed. If the manifestations
are strong indicators of the assertion, the Certainty slot will have a high
value. The Certainty slot is a property of the knowledge rather than a
statement about a particular consultation.
When an assertion is directly related to a patient datum, it is termed
a categorization of that patient datum. This relationship is specified by the
CategorizationOf and CategoryCriterion slots of the assertion.
CategorizationOf indicates which patient datum the assertion depends on,
while CategoryCriterion specifies the range in which the value must be for
the assertion to be verified. For example, the assertion "the patients TLC
is greater than 110" (TLC stands for total lung capacity) would be a cate-
gorization of the TLC value with the category criterion being value> 110.
444 AnotherLookat Frames

The relationship may also be used in the reverse manner. A high-level


datum such as SeverityOfDisease could be defined as one of a disjoint set
of assertions being true (MildDisease, ModerateDisease, etc.), in which case
the categorization relationship might be used to determine the datum from
the assertions.
Each assertion has a DegreeOfBelief slot associated with it indicating
to what degree the assertion is believed to be true in that particular con-
sultation. The value of this slot can be any integer between - 1000 and
1000, where 1000 indicates complete faith and - 1000 means total denial
of the assertion. It may also take on the value Unknown,indicating that
the knowledge needed to determine the degree of belief of the assertion
is not known. Note that there is a distinction made between a degree of
belief that has not yet been investigated, a degree of belief that has been
investigated but cannot be determined due to insufficient evidence (degree
of belief Unknown)and a degree of belief that indicates equal positive and
negative evidence (DegreeOfBelief= 0).
Unlike the Certainty slot, the DegreeOfBelief it; determined by the
system during the consultation. For an assertion that has only the catego-
rization relationship (no manifestations), the DegreeOfBelief depends only
on the Certainty of the assertion and on the patient datum being in the
specified range. For assertions with manifestations, the DegreeOfBelief of
the assertion can be a general function of the Certainty of the assertion,
the DegreeOfBelief of each of its manifestations, and the importance at-
tributed to each manifestation. The function used in MYCINand PUFF
is a simple thresholding mechanism, where, if the minimumof the ante-
cedents is above some threshold (generally 200), the DegreeOfBelief
effectively set to the certainty factor. Importance measures provide addi-
tional flexibility by permitting the antecedents of a rule to be weighted.
Several different combination mechanisms have been considered:

1. Sum the products of the DegreeOfBelief slots and the importance fac-
tors for each manifestation, then use a thresholding mechanism.
2. Sum the products of the DegreeOfBelief slots and the importance fac-
tors for each manifestation, then multiply this by the certainty factor.
3. Threshold the minimumof the DegreeOfBelief/importance ratios for
the manifestations.

There are two assertion slots that indicate related assertions worth
pursuing when an assertion is confirmed or denied. The SuggestiveOf slot
contains a list of assertions to investigate if the current assertion is con-
firmed. Conversely, the ComplementaryToslot is a list of assertions that
should be pursued if the current assertion is denied. These slots function
like the "triggering" rules in CENTAUR since they suggest goals to inves-
tigate.
The Findings slot of an assertion contains text that should be printed
out if the assertion is confirmed. In PUFF, this text was contained in the
conclusion portions of rules.
Representation 445

Isa Patient

Age <the patients age>

Sex <the patients sex>

PackYearsSmoked <the number of cigarette-smoking years specified in number


of packs per day times number of years of smoking>

TLC <the value of the total lung capacity for the patient>

RDX <the referral diagnosis>

ConfirmedAssertions <assertions that have already been confirmed for this


patient>

DeniedAssertions <assertions that have a DegreeOfBelief less than O>

Agenda <a pointer to an agenda frame containing assertions worth


pursuing>

FIGURE24-2 Organization of a patient frame.

Patient Frames

Information about the patient is kept in a frame named after that patient.
In general, it contains slots for all of the patient data and for the state of
the consultation. As shown in Figure 24-2, the majority of the slots in the
patient frame contain the values of test data, derived data, or more general
facts about the patient. Most of these values are entered directly by the
physician; however, there are data that are derived or calculated from other
values. The slots in the patient frame do not contain any information about
obtaining the value for that slot. Instead, that information is kept in the
corresponding patient datum frame (discussed below). The Confirmed-
Assertions and DeniedAssertions slots keep track of the assertions that have
already been tested. The Agenda slot contains a pointer to the agenda
frame fbr the patient. It is important to note that the patient frame does
not contain any heuristic knowledge about the system. Its only purpose is
to hold current information about the patient.

Patient Datum Frames

In addition to patient and assertion frames, there are frames in the knowl-
edge base for each type of patient datum (as shown in Figure 24-3). These
frames indicate how a datum is obtained (whether it is requested from the
physician or derived from other data), what a typical value for the datum
446 AnotherLookat Frames

Isa PatientDatum

Description <commentaryon this specific datum>

ToGetValue <howto get the value of this datumif it is not known>

Categorization <the set of assertions that are categorizations of this datum>

TypicalValue <the value of this datumexpected for a normal patient>

FIGURE
24-3 Organization of a patient datumframe.

might be, and what categories the value may be placed in. Whenthe value
of a patient datum is requested and not yet known, the frame for that
patient datum is consulted and the information about how to obtain that
datum is applied. This information takes the form of a procedure in the
ToGetValue slot of the frame.
For a given patient datum, there may be manylow-level assertions that
are categorizations of the datum. These are specified by the Categorization
slot. For example, the Categorization slot of TLC(total lung capacity) might
contain the assertions TLC=80toI00, TLC=100tol20, TLC<80, and
TLC>120, indicating that there are four major categories of the values.
Thus the patient datum contains heuristic knowledge about how the datum
is derived and howit relates to assertions in the network.

24.1.3 Translation

The process of translating a PUFFrule into a WHEEZE assertion consists


of several steps. First, an assertion must be created embodyingthe conclu-
sion and findings of the rule. Next, assertions corresponding to each of
the antecedents of the rule must be constructed (if they are not already
present) and added to the Manifestation slot of the assertion. If a mani-
festation is a categorization of some patient datum, then the
CategorizationOf and CategoryCriterion slots for that manifestation must
be filled in accordingly, and the frame describing that patient datum must
be created.
Figure 24-4 is an example of howa particular PUFFrule was translated
into our representation. The conclusion of the rule corresponds to the
assertion and findings. The antecedents became the manifestations of the
assertion. Quite often the manifesting assertions are not already present
in the knowledge base and must be created. For example, the assertion
frame RDX-Asthma(meaning "referral diagnosis of asthma") had to
added to the knowledge base when the RefractoryAsthma frame was cre-
ated, since it is one of the manifestations of RefractoryAsthma. The patient
Control Structure 447

PUFF
Rule42

If: 1) There
arepostbronchodilation
test results,and

2) Thedegree
of reversibilityof airway
obstruction
of thepatientis lessthanor equal
to slight,and

3) Asthma
is oneof thereferraldiagnoses
of the patient

Then: It is definite(1000)
thatthefollowing
is oneof theconclusion
statements
aboutthis interpretation:
The
poorresponse to bronchodilators
is anindication
of anasthmatic
condition
in a refractorystate.

REFRACTORY-ASTHMA

Isa PhysiologicalState

Manifestation (OADBronchodilationTestResults
RDX-Asthma
(*OneOf
OADReversibility-None
OADReversibility-Slight))
Certainty 1000
DegreeOfBelief

Findings Thepoorresponseto bronchodilators


is anindicationof anasthmatic
conditionin a
refractory
state.
ComplementaryTo((RefractoryAsthma-None
5))

FIGURE 24-4 PUFF rule and corresponding WHEEZE


frame
for refractory asthma.

datum RDX (referral diagnosis) also had to be added, since RDX-Asthma


was specified as a categorization of RDX. Most of the other rules in the
system were translated in an analogous fashion.
While there is not a one-to-one mapping between the representations
we have used and the rules in PUFF, we can imagine automating the pro-
cess. The most difficult problem in conversion is to create meaningful and
consistent names for the assertions in the knowledge base. In most cases
we used some combination of keywords in the conclusion of the rule we
were mapping into the assertion (as in Figure 24-4).

24.2Control Structure

Depth-first, goal-directed search is often used in production systems be-


cause questions asked by the system are focused on specific topics. Thus
the system appears to follow a coherent line of reasoning, more closely
mimicking that of human diagnosticians. There are, however, many widely
recognized limitations. No mechanism is provided for dynamically select-
448 AnotherLookat Frames

Asthma ~--~-h~ ~ r~8 10 ~7~ ~ FEV1/FVC<80

Complementa
ry To

8 ~ FEV1/FVC2 80 /
ALS ~RLD ~) RV<80

~ "- RDX-AL$

FIGURE24-5 A simplified portion of the WHEEZE knowl-


edge base. Thesolid lines indicate Manifestation links (e.g.,
OADis a manifestation of Asthma);the dashedlines represent
SuggestiveOf links. The numbersrepresent the corresponding
importanceand SuggestiveOf values of the links. (Key: ALS
amyotrophiclateral sclerosis; FEVI= forced expiratory vol-
umeat one minute; FVC-- forced vital capacity; MMF = max-
imal midexpiratoryflow; OAD-- obstructive airways disease;
RDX-- referral diagnosis; RLD= restrictive lung disease;
RV-- residual volume;TLC-- total lung capacity.)

ing or ordering the initial set of goals. Consequently, the system may ex-
plore many "red herrings" and ask irrelevant questions before encounter-
ing a good hypothesis. In addition, a startling piece of evidence (strongly
suggesting a different hypothesis) cannot cause suspension of the current
investigation and pursuit of the alternative.
For the assertion network in Figure 24-5, a depth-first, goal-directed
system like PUFFwould start with the goals Asthma, Bronchitis, and ALS
(amyotrophic lateral sclerosis) and work backwards in a goal-directed fash-
ion toward OAD(obstructive airways disease) and RLD(restrictive lung
disease) and then toward FEV1/FVC<80,MMF->14,etc. In contrast, the
CENTAUR system would make use of triggering rules to allow primitive
data (e.g., RDX-ALSand FEV1/FVC<80) to suggest whether ALS and
OADwere worth investigating and the order in which to investigate them.
It wouldthen proceed in a goal-directed fashion to try to verify those goals.
Expert diagnosticians use more than simple goal-directed reasoning.
They seem to work by alternately constructing and verifying hypotheses,
corresponding to a mix of data- and goal-directed search. They expect
expert systems to reason in an analogous manner. It is therefore necessary
that the system designer have some control over the reasoning behavior of
Control Structure 449

the system. These intuitions, and the work on triggering described in


Chapter 23, have led us to adopt a control mechanism that permits a
combination of backward chaining and forward (data-driven) exploration
together with any search strategy ranging from pure depth-first to pure
breadth-first search. This control structure is implemented by using an
agenda, with each suggested assertion being placed on the agenda accord-
ing to somespecified priority. The control strategy is as follows:

1. Examine the top assertion on the agenda.


2. If its subassertions (manifestations) are known,the relative belief of the
assertion is determined. If confirmed, any assertions of which it is
suggestive are placed on the agenda according to the specified measure
of suggestivity. If denied, complementary assertions are placed on the
agenda according to their measures of suggestivity.
3. If it cannot be immediately verified or rejected, then its unknownman-
ifestations are placed on the agenda according to their measures of
importance and the agenda level of the original assertion.

By varying the importance factors, SuggestiveOf values, and the initial


items placed on the agenda, numerous control strategies are possible. For
example, if high-level goals are placed on the agenda initially and subgoals
are always placed at the top of the agenda, depth-first, goal-directed be-
havior will result. Alternatively, if low-level data are placed on the agenda
initially and assertions suggested by these data assertions are always placed
below them on the agenda, breadth-first, data-driven behavior will result.
More commonly,what is desired is a mixture of the two, in which assertions
suggest others as being likely and goal-directed verification is employed to
investigate the likely assertions. The example below illustrates howthis can
be done.
In the knowledge base of Figure 24-5, suppose that RDX-ALSis con-
firmed, suggesting RLDto the agenda at level 5 and ALSat level 4. RLD
is then examined, and since its manifestations are unknown, they are
placed at the specified level on the agenda. The agenda now contains
FEVI/FVC~>80at level 8, RV<80and RLDat level 5, and ALSat level 4.
FEVI/FVC~>80is therefore selected. Suppose that it is found to be false.
Its complementary assertion (FEV1/FVC<80)is placed at level 8 on the
agenda and is immediately investigated. It is, of course, true, causing OAD
to be placed at level 8 on the agenda. The diagnosis proceeds by investi-
gating the manifestations of OAD;and, if OADis confirmed, Asthma and
Bronchitis are investigated.
Although many subtleties have been glossed over in this example, it is
important to note that:

1. The manipulation of SuggestiveOf and importance values can change


the order in which assertions are examined, therefore changing the
450 AnotherLookat Frames

order in which questions are asked and results are printed out. (In the
example, FEV1/FVC was asked for before RV.)
2. Surprise values (data contrary to the hypothesis currently being inves-
tigated) may suggest goals to the agenda that are high enough to cause
suspension of the current investigation. (The surprise FEV1/FVC value
caused suspension of the RLDinvestigation in favor of the OADinves-
tigation. If the Suggestivity of the link from FEV1/FVC<80to OAD
were not as high, this would not have occurred.)
3. Low-level data assertions cause the suggestion of high-level goals, thus
selecting and ordering goals to avoid irrelevant questions. (In the ex-
ample, RLD and ALS were suggested and ordered by the low-level
assertion RDX-ALS.)

24.3Conclusions
It is no surprise that WHEEZE exhibits the same diagnostic behavior as its
predecessors, PUFFand CENTAUR, on a standard set of ten patient test
cases. The three systems are also roughly comparable in efficiency.
WHEEZEand CENTAURare somewhat slower than PUFF, but this may
be misleading, since little effort has been expended on optimizing either
of these systems.
The frame representation described in Section 24.1 has proved en-
tirely adequate for capturing the domain knowledge of both PUFF and
CENTAUR. In some cases, several rules were collapsed into a single as-
sertion frame. In other cases, intermediate assertions, corresponding to
commongroups of clauses in rule premises, were added to the knowledge
base. This had the effect of simplifying other assertion frames. The com-
bination of representation and control structure also eliminated the need
for many awkward interdependent rules and eliminated the need for
screening clauses in others.
There are several less tangible effects of using a frame representation.
Our purely subjective view is that a uniform, declarative representation is
often more perspicuous. As an example, all of the interconnections be-
tween assertions about disease states are made explicit by the Manifestation
and ManifestationOfslots. As a result, it is easier to find all other assertions
related to a given assertion. This in turn makes it somewhat easier to
understand and predict the control flow of the system.
Since the agenda-based control mechanism includes backward-chain-
ing and goal-triggering capabilities, it has also proved adequate for cap-
turing the control flow of PUFFand CENTAUR. In addition, the flexibility
of agenda-based control was used to advantage. Suggestiveness and im-
portance factors were used to change the order in which questions were
Conclusions 451

asked and conclusions printed out. They were also used to eliminate the
need to order carefully sets of antecedent assertions.
There is evidence that mixed goal-directed and data-directed control
models human diagnostic behavior much more closely than either pure
goal-directed or data-directed search (Elstein et al., 1978). The diagnostic
process is one of looking at available symptoms, allowing them to suggest
higher-level hypotheses, and then setting out to prove or disprove those
hypotheses, all the while recognizing hypotheses that might be suggested
by symptoms appearing in the verification process. Pauker and Szolovits
(1977) have noted that a physician will go to great lengths to explain data
inconsistent with a partially verified hypothesis before abandoning it. This
type of behavior is not altogether inconsistent with the strategy we have
employed, albeit for a different reason. The combination of a partially
verified hypothesis and data inconsistent with it may be enough to boost
an assertion that would explain the inconsistent data "above" an alternative
hypothesis on the agenda. Oddly enough, some of this behavior seems to
be a natural consequence of the control structure we have employed.

24.3.1 Generalizing

There is no reason to suppose that the representation and control mech-


anisms used in WHEEZE could not be used to advantage in other diag-
nostic production systems. A system similar to EMYCIN (Chapter 15), hav-
ing both knowledge acquisition and explanation capabilities, could
certainly be based on frames and agenda-based control. It also seems likely
that an analogue of the EMYCIN rule compiler could be developed to take
portions of an assertion network and produce efficient LISP code that
would perform identically to the agenda-based control scheme operating
on the assertion network.
A second class of extensions that becomes possible with a frame-based
system is the addition of other kinds of knowledge not essential to the
diagnostic process. For example, in the development of GUIDON (Chapter
26) Clancey noted that a substantial proportion of the domain knowledge
had been compiled out of the rules used by most high-performance sys-
tems. Within our framework there is no reason why this information could
not be added while still maintaining high performance. Such additional
information might also be useful for enhanced explanation of system be-
havior.

24.3.2 Some Outstanding Questions

In the discussion above, claims were made about the perspicuity of the
frame representation and about the flexibility of the agenda-based control
mechanism. Of course, the acid test would be to see how well domain
452 AnotherLookat Frames

experts could adapt to the representation and to see whether or not they
would becomefacile at tailoring control flow.
A second question that we pondered is this: how would WHEEZE be
different if we had started with a basic frame system and the agenda-based
control mechanism and worked with an expert to help build up the system
from scratch? It is entirely possible that the backward-chaining production
system paradigm had a significant effect on the vocabulary and knowledge
that make up both PUFF and CENTAUR.In other words, the medium
may have influenced the "message."
To a large extent, we have only paraphrased PUFFsrules in a different
representational medium. This paraphrase may not be the most natural
way to do diagnosis in the new architecture. Unfortunately, we do not have
sufficient expertise in pulmonary function diagnosis to consider radical
reformulations of the domain knowledge. For this reason, it would be in-
teresting to see a new diagnostic system developed using the basic archi-
tecture we have proposed.
PART EIGHT

Tutoring
25
Intelligent Computer-Aided
Instruction

The idea of directly teaching students "how to think" goes back at least to
Polya (1957), if not to Socrates, but it reached a new stage of development
in Paperts laboratory (Papert, 1970). In the LOGOlab, young students
were taught AI concepts such as hierarchical decomposition, opening up
a new dimension by which they could take apart a problem and reason
about its solution. In part, Polyas heuristics have seemed vague and too
general, too hard to follow in real problems (Newell, 1983). But progress
in AI programming, particularly expert system design, has suggested a
vocabulary of structural concepts that we now see must be conveyed along
with the heuristics to make them intelligible (see Chapter 29).
Developing in parallel with Paperts educational experiments and cap-
italizing even more directly on AI technology, programs called intelligent
tutoring systems (ITS) were constructed in the 1970s. In contrast with the
computer-aided instruction (CAI) programs of the 1960s, these programs
used new AI formalisms to separate out the subject matter they teach from
the programs that control interactions with students. This is called intel-
ligent computer-aided instruction (ICAI). This approach has several ad-
vantages: it becomes possible to keep records of what the student knows;
the logic of teaching can be generalized and applied to multiple problems
in multiple problem domains; and a model of student knowledge can be
inferred from student behavior and used as a basis for tutoring. The well-
known milestones in ITS research include:
interacting with the student in a mixed-initiative dialogue 1 (Carbonell,
1970b) and tutoring by the Socratic method (Collins, 1976)

Parts of this chapterare takenfromthe final report to the Office of NavalResearchfor the
first period of GUIDON research (1979-1982).That report appearedas a technical memo
(HPP-82-2)written by WilliamJ. Clanceyand BruceG. Buchananfromthe Heuristic Pro-
gramming Project, Departmentof ComputerScience, StanfordUniversity.
1In a mixed-initiativedialoguebetweena student and a program,either party can initiate
questionsandexpectreasonableresponsesfromthe other party. Thiscontrasts sharply with
drill and practice programsor MYCINs dialogue, in whichusers cannot volunteerinfor-
mationor direct the programsreasoning.
455
456 Intelligent Computer-Aided
Instruction

evaluating student hypotheses for consistency with measurements taken


(Brown et al., 1975)
enumerating bugs in causal reasoning (Stevens et al., 1978)
interpreting student behavior in terms of expert knowledge ("overlay
model") (Burton, 1979; Carr and Goldstein, 1977; Clancey, 1979b)
codifying discourse procedures for teaching (Clancey, 1979c)
constructing models of incorrect plans or procedures (Genesereth, 1981
Brown and Burton, 1978)
relating incorrect procedures to a generative theory (Brown and Van-
Lehn, 1980)

The record of ITS research reveals a few recurring questions:

1. Nature of expertise: What is the knowledge we want to teach a student?


2. Modeling: Howcan we determine what the student knows?
3. Tutoring: How can we improve the students performance?

Almost invariably, researchers have backed off from initially focusing on


the last question--"How shall we teach?"---to reconsider the second ques-
tion, that of building a model of the students knowledge. This follows
from the assumption that student errors are not random but reflect mis-
conceptions about the procedure to be followed or facts in the problem
domain and that the best teaching strategy is to address directly the stu-
dents misconceptions.
In order to extend the research in building models of misconcep-
tions in well-understood domains such as subtraction to more complex do-
mains such as physics, medicine, and electronic troubleshooting, we need
a sounder understanding of the nature of knowledge and expertise. Com-
parison studies of experts and novices (Chi et al., 1980; Feltovich et al.,
1980; Lesgold, 1983) reveal that how the expert structures a problem, the
very concepts he or she uses for thinking about the problem, distinguishes
an experts reasoning from a students often formal, bottom-up approach.
These studies suggest that we might directly convey to the student the
kinds of quick associations, patterns, and reasoning strategies that experts
build up tediously over long exposure to many kinds of problems--the
kind of knowledge that tends not to be written down in basic textbooks.
It is with this premise--that we will be better teachers by better un-
derstanding expertise--that research on expert systems becomes of keen
interest to the educator. These knowledge-based programs contain within
them a large number of facts and rulelike associations for solving problems
in restricted domains of medicine, science, and engineering. While these
programs were developed originally just for the sake of building systems
that could solve difficult problems, they have special interest to research
in cognitive science as simulation models that can be used as a "laboratory
workbench" for experimenting with knowledge structures and control
Tutoringfrom MYCINs
KnowledgeBase 457

strategies. By altering the "program as a model," one can test hypotheses


about human performance [for example, see Johnson et al. (1981)].
Another natural application for expert systems in education is to use
them as the "knowledge foundation" for an intelligent tutoring system.
Brown pioneered this technology in the SOPHIE3system (Brown et al.,
1974), which took a student through the paces of debugging a circuit.
Brown, Collins (1978), and Goldstein (1978) pioneered the use of produc-
tion rules to express knowledge about how to interact with a student and
how to interpret his or her behavior. The first tutor built on top of a
complex expert system was GUIDON (Clancey, 1979a), using MYCINs450
production rules and tables for teaching medical diagnosis by the case
method. GUIDONsteaching expertise is represented cleanly and inde-
pendently of the domain rules; it has been demonstrated for both medical
2and engineering domains.

25.1Tutoring from MYCINs Knowledge Base

Early in the course of building MYCIN,we observed that a program with


enough medical knowledge for consulting had high potential for educating
physicians and medical students. Physicians who seek advice from a con-
sultant--human or machine---do so because they are uncertain whether or
not they are ignoring important possibilities or making conclusions that
are correct. Along with confirmation and advice, a consultant provides
reasons, answers questions, and cites related issues. The educational com-
ponent of a computer-based consultant was too obvious for us to ignore.
MYCINsconclusions alone would not help a physician understand the
medical context of the case he or she presents to the program. But the
dialogue with MYCIN already begins to illuminate what are the key factors
for reaching those conclusions. Because MYCINasks whether or not the
patient has been burned, for example, a physician is reminded that this
factor is relevant in this context. This is very passive instruction, however,
and does not approach the Socratic dialogue we expect from good teachers.
MYCINs explanation capabilities were introduced to give a physician an
opportunity to examine parts of the dialogue he or she found puzzling.
Whenthe program asks whether or not the patient has burns, the user
can inquire why that information is relevant. As described in Part Six,
answers to such inquiries elucidate MYCINsline of reasoning on the case
at hand and thus provide brief instructional interchanges in the course of
a consultation. Similarly, the question-answering capabilities give a physician

~In GUIDON teaching knowledgeis treated as a formof expertise. That is, GUIDON
has a
knowledge base of teachingrules that is distinct fromMYCINs
knowledgebaseof infectious
diseaserules.
458 Intelligent Computer-Aided
Instruction

instructional access to the static knowledge base. Although we now under-


stand better the difference between MYCINsexplanation capabilities and
an active tutor, we enthusiastically wrote in 1974 (Shortliffe, 1974, pp. 230-
231):

As... emphasizedthroughout this report, an ability to instruct the user


was an important consideration during the design of MYCIN. Webelieve it
is possible to learn a great deal simply by asking MYCIN for consultative
advice and taking advantage of the programsexplanation capabilities. It is
quite likely, in fact, that medicalstudents in their clinical years will comprise
a large percentage of MYCINs regular users.

We were also aware of the need to make an instructional program


more active, as others in AI were doing. In 1974 we noted (Shortliffe,
1974, p. 231):

It would be possible ... to adapt MYCIN so that its emphasis became


primarily educational rather than consultative. This could be accomplished
in a numberof ways. In one scenario, MYCIN wouldpresent a sample patient
to a student. The program would then judge the students ability to ask
important questions and to reach valid conclusions regarding both the iden-
tity of the organism(s) and the most appropriate therapeutic regimen.
comparingthe students questions and decisions to its own, MYCIN could
infer inadequacies in the users knowledgeand enter into a tutorial discourse
customizedfor the student .... Wehave no plans to pursue this application
in the near future.

It was within this intellectual context that Clancey began asking about
the adequacy of MYCINsknowledge base for education. Weinitially be-
lieved that the rules and tables MYCINused for diagnosing causes of
infections would be a sufficient instructional base for an ICAI program.
Wefelt that the only missing intelligence was pedagogical knowledge: how
to carry on a mixed-initiative dialogue, how to select and present infor-
mation, how to build and use a model of the student, and so on. Clancey
began work on a tutorial program, called GUIDON,within two years after
the material quoted above was written. The initial model of interaction
between MYCINand GUIDONis shown schematically in Figure 25-1.
GUIDON was first conceived as an extension of the explanation system
of the MYCINconsultation program. This previous research provided the
building blocks for a teaching program:

modular representation of knowledge in production rules


English translation of the internal rule representation
a developed "history trace" facility for recording reasoning steps
representation in the system of the grammarof its rules, so they can be
parsed and reasoned about by the system itself
an explanation subsystem with a well-developed vocabulary for the log-
Tutoring fromMYCINs
KnowledgeBase 459

GUIDON

MYCIN
medical

/
inferenceengine
knowledge
= diagnosticknowledge

tutorial program
= pedagogical
knowledge

FIGURE
25-1 Model of interaction between MYCIN
and GUI-
DON.

ical kinds of" questions that can be asked about MYCINsreasoning ("Why
didnt you ask X?" or "How did you use X to conclude about Y?")

With this foundation, we constructed a tutoring program that would


take MYCINs solution to a problem, analyze it, and use it as the basis for
a dialogue with a student trying to solve the same problem. About two
hundred tutoring rules were developed, organized into "discourse proce-
dures" for carrying on the dialogue (offering advice, deciding whether and
how to interrupt, etc.) (Clancey, 1979b). Student modeling rules were used
to interpret a students partial problem solutions in terms of MYCINs
knowledge, and the resulting model was used to decide how much to tell
the student and when to test his or her understanding.
Our 1978 proposal to the Office of Naval Research (ONR) for GUI-
DONresearch outlined investigation of both problem-solving and teaching
strategies. With the program so well developed, it was expected that early
experimentation could be done with alternative teaching approaches. How-
ever, during preliminary discussions with other researchers in this field, a
key question was repeatedly raised. To paraphrase John Brown (August 2,
1978, at Stanford University):

Whatis the nature of the expertise to be transmitted by this system


[GUIDON]? Youare not.just unfolding a chain of inferences; there is also
glue or a modelof process .... Whatmakesa rule click?

Following this lead, we began to concentrate on the nature of the


expertise to be taught. GUIDONsinteractions were studied, particularly
the kind of feedback it was able to provide in response to incorrect partial
solutions. The inability of the program to provide strategical guidance--
460 Intelligent Computer-Aided
Instruction

advice about what to do next--revealed that the "glue" that was missing
had something to do with the system of rules as a whole. With over 400
rules to learn, there had to be some kind of underlying logic that made
them fit together; the idea of teaching a set of weakly structured rules was
now seriously in question. Significantly, this issue had not arisen in the
years of developing MYCIN but was now apparently critical for teaching,
and probably had important implications for MYCINsexplanation and
knowledgeacquisition capabilities as well.
It soon became clear that GUIDONneeded to know more than MY-
CIN knows about diagnosis. MYCINsroute from goal to specific questions
is not the only acceptable line of reasoning or strategy for gathering evi-
dence. The order in which MYCIN asks for test results, for example, is
often arbitrary. Thus a student is not necessarily wrongif he or she deviates
from that order. Moreover, MYCINsexplicit knowledge about medicine is
often less complete than what a tutor needs to convey to a student. It is
associational knowledge and does not represent causal relationships ex-
plicitly. The causal models have been "compiled into" the associations.
Thus MYCIN cannot justify an inference from A to B in terms of a causal
chain, A ~ Al --, A2 --* B. A student, therefore, is left with an incomplete,
and easily forgotten, model of the disease process. These two major short-
comings are discussed at length in Chapters 26 and 29.

25.2 Recent Work

Complementingthe studies of differences between experts and novices, as


well as our own work at Stanford on systems that explain their reasoning,
our recent work has shown that expert systems must represent knowledge
in a special wayif it is to be used for teaching (Chapter 29). First, theprogram
must conveyorganizations and approachesthat are useful to the student," this ar-
gues for a knowledge base that reflects ways of thinking used by people
(the hypothesis formation approach). Second, various kinds of knowledge must
be separatedout and madeexplicit so reasoningsteps can be carefully articulated---
the experts associations must be decomposedinto structural and strategic
components. Under our current contract with ONR, such an expert sys-
tem, called NEOMYCIN, has been constructed (Clancey and Letsinger,
1981). It is being readied for use with students through both active devel-
opment of its knowledge base and construction of modeling programs that
will use it as a basis for interpreting student behavior.
The ultimate goal of our work in the past few years has been to use
NEOMYCIN for directly teaching diagnostic problem solving to students.
Students will have the usual classroom background but will be exposed in
this tutoring system to a way of thinking about and organizing their text-
book knowledge that is usually taught only informally in apprenticeship
settings. That is, we are beginning to capture in an expert system what we
MultipleUses of the SameKnowledge 461

deem to be the essential knowledge that separates the expert from the
novice and teaching it to the novice in practice sessions in which its value
for getting a handle on difficult, confusing problems will be readily ap-
parent. Empirical studies are a key part of this research.
Weview our work as the logical "next step" in knowledge-based tu-
toring. Just as representing expert knowledge in a simulation program
provides a vehicle for testing hypotheses about how people reason, using
this knowledge in a tutoring system will enable us to see how the knowledge
might be explained and recognized in student behavior. The experience
with the first version of GUIDON,as detailed further in Chapter 26, il-
lustrates how the tutoring framework provides a "forcing function" that
requires us to clarify what we want to teach and how we want to teach it.
During 1979-1980 a study was undertaken to determine how an
expert remembered MYCINsrules (the "model of process" glue) and how
he or she remembered to use them. This study utilized several commonAI
methods for knowledge acquisition but built upon them significantly
through the development of an epistemological framework for character-
izing kinds of knowledge, detailed in Chapter 29. The experts explanations
werecharacterized in terms of: strategy, structure, inference rule, and sup-
port. With this kind of framework, discussions with the expert were more
easily focused, and experiments were devised for filling in the gaps in what
we were told.
By the end of 1980, we had formulated and implemented a new, com-
prehensive psychological model of medical diagnosis (Clancey and Letsin-
ger, 1981) based on extensive discussions with Dr. Tim Beckett. NEO-
MYCIN is a consultation program in which MYCINs rules are
reconfigured according to our epistemological framework. That is, the
knowledge representation separates out the inference rules (simple asso-
ciations among data and hypotheses) from the structural and strategic
knowledge: we separate out what a heuristic is from when it is to be applied.
Moreover, the strategies and structure we have chosen model how an ex-
pert reasons. Wehave attempted to capture the experts forward-directed
inferences, "diagnostic task structure," and the types of focusing strategies
he or she uses. This explicit formulation of diagnostic strategy in the form
of meta-rules is exactly the material that our original proposal only men-
tioned as a hopeful aside. Recently, we have been fine-tuning NEOMYCIN,
investigating its applicability to other domains, and exploiting it as the
foundation of a student model.

25.3Multiple Uses of the Same Knowledge

From a slightly different perspective, we were also interested in exploring


the question of whether or not one knowledge base could be used for
multiple purposes. From the DENDRALand MYCINexperiences, we
462 Intelligent Computer-Aided
Instruction

Predictive Rule for DENDRAL:


IF the molecular structure contains the subgraph
OII
Rj--~-R
2 (where RI and Rz represent any substructures)
THEN predict that the molecule will fragment in the massspectrometer at either
side of the carbon atom, retaining the positive charge on the C~Ogroup.

CorrespondingInterpretive Rule for DENDRAL:


IF the mass spectrum showsdata points at masses x~ and x2 such that the sum
of Xl and x2 is the molecular weight plus 28 mass units (the overlapping C~O
group) and at least one of the two peaks is high (because the fragmentation
favorable)
THEN infer that the molecular structure contains the subgraph
O
LI
RI--C--R9
where the masses of Rl and R2 are just (xl - 28) and 2 - 28).

FIGURE25-2 Two forms of the same knowledge in DEN-


DRAL.

were painfully aware of how difficult it is for experts to build a single


knowledge base capable of supporting high performance in reasoning. Yet
there are manyrelated reasoning tasks in any domain for which one knowl-
edge base would be important. Wehad been troubled, for example, by the
fact that DENDRAUs predictive rules of mass spectrometry had to be recast
to serve as interpretive rules, s Prediction is from cause to effect; interpre-
tation depends on inferences from effects to causes. An example from
DENDRAL is shown in Figure 25-2. When we began working on MYCIN,
we were thus already sensitized to the issue of avoiding the work of re-
casting MYCINsinterpretive rules in a form suitable for teaching or other
purposes.
The GUIDON program discussed in the next chapter has at least three
important facets. First, GUIDON can be seen as an expert system in its
own right. Its expertise is in pedagogy, but it obviously needs a knowledge
base of medicine to teach from as well as a knowledge base about pedagogy.
Second, we had hoped that GUIDON would help us understand the prob-
lem of transfer of expertise. Webelieve there is some symmetry between
GUIDONstransferring medical knowledge to a student and an experts

3Weexperimentedwith twowaysof using predictive rules for interpretation in DENDRAL:


(a) generatethe interpretive rules automaticallyfromthe predictivemodel(Delfinoet al.,
1970),and(b) simulatethe behaviorof a skeletal structureunderall plausiblesubstitutions
of substructurestor the unnamed radicals in order to infer the structure and location of
substituentsaroundthe skeleton(Smithet al., 1972).
Multiple Uses of the Same Knowledge 463

transferring his or her medical knowledge to MYCIN.Weneed to do much


more work here. And third, because professional educators cannot yet
provide a firm set of pedagogical rules and heuristics, GUIDON can also
be seen as a laboratory for experimenting with alternative teaching strat-
egies. In all three of these areas, the possibilities are exciting because of
the newness of the territory and frightening because of the expanse of
uncharted waters.
26
Use of MYCINs Rules for
Tutoring

William J. Clancey

How can we make the expertise of knowledge-based programs accessible


to students? Knowledge-based programs (Davis et al., 1977; Lenat, 1976;
Pople, 1977; Goldstein and Roberts, 1977) achieve high performance by
interpreting a specialized set of facts and domain relations in the context
of particular problems. These knowledge bases are generally built by in-
terviewing human experts to extract the knowledge they use to solve prob-
lems in their area of expertise. However,it is not clear that the organization
and level of abstraction of this performance knowledge is suitable for use
in a tutorial program.
A principal feature of MYCINsformalism is the separation of the
knowledge base from the interpreter for applying it. This makes the knowl-
edge accessible for multiple uses, including explanation of reasoning
(Davis, 1976) and tutoring. In this chapter we explore the use of MYCINs
knowledge base as the foundation of a tutorial system called GUIDON.
The goal of this project is to study the problem of transferring the exper-
tise of MYCIN-likesystems to students. An important result of this study
is that although MYCIN-Iikerule-based expert systems constitute a good
basis for tutorial programs, they are not sufficient in themselves for making
knowledge accessible to students.
In GUIDONwe have augmented the performance knowledge of rules
by adding two other levels: a support level to justify individual rules, and
an abstraction level to organize rules into patterns (see Section 26.3.3). The
GUIDON system also contains teaching expertise that is represented ex-
plicitly and that is independent of the contents of the knowledgebase. This

This chapter is a shortened version of an article originally appearing in International Journal


of Man-Machine Studies 11:25-49 (1979). Copyright 1980 by Academic Press Inc (London)
Limited. Used with permission.

464
Descriptionof the Knowledge
Base 465

is expertise for carrying on a tutorial dialogue intended to present the


domain knowledge to a student in an organized way, over a number of
sessions. Section 26.2 describes design considerations for this tutorial dia-
logue, given the structure of the knowledge in MYCIN-like problem areas
(described in Section 26.1).
GUIDONis designed to transfer the expertise of MYCIN-like pro-
grams in an efficient, comprehensible way. In doing this, we overlap several
areas of research in intelligent computer-aided instruction (ICAI), includ-
ing means for structuring and planning a dialogue, generating teaching
material, constructing and verifying a model of what the student knows,
and explaining expert reasoning.
The nature of MYCIN-like knowledge bases makes it reasonable to
experiment with various teaching strategies. The representation of teach-
ing expertise in GUIDON is intended to provide a flexible framework for
such experimentation (Section 26.3). To illustrate the use of this framework
in the first version of GUIDON,we present in this chapter two sample
interactions and describe the domain knowledge and teaching strategies
used by the program (Section 26.4 and Section 26.5). The sample inter-
actions and rule listings were generated by the implemented program.

26.1Description of the Knowledge Base

MYCINsknowledge base of infectious diseases that we use for tutoring


has been built over four years through interactions with physicians. It
currently contains approximately 450 rules. In addition, there are several
hundred facts and relations stored in tables, which are referenced by the
rules. In this chapter, each precondition is called a subgoal. If all of the
subgoals in the premise can be achieved (shown to be true), then a conclu-
sion can be made about the goal in the action.
The tutoring system we are developing will also work with problems
and rules in another domain, assuming some parallels between the struc-
ture of the knowledge in the new domain and the structure of the existing
medical knowledge. Thus GUIDON is a multiple-domain tutorial program.
The overall configuration of this system is shown in Figure 26-1. One
advantage of this system is that a fixed set of teaching strategies can be
tried in different domains, affording an important perspective on their
generality. This method of integrating domain and teaching expertise is
quite distinct from the design of early frame-oriented computer-aided in-
struction (CAI) systems. For example, in the tutor for infectious diseases
by Feurzeig et al. (1964), medical and teaching expertise were "compiled"
together into the branching structure of the frames (dialogue/content sit-
uations). In GUIDON,domain and teaching expertise are decoupled and
stated explicitly.
466 Use of MYCINs
Rulesfor Tutoring

i i "OWL
00 I
DATA BASE

/ TEACHING I
INTERPRETER EXPERTISE
problem-
solution
~stance &
trace
instruction

STUDENT

FIGURE
26-1 Modulesfor a multiple-domain tutorial system.

Development of a Tutorial Program Based on


26.2 MYCIN-Iike Systems

In addition to the domain knowledge of the expert program, a tutorial


program requires expertise about teaching, such as the ability to tailor the
presentation of domain knowledge to a students competence and interests
(Brown and Goldstein, 1977). The GUIDONprogram, with its teaching
expertise and augmented domain knowledge, is designed to be an active,
intelligent agent that helps make the knowledge of MYCIN-like programs
accessible to students.
With the original MYCINsystem, it was clear that even rudimentary
explanations of the systems reasoning could provide some instruction to
users. For example, one can ask why case data are being sought by the
program and how goals will be (were) achieved. However, we believe that
this is an inefficient way for a student to learn the contents of the knowl-
edge base. The MYCIN program is only a passive "teacher": it is necessary
for the student to ask an exhaustive series of questions in order to discover
all of the reasoning paths considered by the program. Moreover, the MY-
CIN program contains no model of the user, so program-generated ex-
planations are never tailored to his or her competence or interests. On the
other hand, GUIDON acts as an agent that keeps track of the knowledge
that has been presented to the student in previous sessions and looks for
opportunities to deepen and broaden the students knowledge of MYCINs
expertise. GUIDONs teaching expertise includes capabilities to measure a
students competenceand to use this measure as a basis for selecting knowl-
edge to present. Someof the basic questions involved in converting a rule-
based expert program into a tutorial program are:
Development of a Tutorial Program Based on MYCIN-IikeSystems 467

What kind of dialogue might be suitable for teaching the knowledge of


MYCIN-Iike consultation systems?
What strategies for teaching will be useful?
Will these strategies be independent of the knowledge base content?
Howwill they be represented?
What additions to the performance knowledge of MYCIN-like systems
might be useful in a tutorial program?

As the first step in approaching these questions, the following sections


discuss some of the basic ways in which MYCINsdomain and formalism
have influenced design considerations for GUIDON.Section 26.2.1 de-
scribes the nature of the dialogue we have chosen for tutorial sessions.
Section 26.2.2 discusses the nature of MYCINsperformance knowledge
and argues tor including additional domain knowledge in the tutorial pro-
gram. Sections 26.2.3 and 26.2.4 argue that the uncertainty of MYCINs
knowledge and the size of" its knowledge base make it desirable to have a
framework for experimenting with teaching strategies. This framework is
presented in Section 26.3.

26.2.1 A Goal-Directed Case Dialogue

In a GUIDON tutorial session, a student plays the role of a physician


consultant. A sick patient (the case) is described to the student in general
terms: age, sex, race, and lab reports about cultures taken at the site of the
infection. The student is expected to ask for other information that might
be relevant to this case. For example, did the patient becomeinfected while
hospitalized? Did the patient ever live in the San Joaquin Valley? GUIDON
compares the students questions to those asked by MYCIN and critiques
the students line of reasoning. Whenthe student draws hypotheses from
the evidence collected, GUIDON compares these conclusions to those that
MYCINreached, given the same information about the patient. We refer
to this dialogue between the student and GUIDON as a case dialogue. Be-
cause GUIDONattempts to transfer expertise to students exclusively
through case dialogues, we call it a case methodtutor.
GUIDONspurpose is to broaden the students knowledge by pointing
out inappropriate lines of reasoning and suggesting approaches the stu-
dent did not consider. An important assumption is that the student has a
suitable background for solving the case; he or she knows the vocabulary
and the general form of the diagnostic task. The criterion for having
learned MYCINsproblem-solving methods is therefore straightforward:
when presented with novel, difficult cases, does the student seek relevant
data and draw appropriate conclusions?
Helping the student solve the case is greatly aided by placing con-
straints on the case dialogue. A goal-directed dialogue is a discussion of the
468 Use of MYCINsRules for Tutoring

rules applied to achieve specific goals. In general, the topics of this dialogue
are precisely those "goals" that are conchtded by MYCINrules. 1 During
the dialogue, only one goal at a time is considered; data that cannot be
used in rules to achieve this goal are "irrelevant." This is a strong constraint
on the students process of asking questions and making hypotheses. A
goal-directed dialogue helps the tutor to follow the student as he or she
solves the problem, increasing the chance that timely assistance can be
2provided.
Our design of GUIDON has also been influenced by consideration of
the expected sophistication of the students using it. Weassume the students
are well motivated and capable of a serious, mixed-initiative dialogue. Var-
ious features (not all described in this paper) make the program flexible,
so that students can use their judgment to control the depth and detail of
the discussion. These features include the capability to request:

descriptions of" all data relevant to a particular goal


a subgoal tree for a goal
a quiz or hint relevant to the current goal
a concise summaryof all evidence already discussed for a goal
discussion of a goal (of the students choice)
conclusion of a discussion, with GUIDON finishing the collection of evi-
dence for the goal and indicating conclusions that the student might
have drawn

26.2.2 Single Form of Expertise

The problem of muhiple forms of expertise has been important in ICAI


research. For example, when mechanistic reasoning is involved, qualitative
and quantitative forms of expertise may be useful to solve the problem
(Brown et al., 1976). De Kleer has found that strategies for debugging
electronic circuit are "radically different" depending on whether one does
local mathematical analysis or uses a higher-level, functional analysis of
components (Brown et al., 1975). One might argue that a tutor for elec-
tronics should also be ready to recognize and generate arguments on both
3of" these levels.
For all practical purposes, GUIDONdoes not need to be concerned
about multiple fl)rms of expertise. This is primarily because reasoning in

IA typical sequence of (nested) goals is as [i~llows: (a) reach a diagnosis, (b) determine
organisms might be causing the infection, (c) determine the type of infection, (d) determine
if the infection has been partially treated, etc.
~Sleeman uses a similar approach lk)r allowing a student to explore algorithms (Sleeman,
1977).
:~See Carr and Goldstein (1977) fi)r a related discussion.
Development of a Tutorial Program Based on MYCIN-Iike Systems 469

infectious disease problem solving is based on judgments about empirical


information, rather than on arguments based on causal mechanisms (Weiss
et al., 1978). MYCINsjudgments are "cookbook" responses that address
the data directly, as opposed to attempting to explain it in terms of phys-
iological mechanisms. Moreover, the expertise to solve a MYCIN case on
this level of abstraction constitutes a "closed" world (Carbonell and Collins,
1973): all of the objects, attributes, and values that are relevant to the
solution of a case are determined by a MYCIN consultation that is per-
4formed before a tutorial session begins.
Even though MYCINsdomain makes it possible for cases to be solved
without recourse to the level of physiological mechanisms, a student may
find it useful to knowthis support knowledge that lies behind the rules.
Section 26.3.3 describes the domain knowledge we have added to MYCINs
performance knowledge in developing GUIDON.

26.2.3 Weak Model of Inquiry

Even though the MYCINworld can be considered to be closed, there is


no strong model for ordering the collection of evidence. 5 Medical problem
solving is still an art. While there are some conventions to ensure that all
routine data are collected, physicians have no agreed-upon basis for nu-
merically optimizing the decision of what to do next. 6 During a tutoring
session, it is not only difficult to tell a student what is the "next best" piece
of evidence to gather but also difficult to decide what to say about the
evidence-gathering strategy. For example, when offering assistance, should
the tutor suggest the domain rule that most confirms the evidence already
7collected or a rule that contradicts this evidence?

26.2.4 Large Number of Rules

MYCINprovides to GUIDONan AND/ORtree of goals (the OR nodes)


and rules (the ANDnodes) that were pursued during consultation on
case. This tree constitutes a trace of the application of the knowledgebase

4There is always the possibility that a student maypresent an exotic case to GUIDON that is
beyond its expertise. While MYCIN has been designed to detect simple instances of this (i.e.,
evidence of an infection other than bacteremia or meningitis), we decided to restrict GUIDON
tutorials to the physician-approved cases in the library (currently over 100 cases).
51n the WUMIUS program (Carr and Goldstein, 1977), for example, it is possible to rank
each legal move(analogous to seeking case data in MYCIN) and so rate the student according
to "rejected inferior moves" and "missed superior moves." The same analysis is possible in
the WESTprogram (Burton, 1979).
*iSee, tor example, Sprosty (1963).
7MYCINs rules are nut based on Bayesian probabilities, so it is not possible to use optimization
techniques like those developed by Hartley et al. (1972). Arguments against using Bayes
Theorem in expert systems can be found in Chapter 11.
470 Use of MYCINs
Rulesfor Tutoring

to the given case. 8 Manyof the 450 rules are not tried because they con-
clude about goals that do not need to be pursued to solve the case.
Hundreds of others fail to apply because one or more preconditions are
not satisfied. Finally, 20%of the rules typically make conclusions that con-
tribute varying degrees of belief about the goals pursued.
Thus MYCINsinterpreter provides the tutorial program with much
information about the case solution (see Figure 26-1). It is not clear how
to present this to a student. What should the tutor do when the student
pursues a goal that MYCINdid not pursue? (Interrupt? Wait until the
student realizes that the goal contributes no useful information?) Which
dead-end search paths pursued by MYCINshould the tutor expect the
student to consider? For many goals there are too many rules to discuss
with the student; how is the tutor to decide which to present and which to
omit? What techniques can be used to produce coherent plans for guiding
the discussion through lines of reasoning used by the program? One so-
lution is to have a frameworkthat allows guiding the dialogue in different
ways. The rest of this paper shows how GUIDONhas been given this
flexibility by viewing it as a discourse program.

A Frameworkfor a Case Method Tutorial


26.3 Program

One purpose of this tutorial project is to provide a framework for testing


teaching methods. Therefore, we have chosen an implementation that
makes it possible to vary the strategies that the tutor uses for guiding the
dialogue. Using methods similar to those used in knowledge-based pro-
grams, we have formalized the tutorial program in rules and procedures
that codify expertise for carrying on a case dialogue.
This section is a relatively abstract discussion of the kinds of knowledge
needed to guide a discourse and the representation of that knowledge.
The reader may find it useful to consider the sample dialogues in Figures
26-6 and 26-7 before proceeding.

8Befbrea tutorial session, GUIDON scans each rule used by MYCIN and compilesa list of
all subgoalsthat neededto be achievedbefore the premiseof the rule couldbe evaluated.
In the case of a rule that failed to apply,GUIDON determinesall preconditions of the premise
that are false. Bydoingthis, GUIDONs knowledge of the case is independentof the order
in whichquestions wereaskedand rules wereapplied by MYCIN, so topics can be easily
changedand the depth of discussion controlled flexibly by both GUIDON and the student.
Thisprocessof automaticallygeneratinga solutiontrace for anycase canbe contrastedwith
SOPHIEs single, fixed, simulatedcircuit (Brown et al., 1976).
A Framework
for a CaseMethodTutorial Program 471

26.3.1 Discourse Knowledge

Our implementation of GUIDONs dialogue capabilities makes use of


knowledge obtained from studies of discourse in AI (Bobrow et al., 1977;
Bruce, 1975; Deutsch, 1974; Winograd, 1977). To quote Bruce (1975,
emphasis added):

[It is] ... useful to havea modelof howsocial interactions typically fit
together, and thus a model of discourse structure. Such a model can be
viewedas a heuristic which suggests likely action sequences .... There are
places in a discourse where questions makesense, others where explanations
are expected. [These paradigms] ... facilitate generation and subsequent
understanding.

Based on Winograds analysis of discourse (Winograd, 1977), it ap-


pears desirable for a case method tutor to have the following forms of
knowledge for carrying on a dialogue:

Knowledge about dialogue patterns. Faught (1977) mentions two types


patterns: interpretation patterns (to understand a speaker), and action
patterns (to generate utterances). GUIDON uses action patterns repre-
sented as discourse procedures for directing and focusing the case dialogue.
These are the action sequences mentioned by Bruce. They are invoked by
0tutoring rules, discussed in Section 26.3.2.
Forms of domain knowledge for carrying on a specific dialogue. Section
26.3.3 surveys the augmented domain knowledge available to GUIDON.
Knowledgeof the communication situation. This includes the tutorial pro-
grams understanding of the students intentions and knowledge, as well
as the tutors intentions for carrying on the dialogue. These components
are represented in GUIDON by an overlay student model (in which the
students knowledge is viewed as a subset of the expert programs),
lesson plan (a plan of topics to be discussed, created by the tutor for each
case), and a focus record (to keep track of factors in which the student
has shown interest recently) (Section 26.3.4). Knowledge of the com-
munication situation controls the use of dialogue patterns.

The following sections give details about these forms of knowledge.

~Becauseof the constraints a goal-directed dialogueimposeson the student, wehavenot


foundit necessaryto use interpretationpatterns at this time. Theymightbe useful to follow
the studentsreasoningin a dialoguethat is not goal-directed.
472 Use of MYCINs
Rulesfor Tutoring

26.3.2 Dialogue Patterns: Discourse Procedures and


Tutoring Rules

The sequences of actions in discourse procedures serve as an ordered list


of options--types of remarks fbr the program to consider making. For
example, the procedure for discussing a domain rule (hereafter, d-rule)
includes a step that indicates to "consider mentioning d-rules related to
the one just discussed." Thus a discourse procedure step specifies in a
schematic form when a type of remark might be appropriate. Whether to
take the option (e.g., is there an "interesting" d-rule to mention?) and what
to say exactly (the discourse pattern for mentioning the d-rule) will
dynamically determined by tutoring rules (hereafter, t-rules) whose pre-
conditions refer to the student model, case lesson plan, and focus record
(hereafter referred to jointly as the communication model).
T-rules are generally invoked as a packet to achieve sometutorial goal. 10
T-rule packets are of two types:

T-rules for accumulating beliefi Updating the communication model and


determining how "interesting" a topic is are two examples.ll Generally,
a packet of t-rules of this type is applied exhaustively.
o T-rules for selecting a discourse procedureto follow. Generally, a packet of
this type stops trying t-rules when the first one succeeds. The form of"
t-rules of this type is shown in Figure 26-2. Knowledge referenced in
the premise part of a t-rule of this type is described in subsequent
sections. The action part of these t-rules consists of stylized code, just
like the steps of a discourse procedure. 12 A step may invoke:
a. a packet of t-rules, e.g., to select a question format for presenting a
given d-rule
b. a discourse procedure, e.g., to discuss sequentially each precondition
of a d-rule
c. a primitive function, e.g., to accept a question from the student,
perform bookkeeping, etc.

Below is an outline of the t-rules currently implemented in GUIDON.


Except where noted, examples of these t-rules are presented in discussions
of the sample tutorial dialogues in this chapter.

"~Packetsare implemented as stylized Interlisp procedures.Thisshouldbe contrastedwith


the interpreter used by the expert programthat invokesd-rules directly, indexingthem
accordingto the goal that needsto be determined.
l IGI ~IDON uses MYCINs certainty factors (Chapter11) for representingthe programs belief
in an assertion.
l~Discourse proceduresteps also containcontrolinformation (e.g., for iteration) that is not
importantto this discussion.
A Frameworkfor a Case MethodTutorial Program 473

PREMISE Domain KnowledgeReference


CommunicationModel Reference
-- Overlay Student Model
-- Caselesson plan
-- Focus Record

ACTION DISCOURSE PROCEDURE

-- T- rule Packet
-- Discourse Procedure
-- Primitive Function

FIGURE 26-2 Form of a tutorial rule for selecting a discourse


procedure.

T-rules for selecting discourse patterns


,
a. guiding discussion of a d-rule
b. responding to a student hypothesis
c. choosing question formats

T-rules for choosing domain knowledge


=
a. providing orientation for pursuing new goals (not demonstrated in
this paper)
b. measuring interestingness of d-rules

T-rules for maintaining the communication model


=
a. updating the overlay model when d-rules fire
b. updating the overlay model during hypothesis evaluation
c. creating a lesson plan (not implemented)

All t-rules are translated by a program directly from the Interlisp


source code, using an extension of the technique used for translating MY-
CINs rules. This accounts for some of the stilted prose in the examples
that follow.
474 Use of MYCINs
Rulesfor Tutoring

I. META-LEVEL
ABSTRACTIONS: rule models
rule schemata

II. PERFORMANCE: rules


lists andtables

III. SUPPORT: definitions


mechanism descriptiona
justifications
literature references

FIGURE 26-3 Organization of domain knowledge into three


tiers.

26.3.3 Augmented Representation of Domain


Knowledge

The representation of domain knowledge available to GUIDONcan be


organized in three tiers, as shown in Figure 26-3. Subsequent subsections
briefly describe the components of each tier, starting with the middle one.

Performance Tier

The performance knowledge consists of all the rules and tables used by
MYCIN to make goal-directed conclusions about the initial case data. The
output of the consultation is passed to the tutor: an extensive AND/OR
tree of traces showing which rules were applied, their conclusions, and the
case data required to apply them. GUIDON fills in this tree by determining
which subgoals appear in the rules. In Figure 26-4 COVERFOR signifies
the goal to determine which organisms should be "covered" by a therapy
recommendation; d-rule 578, shown in Figure 26-5, concludes about this
goal; BURNED is a subgoal of this rule.
Tutorial rules make frequent reference to this data structure in order
to guide the dialogue. For example, the response to the request for help
shown in Figure 26-6 (line 17) is based first of all on the rules that were
used by MYCIN for the current goal. Similarly, the t-rules for supplying
the case data requested by the student check to see if MYCIN asked for
the same information, e.g., the WBC(white blood count) in the sample
A Frameworkfor a Case MethodTutorial Program 475

COVERFOR

D-RLILE 578

BURNED

TYPE

{rules}

/
WBC

CSF-FINDINGS

FIGURE26-4 The portion of the AND/ORtree of goals and


rules created by the expert programthat is relevant to the dia-
logue shown in Figure 26-6. Figure 26-5 shows the contents of
d-rule 578.

dialogue of Figure 26-6.13 Associated documentation for d-rule 578 is also


shown in Figure 26-5.

Support Tier

The support tier of the knowledge base consists of annotations to the rules
and the factors used by them. 14 For example, there are "canned-text" de-
scriptions of every laboratory test in the MYCINdomain, including, for
instance, remarks about how the test should be performed. Mechanism
descriptions provided by the domain expert are used to provide some
explanation of a rule beyond the canned text of the justification. For the
infectious disease domain of MYCIN,they indicate how a given factor leads

~sOtherpossibilities include: the question is not relevant to the current goal; the case data
can be deducedby definition from other knowndata; or a d-rule indicates that the requested
data are not relevant to this case.
n4Rulejustifications, author, and edit date were first proposedby Davis (1976) as knowledge
base maintenance records.
476 Use of MYCINsRules for Tutoring

AbstractionLevel
RULE-SCHEMA:
MENINGITIS.COVERFOR.CLINICAL
RULE-MODEL: COVERFOR-IS-MODEL
KEY-FACTOR: BURNED
DUAL: D-RULE577

Performance
Level
D-RULE578
IF: 1) Theinfection whichrequirestherapyis meningitis,and
2) Organismswerenot seenonthe stain of the culture, and
3) Thetypeof theinfectionis bacterial, and
4) Thepatient hasbeenseriously burned
THEN: Thereis suggestiveevidence(.5) that pseudomonas-aeruginosa
is oneof the organisms
(other than
thoseseenoncultures or smears)whichmightbecausingthe infection
UPDATES: COVERFOR
USES: (TREATINF ORGSEEN
TYPE BURNED)

SupportLevel
MECHANISM-FRAME: BODY-INFRACTION.WOUNDS
JUSTIFICATION:
"Fora very brief periodof time after a severeburnthe surfaceof the wound
is sterile. Shortly
thereafter, the areabecomes colonizedby a mixedflora in whichgram-positiveorganisms predominate. By
the 3rd post-burndaythis bacterial populationbecomes dominatedby gram-negative
organisms.Bythe
5th daythese organisms haveinvadedtissue well beneaththe surfaceof the burn. Theorganismsmost
commonly isolated fromburnpatientsare Pseudomonas, Klebsiella-Enterobacter,
Staph.,etc. Infection
with Pseudomonasis frequentlyfatal."
LITERATURE:
MacMillanBG:Ecologyof Bacteria Colonizingthe BurnedPatient GivenTopicaland System
Gentamicin Therapy:a five-year study, J Infect Dis 124:278-286,1971.
AUTHOR:
Dr. Victor Yu
LAST-CHANGE: Sept. 8, 1976

FIGURE 26-5 Domainrule 578 and its associated documen-


tation. (All informationis providedby a domainexpert, except
for the key factor, whichis computed by the tutor fromthe rule
schemaand contents of the particular rule. See third subsection
of Section 26.3.3.)

to a particular infection with particular organisms by stating the origin of"


the organism and the favorable conditions for its growth at the site of" the
infection. Thus the frame associated with the factor "a seriously burned
patient" shows that the organisms originate in the air and grow in the
exposed tissue of" a burn, resulting in a frequently fatal infection.

Abstraction Tier

The abstraction tier of the knowledge base represents patterns in the per-
formance knowledge. For example, a rule schema is a description of a kind
of rule: a pattern of preconditions that appears in the premise, the goal
concluded, and the context of its application. The schema and a canned-
A Framework
for a Case Method
Tutorial Program 477

text annotation of its significance are formalized in the MYCIN knowledge


base by a physician expert. This schema is used by the tutor to "subtract
off" the rule preconditions commonto all rules of the type, leaving behind
the factors that are specific to this particular rule, i.e., the key factors of this
rule. Thus the key factor of" d-rule 578 (see Figure 26-5), the fact that the
patient has been seriously burned, was determined by removing the "con-
textual" information of" the name of the infection, whether organisms were
seen, and the type of the infection. (Examples of the use of key factors
occur throughout the hypothesis evaluation example in Figure 26-7, par-
ticularly lines 4-9.)
Rule models (Davis, 1976) are program-generated patterns that rep-
resent the typical clusters of" factors in the experts rules. Unlike rule sche-
mata, rule models do not necessarily correspond to domain concepts, al-
though they do represent factors that tend to appear together in domain
arguments (rules). For example, the gram stain of an organism and its
morphology tend to appear together in rules for determining the identity
of an organism. Because rule models capture the factors that most com-
monly appear in rules for pursuing a goal, they are valuable as a form of
orientation for naive students.

Use of Meta-Knowledge in Tutorial Rules

Meta-knowledge of the representation and application of d-rules plays an


important role in t-rules. For example, in the dialogue excerpt shown in
Figure 26-6 GUIDON uses function templates 15 to "read" d-rule 578 and
discovers that the type of the infection is a subgoal that needs to be com-
pleted before the d-rule can be applied. This capability to examine the
domain knowledge and reason about its use enables GUIDONto make
multiple use of any given production rule during the tutorial session. Here
are some uses we have implemented:

examine the rule (if it was tried in the consultation) and determine what
subgoals needed to be achieved before it could be applied; if the rule
failed to apply, determine all possible ways this could be determined
(perhaps more than one precondition is false)
examine the state of" application of the rule during a tutorial interaction
(what more needs to be done before it can be applied?) and choose
appropriate method of presentation
generate different questions fbr the student
use the rule (and variations of it) to understand a students hypothesis
summarize arguments using the rule by extracting the key point it ad-
dresses

~Afunctionstemplate"indicates the order andgenerictype of the argumentsin a typical


calf of that function"(see Chapter28).
478 Use of MYCINs
Rulesfor Tutoring

The ability to use domain knowledge in multiple ways is an important


feature of a "generative" tutor like GUIDON. 16 Flexible use of knowledge
permits us to write a variety of tutoring rules that select and present teach-
ing material in multiple ways. This is important because we want to use
the MYCIN/GUIDON system for experimenting with teaching strategies.

26.3.4 Components of the Communication Model

The components of the communication model are

1. an overlay student model,


2. a case lesson plan, and
3. a focus record.

The Overlay Student Model

The d-rules that were fired during the consultation associated with the
~7
given case are run in a forward direction as the student is given case data.
In this way, GUIDONknows at every moment what the expert program
would conclude based on the evidence available to the student. Wemake
use of knowledge about the history and competence of the student to form
hypotheses about which of the experts conclusions are probably knownto
the student. This has been termed an overlay model of the student by Gold-
stein, because the students knowledge is modeled in terms of a subset and
simple variations of the expert rule base (Goldstein, 1977). Our work was
originally motivated by the structural model used in the WESTsystem
(Burton and Brown, 1982).
Special t-rules for updating the overlay model are invoked whenever
the expert program successfully applies a d-rule. These t-rules must decide
whether the student has reached the same conclusion. This decision is
based on:

the inherent complexity of the d-rule (e.g., some rules are trivial defi-
nitions, others have involved iterations),
whether the tutor believes that the student knows how to achieve the
subgoals that appear in the d-rule (factors that require the application
of rules),
background of the student (e.g, year of medical school, intern, etc.), and
evidence gathered in previous interactions with the student.

16GenerativeCAIprogramsselect and transform domainknowledgein order to generate


individualizedteachingmaterial. SeeKoffman andBlount(1973)for discussion.
~TThisis oneapplicationof the problemsolutiontrace. Thestructure of this trace permits
the programto repetitively reconsiderd-rules (indexingthemby the case data referencedin
the premisepart), withoutthe highcost of reinterpretingpremisesfiomscratch.
A Framework for a Case Method Tutorial Program 479

These considerations are analogous to those used by Carr and Gold-


stein for the WUMPUS tutor (Carr and Goldstein, 1977).

The Case Lesson Plan

Before a human tutor discusses a case with a student, he or she has an


idea of what should be discussed, given the constraints of time and the
students interests and capabilities. Similarly, in later versions of GUIDON
a lesson plan will be generated before each case session, z8 Wed like the
lesson plan to give GUIDON a global sense about the value of discussing
particular topics, especially since the depth of emphasis will impact on the
students understanding of the problems solution. The lesson plan of the
type we are proposing provides consistency and goal-directedness to the
tutors presentations.
The lesson plan will be derived from:

The student model: where does the student need instruction?


Professed student interests (perhaps the case was chosen because of fea-
tures the student wants to know more about)
Intrinsic importance of topics: what part does this information play in
understanding the solution of the problem?
Extrinsic importance of topics: given the universe of cases, how inter-
esting is this topic? (A datum that is rarely available is probably worth
mentioning when it is known, no matter how insignificant the evidence
it contributes.)

Webelieve that these considerations will also be useful for imple-


menting automatic selection of cases from the consultation library.

The Focus Record

The purpose of the focus record is to maintain continuity during the dia-
logue. It consists of a set of global variables that are set when the student
asks about particular goals and values for goals. T-rules reference these
variables when selecting d-rules to mention or when motivating a change
in the goal being discussed. An example is provided in Section 26.4.1.

ls(;,oldsteins "syllabus" and BIPs "Curriculum Information Network" are fixed networks that
relate skills in terms of their complexities and dependencies. The lesson plan discussed here
is a program-generated plan for guiding discussion of a particular problem with a particular
student. Webelieve that a skill network relating MYCINs rules will be useful for constructing
dialogue plans.
480 Use of MYCINs
Rulesfor Tutoring

.wvA~,~" T-Rules for Guiding Discussion of a Goal

In this section we consider an excerpt from a dialogue and some of the


discourse procedures and tutoring rules involved. Suppose that a first-year
medical student has just read about treatment for burned patients sus-
pected to have a meningitis infection. His mic-obiology text mentioned
several organisms, but it wasnt clear to him how other factors such as the
age and degree of sickness of the patient might affect diagnosis of an actual
case. GUIDON is available to him, so he decides to ask the program to
select a relevant case from the MYCIN library for a tutorial session.
The program begins by invoking the discourse procedure CASE-DIS-
CUSSION.One of the first steps is to choose a case. At this point the
student described the case he wanted using keywords ("burned meningitis
patient"). 19 GUIDON selected the case and set the scene for the student.
There is one pending culture from the CSF (cerebral spinal fluid, where
meningitis infections occur). In the excerpt in Figure 26-6, the dialogue
has reached the subtask of determining the organisms that might be caus-
ing the infection.
When the student requested help (line 17), the program had been
following the pattern fbr discussing a goal. The request for help led to the
invocation of tutoring rules. The teaching strategy represented by these t-
rules is to provide help for a goal by suggesting a d-rule to the student.
The discourse procedure that provides help in this context first invokes a
packet of t-rules that will choose a d-rule to mention to the student. The
second step is to invoke a packet of t-rules that will choose a presentation
method.

26.4.1 Choosing a D-Rule to Mention in Order to


Provide Help for a Goal

D-rule 578 (Figure 26-5) was chosen because it became the focus of the
discussion when the student asked about the relevance of the "burned"
factor. That is, when the student asked the question in line 8, a variable
was set to indicate that the most recent factor referred to for this goal was
"burned" (the focus topic). Then when the packet of t-rules for choosing
2
a d-rule to present was invoked, the following t-rule succeeded:

IJStudentinput to the GUIDON programis in the tormof menuoptions and simple English
phrasesthat are parsedusing keywordanalysis and pattern-matchingroutines developedfor
MYCINs question-answeringmodule(see Chapter18).
2T-rulenumbersare of the [orm<procedurenumberthat invokesthe rule>.<indexof the
rule>. Thust-rule 26.03is the third rule in discourseprocedurenumber26.
T-Rules for Guiding Discussion of a Goal 481

Youshouldnowproceedto ask questionsthat will enableyou


to makea hypothesisaboutthe organisms(other than those
seenonculturesor smears)whichmightbecausingthe infection.

{Initiative is givento the student;he or shemayaskfor more


data, makea hypothesis,or usea specialoption.}

** HOWIS THEFACTTHATTHEPATIENTIS BURNED


RELEVANT?

10 {Oneoptionis to askhowa pieceof datais relevantto the currentgoal.}

Thefact that patient-326


is seriouslyburnedwill beusedto
concludethat pseudomonas-aeruginosa is one of the organismswhich
mightbepresent(in additionto thoseseenon culturesor smears)
15 [Rule578].

** HELP
{TheHELPoptionis a requestfor assistance:the studentasks
"wheredo I go from here?"}
20
Try to determine the typeof theinfection: bacterial,
fungal,viral, or Tb.

{To providehelp, GUIDON chose:(1) a rule to discuss(d-rule 578waschosen


25 because the dialoguejust focusedon it); and(2) a method of presentation--"tell
studentto finish incompletesubgoalsmentioned in the rule." Thus,a newtopic
is set up,"the typeof the infection."Initiative is returnedto the student....}

** WHATIS THEPATIENTSWBC?
3O
Thewhite countfromthe patients peripheralCBC
is 1.9 thousand.

{Thetutor observesthat this questionis relevantto the new


topic andprovidesthe requesteddata. Thedialoguecontinuesin
35 this goal-directedmanner...}

FIGURE 26-6 Sample interaction: gathering data. Input from


the student follows double asterisks; annotations appear in
italics within curly brackets. Lines are numbered for reference
within the text.

T-RULE26.03

IF: Therecentcontextof the dialoguementioned either a "deepersubgoaror a factor relevant to


the currentgoal
THEN: Definethe focusrule to bethe d-rule that mentionsthis focustopic

This example illustrates how the communication model guides the


session by controlling t-rules. Often there is no obvious d-rule to suggest
to the student. It is then useful for the tutor to have some measure of the
interestingness of a d-rule at this time in the discussion. The t-rules pre-
sented below are applied to a set of d-rule candidates, ranking them by
how strongly the tutor believes that they are interesting.
482 Use of MYCINsRules for Tutoring

Changein Belief Is Interesting

Onemeasureof interest is the contribution the d-rule wouldmaketo what


is currently knownabout the goal being discussed, If the d-rule contributes
evidence that raises the certainty of the determined value of the goal to
morethan 0.2, wesay that the value of the goal is nowsignificant. 21 This
contribution of evidence is especially interesting because it dependson
what evidence has already been considered.
As is true for all t-rules, this determinationis a heuristic, whichwill
benefit from experimentation. In t-rule 25.01 we have attempted to cap-
ture the intuitive notion that, in general, changein belief is interesting:
the more drastic the change, the more interesting the effect. The numbers
in the conclusionof t-rule 25.01are certainty factors that indicate our belief
in this interestingness.
T-RULE25.01
IF: Theeffectof applying thed-ruleonthecurrentvalueof thegoalhasbeen determined
THEN: The"valueinterest"of this d-ruledepends ontheeffectof applying
thed-ruleasfollows:
a. if thevaluecontributed
is still insignificant
then.05
b. if a new insignificant
valueis contributed then.05
c. if a new significantvalueis contributed then.50
d. if a significantvalueis confirmed then.70
e. if a newstronglysignificantvalueis contributed then.75
f. if aninsignificantvaluebecomes significantthen.80
g. if anoldvalueis now insignificant then.85
h. if beliefin anoldvalueis strongly contradicted
then.90

Useof Special Facts or Relations Is Interesting

In contrast to that in t-rule 25.01, the measureof interest in t-rule 25.06


belowis static. Wedlike to makesure that the student knowsthe infor-
mation in tables used by the expert program, so we give special consider-
ation to a d-rule that references a table.
T-RULE25.06
IF: Thed-rulementions
a static tablein its premise
THEN:Definethe "contentinterest" to be.50

26.4.2 Guiding Discussion of a D-Rule

Returning to our example, after selecting d-rule 578, the tutor needed to
select a methodfor presenting it. The following t-rule was successfully
applied:

21Forexample,if the goal is the "organismcausing the infection" and the certainty associated
with the value "pseudomonas"
is 0.3, then this value is significant.
T-Rules for Responding to a Students Hypothesis 483

T-RULE2.04

IF: 1) Thenumber of factors appearingin the d-rule whichneedto beaskedby the studentis
zero, and
2) Thenumber of subgoalsremainingto be determined before the d-rule canbe appliedis
equalto 1
THEN: Substepi. Say: subgoal-suggestion
Substepii. Discuss the goal with the studentin a goal-directedmode [ProcO01]
Substepiii. Wrapup the discussion of the rule beingconsidered [Proc017]

The premise of this t-rule indicates that all preconditions of the d-rule can
be evaluated, save one, and this d-rule precondition requires that other d-
rules be considered. The action part of this t-rule is a sequence of actions
to be followed, i.e., a discourse pattern. In particular, substep (i) resulted
in the program printing "try to determine the type of the infection ... "
(line 22). 22 The discourse procedure invoked by substep (ii) will govern
discussion of the type of the infection (in simple terms, a new context is
set up for interpreting student questions and use of options). After the
type of the infection is discussed (relevant data are collected and
hypotheses drawn), the tutor will direct the dialogue to a discussion of the
conclusion to be drawn from d-rule 578.
Other methods for suggesting a d-rule are possible and are selected
by other t-rules in the packet that contains t-rule 2.04. For example, the
program could simply tell the student the conclusion of the d-rule (if the
d-rule can be evaluated based on data currently available to the student),
or quiz the student about the d-rule, or sequentially discuss each precon-
dition of the d-rule, and so on.

T-Rules for Responding to a Students


26.5 Hypothesis

A students hypothesis provides the tutor with considerable evidence about


the students knowledge of the d-rules. It is an excellent opportunity for
the tutor to update the model of the students understanding and use that
model to make instructive remarks. The chief strategy behind the tutorial
remarks illustrated in Figure 26-7 is to state evidence directly, or to use
simple true-false questions when the overlay model shows that student is
probably ignorant of the rules to apply (lines 10-16), or otherwise to use
fill-in and multiple choice questions that are intended to "entrap" the stu-

zz"Say <label>" designates something the program will "say" to the student. The label is
usefld for debugging, because every print statement is uniquely labeled.
484 Use of MYCINsRules for Tutoring

{After presentation of relevantcasedata... }


** I THINK THATTHEORGANISMS PRESENTAREDIPLOCOCCUS-PNEUMONIAE,
PSEUDOMONAS-AERUGINOSA, ANDNEISSERIA-MENINGITIDIS.
Good.Theageof the patient gives evidencefor concludingthat
Diplococcus-Pneumoniae might be present(Rule507). Thefact that
the patient hasbeenseriously burnedis goodreasonto suspect
Pseudomonas-Aeruginosa (Rule 578). Both conclusionsapply because
the infection whichrequirestherapyis meningitis,no organisms were
seenona gramstain of the culture, andthe typeof the infection
is bacterial.
10 Theageof the patient also leadsoneto suspectNeisseria-
Meningitidis(Rule507).Dothe fact that the infectionwasacquired
while the patient washospitalizedandinformationaboutthe diseases
to whichthe patient hasbeenexposed supportthis conclusion?
** NO
15 Right,this wouldbegoodreasonto rule out Neisseria-
Meningitidis(Rule561).
Whatfacts aboutthis casetell youthat the organisms that mightbe
presentare: E.Coli andKlebsieUa-Pneumoniae?
** HELP
2O {The studentwasexpectedto describeeachfactor with a phrase, one
per line. Helpis provided by listing all relevantfactorsfor the
currentgoal.}
Choosefrom:
1. Thepatient doesnot havea TBrisk factor andthe
25 patientsx-rayis not abnormal
2. Or/t/s-mediaandEpJglottitis are not among the diagnoses
whichare consistentwiththe patientsclinical history
3. Thewhite countfromthe patients peripheralCBCis
less than 2.5 thousand
3O 4. Theinfection wasacquiredwhile the patient washospitalized
5. Thepatient hasbeenseriously burned
6. Noneof these

** 1 3 4 {This questionwasa formof "entrapment."Notice


35 that the choicesall appearin Figure26-8.}
Youreright aboutthe WBC (Rule 557)andthe evidencegainedfromthe
fact that the infectionwasacquiredwhilethe patientwashospitalized
(Rule545).However, the fact that the patient doesnot havea TBrisk
factor andthe patients x-ray is not abnormal is weakevidence that
40 Mycobacterium-TB is not oneof the organismswhichmight be present
(Rule160).

FIGURE 26-7 Sample interaction continued: hypothesis eval-


uation.

dent (lines 17-32). "Entrapment," as used here, involves forcing the stu-
dent to make a choice that will reveal some aspect of his or her
understanding. 2~ In this example, all choices listed (lines 24-32) actually

23Manyof GUIDONs question-asking strategies resemble the "Socratic" strategies formalized


by Collins (1976), probably because our production rule representation of" domain knowledge
makes it convenient to think in terms of "relevant factors" for determining the "value of a
goal" (terms we share with Collins). However, the relation between factor and goal in MYCIN
is not necessarily causal as it is in the network representation used by Collins.
T-Rules for Responding to a Students Hypothesis 485

appear in rules applied by MYCIN(see Figure 26-8). When the student


wrongly chose number 1 ("no TB risk factor and no abnormal x-ray"),
GUIDONindicated how that evidence actually was used by MYCIN.

26.5.1 Updating the Overlay Student Model After a


Student Hypothesis

Figure 26-8 illustrates how the overlay model is updated for the hypothesis
in line 1 of Figure 26-7. T-rules are invoked to determine how strongly
the tutor believes that the student has taken each of the relevant d-rules
into accounc That is, a packet of t-rules (packet number 6 here) is tried
in the context of each d-rule. Those t-rules that succeed will modify the
cumulative belief that the given d-rule was considered by the student. T-
rule 6.05 succeeded when applied to d-rules 545 and 557. The student
mentioned a value (PSEUDOMONAS) that they conclude (clause 1 of
t-rule) but missed others (clause 3). Moreover, the student did not mention
values that can only be concluded by these d-rules (clause 2), so the overall
24
evidence that these d-rules were considered is weak (-0.70).
T-RULE6.05
IF: 1) Thehypothesisdoesincludevaluesthat canbe concluded by this d-rule, as well as others,
and
2) Thehypothesisdoesnot includevaluesthat canonly be concluded by this d-rule, and
3) Valuesconcluded by the d-rule are missingin the hypothesis
THEN: Definethe belief that the d-rule wasconsideredto be -.70

After each of the d-rules applied by MYCINis considered indepen-


dently, a second pass is made to look for patterns. Twojudgmental tutorial
rules from this second rule packet are shown below. T-rule 7.01 applied to
d-rule 578: of the d-rules that conclude Pseudomonas,this is the only one
that is believed to have been considered, thus increasing our belief that d-
rule 578 was used by the student. T-rule 7.05 applies to d-rules 545 and
561: the factor NOSOCOMIAL appears only in their premises, and they
are not believed to have been considered. This is evidence that NOSO-
COMIAL was not considered by the student, increasing our belief that
each of the d-rules that mention it were not considered.
T-RULET.01
IF: Youbelievethat this domainrule wasconsidered, it concludesa valuepresentin the students
hypothesis,andnoother rule that mentions this valueis believedto havebeenconsidered
THEN:Modifythe cumulativebelief that this rule wasconsidered by .40

T-RULET.05
IF: This domain
rule containsa factor that appears
in severalrules, noneof whichare believedto
havebeenconsideredto makethe hypothesis
THEN:Modifythe cumulativebelief that this rule wasconsidered by -.30

24The certainty |actor of -0.70 was chosen by the author. Experience with MYCINshows
that the precise value is not important, but the scale from - 1 to 1 should be used consistently.
486 Use of MYCINsRules for Tutoring

GOAL: COVERFOR

age:507 tb-risk:160 nosocomial:545 : : diagnoses562 burned:578


& x-ray : : & nosocomial ~"
L _ --P ---- I ./~
I
I

., \w: ~ .R?
\ /~"
k () k,m~.~ Pseudomono, ,
! ~
\ ~
oip,ococc., m
." i

FIGURE26-8 Interpreting a student hypothesis in terms of


expert rules. Key: D-rules that conclude about organisms to
cover for are shown with their key factors (see Figure 26-5).
Circled values are missing from the students hypothesis (e.g.,
E.coli) or wrongly stated (e.g., Neisseria). Dotted lines lead from
rules the student probably did not use. Also, m = evidence link
that the tutor deduced is unknown to the student; R and W --
links to right and wrong values that the tutor believes are known
by the student; ! = unique link, expert knows of no other evi-
dence at this time; ? = questionable, tutor isnt certain which
evidence was considered by the student. For example, R? means
that the student stated this value, it is correct, and more than
one d-rule supplies evidence for it.

Future improvements to this overlay model will make it possible to


recognize student behavior that can be explained by simple variations of
the experts d-rules:

1. Variation in the premise of a d-rule: The student is using a d-rule that fails
T-Rules for Responding to a Students Hypothesis 487

to apply or applies a successful d-rule prematurely (is misinformed


about case data or is confused about the d-rules premise).
Variation in the action of a d-rule: The student draws the wrongconclusion
,
(wrong value and/or degree of certainty).

26.5.2 Presentation Methods for D-Rules the Student


Did Not Consider

Returning to our example, after updating the overlay model, the tutor
needs to deal with discrepancies between the students hypothesis and what
the expert program knows. The following t-rules are from a packet that
determines how to present a d-rule that the student evidently did not
consider. The tutor applies the first tutorial rule that is appropriate. In our
example, t-rule 9.02 generated the question shown in lines 10-14 of Figure
26-7. T-rule 9.03 (a default rule) generated the question shown in lines
17-32.
T-RULEg.01
IF: 1) Thed-rule is not onthe lessonplanfor this case,and
2) Basedon theoverlaymodel,the studentis ignorantaboutthe d-rule
THEN: Affirm the conclusions made by the d-rule by simplystating the keyfactors andvaluesto
be concluded

T-RULE9.02
IF: Thegoal currently beingdiscussedis a true/falseparameter
THEN: Generate a questionaboutthe d-rule using"facts" formatin the premise
part and"actual
value"formatin theactionpart

T-RULE9.03
IF: True
THEN: Generate a questionaboutthe d-rule using"fill-in" formatin the premise
part and"actual
value"formatin theactionpart

26.5.3 Choosing Question Formats

Whenthe tutor responds to a hypothesis, the context of the dialogue gen-


erally determines which question format is appropriate. However, during
other dialogue situations it is not always clear which format to use (e.g.,
when quizzing the student about a rule that MYCIN has just applied using
case data just given to the student). Our strategy is to apply special t-rules
to determine which formats are logically valid for a given d-rule, and then
to choose randomly from the candidates.
T-rule 3.06 is part of a packet of t-rules that chooses an appropriate
format for a question based on a given d-rule. The procedure for format-
ting a question is to choose templates for the action part and premise part
that are compatible with each other and the d-rule itself.
488 Use of MYCINsRules for Tutoring

T-RULE3.06
IF: 1) Theactionpart of the questionis not "wrongvalue," and
2) Theaction part of the questionis not "multiplechoice,"and
3) Notall of the factorsin the premiseof the d-rule are true/falseparameters
THEN:Include"multiple choice"as a possibleformatfor the premisepart of the question

T-rule 3.06 says that if the program is going to present a conclusion that
differs from that in the d-rule it is quizzing about, it should not state the
premise as a multiple choice. Also, it would be nonsensical to state both
the premise and action in multiple-choice form. (This would be a matching
question--it is treated as another question type.) Clause 3 of this t-rule is
necessary because it is nonsensical to make a multiple-choice question when
the only choices are true and false.
As can be seen here, the choice of a question type is based on purely
logical properties of the rule and interactions among question formats.
About 20 question types (combined premise/conclusion formats) are pos-
sible in the current implementation.

26.6Concluding Remarks

Wehave argued in this chapter that it is desirable to add teaching expertise


and other levels of domain knowledge to MYCIN-like expert programs if
they are to be used for education. Furthermore, it is advantageous to pro-
vide a flexible framework for experimenting with teaching strategies, for
we do not know the best methods for presenting MYCIN-Iike rules to a
student.
The framework of the GUIDONprogram includes knowledge of dis-
course patterns and the means for determining their applicability. The
discourse patterns we have codified into procedures permit GUIDONto
carry on a mixed-initiative, goal-directed case method dialogue in multiple
domains. These patterns are invoked by tutoring rules, which are in turn
controlled by a communication model. The components of this model are
a lesson plan (topics the tutor plans to discuss), an overlay model (domain
knowledge the tutor believes is being considered by the student), and
focus record (topics recently mentioned in the dialogue). Finally, we ob-
served that meta-knowledge about the representation and use of domain
rules made it possible to use these rules in a variety of ways during the
dialogue. This is important because GUIDONs capability to reason flexibly
about domain knowledge appears to be directly related to its capability to
guide the dialogue in multiple, interesting ways.
Furthermore, we have augmented the performance knowledge of MY-
CIN-like systems by making use of support knowledge and meta-level ab-
stractions in the dialogue. The problem-solving trace provided by the in-
terpreter is augmented by GUIDONto enable it to plan dialogues (by
ConcludingRemarks 489

looking ahead to see what knowledge is needed to solve the problem) and
to carry on flexible dialogues (by being able to switch the discussion at any
time to any portion of" the AND/OR solution tree).
Early experience with this program has shown that the tutor must be
selective about its choice of topics if the dialogues are not to be overly
tedious and complicated, That is, it is desirable for tutorial rules to exert
a great deal of control over which discourse options are taken. Webelieve
that it is chiefly in selection of topics and emphasis of discussion that the
"intelligence" of this tutor resides.
PART NINE

Augmenting the Rules


27
Additional Knowledge
Structures

We have so far described MYCINlargely in terms of its knowledge base


and inference mechanism, and specifically in terms of rules and a rule
interpreter that allow high-performance problem solving. In Chapters 27
through 29 we describe additional knowledge structures that increase the
flexibility and transparency of MYCINsknowledge base. Werefer to many
of these as meta-level knowledge.
When we speak of meta-level knowledge we mean nothing more than
knowledge about knowledge. In a computer program it needs to be rep-
resented and interpreted in order to be useful, but the main idea is that
it can be an explicit, and flexible, element of expertise. For example, meta-
level knowledge can help in modifying an existing rule and in integrating
the modification into the whole rule set because it provides additional in-
formation about the existing rules to the editor.
The ideas for using meta-level knowledge in MYCINgrew out of sev-
eral projects that Randy Davis was working on in the mid-1970s. In the
context of knowledge acquisition, we had found that the simple rule editor
needed more knowledge about the structure and contents of the rules and
about the representations of objects (contexts). In the context of explana-
tion, we found that the predicates (such as SAME)used in rules could
matched to keywords in questions much more easily if the structure of the
predicates were known to MYCIN.And, in the context of controlling
MYCINsinferences, we saw that rules about MYCINsrules could provide
an element of control. Davis was working on solutions to these problems
and saw that the commonthread that bound these different parts of the
TEIRESIASsystem together was meta-level knowledge.
Our first instances of domain-independent meta-level reasoning were
(a) the unity path mechanism, by which MYCINchecks for a chain
inferences known to be true with certainty (CF = 1.0) before evaluating
other rules, and (b) the preview mechanism, by which MYCIN looked over
the clauses of a rule before exhaustively evaluating them to see if the
conjunction of premise clauses was already falsified by virtue of any clause

493
494 AdditionalKnowledge
Structures

already known to be false (or not "true enough"). In both instances, MY-
CIN is reasoning about its rules before executing them. The important
difference between these mechanisms and the meta-knowledge that
evolved from work by Davis is that the former are buried in the code of
the rule interpreter and thus are not open to examination by other parts
of the system, or by the user. After these initial meta-level reasoning tech-
niques were added to the rule interpreter, however, Davis was careful to
separate any additional meta-level knowledge structures from the editor,
explanation generator, and interpreter, just as we had done with the (ob-
ject-level) medical knowledge. As a result, the new system (MYCINplus
TEIRESIAS) contains considerably more knowledge about its own knowl-
edge structures than did MYCINalone. Many of these ideas have subse-
quently been incorporated into EMYCIN.Chapter 28 provides a summary
of the knowledge structures used by TEIRESIASfor knowledge acquisition
(see Chapter 9) and control of MYCINsinferences. This was a line
development that was not anticipated in DENDRAL, 1 and its systematic
treatment by Davis in his dissertation was an advance for AI.
Bill Clancey was working on GUIDON at about the same time and was
discovering that additional knowledge structures, including meta-level
knowledge, were essential for tutoring. TEIRESIASknowledge about the
form and contents of MYCINsrules was certainly helpful in constructing
GUIDON,but Clancey began focusing more on representing MYCINs
strategies. In the course of his research, he also uncovered the importance
of two additional kinds of knowledge: knowledge about the structure of the
domain (and thus about the structure of the rule set), and support knowl-
edge that justifies individual rules. Chapter 29 is a careful analysis of these
three types of meta-level knowledge that Clancey terms "strategic, struc-
tural and support knowledge." This analysis was written in 1981-1982 (and
published in 1983) and thus is a recent critique of the structure of MYCINs
knowledge base. Wewere not unaware of many of the issues raised here,
but Clancey provides a coherent framework for thinking about them.

27.1The Context Tree

In the original (1974) version of MYCIN,several knowledge structures had


already been added to the basic rule representation, as discussed in Chap-
ter 5. Most notable amongthese was the context tree, in which we encoded
knowledge about relations among the objects mentioned in rules. The dis-
cussion here is taken from the EMYCIN manual (van Melle et al., 1981)
and explains this important structure in more detail.

IWeused the term Meta-DENDRAL to refer to the programthat inferred newknowledge


for DENDRAL,
but we did not havea well-developedconceptof knowledgeabout knowledge.
TheContextTree 495

As described in Chapter 15, an EMYCINknowledge base is composed


of factual knowledge about the domain and production rules that control
the consuhation interaction and make inferences about a case. Of all the
structures the expert must specify for an EMYCIN system, the context tree
is perhaps the most important, yet the least discussed. The context tree
forms the backbone of the consultant, organizing both the conceptual
structure of the knowledge base and the basic flow of the consultation
interaction. The tree also indicates the goals for which the consultant will
initially attempt to determine values. Since the principles for designing
new context trees are poorly understood, this discussion provides examples
from various existing EMYCIN systems.
The context tree is composed of at least one, but possibly many, con-
text-types. A context-type corresponds to an actual or conceptual entity in
the domain of the consultant, e.g., a patient, an aircraft, or an oil well.
Each context-type in the context tree is very muchlike a record declaration
in a traditional programminglanguage. It describes the form of all of its
instances created during a case. Thus there are two related but distinct
aspects of the context tree mechanism: a static tree of context-types and a
dynamic tree of context-instances. The static tree of types is the structure
defined by the expert during system construction and forms the knowledge
base "core."
The static tree is used to guide the creation of the dynamic context
tree of instances during the consultation. These instances are also orga-
nized into a tree that has a form reflecting the structure of the static hier-
archy. Wedistinguish these two structures by referring to them as the static
tree and the instance tree. A moderately complex example of each of these
types of trees tor the SACONsystem is given in the Figures 27-1 and
27-2. In these and later figures, the links, or relationships, amongcontext-
types are labeled to show different uses of the tree.
Each knowledge base has one main, or root, context-type for which
there will be a single instance for each consultation. It corresponds to the
main subject of the consultation. In MYCIN,for example, the main con-
text-type is PATIENT,and consultation provides advice about disease(s)
the patient. In SACON,the main context-type is STRUCTURE,and a
consultation gives advice about performing structural analysis on a struc-
ture (such as a bridge or an airplane wing).
Some domains are simple enough that no other context-types are
needed. PUFF, for example, needed only attributes of the main context
PATIENT. However, other systems, such as MYCINand SACON,require
the ability to discuss multiple objects. In these cases, the context-types are
organized into a simple tree structure with the main context at the root.
For each context-type that is subordinate to another context-type there is
an implicit one-to-many relationship betwec;,, the instances of each type
created during a consultation. Thus, for SACON,there can be many SUB-
STRUCTUREinstances tor the single STRUCTUREinstance during a
case, and there can be several LOADING instances for each SUBSTRUC-
496 AdditionalKnowledge
Structures

ISTRUCTURE
]

composed-of

I SUBSTRUCTURE
I

l applied
-to

1 LOADINGI

composed-of

I LOADCOMPONENT
l

FIGURE
27-1 SACONs
static tree of context-types.

TUREinstance. It should be noted that, except for the root-type, every


possible context-type need not be instantiated during a consultation. In
the MYCIN system, for example, the patient may or may not have had any
prior drug therapy.
The static tree is the major repository of structural and control infor-
mation about the consultant. It indicates, in particular, the possible param-
eters of a context (its PARMGROUP) and the groups of rules that can
applied to instances of a context (its RULETYPES). Hence, the context-
types must be defined before one can proceed to acquire rules and param-
eters, since both of these are defined with respect to the context tree. In
addition, the static relationships amongthe context-types dictate, in large
part, the basic mechanism for the propagation of the dynamic tree of
instances during a consultation (see Chapter 5).
All of the rules used by the consultant to reason about the domain are
written without regard to specific context-instances in an actual consulta-
tion. A rule instead refers to parameters of certain context-types, and the
rule is applied to all the context-instances for which its parameter group
is relevant. For example, a rule that concludes about a parameter of a
LOADING,say FORCE-BOUND, will be applied to all instances of LOAD-
ING, as shown in Figure 27-2 (e.g., LOADING-l, LOADING-2)and may
or may not succeed within each instance depending on whether its premise
is true in that particular context. In addition, if a rule refers to a specific
context-type, its premise can refer to the parameters of any direct ancestors
of this context-type. Continuing with our example, the rule premise could
refer to parameters of any SUBSTRUCTURE and of" the STRUCTURE
497

0
498 Additional Knowledge Structures

itself. The instance tree organization makes clear which LOADING in-
stances are associated with which SUBSTRUCTURE instance.
If a rule is applied to some context-instance and uses information
about context-instances lower in the tree, however, an implicit iteration oc-
curs: the rule is applied to each of" the lower instances in turn. If" the lower
context-types have not yet been instantiated, the program digresses to ask
about their creation at this time. Thus contexts are instantiated because
rules need them, 2 just as parameters are traced when rules need them. In
fact, since the goals of the consultation usually consist of finding out some-
thing about the root of the tree, the only way that lower context-types are
instantiated at all is through the application of rules that use information
about lower context-types.

27.1.1 Uses of the Context Tree

There have been a few rather stereotypic uses of the context tree. Although
experience to date has by no means exhausted the possible uses, the ex-
amples shown here should help readers to understand how an expert and
knowledge engineer might select appropriate context-types and organize
them in a new domain.
The primary use of additional contexts has been to structure the data or
evidence to be collected. Thus, in the MYCIN system, the culture contexts
describe the tests performed to isolate organisms. Additional information
about the patients current and previous therapies, the cultures, and
MYCINsown estimation of the suspected infections are also represented
in the tree. The current context organization for MYCIN is shown in Figure
27-3 and should be contrasted with the sample instance tree of Figure
5-1 ~
(which reflects MYCINscontext-types as they were defined in 1974).
The second major use of the context tree has been to organize the
important components of some object. For example, in the SACON system the
substructures of the main structure correspond to components or regions
of the object that have some uniform property, typically a specific geometry
or material. Each substructure instance is considered independently, and
conclusions about individual responses to stress loadings are summarized
on the structure level to provide a "global" sense of the overall response
of the structure. A recent, additional example of this use of a part-whole
hierarchy is found in a system called LITHO(Bonnet, 1979), which inter-
prets data from oil wells. In this system, each well is decomposedinto a
number of zones that the petrologist can distinguish by depth (Figure
27-4).
A context need not correspond to some physical object but may be an
abstract entity. However, the relationships amongcontexts are explicitly

~Contexts mayalso be instantiated by explicit comma~d,but the mechanismis less convenient.


:~It is instructive to comparethis structure with the original context tree described in Chapter
5; the MYCIN system has undergone at least three intermediate reorganizations of its static
tree. Significantly, however, the kin(L~ of objects in the tree have not changed substantially.
499
500 AdditionalKnowledge
Structures

composed.of

FIGURE
27-4 LITHOsstatic tree and an instance tree.

fixed by the tree of context-types. For this reason, physical objects, repre-
sented in this part-whole fashion, lend themselves more readily to the current
context tree mechanism.
The last major use of the context tree, which is closely related to the
part-whole use described above, has been to represent important events or
situations that happen to an object. Thus, in the SACON system, a LOAD-
ING describes an anticipated scenario or maneuver (such as pounding or
braking) to which the particular SUBSTRUCTURE is subjected. Each
LOADING,in turn, is composed of a number of independent LOAD-
COMPONENTS, distinguished by the direction and intensity of the ap-
plied force. Other uses of this organizational idea have been to represent
individual past PREGNANCIES and current vISITS of a pregnant woman
in the GRAVIDA system of Catanzarite (unpublished; see Figure 27-5) and
the anticipated use of BLEEDING-EPISODES of a PATIENTin the CLOT
4system (Figure 27-6; see also Chapter 16).
The primary reason for defining additional context-types in a consul-
tant is to represent multiple instances of an entity during a case. Some
users may like to define context-types that always have one instance and
no more, primarily for purposes of organization, but this is often unnec-
essary (and even cumbersome). 5 For example, one might want to write
rules that use various attributes of a patients liver, but since there is always
exactly one liver for a patient there is no need to have a liver context; any
attribute of the liver can simply be viewed as an attribute of the patient.
Reference to parameters of contexts in different parts of an instance
tree is currently very awkward. For example, in MYCIN,a particular drug
may be associated somehowwith a particular organism (Figure 27-7). How-
ever, this relationship between context-instances is not one that always holds

4It should be noted that use of the context mechanism to handlesequential visits in the
GRAVIDA systemis experimentaland required the definition of numerous additional func-
tions for this purpose.Theyare not currently in EMYCIN.
5Note,however,that separatinguniqueconceptsout into single contextsmayprovidemore
understandablerule translations dueto the conventionsof context-name
substitutionsin text
generation.See Chapter18 for further discussionof this point.
501
502
GrainSize of Rules 503

between all organisms and all drugs: not all drugs are prescribed to treat
all identified organisms. This "prescribed for" relationship cannot be stated
statically, independently of the case. Special predicate and action functions
must be written to establish and manipulate these kinds of relationships
between instances. It is best to avoid these interactions between disjoint
parts of the tree during the initial design of the knowledgebase.
Summingup our experience with this mechanism and considering its
relative inflexibility, we offer this final caveat: for an initial system design,
those using EMYCINshould start small and should use only one or two
context-types. They should plan the structure of the consultants context
tree carefully before running the EMYCIN system, since restructuring a
context tree is perhaps the most difficult and time-consuming knowledge-
base construction task. Indeed, restructuring the context tree implies a
complete restructuring of the rest of the knowledge base.

27.2Grain Size of Rules

Wehad noticed that MYCINsknowledge is "shallow" in the sense that its


rules encode empirical associations but not theoretical laws. MYCIN lacks
explicit representations of the "deep" understanding, such as an expert
has, of" causal mechanisms and reasoning strategies in medicine. MYCINs
rules do include somecausal relations and definitions as well as structural
relations, but all these are not cleanly separated from the heuristics and
"compiled knowledge" that make up most of the rule set.
Whenwe were building the initial system, we recognized that many
rules were "broad-brush" treatments of complex processes, skipping from
A to E in one leap and omitting any mention of B, C, and D in a chain
such as A --, B -~ C ~ D ~ E. We were focusing on rules whose "grain
size" was of clinical significance. Even though finer-grained rules were often
discussed, we consciously omitted them if the finer distinctions would not
improve the programs ability to suggest appropriate treatments for infec-
tions or if they would not improve the understandability of the program
for clinicians. 6 That is, the clinical significance of the conclusions deter-
mined the vocabulary of the rules. Thus, from the standpoint of perfor-
mance, many causal mechanisms were not needed for reasoning from evi-
dence to appropriate conclusions.
Examples of this collapsing of inference steps abound in all domains.
For instance, physicians generally use a diuretic, such as furosemide, to
treat edema or congestive heart failure without thinking twice about it. It
is typically only whena patient fails to respond that the physician considers
the mechanismof the drugs action in order to find, perhaps, another drug

6Notethat physicianswill be able to understandrules that medicalstudentssometimes


find
confusing.SeeChapter20for a further discussionof the grain size of rules.
504 AdditionalKnowledge
Structures

to give with the first in order to produce the desired effect. Or, in a
nonmedical domain, a mechanic often makes adjustments in response to
manifestations of an automobile problem (e.g., adjusting the carburetor in
response to stalling) and considers more detail only if the first few adjust-
ments fail. An example from MYCINis cited by Clancey in Chapter 29,
in his discussion of the tetracycline rule: "If the patient is less than 8 years
old, dont prescribe tetracycline." This rule lacks ties to the deeper under-
standing of drug action of which it is a consequence. Thus it is not only
difficult for a student to remember, but also difficult for one to knowhow
to modify or to know exactly how far the premise clause can be stretched
safely.
Wealso recognized that many of the attributes mentioned in rules are
not primitive observational terms in the same sense that values of labora-
tory tests are. For example, MYCIN asks whether a patient is getting better
or worse in response to therapy, just as it asks for serum glucose levels.
Obviously, there are a number of rules that could be written to infer
whether the patient is better, mentioning such things as change in tem-
perature, eating habits, and general coloring. That is, we chose a rule of
the form A -~ B, with A as a primitive, rather than several rules in the
fbllowing form:

AI-,A
A2--, A

An --~ A

A -,B

Neither of these shortcuts is a fatal flaw in the methodology of rule-


based systems. Expanding the rule set to cover the richer knowledge phy-
sicians are known to hold would be possible, but time-consuming and un-
necessary for improving MYCINsadvice in consultations. The consultation
program, after all, was designed for use by physicians, and it seemed rea-
sonable to leave some of the more basic observations up to them. However,
as a result, there is considerable knowledge absent from MYCIN.As men-
tioned in Part Eight, successful tutoring depends on deep knowledge even
more than successful consulting does.

27.3Strategic, Structural, and Support Knowledge

The missing knowledgeis of" three classes: strategic, structural, and sup-
port. Strategic knowledgeis an important part of expertise. MYCIb~sbuilt-
in strategy is cautious: gather as much evidence as possible (without de-
Strategic, Structural, and Support Knowledge 505

manding new tests) for and against likely causes and then weigh the evi-
dence. Operationally, this translates into exhaustive rule invocation
whereby (a) all (relevant) rules are tried and (b) all rules whose left-hand
sides match the case (and whose right-hand sides are relevant to problem-
solving goals) have their right-hand sides acted upon. But under different
circumstances, other strategies would be more appropriate. In emergen-
cies, for example, physicians cannot take the time to gather much history
data. Or, with recurring illness, physicians will order new tests and wait
for the results. Deciding on the most appropriate strategy depends on
medical knowledge about the context of the case. MYCINscontrol struc-
ture is not concerned with resource allocation; it assumes that there is time
to gather all available information that is relevant and time to process it.
Thus MYCINasks 20-70 questions and processes 1-25 rules between
questions. We estimate that MYCINexecutes about 50 rules per second
(exclusive of I/O wait time). With larger amounts of data or larger numbers
of rules, the control structure would need additional meta-rules that esti-
mate the costs of" gathering data and executing rules, in order to weigh
costs against benefits. Also, in crisis situations or real-time data interpre-
tation, the control structure would need to be concerned with the allocation
7of resources.
One way to make strategic knowledgeexplicit is by putting it in meta-
rules, as discussed in Chapter 28. They are rules of the same IF/THEN
form as the medical rules, but they are "meta" in the sense that they talk
about and reason with the medical rules. One of the interesting aspects of
the meta-rule formalism, as Davis designed it, is that the same rule inter-
preter and explanation system work for meta-rules as for object-level rules.
(Chapter 23 discussed the use of prototypes, or frames, for representing
much of" the same kind of knowledge about problem solving.) Making
strategy knowledge explicit has come to be recognized as an important
design consideration for expert systems (Barnett and Erman, 1982; de
Kleer et al., 1977; Genesereth, 1981; Patil et al., 1981) because it can make
a systems reasoning more efficient and more understandable.
Structural knowledge in medicine includes anatomical and physiolog-
ical information about the structure and function of the body and its sys-
tems. ~ It is part of what we believe is needed for "deeper" reasoning about
diagnosis. A structural model showing, inter alia, the normal connections
of" subparts can be used for reasoning about abnormalities. In contrast,
representing this information in rules would force explicit mention of the

7111 the AMand EURISKOprograms (Lenat, 1976; 1983), Lenat has added information
about nlaximunl amounts of time to spend on various tasks, which keeps those programs
fiom "overspending" computer time on difficult tasks of low importance. (EURISKOcan also
decide to change those time allocations.) In PROSPECTOR (Duda et al., 1978a), attention
tocused on the rules that will add the most information, i.e., that will most increase or decrease
the probability of the hypothesis being pushed. In Foxs system (Fox, 1981), the estimated cost
of evaluatiqg premises of rules helps determine which rules to invoke.
8More generally, we want to talk about the structure of any system or device we want an
expert system to analyze, such as electronic circuits or automobiles.
506 AdditionalKnowledge
Structures

abnormal situations and their manifestations. Thus there is a saving in the


number of items represented explicitly in a rich structural model as op-
posed to an equally rich rule set. In medicine this point has been made by
the Rutgers group (Kulikowski and Weiss, 1971) in the context of the
CASNETprogram for diagnosing glaucomas. More recently, it is being
advanced by Patil et al. (1981), Kunz (1983), Pople (1982), and others.
the domain of" electronics almost everyone has noticed that a circuit dia-
gram and causal knowledge are powerful pieces of knowledge to have [see,
for example, Brownet al. (1974), Davis et al. (1982), Genesereth (1981),
Grinberg, (1980)]. Structural knowledge also includes knowledge about the
structure of the domain, e.g., the taxonomy of important concepts. This
structure is an important reference point [or guiding the problem solver
in writing strategy rules.
Support knowledge includes items of information that are relevant for
understanding a rule (or other knowledge structure). In early versions
MYCIN,we attached extra information to rules as justification for them
or as historical traces of their evolution. For example, the literature citations
provide credibility as well as pointers to more detailed information. The
names of the persons who authored or edited a rule and the dates when it
was created or edited are important pointers to persons responsible for
the interpretation of the literature. The slot called "Justification" was cre-
ated as a repository for the authors comments about why the rule was
thought to be necessary in the first place. Additional support for a pro-
grams knowledge comes from deeper theoretical knowledge. Quantum
chemistry, for example, could have been (but was not) referenced as sup-
port for DENDRALsrules of mass spectrometry; pharmacology could
have been (but was not) referenced to support MYCINsrules of drug
therapy. In general, support knowledge filrther explains the facts and re-
lations of the domain knowledge. The contexts of tutoring and explanation
demonstrate the need for support knowledge better than does the context
of consultation because the additional support fi)r rules is more relevant
to understanding them than to using them (see Part Eight).
Recently, we have shifted our focus tor this line of" work from MYCIN
to NEOMYCIN (Clancey and Letsinger, 1981), an updated version of the
MYCIN knowledge base, representation, and control structure. In brief, it
separates the diagnostic strategies clearly from the medical rules and [acts
used for diagnosing individual cases. By doing this, it can better serve as
a basis for tutoring, as discussed in Chapter 26. NEOMYCIN was under-
taken because of the issues noted in the tollowing two chapters, but it is
still too early to draw conclusions from the work.
28
Meta-Level Knowledge

Randall Davis and Bruce G. Buchanan

This chapter explores a number of issues involving representation and use


of what we term meta-level knowledge, or knowledge about knowledge.1 It
begins by defining the term, then exploring a few of its varieties and con-
sidering the range of capabilities it makes possible. Four specific examples
of meta-level knowledge are described, and a demonstration given of their
application to a number of problems, including interactive transfer of ex-
pertise and the "intelligent" use of knowledge. Finally, we consider the
long-term implications of the concept and its likely impact on the design
of large programs. The context of this work is the TEIRESIASprogram
discussed in Chapter 9. In the earlier chapter we focused on the use of
TEIRESIASfor knowledge acquisition. Here we focus on the classification
and types of knowledge used by TEIRESIAS.
In the most general terms, meta-level knowledge is knowledge about
knowledge. Its primary use here is to enable a program to "know what it
knows," and to make multiple uses of its knowledge. As mentioned in
Chapter 9, the program is not only able to use its knowledge directly, but
mayalso be able to examine it, abstract it, reason about it, or direct its
application.
This chapter discusses examples of meta-level knowledge classified
along two dimensions: (i) specificity character (representation-specific vs. do-
main-specific), and (ii) source (user-supplied vs. derived). Representation-spe-
cific meta-level knowledge involves supplying a program with a store of
knowledgedealing with the form of its representations, in particular, their
design and organization. Traditionally, this design and organization infor-

This chapter is an expanded and edited version of a paper originally appearing in Proceedings
of the Fifth IJCAL 1977, pp. 920-928. Used by permission of International Joint Conferences
on Artificial Intelligence, Inc.; copies of the Proceedings are available from William Kaufmann,
Inc., 95 First Street, Los Altos, CA94022.
IFollowing standard usage, knowledge about objects and relations in a particular domain will
be referred to as object-level knowledge.

507
508 Meta-LevelKnowledge

I. Knowledgeabout contents of rules in the knowledgebase--Rule Models


II. Knowledgeabout syntax
Of the representation of objects--Schemata
Of predicate functions--Function Templates
III. Knowledgeabout strategies--Meta-Rules

FIGURE28-1 Classification of meta-level knowledgein


TEIRESIAS.

mation is present in a system only implicitly, for example, in the way a


particular segment of code accesses data or the way a chunk of knowledge
is encoded. Type declarations are a small step toward more explicit speci-
fication of this information, especially as they are used in extended data
types and record structures. As we discuss below, this sort of information,
along with a range of other facts about representation design, can be em-
ployed quite usefully if it is madeexplicit and madeavailable to the system.
Domain-specific meta-level knowledge contains information dealing
with the content of object-level knowledge, independent of its particular
encoding. It might involve any kind of useful information about a chunk
of knowledge, including its likely utility, range of applicability, speed or
space requirements, capabilities, and side effects. The two examples given
here deal with forms of meta-level knowledge that (i) offer information
about global patterns and trends in the content of object-level knowledge,
and (ii) provide strategic information, i.e., knowledge about how best
use other knowledge.
The examples described below also illustrate the difference between
user-supplied and derived meta-level knowledge. The former is of course
obtained from the user; the latter is derived by the system on the basis of
information it already has. The user-supplied variety is used as a source
for knowledge that the system could not have deduced on its own; the
derived form allows the system to uncover useful characteristics of the
knowledge base and to make maximal use of knowledge it already has.
As will become clear below, meta-level knowledge makes possible a
number of interesting capabilities. The representation-specific variety sup-
ports knowledge acquisition, provides assistance on knowledge base main-
tenance, and makes possible multiple distinct uses of a single chunk of
knowledge. The domain-specific type provides a site for embedding infor-
mation about the most effective use of knowledge and can have a signifi-
cant impact on both the efficiency displayed by a system and its level of
performance. The examples also demonstrate that the source of the meta-
level knowledge has an impact on system performance. In particular, the
derived variety is shown to make possible a very simple but potentially
useful form of closed-loop behavior.
Weexamine below the four instances of meta-level knowledge used by
TEIRESIAS(shown in Figure 28-1) and review for each (i) the basic idea,
Rule Models 509

explaining whyit is a form of meta-level knowledge; (ii) a specific instance,


detailing the information it contains; (iii) an example of how that infor-
mation is used to support knowledge base construction, maintenance, or
use; and (iv) the other capabilities it makes possible, including a limited
form of self-knowledge.

28.1Rule Models

28.1.1 Rule Models as Empirical Abstractions of the


Knowledge Base

As described in Chapter 9, a rule model is an abstract description of a


subset of rules, built from empirical generalizations about those rules. It
is used to characterize a "typical" memberof the subset and is composed
of four parts. First, a list of examples indicates the subset of rules from
which this model was constructed.
Next, a description characterizes a typical memberof the subset. Since
we are dealing in this case with rules composed of premise-action pairs,
the description currently implemented contains individual characteriza-
tions of a typical premise and a typical action. Then, since the current
representation scheme used in those rules is based on associative triples,
we have chosen to implement those characterizations by indicating (a)
which attributes "typically" appear in the premise (and in the action) of
rule in this subset and (b) correlations of attributes appearing in the prem-
ise (and in the action). ) Note that the central idea is the concept of char-
acterizing a typical memberof the subset. Naturally, that characterization looks
different for subsets of rules than it does for procedures, theorems, frames,
etc. But the main idea of characterization is widely applicable and not
restricted to any particular representational formalism.
The two remaining parts of the rule model are pointers to models
describing more general and more specific rule models covering larger or
smaller subsets of rules. The set of models is organized into a number of
tree structures, each of the general form shown in Figure 28-2. This struc-
ture determines the subsets for which models will be constructed. At the
root of each tree is the model made from all the rules that conclude about
<attribute>; below this are two models dealing with all affirmative and all
negative rules; and below this are models dealing with rules that affirm or
deny specific values of the attribute. There are several points to note here.
First, these models are not hardwired into the system, but are instead
formed by TEIRESIASon the basis of the current contents of the knowl-
edge base. Second, whereas the knowledge base contains object-level rules
about a specific domain, the rule models contain information about those

2Both of these are constructed via simple statistical thresholding operations.


510 Meta-LevelKnowledge

(attribute)

(attribute)-ia (attribute).isnt

(att ribute>-is-X <att ribute)-is-Y <attribute)-isnt-X <att ribute>-iant-Y

FIGURE
28-2 Organization of the rule models.

rules, in the form of empirical generalizations. As such, they offer a global


overview of the regularities in the rules. The rule models are thus an
example of derived, domain-specific meta-level knowledge.

28.1.2 Rule Model Example

Figure 28-3 shows an example of a rule model, one that describes the
3subset of rules concluding affirmatively about the area for an investment.
(Since not all details of implementation are relevant here, this discussion
will omit some.) As indicated above, there is a list of rules from which this
model was constructed, descriptions characterizing the premises and ac-
tions, and pointers to more specific and more general models. Each char-
acterization in the description is shownsplit into its two parts, one con-
cerning the presence of individual attributes and the other describing
correlations. The first item in the premise description, for instance, indi-
cates that "most" rules about the area of investment mention the attribute
RETURNRATE in their premises; when they do mention it, they "typi-
cally" use the predicate functions SAMEand NOTSAME;and the
"strength," or reliability, of this piece of advice is 3.83.
The fourth item in the premise description indicates that when the
attribute RETURNRATE (rate of return) appears in the premise of a rule
in this subset, the attribute TIMESCALE "typically" appears as well. As
before, the predicate functions are those usually associated with the attri-
butes, and the numberis an indication of reliability.

28.1.3 Use of Rule Models in Knowledge Acquisition

Use of the rule models to support knowledge acquisition occurs in several


steps. First, as noted in Chapter 9, our model of knowledge acquisition is
one of interactive transfer of expertise in the context of a shortcoming in

3These examples were generated by substituting investment terms for medical terms in ex-
anaples from TEIRESIAS using MYCINs medical knowledge.
Rule Models 511

MODELFORRULESCONCLUDING
AFFIRMATIVELYABOUTINVESTMENT
AREA

EXAMPLES ((RULE116.33)
(RULE050.70)
(RULE037.80)
(RULE095.90)
(RULE152
1.0)
(RULE140
1.0))

DESCRIPTION
PREMISE ((RETURNRATESAMENOTSAME3.83)
(TIMESCALE
SAMENOTSAME3.83)
(TRENDSAME2.83)

((RETURNRATESAME)(TIMESCALE
SAME)
((TIMESCALESAME)(RETURNRATE
SAME)
((BRACKETSAME)(FOLLOWS NOTSAME
SAME)(EXPERIENCE
SAME)

ACTION ((INVESTMENT-AREA
CONCLUDE
4.73)
(RISK CONCLUDE
4.05)

((INVESTMENT-AREA
CONCLUDE)
(RISK CONCLUDE)
4.73))

MORE-GENL (INVESTMENT-AREA)

MORE-SPEC (INVESTMENT-AREA-IS-UTILITIES)

FIGURE28-3 Exampleof a rule model.

the knowledge base. The process starts with the expert challenging the
system with a specific problem and observing its performance. If the expert
believes its results are incorrect, there are available a numberof tools that
will allow him or her to track down the source of the error by selecting
the appropriate rule model. For instance, if the problem is a missing rule
in the knowledge base to conclude about the appropriate area for an in-
vestment, then TEIRESIASwill select the model shown in Figure 28-3 as
the appropriate one to describe the rule it is about to acquire. Note that
the selection of a specific model is in effect an expression by TEIRESIAS
of its expectations concerning the new rule, and the generalizations in the
model become predictions about the likely content of the rule.
At this point the expert types in the new rule (Figure 28-4), using the
vocabulary specific to the domain. (In all traces, computer output is in
mixed upper and lower case, while user responses are in boldface capitals.)
As mentioned in Chapter 9 and further described in Chapter 18, En-
glish text is understood by allowing keywords to suggest partial interpre-
tations and intersecting those results with the expectations provided by the
selection of a particular rule model. Wethus have a data-directed process
(interpreting the text) combined with a goal-directed process (the predic-
tions made by the rule model). Each contributes to the end result, but it
is their combination that is effective. TEIRESIASdisplays the results of
512 Meta-Level Knowledge

Thenewrule will becalledRULE383


If: 1 - THECLIENTS INCOMETAXBRACKETIS 50%
and 2 - THECLIENTIS FOLLOWING UP ONMARKET TRENDS
CAREFULLY
and 3-
Then:1 - THERE IS EVIDENCE
(,8) THATTHEINVESTMENTAREA
SHOULD
BEHIGH
TECHNOLOGY
and 2-
Thismaytakea fewmoments...

FIGURE28-4 New rule as stated by the expert.

this initial interpretation of the rule (Figure 28-5). If there are mistakes
(as there are in this case), a rule editor is available to allow the expert
indicate necessary changes. This is easily accomplished, since TEIRESIAS
can often make an effective second choice by determining the likely source
of error in its initial guess.
Once the expert is satisfied that TEIRESIAS has correctly understood
what was said, it is the systems turn to see if it is satisfied with the content
of the rule. The main idea is to use the rule model to see how well this
new rule "fits into" the systems model of its knowledge--i.e., does it "look
like" a typical rule of the sort expected?
If the expert agrees to the inclusion of a new clause, TEIRESIAS
attempts to create it (Figure 28-6). The system relies on the context of the
current dialogue (which indicates that the clause should deal with the
amount of the clients investment experience) and the fact that the rule
must work for this case or it wont fix the bug (it is not shown here, but
earlier in the interaction the expert indicated that the client had a moderate
amount of experience). TEIRESIAS guess is not necessarily correct, of
course, since the desired clause may be more general, but it is at least a
plausible attempt.
It should be noted that there is nothing in this concept of "second-
guessing" that is specific to the rule models as they are currently designed,
or indeed to associative triples of rules as a knowledge representation. The
most general and fundamental point was mentioned above--testing to see
how something "fits into" the systems model of its knowledge. At this point

Thisis myunderstanding
of your
rule:
RULE383
IF: 1) Theclientsincome-taxbracket
is 50%,and
2) Themarkethasfollowed a upward
trendrecently,
and
3) Theclientmanages hisassetscarefully
THEN:
Thereis evidence(.8) that theareaof theinvestment
should
behigh-technology

FIGURE28-5 TEIRESIAS first interpretation of the rule


shown in Figure 28-4.
Rule Models 513

I hateto criticize, Randy, but did youknowthat mostrules aboutwhatthe areaof investment
mightbe, that mention-
the income-taxbracketof the client, and
howcloselythe client follows the market
ALSOmention-
[A] - the amount of investment experienceof the client
ShallI try to writea clauseto account for [A]?
++** y
Howabout-
[A] Theamount
of investment
experience
of the client is moderate
Ok?
++**y

FIGURE 28-6 TEIRESIAS suggestion of an additional clause


to the newrule based on the rule modelshownin Figure 28-3.

the system might perform any kind of check for violations of any estab-
lished prejudices about what the new chunk of knowledge should look like.
Additional kinds of checks for rules might concern the strength of the
inference, the number of clauses in the premise, etc. In general, this "sec-
ond-guessing" process can involve any characteristic that the system may
have "noticed" about the particular knowledge representation in use.
Automatic generation of rule models has several interesting implica-
tions, since it makes possible a synthesis of the ideas of model-based un-
derstanding and learning by experience. While both of these have been
developed independently in previous AI research, their combination pro-
duces a novel sort of feedback loop: rule acquisition relies on the set of
rule models to effect the model-based understanding process; this results
in the addition of a new rule to the knowledge base; and this in turn
triggers recomputation of the relevant rule model(s).
Note, first, that performance on the acquisition of a subsequent rule
may be better, because the systems "picture" of its knowledge base has
improved--the rule models are now computed from a larger set of in-
stances, and their generalizations are more likely to be valid. Second, since
the relevant rule models are recomputed each time a change is made to
the knowledge base, the picture they supply is kept constantly up to date,
and they will at all times be an accurate reflection of the shifting patterns
in the knowledge base.
Finally, and perhaps most interesting, the models are not hand-tooled
by the system architect or specified by the expert. They are instead formed
by the system itself, and formed as a result of its experience in acquiring
rules from the expert. Thus, despite its reliance on a set of models as a
basis for understanding, TEIRESIASabilities are not restricted by a pre-
existing set of models. As its store of knowledge grows, old models can
become more accurate, new models will be formed, and the systems stock
of knowledge about its knowledge will continue to expand.
514 Meta-LevelKnowledge

28.2 Schemata

28.2.1 The Need for Knowledge About


Representations

As data structures go beyond the simple types available in most program-


ming languages to extended data types defined by the user, they typically
become rather complex. Large programs may have numerous structures
that are complex in both their internal organization and their interrela-
tionships with other data types in the system. Yet information about these
details may be scattered in comments in system code, in documents and
manuals maintained separately, and in the mind of the system architect.
This presents problems to anyone changing the system. Consider, for ex-
ample, the difficulties encountered in such a seemingly simple problem as
adding a new instance of an existing data type to a large program. Just
finding all of the necessary information can be a major task, especially for
someone unfamiliar with the system.
One particularly relevant set of examples comes from the numerous
approaches to knowledge representation that have been tried over the
years. While the emphasis in discussions of predicate calculus, semantic
nets, production rules, frames, etc., has naturally concerned their respec-
tive conceptual power, at the level of implementation each of these carries
problems of data structure management.
Our second example of meta-level knowledge, then, is of the repre-
sentation-specific variety and involves describing to a system a range of
information about the representations it employs. The main idea here is,
first, to view every knowledge representation in the system as an extended
data type and to write explicit descriptions of them. These descriptions
should include all of the information about structure and interrelations
that is often widely scattered. Next, we devise a language in which all of
this can be put in machine-comprehensible terms and write the descrip-
tions in those terms, making this store of information available to the sys-
tem. Finally, we design an interpreter for the language, so that the system
can use its new knowledge to keep track of the details of data structure
construction and maintenance.
The approach is based on the concept of a data structure schema, a device
that provides a framework in which representations can be specified. The
framework, like most, carries its own perspectives on its domain. One point
it emphasizes strongly is the detailed specification of manykinds of infor-
mation about representations. It attempts to make this specification task
easier by providing ways of organizing the information and a relatively
high-level vocabulary for expressing it.
Schemata 515

Schemahierarchy: indicates categories of representations and their organization

Individual schema:describes structure of a single representation

Slot names: (the schemabuilding blocks) describe implementationconventions

FIGURE28-7 Levels of knowledgeabout representations.

28.2.2 Schema Example

There are three levels of organization of the information about represen-


tations (Figure 28-7). At the highest level, a schema hierarchy links the
schemata together, indicating what categories of data structure exist in the
system and the relationships amongthem. At the next level of organization
are individual schemata, the basic units around which the information
about representations is organized. Each schema indicates the structure
and interrelationships of a single type of data structure. At the lowest level
are the slot names (and associated structures) from which the schemata are
built; these offer knowledge about specific conventions at the program-
ming language level. Each of these three levels supplies a different sort of
information; together they comprise an extensive body of knowledge about
the structure, organization, and implementation of the representations.
The hierarchy is a generalization hierarchy (Figure 28-8) that indicates
the global organization of the representations. It makes extensive use of
the concept of inheritance of properties, so that a particular schema need
represent only the information not yet specified by schemata above it in
the hierarchy. This distribution of information also aids in making the
network extensible.

ROOT

c VALUE-SCHEMA ATTRIBUTE-SCHEMA

INVSATTRIB-SCHEMA CLIENTSATTRIB-SCHEMA MARKETSATTRIB-SCHEMA

S~NGLESVAL-SCHEMA MULTIPLESVAL-SCHEMA TRUEFALSESVAL-SCHEMA

FIGURE 28-8 Part of the schema hierarchy.


516 Meta-Level Knowledge

Each schema contains several different types of information:

1. the structure of its instances,


2. interrelationships with other data structures,
3. a pointer to all current instances,
4. inter-schema organizational information, and
5. bookkeeping information.

Figure 28-9 shows the schema for a stock name; information corre-
sponding to each of the categories listed above is grouped together. The
first five lines in Figure 28-9 contain structure information and indicate
some of the entries on the property list (PLIST) of the data structure that
represents a stock name. The information is a triple of the form

<slot name> <blank> <advice>

The slot name labels the "kind" of thing that fills the blank and serves as
a point around which much of the "lower-level" information in the system
is organized. The blank specifies the format of the information required,
while the advice suggests how to find it. Someof the information needed
may be domain-specific, and hence must be requested from the expert.
But some of it may concern completely internal conventions of represen-
tation, and hence should be supplied by the system itself, to insulate the
domain expert from such details. The advice provides a way of indicating
which of these situations holds in a given case.

STOCKNAME-SCHEMA
PLIST [( INSTOF STOCKNAME-SCHEMA GIVENIT
SYNONYM(KLEENE(1 0) < ATOM ASKIT
TRADEDON(KLEENE(1 1 2) <(MARKET-INST
FIRSTYEAR-INST)>) ASKIT
RISKCLASSCLASS-INST ASKIT
CREATEIT]
RELATIONS ( (AND* STOCKNAMELIST
HILOTABLE)
(OR* CUMVOTINGRIGHTS)
(XOR* COMMONPFD CUMPFDPARTICPFD)
((OR* PFDCUMPFDPARTICPFD)PFORATETABLE)
((AND* CUMPFD)
OMITTEDDIVS)
INSTANCES (AMERICAN-MOTORS
AT&T... XEROXZOECON)
FATHER (VALUE-SCHEMA)
OFF-SPRING NIL
DESCR "the STOCKNAME-SCHEMA
describes the format for a stock name"
AUTHOR DAVIS
DATE 1115
INSTOF (SCHEMA-SCHEMA)

FIGURE 28-9 Schema for a stock name.


Schemata 517

The next five lines in the schema (under RELATIONS)indicate its


interrelations with other data structures in the system. The main point
here is to provide the system architect with a way of making explicit all of
the data structure interrelationships on which the design depends. Ex-
pressing them in a machine-accessible form makes it possible for TEIRE-
SIAS to take over the task of maintaining them, as explained below.
The schemata also keep a list of all current instantiations of themselves
(under INSTANCES), primarily for use in maintaining the knowledge
base. If the design of a data structure requires modification, it is convenient
to have a pointer to all current instances to ensure that they are similarly
modified.
The next two lines (FATHERand OFF-SPRING) contain organiza-
tional information indicating how the stock name schema is connected to
the schema hierarchy.
Finally, there are four slots for bookkeeping information to help keep
track of a large number of data structures: each structure is tagged with
the date of creation and author, along with a free-text description supplied
by the author. In addition, each structure has a pointer to the schema of
which it is an instance (note in this case that it is the schemaitself that is
the data structure being described by this information).

28.2.3 Use of Schemata in Knowledge Acquisition

Use of the schemata in knowledge acquisition relies on several ideas:

Information in the schema is viewed as a guide to creating a new instance


of the representation it describes.
That guidance is supplied by (a) the structure description information,
which is in the form of a prototype to be instantiated, and (b) the rela-
tions information, which is interpreted as pointers to a number of struc-
tures that may require updating to ensure that necessary data structure
interrelations are maintained.
It is this instantiation and interpretation process that drives the knowl-
edge transfer dialogue.
The advice present in the schema adds a level of sophistication to the
dialogue.

For instance, suppose in the process of adding a new rule to the system
the expert mentions a stock the system hasnt heard about yet. Learning
about the new stock (i.e., creating a new instance of the concept of stock
name) becomes a subproblem in the task of acquiring the new rule. The
schema for the concept of stock name is retrieved and used as the basis
for a dialogue that requests the necessary information from the expert.
An abbreviated version of the dialogue is shown in Figure 28-10.
518 Meta-Level Knowledge

Thenewrule will be called RULE384


If 1 - THE DESIREDRATEOF RETURN ONTHEINVESTMENTIS GREATERTHAN19%
2 - THEAMOUNT OF INVESTMENT EXPERIENCE
OF THE CLIENTIS MODERATE
3 - THE AREAOF THE INVESTMENT IS NATURALRESOURCE DEVELOPMENT
Then
1 - THEREIS EVIDENCE (.5) THATTHENAMEOF THESTOCKTO INVEST
IS GEORGIAPACIFIC
Sorry,but I dont understand
THEREIS EVIDENCE (.5) THATTHENAMEOF THESTOCK
TO INVEST
IS GEORGIA PACIFIC
becausethese wordsare unknown:GEORGIA PACIFIC
{TEIRESIASfinds it cannotgenerate
anyinterpretationsof the text, so it reprints the original fine and
specifieswherethe troubleis. Thismakesit possiblefor the expertto figure outwhatit is that is newto the
system.}
Wouldyoucare to try rephrasingthat?
++**y
(Pleaseuse*s to markwhatyouthink mightbe a newattribute or
newvalueof anexistingattribute.)
1 - THERE IS EVIDENCE
(.5) THATTHENAME
OF THESTOCK
TO INVEST
IN IS * GEORGIAPACIFIC*
{Whilethe languagein this exampleis very simple, there are morecomplexcaseswhereit wouldbequite
hardto decidefromthe contextwhatsort of newitem the experthadin mind.Toavoidthe necessityfor
moresophisticatednatural languagehandling,the expert is askedto markoff the newwordor phrasewith
asterisks.}
It looks as though
GEORGIA PACIFIC
refers to a newvalueof anattribute, correct?
++**y
... is it the nameof a stock?
++**y

{... andwith the newitem clearly marked,the systemcanthen do somevery simplepattern matchingto
decidewhats goingon.}
Nowpleasetell mea tew things aboutit...
{TEIRESIAS
retrieves the schema
for a stock, and"reads"it to find out whatit needsto knowaboutany
stock.}
On what marketsis GEORGIA-PACIFIC traded?
Theanswershouldbein the formof 1 or moreof the following-
<marketname> <year first traded>
[Type1 set to a line, then anemptyline whendone.]
+ +** NYSE1945
+ +** PCE1968
+ +**
{Eachof the questionsfromthe systemis prompted by the attemptto fill in somecomponent
of the data
structure beingbuilt, accordingto the representationconventions
indicatedby the schema.}
Whatis the current risk classification of GEORGIA-PACIFIC?
Theanswershouldbein the formof a
<risk classification>
+ +** AMEX

FIGURE 28-10 Excerpt of dialogue in which the system learns


a new rule.
Schemata 519

Sorry,but thefollowingare invalid -


AMEXis not a recognized <risk classification>
Pleaseansweragain[use the sameanswerif youreally meantit].
+ +* BLUE-CHIP
+ +**
{TEIRESIASnowbeginsworkingonupdatingother data structures in responseto the addition of the new
stock name.}
Shouldthis newitem be addedto-
1 - the list of common stocks,or
2 - thelist of preferredstocks,or
3 - the list of cumulativepreferredstocks,or
4 - thelist of participatingpreferredstocks
Select oneof the above
~- +** 3
[Adding GEORGIA-PACIFIC
to CUMULATIVEPFD]
{Oneof thesetasksinvolvesaddingthe newstockto oneof four mutuallyexclusivelists. (Severalother
necessaryupdatesare also performed;thesehavebeenedited out herefor brevity.)}
Ok, donewith GEORGIA-PACIFIC
now... Backto the rule.
{Havingfinished acquiringinformationaboutthe newstock, the systemreturns to parsingthe newrule, and
with somehelpfromthe expertgetsthe correct result [the parsingprocessis omittedhere, seeDavis
(1977)for a detailedexample].}
This maytake a few moments.
Thisis myunderstanding of your rule:
If t - the desiredrate of return for the investment is greaterthan10%
2 - the amountof investment experience of the client is moderate
3 - the areaof investment is natural-resource-development
Then
1 - thereis evidence(.5) that the name of the stockto chooseis georgia-pacific

FIGURE28-10 continued

28.2.4 Other Uses of Schemata

The preceding subsection showed one instance of using schemata for main-
tenance of the knowledge base. They help ensure that one change to the
knowledge base (adding a new instance of a known representation) will
not violate necessary relationships between data structures. The schemata
also support other capabilities. Besides being useful in maintaining the
knowledge base, they offer a convenient mechanism for organizing and
implementing data structure access and storage functions.
One of the ideas behind the design of the schemata is to use them as
points around which to organize knowledge. The information about struc-
ture and interrelationships described above, for instance, is stored this way.
In addition, access and storage information is also organized in this fash-
ion. By generalizing the advice concept slightly, it is possible to effect all
data structure access and storage requests in the appropriate schema. That
is, code that needs to access a particular structure "sends" an access request,
520 Meta-LevelKnowledge

and the structure "answers" by providing the requested item. 4 This offers
the well-known advantage of insulating the implementation of a data struc-
ture from its logical design. Codethat refers only to the latter is far easier
to maintain in the face of modifications to data structure implementation.

28.3Function Templates

Associated with each predicate function in the system is a template, a list


structure that resembles a simplified procedure declaration (Figure 28-11).
It is representation-specific, indicating the order and generic type of the
arguments in a typical call of that function. Templates make possible two
interesting parallel capabilities: code generation and code dissection. Tem-
plates are used as a basis for the simple form of code generation alluded
to in Chapter 9. Although details are beyond the scope of this chapter [see
Davis (1976)], code generation is essentially a process of "filling in the
blanks": processing a line of text in a new rule involves checking for key-
words that implicate a particular predicate function, and then filling in its
template on the basis of connotations suggested by other words in the text.

Function Template
SAME (object attribute value)

FIGURE
28-11 Template for the predicate function SAME.

Code dissection is accomplished by using the templates as a guide to


extracting any desired part of a function call. For instance, as noted earlier,
TEIRESIASforms the rule models on the basis of the current contents of
the knowledge base. To do this, it must be able to pick apart each rule to
determine the attributes to which it refers. This could have been made
possible by requiring that every predicate function use the same function
call format (i.e., the same number, type, and order of arguments), but this
would be too inflexible. Instead, we allow every function to describe its
own calling format via its template. To dissect a function call, then, we
need only retrieve the template for the relevant function and then use the
template as a guide to dissecting the remainder of the form. The template
in Figure 28-11, for instance, indicates that the attribute would be the sec-
ond item after the function name. This same technique is also used by
TEIRESIASexplanation facility, where it permits the system to be quite
precise in the explanations it provides.

4This was suggested by the perspective taken in workon SMALLTALK (Goldbergand Kay,
1976) and ACTORS(Hewitt et al., 1973). This style of writing programshas cometo
knownas object-oriented programming.
Meta-Rules 521

This approach also offers a useful degree of flexibility. The introduc-


tion of a new predicate function, for instance, can be totally transparent
to the rest of the system, as long as its template can be written in terms of
the available set of primitives such as attribute, value, etc. The power of
this approach is limited primarily by this factor and will succeed to the
extent that code can be described by a relatively small set of such primitive
descriptors. While more complex syntax is easily accommodated(e.g., the
template can indicate nested function calls), more complex semantics are
more difficult (e.g., the appearance of muhiple attributes in a function
template can cause problems).
Finally, note that the templates also offer a small contribution to system
maintenance. If it becomes necessary to modify the calling sequence of a
function, for instance, we can edit just the template and have the system
take care of effecting analogous changes to all current invocations of the
function.

28.4 Meta-Rules
28.4.1 Meta-Rules---Strategies to Guide the Use of
Knowledge

A second form of domain-specific meta-level knowledgeis strategy knowledge


that indicates how to use other knowledge. This discussion considers strat-
egies from the perspective of deciding which knowledge to invoke next in a
situation where more than one chunk of knowledge may be applicable. For
example, given a problem solvable by either heuristic search or problem
decomposition, a strategy might indicate which technique to use, based on
characteristics of the problem domain and nature of the desired solution.
If the problem decomposition technique were chosen, other strategies
might be employed to select the appropriate decomposition from among
several plausible alternatives.
This view of strategies is useful because many of the paradigms de-
veloped in AI admit (or even encourage) the possibility of having several
alternative chunks of knowledge be plausibly useful in a single situation
(e.g., production rules, logic-based languages, etc.). Whena set of aher-
natives is large enough (or varied enough) that exhaustive invocation
infeasible, some decision must be made about which should be chosen.
Since the performance of a program will be strongly influenced by the
intelligence with which that decision was made, strategies offer an impor-
tant site for the embedding of knowledge in a system.
A MYCIN-like system invokes rules in a simple backward-chaining
fashion that produces an exhaustive depth-first search of an AND/OR goal
tree. If" the program is attempting, for example, to determine which stock
522 Meta-LevelKnowledge

would make a good investment, it retrieves all the rules that make a con-
clusion about that topic (i.e., they mention STOCKNAME in their action
clauses). It then invokes each one in turn, evaluating each premise to see
if the conditions specified have been met. The search is exhaustive because
the rules are inexact: even if one succeeds, it was deemedto be a wisely
conservative strategy to continue to collect all evidence about a subgoal.
The ability to use an exhaustive search is of course a luxury, and in
time the base of rules may grow large enough to make this infeasible. At
this point some choice would have to be made about which of the plausibly
useful rules should be invoked. Meta-rules were created to address this
problem. They are rules about object-level rules and provide a strategy for
pruning or reordering object-level rules before they are invoked.

28.4.2 Examples of Meta-Rules

Figure 28-12 shows four meta-rules for MYCIN(reverting to medicine


again for the moment). The first of them says, in effect, that in trying to
determine the likely identities of organisms from a sterile site, rules that
base their identification on other organisms from the same site are not
likely to be success[ul. The second indicates that when dealing with pelvic
abscess, organisms of the class Enterobacteriacae should be considered before
gram-positive rods. The third and fourth are like the second in that they
reorder relevant rules before invoking them.
It is important to note the character of the information conveyed by
meta-rules. First, note that in all cases we have a rule that is making a
conclusion about other rules. That is, where object-level rules conclude
about the medical (or other) domain, meta-rules conclude about object-
level rules. These conclusions can (in the current implementation) be
two forms. As in the first meta-rule, they can make deductions about the
likely utility of certain object-level rules, or as in the second, they can
indicate a partial ordering between two subsets of object-level rules.
Note also that (as in the first example) meta-rules make conclusions
about the utility of object-level rules, not about their validity. That is,
METARULE001 does not indicate circumstances under which some of the
object-level rules are invalid [or even "very likely (.9)" to be invalid].
merely says that they are likely not to be useful; i.e., they will probably fail,
perhaps only after requiring extensive computation to evaluate their pre-
conditions. This is important because it has an impact on the question of
distribution of knowledge. If meta-rules did commenton validity, it might
make more sense to distribute the knowledge in them, i.e., to delete the
meta-rule and just add another premise clause to each of the relevant
object-level rules. But since their conclusions concern utility, it does not
make sense to distribute the knowledge.
Adding meta-rules to the system requires only a minor addition to
MYCINscontrol structure. As before, the system retrieves the entire list
Meta-Rules 523

METARULEO01
IF 1) the culture wasnot obtainedfroma sterile source,and
2) there are rules whichmentionin their premisea previous
organismwhichmaybe the sameas the current organism
THEN
it is definite (1.0) that eachof themis not goingto beuseful.
PREMISE: (SAND(NOTSAMECNTXTSTERILESOURCE)
(THEREARE OBJRULES(MENTIONSCNTXTPREMISE
SAMEBUG)SET1))
ACTION:(CONCLISTSET1UTILITY NOTALLY1.0)

METARULE002
IF 1) the infectionis a pelvic-abscess,
and
2) there are rules whichmentionin their premise
enterobactariaeeae,and
3) there are rules whichmentionin their premise
gram-positive
rods,
Thereis suggestiveevidence(.4) that the formershouldbe donebefore
thelatter.
PREMISE:(SAND(SAMECNTXTPELVIC-ABSCESS)
(THEREAREOBJRULES(MENTIONS CNTXTPREMISE
ENTEROBACTERIACEAE)SET1)
(THEREAREOBJRULES(MENTIONS CNTXTPREMISEGRAMPOS-RODS)
SET2))
ACTION:CONCLISTSET1DOBEFORE
SET2TALLY .4)

METARULE003
IF 1) there are rules whichdo not mentionthe currentgoal in
their premise
2) there are rules whichmentionthe currentgoalin their
premise
THEN
it is definite that the formershouldbedonebeforethe latter,
PREMISE: ($AND(THEREARE OBJRULES(SAND(DOESNTMENTION
FREEVAR
ACTIONCURGOAL))SET1)
(THEREAREOBJRULES (SAND(MENTIONSFREEVARPREMISE
CURGOAL)SET2))
ACTION: (CONCLISTSET1 DOBEFORE
SET21000)

METARULE004
IF 1) there are rules whichare relevantto positive cultures, and
2) thereare rules whichare relevantto negativecultures
THENit is definite that the formershouldbedonebeforethe latter.
PREMISE:($AND(THEREARE OBJRULES (SAND(APPLIESTOFREEVARPOSCUL))
SET1
)
(THEREARE OBJRULES (SAND(APPLIESTOFREEVAR
NEGCUL))
SET2))
ACTION: (CONCLISTSET1 DOBEFORESET21000)

FIGURE 28-12 Four meta-rules for MYCIN.


524 Meta-LevelKnowledge

of rules relevant to the current goal (call the list L). But before attempting
to invoke them, it first determines if there are any meta-rules relevant to
the goah5 If so, these are invoked first. As a result of their actions, we may
obtain a numberof conclusions about the likely utility and relative ordering
of the rules in L. These conclusions are used to reorder or shorten L, and
the revised list of rules is then used. Viewed in tree-search terms, the
current implementation of meta-rules can either prune the search space
or reorder the branches of the tree.

28.4.3 Guiding the Use of the Knowledge Base

There are several points to note about encoding knowledge in meta-rules.


First, the framework it presents for knowledge organization and use ap-
pears to offer a great deal of leverage, since muchcan be gained by adding
to a system a store of (meta-level) knowledge about which chunk of object-
level knowledge to invoke next. Considered once again in tree terms, we
are talking about the difference between a "blind" search of the tree and
one guided by heuristics. The advantage of even a few good heuristics in
cutting down the combinatorial explosion of tree search is well known.
Thus, where earlier sections were concerned about adding more object-
level knowledge to improve performance, here we are concerned with giv-
ing the system more information about how to use what it already knows.
Consider, too, that the definition of intelligence includes appropriate use
of information. Even if a store of (object-level) information is not large,
is important to be able to use it properly. Meta-rules provide a mechanism
for encoding strategies that can make this possible.
Second, the description given in the preceding subsection has been
simplified in several respects for the sake of clarity. It discusses the aug-
mented control structure, for example, in terms of two levels. In fact, there
can be an arbitrary number of levels, each serving to direct the use of
knowledgeat the next lower level. That is, the system retrieves the list (L)
of object-level rules relevant to the current goal. Before invoking this, it
checks for a list (L) of first-order meta-rules that can be used to reorder
or prune L, etc. Recursion stops when there is no rule set of the next
higher order, and the process unwinds, each level of strategies advising on
the use of the next lower level. Wecan gain leverage at this higher level
by encoding heuristics that guide the use of heuristics. That is, rather than
adding more heuristics to improve performance, we might add more in-
formation at the next higher level about effective use of existing heuristics.

5Thatis, are there meta-rulesdirectly associatedwiththat goal? Meta-rulescanalso be as-


sociatedwithother objects in the system,but that is beyondthe scopeof this chapter.The
issues of organizingand indexingmeta-rulesare coveredin moredetail elsewhere(Davis,
1976;1978).
Meta-Rules 525

The judgmental character of the rules offers several interesting ca-


pabilities. It makes it possible, for instance, to write rules that make dif-
ferent conclusions about the best strategy to use and then rely on the
underlying model of confirmation (Shortliffe and Buchanan, 1975)
weigh the evidence. That is, the strategies can "argue" about the best rule
to use next, and the strategy that "presents the best case" (as judged by the
confirmation model) will win out.
Next, recall that the basic control structure of the performance pro-
gram is a depth-first search of the AND/ORgoal tree sprouted by the
unwinding of rules. The presence of meta-rules of the sort shown in Figure
28-12 means that this tree has an interesting characteristic at each node:
when the system has to choose a path, there may be information stored
that advises about the best path to take. There may therefore be available
an extensive body of knowledge to guide the search, but that knowledge
is not embedded in the code of a clever search algorithm. It is instead
organized around the specific objects that form the nodes in the tree; i.e.,
instead of a smart algorithm, we have a "smart tree."
Finally, there are several advantages associated with the use of strate-
gies that are goal-specific, explicit, and imbeddedin a representation that
is the same as that of the object-level knowledge. The fact that strategies
are goal-specific, for instance, makesit possible to specify precise heuristics
for a given goal, without imposing any overhead on the search for any
other goals. That is, there may be a number of complex heuristics describ-
ing the best kinds of rules to use for a particular goal, but these will cause
no computational overhead except in the search for that goal.
The fact that they are explicit means a conceptually cleaner organiza-
tion of knowledge and an ease of modification of established strategies.
Consider, for instance, alternative means of achieving the sort of partial
ordering specified by the second meta-rule. There are several alternative
schemes by which this could be accomplished, involving appropriate mod-
ifications to the relevant object-level rules and slight changes to the control
structure. Such schemes, however, share several faults that can be illus-
trated by considering one such approach: an agenda with multiple priority
levels like the one proposed in Bobrow and Winograd (1977).
In an agenda-driven system, rules are put on an agenda rather than
dealt with in the form of a linear list of relevant rules in a partial ordering.
Partial ordering could be accomplished simply by setting the priority of
some rules higher than that of others; rules in subset A, for instance, might
get priority 6, while those in subset B are given priority 5. But this tech-
nique presents two problems: it is both opaque and likely to cause bugs. It
will not be apparent from looking at the code, for instance, why the rules
in A were given a higher priority than that of the rules in B. Were they
more likely to be useful, or is it desirable that those in A precede those in
B no matter how useful they may be? Consider also what happens if, before
we get a chance to invoke any of the rules in A, an event occurs that makes
526 Meta-LevelKnowledge

it clear that their priority ought to be reduced (for reasons unrelated to


the desired partial ordering). If the priority of only the rules in A is ad-
justed, a bug arises, since the desired relative ordering may be lost.
The problem is that this approach tries to reduce a number of differ-
ent, incommensurate factors to a single number, with no record of how that
number was reached. Meta-rules offer one mechanism for making these sorts
of considerations explicit, and for leaving a record of whya set of processes
has been queued in a particular order. They also make subsequent modi-
fications easier, since all of the information is in one place--changing a
strategy can be accomplished by editing the relevant meta-rule, rather than
by searching through a program for all the places where priorities have
been set to effect that strategy.
Lastly, the use of a uniform encoding of knowledge makes the treatment
of all levels the same. For example, second-order meta-rules require no
machinery in excess of that needed for first-order meta-rules. It also means
that all the explanation and knowledge acquisition capabilities developed
for object-level rules can be extended to meta-rules as well. The first of
these (explanation) has been done and works for all levels of meta-rules.
Adding this to TEIRESIASexplanation facility makes possible an inter-
esting capability: in addition to being able to explain what it did, the system
can also explain how it decided to do what it did. Knowledgein the strategies
has become accessible to the rest of the system and can be explained in
just the same fashion. Wenoted above that adding meta-level knowledge
to the system was quite distinct from adding more object-level knowledge,
since strategies contain information of a qualitatively different sort. Expla-
nations based on this information are thus correspondingly different as
well.

28.4.4 Broader Implications of Meta-Rules

The concept of strategies as a mechanism for deciding which chunk of


knowledge to invoke next can be applied to a number of different control
structures. We have seen how it works in goal-directed scheme, and it
functions in much the same way with a data-directed process. In the latter
case meta-rules offer a way of controlling the depth and breadth of the
implications drawn from any new fact or conclusion. Pursuing this further,
we can imagine making the decision to use a data- or goal-directed process
itself as an issue to be decided by a collection of appropriate meta-rules.
At each point in its processing, the system might invoke one set of meta-
rules to choose a control structure, then use another set to guide that
control structure. This can be applied to many control structures, dem-
onstrating the range of applicability of the basic concept of strategies as a
device for choosing what to do next.
Conclusions 527

28.4.5 Content-Directed Invocation

If meta-rules are to be used to select from amongplausibly useful object-


level rules, they must have some way of referring to the object-level rules.
The mechanismused to effect this reference has implications for the flex-
ibility and extensibility of the resulting system. To see this, note that the
meta-rules in Figure 28-12 refer to the object-level rules by describing them
and effect this description by direct examination of content. For instance,
METARULE001 refers to rules that mention in their premises previous organisms
that maybe the same as the current organism, which is a description rather than
an equivalent list of rule names. The set of object-level rules that meet this
description is determined at execution time by examining the source code
of the rules. That is, the meta-rule "goes in and looks" for the relevant
characteristic, using the function templates as a guide to dissecting the
rules. Wehave termed this content-directed invocation.
Part of the utility of this approach is illustrated by its advantages over
using explicit lists of object-level rules. If such lists were used, then tasks
would require extensive amounts of bookkeeping. After an object-level rule
had been edited, for instance, we would have to check all the strategies
that name it, to be sure that each such reference was still applicable to the
revised rule. With content-directed invocation, however, these tasks require
no additional effort, since the meta-rules effect their own examination of
the object-level rules and will make their own determination of relevance.

28.5Conclusions
We have reviewed four examples of meta-level knowledge and demon-
strated their application to the task of building and using large stores of
domain-specific knowledge. This has showed that supplying the system
with a store of information about its representations makes possible a num-
ber of useful capabilities. For example, by describing the structure of its
representations (schemata, templates), we make possible a form of transfer
of expertise, as well as a number of facilities for knowledge base mainte-
nance. By supplying strategic information (meta-rules), we make possible
a finer degree of control over use of knowledge in the system. And by
giving the system the ability to derive empirical generalizations about its
knowledge (rule models), we make possible a number of useful abilities
that aid in knowledgetransfer.
The examples reviewed above illustrate a number of general ideas
about knowledge representation and use that may prove useful in building
large programs. Wehave, first, the notion that knowledge in programs
should be made explicit and accessible. Use of production rules to encode
528 Meta-LevelKnowledge

the object-level knowledge is one example of this, since knowledge in them


may be more accessible than that embedded in the code of a procedure.
The schemata, templates, and meta-rules illustrate the point also, since
each of them encodes a form of information that is, typically, either omitted
entirely or at best is left implicit. By making knowledgeexplicit and acces-
sible, we make possible a number of useful abilities. The schemata and
templates, for example, support the forms of system maintenance and
knowledge acquisition described above. Meta-rules offer a means for ex-
plicit representation of the decision criteria used by the system to select its
course of action. Subsequent "playback" of those criteria can then provide
a form of explanation of the motivation for system behavior [see Davis
(1976) for examples]. That behavior is also more easily modified, since the
information on which it is based is both clear (since it is explicit) and
retrievable (since it is accessible). Finally, more of the systems knowledge
and behavior becomes open to examination, especially by the system itself.
Second, there is the idea that programs should have access to their
own representations. To put this another way, consider that over the years
numerous representation schemes have been proposed and have generated
a number of discussions of their respective strengths and weaknesses. Yet,
in all these discussions, one entity intimately concerned with the outcome
has been left uninformed: the program itself. What this suggests is that
we ought to describe to the program a range of information about the
representations it employs, including such things as their structure, orga-
nization, and use.
As noted, this is easily suggested but more difficult to do. It requires
a means of describing both representations and control structures, and the
utility of those descriptions will be strongly dependent on the power of the
language in which they are expressed. The schemata and templates are
the two main examples of the partial solutions we have developed for
describing representations, and both rely heavily on the idea of a task-
specific high-level language--a language whose conceptual primitives are
task-specific. The main reason for using this approach is to make possible
what we might call "top-down code understanding." Traditionally, efforts
at code understanding [e.g., Waldinger and Levitt (1974), Manna(1969)]
have attempted to assign meaning to the code of some standard program-
ming language. Rather than take on this sizable task, we have used task-
specific languages to make the problem far easier. Instead of attempting
to assign semantics to ordinary code, we assigned a "meaning" to each of
the primitives in the high-level language and represented it in one or more
informal ways. Thus, for example, ATTRIBUTE is one of the primitives
in the "language" in which templates are written; its meaning is embodied
in procedures associated with it that are used during code generation and
dissection [see Davis (1976) for details].
This convenient shortcut also implies a number of limitations. Most
importantly, the approach depends on the existence of a finite number of
"mostly independent" primitives. This means a set of primitives with only
Conclusions 529

a few, well-specified interactions between them. The number of interac-


tions should be far less than the total possible, and interactions that do
occur should be uncomplicated (as, for example, the interaction between
the concepts of attribute and value).
But suppose we could describe to a system its representations? What
benefits would follow? The primary thing this can provide is a way of
effecting multiple uses of the same knowledge. Consider, for instance, the
multitude of ways in which the object-level rules have been used. They are
executed as code in order to drive the consultation (see Part Two); they
are viewed as data structures, and dissected and abstracted to form the
rule models (Parts Three and Nine); they are dissected and examined
order to produce explanations (Part Six); they are constructed during
knowledge acquisition (Part Three); and, finally, they are reasoned about
by the meta-rules (Part Nine).
It is important to note here that the feasibility of such multiplicity of
uses is based less on the notion of production rules per se than on the
availability of a representation with a small grain size and a simple syntax and
semantics. "Small" modular chunks of code written in a simple, heavily styl-
ized form (though not necessarily a situation-action form) would have done
as well, as would have any representation with simple enough internal
structure and of manageable size. The introduction of greater complexity
in the representation, or the use of a representation that encoded signifi-
cantly larger "chunks" of knowledge, would require more sophisticated
techniques for dissecting and manipulating representations than we have
developed thus far. But the key limitations are size and complexity of
structure, rather than a specific style of knowledgeencoding.
Twoother benefits ~may arise from the ability to describe representa-
tions. Wenoted earlier that muchof the information necessary to maintain
a system is often recorded in informal ways, if at all. If it were in fact
convenient to record this information by describing it to the programitself,
then we would have an effective and useful repository of information. We
might see information that was previously folklore or informal documen-
tation becoming more formalized and migrating into the system itself. We
have illustrated above a few of the advantages this offers in terms of main-
taining a large system.
This may in turn produce a new perspective on programs. Early scarc-
ity of hardware resources led to an emphasis on minimizing machine re-
sources consumed, for example, by reducing all numeric expressions to
their simplest form by hand. More recently, this has meant a certain style
of programming in which a programmer spends a great deal of time think-
ing about a problem first, trying to solve as much as possible by hand, and
then abstracting out only the very end product of all of that effort to be
embodied in the program. That is, the program becomes simply a way of
manipulating symbols to provide "the answer," with little indication left of
what the original problem was or, more importantly, what knowledge was
required to solve it.
530 Meta-LevelKnowledge

But what if we reversed this trend, and instead viewed a program as


a place to store many forms of knowledge about both the problem and the
proposed solution (i.e., the program itself)? This would apply equally well
to code and data structures and could help make possible a wider range
of useful capabilities of the sort illustrated above.
One final observation. As we noted at the outset, interest in knowledge-
based systems was motivated by the belief that no single domain-indepen-
dent paradigm could produce the desired level of performance. It was
suggested instead that a large store of domain-specific (object-level) knowl-
edge was required. Wemight similarly suggest that this too will eventually
reach its limits and that simply adding more object-level knowledgewill no
longer, by itself, guarantee increased performance. Instead, it may be nec-
essary to focus on building stores of meta-level knowledge, especially in
the form of strategies for effective use of knowledge. Such "meta-level
knowledge-based" systems may represent a profitable future direction for
research.
29
Extensions to Rules for
Explanation and Tutoring

William J. Clancey

As described in Part Eight, the success of MYCINas a problem solver


suggested that the programs knowledge base might be a suitable source
of subject material for teaching students. This use of MYCIN was consis-
tent with the design goals that the programs explanations be educational
to naive users and that the representation be flexible enough to allow for
use of the rules outside of the consultative setting. In theory, the rules
acquired from human experts would be understandable and useful to stu-
dents. The GUIDONprogram discussed in Chapter 26 was developed to
push these assumptions by using the rules in a tutorial interaction with
medical students.
In attempting to "transfer back" the experts knowledge to students
through GUIDON,we found that the experts diagnostic approach and
understanding of" rules were not explicitly represented. GUIDON cannot
justify the rules because MYCINdoes not have an encoding of how the
concepts in a rule fit together. GUIDON cannot fully articulate MYCINs
problem-solving approach because the structure of the search space and
the strategy for traversing it are implicit in the ordering of rule concepts.
Thus the seemingly straightforward task of converting a knowledge-based
system into a computer-aided instruction program has led to a detailed
reexamination of the rule base and the foundations on which rules are
constructed, an epistemological study.
In building MYCIN,rule authors did not recognize a need to record
the structured way in which they were fitting rule parts together. The rules
are more than simple associations between data and hypotheses. Sometimes
clause order counts for everything, and different orders can mean differ-

This chapter is an edited version of an article appearing in Artificial Intelligence 20:215-251


(1983). Copyright 1983 by Artificial Intelligence. All rights reserved. Used with permission.

531
532 Extensionsto Rulesfor Explanation
andTutoring

ent things. Also, some rules are present mostly to control the invocation
of others. The uniformity of the representation obscures these various
functions of clauses and rules. In looking beyond the surface of the rule
representation to make explicit the intent of the rule authors, this paper
has a purpose similar to Woods "Whats in a Link?" (1975) and Brachmans
"Whats in a Concept?" (1976). Weask, "Whats in a Rule?"
In building GUIDON,we thought that we were simply being "appli-
cations engineers" by making use of MYCINsexplanation facility for a
tutorial setting. As noted in Chapter 26, it was surprising to find out how
little the explanation facility could accomplish for a student. Without a
crisp characterization of what we expected an explanation to convey, the
program was of questionable tutorial value. On the positive side, the study
of these shortcomings led to a radical change in our conception of MY-
CINs rules and supplied a new epistemological framework for building
expert systems.
In this chapter we provide a review of MYCINsexplanatory capability
and an overview of an epistemological framework for enhancing that ca-
pability. The following two sections examine in detail the problems of jus-
tifying a rule and explaining an approach, thereby elucidating the support and
strategic aspects of the epistemological framework. Implications for per-
formance of a consultation system and modifiability are considered briefly.
Finally, in the last section, the frameworkis used to analyze other expert
systems.

29.1MYCINs Explanation Capability

As already mentioned, MYCINwas designed so that it could explain its


reasoning to the user. This means that it can describe its reasoning steps:
how a request for data is related to a goal, how one goal leads to another,
and how a goal is achieved. Figure 29-1 shows a consultation excerpt in
which the program demonstrates a line of reasoning (chained goals) that
lead backwards from the current question:

Is the patient a compromised host (Rule 343) --* Is the patient


immunosuppressed?l
(Rule 395) --, Is the patient
receiving steroids?
(Rule 543) --. Is E. coli causing
the meningitis?

tReadas "... compromised


host (by Rule343) answersthe questionIs the patient immu-
nosuppressed?... "
MYCINs Explanation Capability 533

23) Is J.Smith a compromised


host?
** WHY
[i.e. WHY
is it importantto determinewhetherJ.Smithis a compromised
host?]
[3.0] This will aid in determiningwhetherJ.Smithis immunosuppressed.
If
[3.1] &Smithis not a compromised host
then
it is definite (1.0) that J.Smithis not immunosuppressed
[RULE343]
[backto question23...]
** WHY
[i.e. WHY
is it importantto determinewhetherJ.Smithis immunosuppressed?]
[4.0]... in order to determine whetherJ.Smithhasreceivedcorticosteroids.
If
[4.1] J.Smithis not immunosuppressed
then
it is definite(1.0) that J.Smithhasnot received
corticosteroids.
[RULE395]
[backto question23...]
** WHY
[i.e. WHY
is it importantto determine
whether
J.Smithhasreceivedcorticosteroids?]
[5.0]... in order to determine
the organisms
causingthe infection.
It hasalreadybeenestablishedthat
[5.1] the infectionwhichrequirestherapyis meningitis,and
[5.2] only circumstantialevidenceis availablefor this case,and
[5.3] the typeof meningitis
is bacterial.
Therefore,if
[5.4] J.Smithis receivingcorticosteroids
then
thereis evidence that the organisms
causingthe infection are:
e.coli (.4), klebsiella-pneumoniae
(.2), or pseudomonas-aeruginosa
[RULE543]
[backto question23...]
** NO

FIGURE29-1 Consultation excerpt showing explanation


capability. (User input in boldface capitals; all other text is
printed by MYCIN.)

Figure 29-2 illustrates how, in the questioning session after the consulta-
tion, one can inquire further about the programs intermediate reasoning
steps, including why it didnt ask about something. These are the expla-
nation capabilities that we sought to exploit in a teaching program.
MYCINs explanations are entirely in terms of its rules and goals. The
question WHYmeans "Why do you want this information?" or "How is
this information useful?" and is translated internally as "In what rule does
this goal appear, and what goal does the rule conclude about?" Davis, who
534 Extensions to Rules for Explanation and Tutoring

developed the explanation facility, pointed out that MYCIN did not have
the knowledge to respond to other interpretations of a WHYquestion
(Davis, 1976). He mentioned specifically the lack of rule justifications and
planning knowledge addressed in this chapter.
In order to illustrate other meanings for the question WHYin
MYCIN,we illustrate the rule set as a network of goals, rules, and
hypotheses in Figure 29-3. At the top level are all of the systems goals that
it might want to pursue to solve a problem (diagnostic and therapeutic
decisions). Examples of goals, stated as questions to answer, are "What is
the shape of the organism?" and "What organism is causing the meningi-
tis?" At the second level are hypotheses or possible choices for each of the
goals. Examples of hypotheses are "The organism is a rod." and "E. coli is
causing the meningitis." At the third level are the rules that support each
hypothesis. At the fourth level appear the premises of these rules, specific
hypotheses that must be believed for the rule to apply. For example, for
Rule 543 to apply (shown in Figure 29-1) it must be the case that the
infection is meningitis, that the meningitis was caused by bacteria, that the
patient is receiving steroids, and so on.
A key aspect of MYCINsinterpreter is that, when confronted with a
hypothesis in a rule premise that it needs to confirm, it considers all related
hypotheses by pursuing the more general goal. For example, attempting
to apply Rule 543, the program will consider all rules that conclude about
the infection, rather than just those that conclude that the infection is
meningitis. Similarly, it will consider all rules that conclude about the kind
of meningitis (viral, fungal, TB, or bacterial), rather than just those that
hypothesize that the meningitis is bacterial. 2 These new goals deriving
from rules can now be seen conceptually as level 1 goals, and the process
recurs.
The links in Figure 29-3 and their ordering are points of flexibility in
the rule representation. For example, the author of a rule defines each
goal and its specific hypotheses (levels 1 and 2 and 4 and 5). Less trivially,
it is the authors choice to define rules that link hypotheses to one another.
For example, rules on level 3 link levels 2 and 4. Wecall the rationale
behind this link the justification of the rule. GUIDON cannot teach rule
justifications because they are not represented in MYCIN.Section 29.3
examines the nature of rule justifications and how a tutoring system can
provide them.

2This is not inefficient, given the programs exhaustive search strategy and tile fact that the
other hypotheses will be referenced by other rules. Note also that some hypotheses, such as
"the patient is receiving steroids," are not generalized but are represented as goals directly.
Whether or not a hypothesis is represented as a yes/no parameter or as a "value" of a multi-
valued parameter (such as "kind of meningitis") is a decision of the author of the rule, deriving
from a pattern of hypotheses that he or she wishes to collapse tbr clarity into a more general
goal. By this process of abstraction, a single muhi-valued parameter dealing with kinds of
surgery would replace individual yes/no parameters that specified "cardiac surgery," "neu-
rosurgery," etc. These organizational decisions have no bearing on system performance, so
the knowledge base is somewhat inconsistent in how these choices are made.
MYCINs
Explanation Capability 535

** DID YOUCONSIDER
WHETHER
THEPATIENT
WASRECEIVING
STEROIDS?
RULE543couldhaveused:
whetherJ.Smith
hasreceivedcorticosteroids.
Clause
4 of RULE543
"thepatienthasreceived
corticosteroids"
wasalready
known
to befalse,

** HOWDID YOUKNOW THATTHEPATIENT HADNOTRECEIVED STEROIDS?


RULE395
wasused to conclude
thatJ.Smith
hasnotreceived
corticosteroids.
Thelast question
askedbefore
theconclusion
wasmadewas23.

** WHAT WASQUESTION 23?


Question
23wasasked in orderto findoutwhether
J.Smith
is a compromised
host
in aneffortto execute
RULE343.

FIGURE29-2 Excerpt from a question/answer session. (User


input appears in boldface capitals.)

Next, the authors ordering of hypotheses in a rules premise will affect


the order in which goals are pursued (level 5). The rationale for this choice
again lies outside of the rule network. Thus the program cannot explain
why it pursues meningitis (goal 5.1 in Figure 29-1) before determining that
the infection is bacterial (goal 5.3). Section 29.4 examines how this ordering
constitutes a strategy and how it can be made explicit.
The order in which rules for a goal are tried (level 3) also affects the
order in which hypotheses (and hence subgoals) are pursued (level 5).
example, Rule 535 considers whether the patient is an alcoholic; so if this
rule is tried before Rule 543, alcoholism will be considered before steroids.
As these goals cause questions to be asked of the user, it is evident that the
ordering of questions is also determined by the ordering of rules as well
as by the ordering of clauses in the premise of a rule.
Here there is no implicit author rationale, for rule order lies outside
of the authors choice; it is fixed, and determined only by the order in
which rules were entered into the system. As pointed out above, MYCIN
does not decide to pursue the hypothesis "bacterial meningitis" before "viral
meningitis"--it simply picks up the bag of rules that make some conclusion
about "kind of meningitis" and tries them in numeric order. Hence rule
order is the answer to the question "Why is one hypothesis considered
before another?" And rule order is often the answer to "Why is one ques-
tion asked before another?" Focusing on a hypothesis and choosing a ques-
tion to confirm a hypothesis are not necessarily arbitrary in human rea-
soning. This raises serious questions about using MYCINfor interpreting
a students behavior and teaching him or her how to reason, as discussed
"~
in Section 29.4.

3Meta-rulescould have been used fl)r ordering rules, as described in Chapter 28. Thepresent
chapter is a rethinking of" the wholequestion.
536

oo
0

tJ
o
e-

...I
An Epistemological Frameworkfor Rule-Based Systems 537

To summarize, we have used a rule network as a device for illustrating


aspects of" MYCINsbehavior that it cannot explain. Weare especially in-
terested in making explicit the knowledge that lies behind the behavior
that is not arbitrary but that cannot be explained because it is implicit in
rule design. To do this, we will need some sort of framework for charac-
terizing the knowledgeinvolved, since the rule link itself is not sufficient.
An epistemological fiamework for understanding MYCINsrules is pre-
sented in the next section.

An Epistemological Framework for


29.2 Rule-Based Systems

The framework presented in this section stems from an extensive study of


MYCINsrules. It is the basic framework that we have used for under-
standing physicians explanations of their reasoning, as well as being a
foundation for re-representing the knowledge in MYCINsrules. As an
illustration, we will consider in detail the steroids rule shown again in
4Figure 29-4.

RULE543
IF: 1) Theinfection whichrequirestherapy is meningitis,
2) Onlycircumstantialevidence is availablefor this case,
3) Thetypeof theinfection is bacterial,
4) Thepatientis receiving corticosteroids,
THEN: Thereis evidence that the organisms whichmightbe causingthe infectionare
e.coli (.4), klebsiella-pneumoniae(.2), or pseudomonas-aeruginosa

FIGURE
29-4 The steroids rule.

Figure 29-5 shows how this diagnostic heuristic is justified and incor-
porated in a problem-solving approach by relating it to strategic, structural,
and support knowledge. Recalling Section 29.1, we use the term strategy to
refer to a plan by which goals and hypotheses are ordered in problem
solving. A decision to determine "cause of the infection" before "therapy
to administer" is a strategic decision. Similarly, it is a strategic decision to
pursue the hypothesis "E. coli is causing meningitis" before "Cryptococcus is
causing meningitis." And recalling an earlier example, deliberately decid-
ing to ask the user about steroids before alcoholism would be a strategic
decision. These decisions all lie above the plane of goals and hypotheses,

1The English fk)rm of rules stated in this paper has been simplified for readability. Sometimes
clauses are omitted. Medical examples are for purposes of illustration only.
538 Extensions to Rules for Explanation and Tutoring

ESTABLISH HYPOTHESISSPACE:
(STRATEGY)
CONSIDERDIFFERENTIAL-BROADENINGFACTORS

/
IN BACTERIALMENINGITIS, COMPROMISED
HOST
(RULE MODEL)
I
[
RISK FACTORSSUGGESTUNUSUALORGANISMS
I
ANY DISORDER

INFECTION
(STRUCTURE)
MENINGITIS
COMPROMISEDHOST
ACUTE CHRONIC

CURRENT BACTERIAL VIRAL


MEDICATIONS

UNUSUAL-CAUSES SKIN ORGS

I if STEROIDSthen GRAM-NEGATIVE
RODORGS
I
I
(INFERENCERULE)

\
STEROIDS IMPAIR IMMUNO-RESPONSE
MAKING PATIENT SUSCEPTIBLETO
(SUPPORT)
INFECTION BY ENTEROBACTERIACEAE,
NORMALLYFOUNDIN THE BODY

FIGURE 29-5 Augmenting a knowledge source with three


kinds of meta.level knowledge: knowledge for indexing, justi-
fying, and invoking a MYCIN rule.

and as discussed later, they can often be stated in domain-independent


terms, e.g., "consider differential-broadening factors."
In order to make contact with the knowledge of the domain, a level
of structural knowledge is necessary. Structural knowledge consists of abstrac-
tions that are used to index the domain knowledge. For example, one can
AnEpistemologicalFramework
for Rule-BasedSystems 539

classify causes of" disease into commonand unusual causes, for example,
of bacterial meningitis. These concepts provide a handle by which a strategy
can be applied, a means of referencing the domain-specific knowledge. For
example, a strategy might specify considering commoncauses of a disease;
the structural knowledgeabout bacterial meningitis allows this strategy to
be instantiated in that context. This conception of structural knowledge
follows directly from Davis technique of content-directed invocation of knowl-
edge sources (see Chapter 28). A handle is a means of indirect reference
and is the key to abstracting reasoning in domain-independent terms. The
discussion here elaborates on the nature of handles and their role in the
explanation of reasoning.
The structural knowledge we will be considering is used to index two
kinds of" hypotheses: problem features, which describe the problem at hand
(for example, whether or not the patient is receiving steroids is a problem
feature); and diagnoses, which characterize the cause of the observed prob-
lem features. For example, acute meningitis is a diagnosis. In general,
problem features appear in the premises of diagnostic rules, and diagnoses
appear in the conclusions. Thus organizations of problem features and
diagnoses provide two ways of indexing rule associations: one can use a
strategy that brings certain diagnoses to mind and consider rules that sup-
port those hypotheses; or one can use a strategy that brings certain prob-
lem features to mind, gather that information, and draw conclusions (apply
rules) in a data-directed way.
Figure 29-5 shows how a rule model, or generalized rule, 5 as a form of
structural knowledge, enables either data-directed consideration of the ste-
roids rule or hypothesis-directed consideration. Illustrated are partial hier-
archies of problem features (compromised host factors) and diagnoses
(kinds of infections, meningitis, etc.)--typical forms of structural knowl-
edge. The specific organisms of the steroids rule are replaced by the set
"gram-negative rods," a key hierarchical concept we use for understanding
this rule.
Finally, the justification of the steroids rule, a link between the problem
feature hypothesis "patient is receiving steroids" and the diagnostic hy-
pothesis "gram-negative rod organisms are causing acute bacterial infec-
tious meningitis," is based on a causal argument about steroids impairing
the bodys ability to control organisms that normally reside in the body.
While this support knowledgeis characteristically low-level or narrow in con-
trast with the strategical justification for considering compromised host
risk factors, it still makesinteresting contact with structural terms, such as
the mention of Enterobacteriaceae, which are kinds of gram-negative rod
organisms. In the next section, we will consider the nature of rule justifi-
cations in more detail, illustrating howstructural knowledge enables us to
make sense of a rule by tying it to the underlying causal process.

51)avisrule models(Chapter28), generatedautomatically,capturepatterns, but they do


restate rules moreabstractlyas weintendhere.
540 Extensionsto Rulesfor Explanation
andTutoring

29.3Explaining a Rule

Here we consider the logical bases for rules: what kinds of arguments
justify the rules, and what is their relation to a mechanistic model of the
domain? Weuse the terms "explain" and "justify" synonymously, although
the sense of "making clear what is not understood" (explain) is intended
more than "vindicating, showing to be right or lawful" (justify).

29.3.1 Different Kinds of Justifications

There are four kinds of justifications for MYCINsrules: identification,


cause, world fact, and domain fact. In order to explain a rule, it is first
necessary to knowwhat kind of justification it is based on.

1. Rules that use identifying properties of an object to classify it are called


identification rules. Most of MYCINsrules that use laboratory observa-
tions of an unknownare like this: "If the organism is a gram-negative,
anaerobic rod, its genus maybe bacteroides (.6)." Thus an identification
rule is based on the properties of a class.
2. Rules whose premise and action are related by a causal argument are
called causal rules. The causality can go in either direction in MYCIN
rules: "symptom caused by disease" or, more commonly, "prior problem
causes disease." Szolovits and Pauker (1978) suggest that it is possible
to subdivide causal rules according to the scientific understanding of
the causal link:
a. empirical association (a correlation for which the process is not under-
stood),
b. complication (direction of causality is known,but the conditions of the
process are not understood), and
c. mechanism(process is well modeled).
Most of MYCINscausal rules represent medical complications that are
not easily expressed as anatomical relations and physiological processes.
The certainty factors in MYCINscausal rules generally represent a
mixture of probabilistic and cost/benefit judgment. Rather than simply
encoding the strength of association between symptom and cause, a
certainty factor also captures how important it is that a diagnosis be
considered in therapy selection.
3. Rules that are based on empirical, commonsense knowledge about the
world are called world fact rules. An example is "If the patient is male,
then the patient is not pregnant." Other examples are based on social
patterns of behavior, such as the fact that a young male might be a
military recruit and thus be living in a crowded environment where
disease spreads readily.
Explaining a Rule 541

4. Domainfact rules link hypotheses on the basis of domain definitions. An


example is "If a drug was administered orally and it is poorly absorbed
in the GI tract, then the drug was not administered adequately." By
definition, to be administered adequately a drug must be present in the
body at high enough dosage levels. By using domain fact rules, the
program can relate problem features to one another, reducing the
amount of information it has to request from the user.

In summary, a rule link captures class properties, social and domain


facts, and probabilistic and cost/benefit judgments. Whena definition,
property, or world fact is involved, simply saying this provides a reasonable
explanation. But causal rules, with their connection to an underlying pro-
cess of disease, require much more, so we will concentrate on them.

29.3.2 Levels of Explanation--Whats Not in a Rule?

In this section we consider the problem of justifying a causal rule, the


tetracycline rule:

"If the patient is less than 8 years old, dont prescribe tetracycline."

This rule simply states one of the things that MYCINneeds to know to
properly prescribe drugs for youngsters. The rule does not mention the
underlying causal process (chelation, or drug deposition in developing
bones) and the social ramifications (blackened permanent teeth) on which
it is based. From this example, it should be clear that the justifications of
MYCINsrules lie outside of the rule base. In other words, the record of
inference steps that ties premise to action has been left out. A few questions
need to be raised here: Did the expert really leave out steps of reasoning?
Whatis a justification for? Andwhat is a good justification?
Frequently, we refer to rules like MYCINsas "compiled knowledge."
However, when we ask physicians to justify rules that they believe and
follow, they very often cant explain why the rules are correct. Or their
rationalizations are so slow in comingand so tentative that it is clear they
are not articulating reasoning steps that are consciously followed. Leaps
from data to conclusion are justified because the intermediate steps (like
the process of chelation and the social ramifications) generally remain the
same from problem to problem. There is no need to step through this
knowledge--to express it conditionally in rules. Thus, for the most part,
MYCINsrules are not compiled in the sense that they represent a delib-
erate composition of reasoning steps by the rule authors. They are com-
piled in the sense that they are optimizations that leave out unnecessary
steps---evolved patterns of reasoning that cope with the demands of or-
dinary problems.
If an expert does not think about the reasoning steps that justify a
rule, why does a student need to be told about them? One simple reason
542 Extensionsto Rulesfor Explanation
andTutoring

tetracycline in youngster
-~ chelation of the drug in growingbones
--, teeth discoloration
undesirable body change
dont administer tetracycline

FIGURE29-6 Causal knowledge underlying the tetracycline


rule.

is so the student can rememberthe rule. A justification can even serve as


memoryaid (mnemonic) without being an accurate description of the un-
derlying phenomena. For example, medical students have long been told
to think in terms of "bacteria eating glucose" from which they can remem-
ber that low CSF(cerebrospinal fluid) glucose is a sign of a bacterial men-
ingitis (as opposed to fungal or viral meningitis). The interpretative rule
is learned by analogy to a familiar association (glucose is a food, and bac-
teria are analogous to larger organisms that eat food). This explanation
has been discredited by biological research, but it is still a useful mnemonic.
Given that an accurate causal argument is usually expected, how is a
satisfying explanation constructed? To see the difficulty here, observe that,
in expanding a rule, there is seemingly no limit to the details that might
be included. Imagine expanding the tetracycline rule by introducing three
intermediate concepts as shown in Figure 29-6. The choice of intermediate
concepts (the grain size of rules) is arbitrary, of course. For example, there
is no mention of how the chelation occurs. What are the conditions? What
molecules or ions are involved? There are arbitrarily manylevels of detail
in a causal explanation. To explain a rule, we not only need to know the
intermediate steps, we also need to decide which steps in the reasoning
need to be explained. Purpose (how deep an understanding is desirable)
and prior knowledge are obviously important.
Conceptually, the support knowledgefor a causal rule is a tree of rules,
where each node is a reasoning step that can theoretically be justified in
terms of finer-grained steps. The important thing to remember is that
MYCIN is a flat system of rules. It can only state its immediate reasoning
steps and cannot explain them on any level of detail.

29.3.3 Problem Features, the Hypothesis Taxonomy,


and Rule Generalizations

A tree of rules seems unwieldy. Surely most teachers cannot expand on


every reasoning step downto the level of the most detailed physical knowl-
edge known. The explanation tree for the tetracycline rule, for example,
quickly gets into chemical bonding theory. Explaining a rule (or under-
Explaininga Rule 543

undesirable bodychanges 4 "types"

photosensitivity,diarrhea,...
teeth nausea
discoloration

drugsy,z,...
tetracycline drugx

FIGURE 29-7 Problemfeature hierarchy for contraindication


rules.

standing one) does not require that every detail of causality be considered.
Instead, a relatively high level of explanation is generally satisfying--most
readers probably feel satisfied by the explanation that tetracycline causes
teeth discoloration. This level of satisfaction has something to do with the
students prior knowledge.
For an explanation to be satisfying, it must make contact with already
known concepts. We can characterize explanations by studying the kinds
of intermediate concepts they use. For example, it is significant that most
contraindication rules, reasons for not giving antibiotics, refer to "unde-
sirable body changes." This pattern is illustrated hierarchically in Figure
29-7. The first level gives types of undesirable changes; the second level
gives causes of these types of changes. Notice that this figure contains the
last step of" the expanded tetracycline rule and a leap from tetracycline to
this step. The pattern connecting drugs to the idea of undesirable body
changes forms the basis of" an expectation for explanations: we will be
satisfied if" a particular explanation connects to this pattern. In other words,
given an effect that we can interpret as an undesirable body change, we
will understand why a drug causing that effect should not be given. We
might want to knowhow the effect occurs, but here again, we will rest easy
on islands of familiarity, just as we dont feel compelled to ask why people
dont want black teeth.
To summarize, key concepts in rule explanations are abstractions that
connect to a pattern of reasoning we have encountered before. This sug-
gests that one way to explain a rule, to make contact with a familiar rea-
soning pattern, is to generalize the rule. Wecan see this more clearly from
the viewpoint of diagnosis, which makes rich use of hierarchical abstrac-
tions.
Consider the following fragment from a rule we call the leukopenia
rule:

"If a complete blood count is available and the white blood


count is less than 2.5 units, then the following bacteria might be
544 Extensions to Rules for Explanationand Tutoring

causing infection: e.coli (.75), pseudomonas-aeruginosa (.5),


klebsiella-pneumoniae (.5)."

How can we explain this rule? First, we generalize the rule, as shown
in Figure 29-8. The premise concepts in the rules on the left-hand side of
levels 1 through 3 are problem features (cf. Section 29.2), organized hier-
archically by different kinds of relations. Generally, a physician speaks
loosely about the connections--referring to leukopenia both as a cause of
immunosuppression as well as a kind of immunosuppression--probably
because the various causes are thought of hierarchically.

Compromised bacteria normally foundin


host condition the bodycause infection

"caus/

pregnancy
I "subtype"

immunosu pp ression
T
"subset"

Gram.negative rods and


condition enterobacteriaceae

"caUS/steroids I "evidence"

leukopenia
I"is a"

E.coli, Pseudomonas,
and Klebsiella

WBC<2.5 PMNS+ BANDS<1000

~ ~mponentof"
CBC Oata

FIGURE29-8 Generalizations of the leukopenia rule.

The relationships among CBC, WBC, and leukopenia reveal some


interesting facts about how MYCINs rules are constructed. WBCis one
component of a complete blood count (CBC). If the CBC is not available,
it makes no sense to ask for any of the components. Thus the CBC clause
in the leukopenia rule is an example of a screening clause. Another example
of a screening clause is the age clause in
Explaininga Rule 545

"If... age is greater than 17 and the patient is an alcoholic,


then..."

Here the relation is a social fact; if the patient is not an adult, we assume
that he is not an alcoholic. The third relation we observe is a subtype, as
in

"If... the patient has undergone surgery and the patient has
undergone neurosurgery, then ... "

All screening relations can be expressed as rules, and some are, such as

"If" the patient has not undergone surgery, then the patient
has not undergone cardiac surgery."

(stated negatively, as is procedurally useful). The philosophy behind MY-


CINs rule set is inconsistent in this respect; to be economical and to make
the relationship between clauses explicit, all screening clauses should be
expressed as world fact rules or hierarchies of parameters. Indeed, the
age/alcoholic relation suggests that some of the relations are not defini-
tional and should be modified by certainty factors.
Viewed as a semantic network representation, MYCINsrules are links
without labels. Even when rules explicitly link problem features, the kind
of relation is not represented because MYCINs rule language does not allow
the link to be labeled. For example, a rule could state "If no CBCwas
taken, then WBCis not available," but MYCIN allows no way of saying that
WBCis a component of CBC. Finally, when one problem feature serves as
a redefinition of another, such as the relation between leukopenia and
WBC,the more abstract problem feature tends to be left out altogether.
"Leukopenia" is not a MYCINparameter; the rule mentions WBCdirectly,
another manifestation of knowledgecompilation. For purposes of explanation,
we argue that problemfeatures, their relations, and the nature of the link should
be explicit.
Returning to Figure 29-8, the action concepts, or diagnostic hypotheses
shown on the right-hand side, are part of a large hierarchy of causes that
the problemsolver will cite in the final diagnosis. The links in this diagnosis
space generally specify refinement of cause, although in our example they
strictly designate subclasses. Generally, problem features are abstractions
of patient states indicated by the observable symptoms, while the diagnosis
space is made up of abstractions of causal processes that produce the symp-
toms. Paralleling our observations about rule problem features, we note
that the relations amongdiagnostic hypotheses are not represented in MY-
CIN--nowherein the knowledge base does it explicitly state that E. coli is
a bacterium.
Now suppose that the knowledge in Figure 29-8 were available, how
would this help us to explain the leukopenia rule? The idea is that we first
546 Extensions to Rules for Explanationand Tutoring

restate the rule on a higher level. We point out that a low WBCindicates
leukopenia, which is a form of immunosuppression, thus tying the rule to
the familiar pattern that implicates gram-negative rods and Enterobacteri-
aceae. This is directly analogous to pointing out that tetracycline causes
teeth discoloration, which is a form of undesirable body change, suggesting
that the drug should not be given.
By re-representing Figure 29-8 linearly, we see that it is an expansion
of the original rule:

WBC< 2.5 --, leukopenia


--, immunosuppression

--, compromised host

--, infection by organisms found in body

--, gram-negative rods and Enterobacteriaceae

E. coli, Pseudomonas, and Klebsiella

The expansion marches up the problem feature hierarchy and then back
down the hierarchy of diagnoses. The links of this expansion involve caus-
ality composed with identification, subtype, and subset relations. By the
hierarchical relationships, a rule on one level "explains" the rule below it.
For example, the rule on level 3 provides the detail that links immuno-
suppression to the gram-negative rods. By generalizing, we have made a
connection to familiar concepts.
Tabular rules provide an interesting special case. The CSF protein rule
shown in Figure 29-9 appears to be quite formidable. Graphing this rule
as shown in Figure 29-10, we find a relatively simple relation that an expert
states as "If the protein value is less than 40, I think of viral infections; if
it is more than 100, I think of bacterial, fungal, or TB." This is the first
level of generalization, the principle that is implicit in the rule. The second
level elicited from the expert is "If the protein value is low, I think of an

RULE50O
(TheCSFProteinRule)
IF: 1) Theinfectionwhich
requires therapyis meningitis,
2) Alumbar puncture
hasbeen performed onthepatient,and
3) TheCSFproteinis known
THEN:Thetypeof theinfection is asfollows:
If theCSF
protein
is:
a)lessthan41then:
notbacterial(.5), viral(.7), notfungal (.6), nottb
b) between
41and100then:bacterial (.1), viral(.4), fungal
c) between
100and200then:bacterial (.3), fungal(.3), tb
d) between
200and300then: bacterial(.4), notviral(.5), fungal (.4), tb
e)greater
orequalto 300then:bacterial (.4), notviral(.6), fungal(.4), tb

FIGURE29-9 The CSF protein rule.


547

..~

II II II II
m t- > t~

ID v-

II
tn i---

i-

i
o
548 Extensionsto Rulesfor Explanation
andTutoring

acute process; if it is high, I think of a severe or long-term process. ~ Then,


at the highest level, the expert states, "An infection in the meninges stim-
ulates protein production." So in moving up abstraction hierarchies on both
the premise and action sides of the rule (acute and chronic are subtypes
of infection), we arrive at a mnemonic, just like "bacteria eat glucose."
Abstractions of both the observations and the conclusions are important
for understanding the rule.
Wemight be surprised that explanations of rules provide levels of detail
by referring to more general concepts. Weare accustomed to the fact that
principled theoretical explanations of, say, chemical phenomena, refer to
atomic properties, finer-grained levels of causality. Whyshould a rule ex-
planation refer to concepts like "compromised host" or "organisms nor-
mally found in the body"? The reason is that in trying to understand a
rule like the steroids rule, we are first trying to relate it to our understand-
ing of what an infection is at a high, almost metaphorical level. In fact,
there are lower-level "molecular" details of" the mechanismthat could be
explained, for example, how steroids actually change the immunological
system. But our initial focus as understanders is at the top level--to link
the problem feature (steroids) to the global process of meningitis infection.
Weask, "What makes it happen? What role do steroids play in the infec-
tious meningitis process?"
The concept of "compromised host" is a label for a poorly understood
causal pattern that has value because we can relate it to our understanding
of the infection process. It enables us to relate the steroids or WBCevi-
dence to the familiar metaphor in which infection is a war that is fought
by the body against invading organisms.

"If a patient is compromised, his or her defenses are down; he or she


is vulnerable to attack."

In general, causal rules argue that some kind of process has occurred. We
expect a top-level explanation of a causal rule to relate the premise of the
rule to our most general idea of the process being explained. This provides
a constraint for howthe rule should be generalized, the subject of the next
section.

29.3.4 Tying an Explanation to a Causal Model

MYCINsdiagnostic rules are arguments that a process has occurred in a


particular way. There are many kinds of infections, which have different
characteristics, but bacterial infections tend to follow the same script: entry
of an organism into the body, passage of the organism to the site of" infec-

Bacterialmeningitisis a severe, acute (short-term)problem,whilefungalandTBmeningitis


are problemsof long (chronic)duration.
Explaining a Rule 549

tion, reproduction of the organism, and causation of observable symptoms.


An explanation of a rule that concludes that an organism is causing an
infection must demonstrate that this generic process has occurred. In short,
this is the level of abstraction that the explanation must connect to.
A program was written to demonstrate this idea. The data parameters
in MYCINs 40 diagnostic rules for bacterial meningitis are restated as one
or more of the steps of the infectious process script. This restatement is
then printed as the explanation of the rule. For example, the programs
explanation of the rule linking alcoholism to Diplococcus meningitis is:
Thefact that the patientis analcoholicallowsaccessof organisms
from
the throat andmouthto lungs(by reaspirationof secretions).
Thefact that the patientis analcoholicmeans
that the patientis a
compromisedhost, andso susceptibleto infection.

Words in italics in the first sentence constitute the pattern of "portal and
passage." Wefind that the premise of a rule generally supplies evidence
for only a single step of the causal process; the other steps must be inferred
by default. For example, the alcoholic rule argues for passage of the Diplo-
coccus to the lungs. The person reading this explanation must know that
Diplococcus is normally found in the mouth and throat of any person and
that it proceeds from the lungs to the meninges by the blood. The organism
finds conditions favorable for growth because the patient is compromised,
as stated in the explanation. In contrast, the leukopenia rule only argues
for the patient being a compromisedhost, so the organisms are the default
organisms, those already in the body, which can proceed to the site of
7infection.
These explanations say which steps are enabled by the data. They place
the patient on the path of an infection, so to speak, and leave it to the
understander to fill in the other steps with knowledge of how the body
normally works. This is why physicians generally refer to the premise data
as "predisposing factors." To be understood, a rule must be related to the
prior steps in a causal process, the general concepts that explain many
rules.
The process of explanation is a bit more complicated in that causal
relations may exist between clauses in the rule. Wehave already seen that
one clause may screen another on the basis of world facts, muhicomponent
test relations, and the subtype relation. The program described here knows
these relations and "subtracts off" screening clauses from the rule. More-
over, as discussed in Section 29.4, some clauses describe the context in
which the rule applies. These, too, are made explicit for the explanation
program and subtracted off. In the vast majority of MYCINrules, only
one premise clause remains, and this is related to the process of infection
in the way described above.

7As physicians would expect, alcoholism also causes infection by gram-negative rods and
Enterobacteriaceae. Wehave omitted these for simplicity. However,this example illustrates that
a MYCIN rule can have multiple conclusions reached by different causal paths.
550 Extensionsto Rulesfor Explanation
andTutoring

Whenmore than one clause remains after the screening and contex-
tual clauses have been removed, our study shows that a causal connection
exists between the remaining clauses. Wecan always isolate one piece of
evidence that the rule is about (for example, WBCin the leukopenia rule);
we call this the key factor of the rule. Wecall the remaining clauses restriction
clauses, s There are three kinds of relations between a restriction clause and
a key factor:

A confirmed diagnosis explains a symptom. For example, a petechial rash


would normally be evidence for Neisseria, but if the patient has leukemia,
it maybe the disease causing the rash. Therefore, the rule states, "If the
patient has a petechial rash (the key factor) and does not have leukemia
(the restriction clause), then Neisseria maybe causing the meningitis."
Twosymptomsin combinationsuggest a different diagnosis than one taken alone.
For example, when both purpuric and petechial rashes occur, then a
virus is a more likely cause than Neisseria. Therefore, the petechial rule
also includes the restriction clause "the patient does not have a purpuric
rash."
Weakcircumstantial evidence is madeirrelevant by strong circumstantial evi-
dence. For example, a head injury so strongly predisposes a patient to
infection by skin organisms that the age of the patient, a weak circum-
stantial factor, is madeirrelevant.

In summary, to explain a causal rule, a teacher must know the purposes


of the clauses and connect the rule to abstractions in the relevant process
script.

29.3.5 The Relation of Medical Heuristics to


Principles

It might be argued that we must go to so much trouble to explain MYCINs


rules because they are written on the wrong level. Now that we have a
"theory" for which intermediate parameters to include ("portal," "pathway,"
etc.), why dont we simply rewrite the rules?
The medical knowledge we are trying to codify is really on two levels
of detail: (1) principles or generalizations, and (2) empirical details or
cializations. MYCINsrules are empirical. Cleaning them up by represent-
ing problem feature relationships explicitly would give us the same set of
rules at a higher level. But what would happen if process concepts were
incorporated in completely new reasoning steps, for example, if the rule
set related problem features to hypotheses about the pathway the organism
took through the body? It turns out that reasoning backwards in terms of

SRestrictionclausesare easyto detect whenexamining


the rule set becausethey are usually
stated negatively.
TeachingProblem-Solving
Strategy 551

a causal model is not always appropriate. As we discovered when explaining


the rules, not all of the causal steps of the process can be directly con-
firmed; we can only assume that they have occurred. For example, rather
than providing diagnostic clues, the concept of "portal of entry and pas-
sage" is very often deduced from the diagnosis itself.
According to this view, principles are good for summarizing argu-
ments, and good to fall back on when youve lost grasp on the problem,
but they dont drive the process of medical reasoning. Specifically, (1) if a
symptomneeds to be explained (is highly unusual), we ask what could cause
("Strep-viridans? It is normally found in the mouth. Howdid it get to the
heart? Has the patient had dental work recently?"); (2) to "prove" that the
diagnosis is correct (after it has been constructed), we use a causal argument("He
has pneumonia; the bacteria obviously got into the blood from the lungs.").
Thus causal knowledge can be used to provide feedback that everything
fits.
It maybe difficult or impossible to expect a set of diagnostic rules both
to serve as concise, "clincher" methods for efficiently getting to the right
data and still to represent a model of disease. Put another way, a student
may need the model if he or she is to understand new associations between
disease and manifestations, but will be an inefficient problem solver if he
or she always attempts to convert that model directly to a subgoal structure
for solving ordinary problems. Szolovits and Pauker (1978) point out that
these "first principles" used by a student are "compiled out" of an experts
reasoning.
In meningitis diagnosis, the problem is to manage a broad, if not
incoherent, hypothesis set, rather than to pursue a single causal path. The
underlying theory recedes to the background, and the expert tends to
approach the problem simply in terms of weak associations between ob-
served data and bottom-line conclusions. This may have promoted a rule-
writing style that discouraged introducing intermediate concepts such as
leukopenia, even where they might have been appropriate.

29.4Teaching Problem-Solving Strategy

A strategy is an approach for solving a problem, a plan for ordering meth-


ods so that a goal is reached. It is well accepted that strategic knowledge
must be conveyed in teaching diagnostic problem solving. As Brown and
Goldstein (1977) say:

Withoutexplicit awarenessof the largely tacit planning and strategic


knowledgeinherent in each domain,it is difficult for a person to "makesense
of" manysequencesof behavior as described by a story, a set of instructions,
a problemsolution, a complexsystem, etc .... The teacher should articulate
552 Extensionsto Rulesfor Explanation
andTutoring

fi)r that domainthe higher-order planning knowledgeand strategic knowl-


edge for formulating and revising hypotheses about what something means.

Strategic knowledge is general, much like the principles of mechanism we


discussed earlier; both relate to processes that have structure. Thus it is
not sufficient to show a student only MYCINssolution, the surface structure
of the program; we must explain why the rules are invoked in a particular
order.
Here it is clear how teaching how to do something differs from merely
explaining how something was done: we want the student to be able to
replicate what he or she observes, to solve similar problems independently.
This is why mnemonicsare useful when justifying a rule. Regarding strat-
egy, we must again address human foibles and preference: we must teach
a strategy that a humancan follow.
The main points of this section are:

MYCINs strategy is different from a physicians strategy;


MYCINsstrategic knowledge is embeddedin the rules, indistinguishable
from screening and problem feature clauses;
A domain-independent representation of strategy is useful for teaching
and for purposes of economy.

29.4.1 Surface and Deep Structure of MYCIN

A person ~rying to understand a MYCIN consultation observes that pieces


of data are requested by the program as shown in Figure 29-11. Concep-
tually, these questions are terminals hanging below an ANDnode in a
subgoal tree, as shown in Figure 29-12. Following the terminology of
Brownand Goldstein (1977), a rule node is method for ac hieving a goal
(e.g., "organisms that might be causing the infection") by asking questions
or pursuing a set of subgoals to achieve. Therefore, the tree of rules and
subgoals is part of a deep-structured trace that they postulate is constructed
when the understander makes sense of the surface problem solution.
It is not sufficient [or a student to knowall of the possible methods
he or she can bring to bear on a problem. He or she generally needs a
plan for solving the problem, that is, needs schemata for kinds of problems
that can be tackled using different approaches or lines of reasoning. A plan
sets up a rational sequence of methods that might get you closer to the
solution, but without guarantees.
The hypothetico-deductive strategy used in medical problem solving
constitutes a plan for focusing on hypotheses and selecting confirmatory
questions (Elstein et al., 1978). However, the methods selected in Figure
29-12 (Rules 511 through 578) have been applied in a fixed, arbitrary
order--not planned by the rule author. MYCINhas no "deep structure"
Teaching Problem-Solving Strategy 553

31) HasPt538ever undergone anytype of surgery?


** YES
32) DoesPt538 have a history of NEUROSURGERY?
** NO
33i DoesPt538live in a crowdedenvironment?
** NO
34) Doyoususpectrecent alcoholic history in Pt538?
** NO
35) Is meningitisa hospital-acquired
infection?
** YES
36) Is Pt538sclinical history consistentwith EPIGLOTTITIS?
** NO
37) Is Pt538sclinical history consistentwith OTITIS-MEDIA?
** NO
38) HasPt538ever undergonesplenectomy?
** NO
39) Is Pt538a burnpatient?
** YES

FIGURE 29-11 Excerpt from a MYCIN consultation showing


requests for relevant data.

GOAL HYPOTHESIS METHOD QUESTION

, E.COLI (Rule511) Q32 NEUROSURGERY

N.MENINGITIDIS (Rule533) Q33CROWD

(Rule536) Q34ALCOHOLIC
PNEUMOCOCCUS
COVERFOR (Rule559) Q38SPLENECTOMY

(Rule545) Q35NOSOCOMIAL

H.INFLUENZA
Q36 EPIGLOTTITIS
(Rule395)
Q37OTITIS-MEDIA

-- PSEUDOMONAS (Rule578) Q39BURN

FIGURE 29-12 Portion of the AND/OR tree corresponding to


the questions shown in Figure 29-11 (reorganized according to
the hypothesis each rule supports).
554 Extensions to Rules for Explanationand Tutoring

RULE092 (TheGoalRule)
IF: 1) Gather
information
aboutcultures
takenfromthepatientandtherapy
heis receiving,
2) Determine
if theorganisms
growing
onculturesrequire
therapy
3) Consider
circumstantial
evidence
foradditionalorganisms
thattherapy
shouldcover
THEN: Determinethe besttherapyrecommendation

RULE535
(TheAlcoholicRule)
IF: 1) Theinfectionwhichrequires therapyis meningitis,
2) Only circumstantial
evidenceis available
forthiscase,
3) Thetypeofmeningitis is bacterial,
4) Theageof thepatient is greaterthan17years,and
S) Thepatient is analcoholic,
THEN: There is evidencethattheorganisms whichmightbecausing
theinfection
are
diplococcus-pneumoniae
(.3) ore.coli(.2)

FIGURE29-13 The goal rule and the alcoholic rule.

plan at this level; the program is simply applying rules (methods) exhaus-
tively. This lack of similarity to human reasoning severely limits the use-
fulness of the system for teaching problem solving.
However, MYCINdoes have a problem-solving strategy above the level
of rule application, namely the control knowledge that causes it to pursue
a goal at a certain point in the diagnosis. We can see this by examining
how rules interact in backward chaining. Figure 29-13 shows the goal rule
and a rule that it indirectly invokes. In order to evaluate the third clause
of the goal rule, MYCINtries each of the COVERFOR rules; the alcoholic
rule is one of these (see also Figure 29-12). Wecall the goal rule ta sk ru le
to distinguish it from inference rules. Clause order counts here; this is
more a procedure than a logical conjunction. The first three clauses of the
alcoholic rule, the context clauses, also control the order in which goals are
pursued, just as is true for a task rule. We can represent this hidden struc-
ture of goals by a tree which we call the inference structure of the rule base
(produced by "hanging" the rule set from the goal rule). Figure 29-14
91
illustrates part of MYCINsinference structure.
The programs strategy comes to light when we list these goals in the
order in which the depth-first interpreter makes a final decision about
them. For example, since at least one rule that concludes "significant" (goal
4 in Figure 29-14) mentions "contaminant" (goal 3), MYCINapplies all of
the "contaminant" rules before making a final decision about "significant."
Analyzing the entire rule set in a similar way gives us the ordering (shown
in Figure 29-14):

9Somedefinitions of terms used in the following discussion: TREATFOR = organisms to be


treated, based on direct laboratory observation; COVERFOR = organisms to be treated,
based on circumstantial evidence; SIGNIFICANT = this organism merits therapeutic atten-
tion, based on the patients degree of sickness and validity of culture results; CONTAMI-
NANT = the finding of this organism is spurious; it was probably introduced during sam-
piing from the cultured site of the body, as a blood culture might include skin organisms.
lWe leave out the goals REGIMEN and TREATFOR because they are just placeholders for
task rules, like subroutine names.
TeachingProblem-Solving
Strategy 555

REGIMEN
= main goal

/
rule92, = TREATFOR
C~)9~ERFOR= rule92,

(2)
c,au
WHAT-INF?SIGNIFICANT?IDENTITY?
(4)
MENINGITIS? BA~T)ERIA
(7)
L?

INFECTION?CONTAMINANT? INFECTION?
(1) (3) (6)

FIGURE29-14 Portion of MYCINsinference structure.


(Numbersgive the order in which nonplaceholder goals are
achievedby the depth-first interpreter.)

1. Is there an infection?
2. Is it bacteremia, cystitis, or meningitis?
3. Are there any contaminated cultures?
4. Are there any good cultures with significant growth?
5. Is the organism identity known?
6. Is there an infection? (already done in Step 1)
7. Does the patient have meningitis? (already done in Step 2)
8. Is it bacterial?
9. Are there specific bacteria to cover for?

MYCINsdiagnostic plan is in two parts, and both proceed by top-


down refinement. This demonstrates that a combination of structural
knowledge (the taxonomy of the diagnosis space--infection, meningitis,
bacterial, Diplococcus ... ) and strategic knowledge(traversing the taxon-
omy from the top down) is procedurally embedded in the rules. In other
words, we could write a program that interpreted an explicit, declarative
representation of the diagnosis taxonomy and domain-independent form
of the strategy to bring about the same effect.
At this level, MYCINsdiagnostic strategy is not a complete model of
how physicians think, but it could be useful to a student. As the quote from
Brown and Goldstein would indicate, and as has been confirmed in GUI-
DONresearch, teachers do articulate both the structure of the problem
556 Extensionsto Rulesfor Explanation
andTutoring

META-RULE002
IF: 1)Theinfection
is pelvic-abscess,
and
2)Thereareruleswhich
mention intheirpremise
enterobacteriaceae,
and
3)Thereareruleswhich
mentionintheirpremise
gram-positive
rods,
THEN:There
is suggestive
evidence
(.4) thattheformer
should
bedone
before
thelatter

FIGURE29-15 A MYCINmeta-rule.

space and the nature of the search strategy to students. This means that
we need to represent explicitly the fact that the diagnosis space is hierar-
chical and to represent strategies in a domain-independent form. If a strat-
egy is not in domain-independent form, it can be taught by examples, but
not explained.

29.4.2 Representing Strategic Knowledge in


Meta-Rules

How might we represent domain-independent strategic knowledge in a


rule-based system? In the context of the MYCINsystem, Davis pursued
the representation of strategic knowledge by using meta-rules to order and
prune methods (Chapter 28). These meta-rules are invoked just before the
object-level rules are applied to achieve a goal. An example of an infectious
disease meta-rule is shown in Figure 29-15 (see Figure 28-12 for other
examples). Observe that this is a strategy for pursuing a goal. In particular,
this meta-rule might be associated with the goal "identity of the organism."
It will be invoked to order the rules for every subgoal in the search tree
below this goal; in this simple way, the rule sets up a line of reasoning.
This mechanism causes some goals to be pursued before others, orders
the questions asked by the system, and hence changes the surface structure
of the consultation.
Although meta-rules like this can capture and implement strategic
knowledge about a domain, they have their deficiencies. Like the perfor-
mance rules we have examined, Daviss domain-dependent examples of
meta-rules leave out knowledge important for explanation. Not only do
they leave out the domain-specific support knowledge that justifies the
rules, they leave out the domain-independent strategic principles that GUI-
DONshould teach. In short, meta-rules provide the mechanism for con-
trolling the use of rules, but not the domain-independent language for
makingthe strategy explicit.
The implicit strategic principle that lies behind Meta-Rule 002 is that
commoncauses of a disorder should be considered first. The structural
knowledgethat ties this strategy to the object-level diagnostic rules is an
explicit partitioning of the diagnosis space taxonomy, indicating that the
group of organisms called Enterobacteriaceae are more likely than gram-
TeachingProblem-Solving
Strategy 557

positive rod organisms to cause pelvic infections. This is what we want to


teach the student. One can imagine different commoncauses for different
infection types, requiring different meta-rules. But if all meta-rules are as
specific as Meta-Rule 002, principles will be compiled into manyrules re-
dundantly and the teaching points will be lost.
What does a domain-independent meta-rule look like, and how is it
interfaced with the object-level rules? To explore this question, we have
reconfigured the MYCINrule base into a new system, called NEOMYCIN
(Clancey and Letsinger, 1981). Briefly, meta-rules are organized hierar-
chically (again!) into tasks, such as "group and refine the hypothesis space."
These rules manage a changing hypothesis list by applying different kinds
of knowledge sources, as appropriate. Knowledge sources are essentially
the object-level rules, indexed in the taxonomy of the diagnosis space by
a domain-independent structural language.
For example, one meta-rule for achieving the task of pursuing a hy-
pothesis is "If there are unusual causes, then pursue them.11 Suppose that
the current hypothesis is "bacterial meningitis." The program will use the
structural label "unusual causes" to retrieve the nodes "gram-negative
rods," "enterobacteriaceae," and "listeria," add them to the hypothesis list,
and pursue them in turn. Whenthere are no "unusual causes" indicated,
the meta-rule simply does not apply. Pursuing gram-negative rods, the
programwill find that leukopenia is a relevant factor, but will first ask if
the patient is a compromised host (Figure 29-8), modeling a physicians
efficient casting of wider questions.
Other terms in the structural language used by NEOMYCINs domain-
independent meta-rules are

1. process features, such as extent and location of disease;


2. the enabling step of a causal process;
3. subtype;
4. cause;
5. trigger association;
6. problem feature screen; and structural properties of the taxonomy, such
as sibling.

In effect, the layer of structural knowledgeallows us to separate out


what the heuristic is from how it will be used. Howdomain-specific heuristics
like MYCINsrules should be properly integrated with procedural, stra-
tegic knowledgeis an issue at the heart of the old "declarative/procedural

i iThis rule appearsafter the rule for consideringcommon


causes, andthe orderingis marked
as strategicallysignificant. Domain-independentmeta-ruleshavejustifications, organization,
andstrategies for usingthem.Theirjustification refers to propertiesof the searchspaceand
the processorscapabilities.
558 Extensionsto Rulesfor Explanation
andTutoring

controversy" (Winograd, 1975). We conclude here that, for purposes


teaching, the hierarchies of problem features and of the diagnosis space
should be represented explicitly, providing a useful means for indexing
the heuristics by both premise and action. A structural language of cause,
class, and process can connect this domain-specific knowledge to domain-
independent meta-rules, the strategy for problem solving.

29.4.3 Self-Referencing Rules

Self-referencing rules provide an interesting special exampleof how problem-


solving strategies can be embedded in MYCINsrules. A rule is self-ref-
erencing if the goal concluded by the action is also mentioned in the prem-
i2
ise. An example is the aerobicity rule shown in Figure 29-16.

RULE086
iF: I) Theaarobicity
oftheorganism
is notknown,
and
2) Theculture
wasobtained
more
than2 days
ago,
THEN: There
is evidence
thattheaerobicity
oftheorganism
is obligate-aerob
(.5) orfacultative
(.5)

FIGURE
29-16 The aerobicity rule.

This rule is tried only after all of the non-self-referencing rules have
been applied. The cumulative conclusion of the non-self-referencing rules
is held aside, then the self-referencing rules are tried, using in each rule
the tentative conclusion. Thus the first clause of Rule 86 will be true only
if none of the standard rules made a conclusion. The effect is to reconsider
a tentative conclusion. Whenthe original conclusion is changed by the self-
referencing rules, this is a form of nonmonotonic reasoning (Winograd,
1980). Wecan restate MYCINsself-referencing rules in domain-indepen-
dent terms:

If nothing has been observed, consider situations that have no visible manifesta-
tions. For example, the aerobicity rule: "If no organism is growing in the
culture, it may be an organism that takes a long time to grow (obligate-
aerob and facultative organisms)."
The self-referencing mechanismmakes it possible to state this rule with-
out requiring a long premise that is logically exclusive from the remain-
der of the rule set.

lZAerobicityrefers to whetheran organism can growin the presenceof oxygen.Afacultative


organismcan growwith or withoutoxygen;an anaerobicorganismcannot growwith oxygen
present; an~lan obligate-aerobis aerobiconlyin a certain stage of growth.Notethat the rule
is self-referencingin that aerobicityis mentionedin boththe premiseandthe conclusion.
Implicationsfor ModifiabilityandPerformance 559

If unable to make a deduction, assume the most probable situation. For


example: "If the gram stain is unknown and the organism is a coccus,
then assume that it is gram-positive."
If there is evidence for two hypotheses, A and B, that tend to be confused, then
rule out B. For example: "If there is evidence for TB and fungal, and
you have hard data for fungal, rule out TB."

Like Meta-Rule 002, self-referencing rules provide a useful mechanism


for controlling the use of knowledge, but they leave out both the domain-
dependent justification and the general, domain-independent reasoning
strategy of which they are examples. These rules illustrate that strategy .
involves more than a search plan; it also takes in principles for reasoning
about evidence. It is not clear that a teacher needs to state these principles
explicitly to a student. They tend to be either commonsense or almost
impossible to think about independently of an example. Nevertheless, they
are yet another example of strategic knowledge that is implicit in MYCINs
rules.

Implications for Modifiability and


~o90K Performance

MYCINachieved good problem-solving performance even without having


to reason about the structural, strategic, and support knowledge we have
been considering. However, there are situations in which knowledge of
justification and strategy allows one to be a more flexible problem solver,
to cope with novel situations, in ways that MYCINcannot. Knowing the
basis of a rule allows you to know when not to apply it, or how to modify
it for special circumstances. For example, knowing that tetracycline wont
kill the young patient but the infection might, you may have to dismiss
social ramifications and prescribe the drug. You can deliberately break the
rule because you understand the assumptions underlying it.
There will also be problems that cannot be diagnosed using MYCINs
rules. For example, several years ago Coccidioides meningitis strangely ap-
peared in the San Francisco Bay Area. Wewould say that this "violates all
the rules." To explain what was happening, one has to reason about the
underlying mechanisms. The organisms were traveling from the San Joa-
quin Valley to the Bay Area by "freak southeastern winds," as the news-
papers reported. The basic mechanismof disease was not violated, but this
time the patients didnt have to travel to the Valley to come in contact with
the disease. A humanexpert can understand this because he or she can fit
the new situation to the model. Examples like these make us realize that
560 Extensionsto Rulesfor Explanation
andTutoring

AI systems like MYCINcan only perform some of the functions of an


expert.
Regarding modifiability, the process of reconfiguring MYCINsrules
in NEOMYCINs terms required many hours of consultation with the orig-
inal rule authors in order to unravel the rules. As shownin this paper, the
lack of recorded principles for using the representation makes it difficult
to interpret the purposes of clauses and rules. The strategy and overall
design of the program have to be deduced by drawing diagrams like Figure
29-14. Imagine the difficulty any physician new to MYCINwould have
modifying the CSF protein table (Figure 29-9); clearly, he or she would
first need an explanation from the program of why it is correct.
Wealso need a principled representation to avoid a problem we call
concept broadening. Whenintermediate problem abstractions are omitted,
use of goals becomes generalized and weakened. This happened in MY-
CIN as the meaning of "significance" grew to include both "evidence of
infection" and "noncontaminated cultures." As long as the rule author
makes an association between the data and some parameter he or she wants
to influence, it doesnt matter for correct performance that the rule is
vague. But vague rules are difficult to understand and modify.
A rule base is built and extended like any other program. Extensive
documentation and a well-structured design are essential, as in any engi-
neering endeavor. The framework of knowledge types and purposes that
we have described would constitute a "typed" rule language that could
make it easier for an expert to organize his or her thoughts. On the other
hand, we must realize that this meta-level analysis may impose an extra
burden by turning the expert into a taxonomist of his or her own knowl-
edge--a task that may require considerable assistance, patience, and tools.

Application of the Frameworkto Other


29.6 Systems

To illustrate further the idea of the strategy, structure, and support frame-
work and to demonstrate its usefulness for explaining how a program
reasons, several knowledge-based programs are described below in terms
of the framework. For generality, we will call inference associations such
as MYCINsrules knowledge sources (KSs). Wewill not be concerned here
with the representational notation used in a program, whether it be frames,
production rules, or something else. Instead, we are trying to establish an
understanding of the knowledge contained in the system: what kinds of
inferences are madeat the KSlevel, how these KSs are structured explicitly
in the system, and how this structure is used by strategies for invoking
KSs. This is described in Table 29-1.
561

..~
,.o
..~
,.o
,1
o

o ~- ++-
~~ ~.~ ~.~

+~ ~ .-

~ ~ ~.~

0
~ &~ ~.~ .
~N =.n
~.~ ~ "=
~~ =

+-
~,.~
0
8~
~~++
~o g "
i i
+:

--.- m m :B.~ ~- +.~m m

<~

11
~= z~
~~
562 Extensionsto Rulesfor Explanation
andTutoring

29.6.1 The Character of Structural Knowledge

One product of this study is a characterization of different ways of struc-


turing KSs for different strategical purposes. In all cases, the effect of the
structural knowledge is to provide a handle for separating out what the
13
KSis from when it is to be applied.
The different ways of structuring KSs are summarized here according
to the processing rationale:

OrganizeKSs hierarchically by hypothesis for consistency in data-directed inter-


pretation. In DENDRAL, if a functional group is ruled out, more specific
membersof the family are not considered during forward-directed, pre-
liminary interpretation of spectral peaks. Without this organization of
KSs, earlier versions of DENDRAL could generate a subgroup as a plau-
sible interpretation while ruling out a more general form of the
subgroup, as if to say "This is an ethyl ketone but not a ketone." (Bu-
chanan et al., 1970).
OrganizeKSs hierarchically by hypothesis to eliminate redundanteffort in hy-
pothesis-directed refinement. In DENDRAL, the family trees prevent the
exhaustive structure generator from generating subgroups whose more
general forms have been ruled out. The same principle is basic to most
medical diagnosis systems that organize diagnoses in a taxonomy and
use a top-down refinement strategy, such as CENTAUR and NEOMY-
CIN.
OrganizeKSs by multiple hypothesis hierarchies for efficient grouping (hypoth-
esis-space splitting). Besides using the hierarchy of generic disease pro-
cesses (infectious, cancerous, toxic, traumatic, psychosomatic, etc.), NEO-
MYCINgroups the same diseases by multiple hierarchies according to
disease process features (organ system involved, spread in the system,
progression over time, etc.). Whenhypotheses are under consideration
that do not fall into one confirmed subtree of the primary etiological
hierarchy, the group and differentiate strategy is invoked to find a pro-
cess feature dimension along which two or more current hypotheses
differ. A question will then be asked, or a hypothesis pursued, to dif-
ferentiate among the hypotheses on this dimension.
Organize KSs for each hypothesis on the basis of how KS data relates to the
hypothesis, for focusing on problem features. In NEOMYCIN, additional re-
lations make explicit special kinds of connections between data and hy-
potheses, such as "this problem feature is the enabling causal step for
this diagnostic process," and meta-rules order the selection of questions
(invocation of KSs) by indexing them indirectly through these relations.
For example, "If an enabling causal step is knownfor the hypothesis to
be confirmed, try to confirm that problem feature." The meta-rules that

13Inthis section, the termhypothesisgenerallyrefers to a diagnosticor explanatory


interpre-
tation madeby a KS(in termsof somemodel),althoughit can also be a hypothesisthat
particular problemfeature is present.
Applicationof the Framework
to OtherSystems 563

reference these different relations ("enabling step," "trigger," "most likely


manifestation") are ordered arbitrarily. Meta-meta-rules dont order the
meta-rules because we currently have no theoretical basis for relating
the first-order relations to one another.
OrganizeKSs into data~hypothesislevels for opportunistic triggering at multiple
levels of interpretation. HEARSAYs blackboard levels (sentence, word se-
quence, word, etc.) organize KSs by the level of analysis they use for
data, each level supplying data for the hypothesis level above it. When
new results are posted on a given level, KSs that "care about" that level
of analysis are polled to see if they should be given processing time.
Policy KSs give coherence to this opportunistic invocation by affecting
which levels will be given preference. CRYSALIS (Engelmore and Terry,
1979) (a program that constructs a three-dimensional crystal structure
interpretation of x-ray crystallographic data) takes the idea a step further
by having multiple planes of blackboards; one abstracts problem fea-
tures, and the other abstracts interpretations.
Organize KSs into a task hierarchy for planning. In MOLGEN, laboratory
operators are referenced indirectly through tasks that are steps in an
abstract plan. For example, the planning level design decision to refine the
abstract plan step MERGE is accomplished by indexing laboratory op-
erators by the MERGE task (e.g., MERGE could be refined to using
ligase to connect DNAstructures, mixing solutions, or causing a vector
to be absorbed by an organism). Thus tasks in planning are analogous
to hypotheses in interpretation problems.
OrganizeKSs into a context specialization hierarchy for determiningtask rele-
vance. In AM,relevant heuristics for a task are inherited from all con-
cepts that appear above it in the specialization hierarchy. Thus AMgoes
a step beyond most other systems by showing that policy KSs must be
selected on the basis of the kind of problem being solved. Lenats work
suggests that this might be simply a hierarchical relationship among
kinds of problems.

The above characterizations of different organizations for knowledge are


a first step toward a vocabulary or language for talking about indirect
reference of KSs. It is clear that strategy and structure are intimately re-
lated; to make this clearer, we return to the earlier topic of explanation.
Teaching a strategy might boil down to saying "think in terms of such-
and-such a structural vocabulary in order to get this strategical task
done"--where the vocabulary is the indexing scheme for calling KSs to
mind. So we might say, "Think in terms of families of functional subgroups
in order to rule out interpretations of the spectral peaks." Or, "Consider
process features when diseases of different etiologies are possible." That
is, teaching a strategy involves in part the teaching of a perspective for relating
KSs hierarchically (e.g., "families of functional subgroups" or "disease proc-
ess features") and then showing how these relations provide leverage for man-
aging a large amountof data or a large number of hypotheses. The explanation
564 Extensionsto Rulesfor Explanation
andTutoring

of the sought-after leverage must be in terms of some task for carrying


the problem forward, thus tying the structuring scheme to the overall pro-
cess of what the problem solver is trying to do. Thus we say "to rule out
interpretations" or "to narrow downthe problem to one etiological process"
or (recalling Figure 29-4) "to broaden the spectrum of possibilities." In this
way, we give the student a meta-rule that specifies what kind of vocabulary
to consider for a given strategical task.
Davis study of meta-rules (Chapter 28) suggested a need for a vocab-
ulary of meta-rule knowledge. His examples suggested just a few concep-
tual primitives for describing refinement (ordering and utility of KSs) and
a few primitives for describing object-level knowledge (KS input and out-
put). All of the strategies in our examples deal with ordering and utility
criteria for KSs; so we have nothing to add there. All of the examples
given here reference KSs by the data they act upon, the hypotheses they
support or the tasks they accomplish, except for AM, which references
KSs by their scope or domain of applicability. What is novel about the
analysis here is the focus on relations among hypotheses and among data.
From our domain-independent perspective, strategical knowledge
selects KSs on the basis of the causal, subtype, process, or scoping relation
they bear to hypotheses or data currently thought to be relevant to the
problem at hand. Thus our meta-rules make statements like these:

1. "Consider KSs that would demonstrate a prior cause for the best
hypothesis."
2. "Dont consider KSs that are subtypes of ruled-out hypotheses."
3. "Consider KSs that abstract known data."
4. "Consider KSs that distinguish between two competing kinds of
processes."
5. "Consider KSs relevant to the current problem domain."

To summarize, the structural knowledge we have been studying con-


sists of relations that hierarchically abstract data and hypotheses. These
relations constitute the vocabulary by which domain-independent meta-
rules invoke KSs. The key to our analysis is our insistence on domain-
independent statement of meta-rules--a motivation deriving from our in-
terest in explanation and teaching.

29.6.2 Explicitness of Strategical Knowledge

Another consideration for explanation is whether or not the strategy for


invoking KSs is explicit. To someextent, system designers are not generally
interested in representing high-level strategies that are always in effect and
never reasoned about by the program. Instead, they are satisfied if their
system can be programmed in the primitives of their representation lan-
guage to bring about the high-level effect they are seeking. For example,
Application of the Frameworkto Other Systems 565

top-down refinement is "compiled into" CENTAURs hierarchy itself by


the control steps that specify on each level what to do next (e.g., "After
confirming obstructive airways disease, determine the subtype of obstruc-
tive airways disease."). By separating control steps from disease inferences,
Aikins improved the explanation facility, one of the goals of CENTAUR.
However,the rationale for these control steps is not represented--it is just
as implicit as it was in PUFFscontextual clauses. In contrast, NEOMYCINs
"explore and refine" task clearly implements top-down refinement through
domain-independent meta-rules. However, these meta-rules are ordered
to give preference to siblings before descendents--an example of an im-
plicit strategy.
One commonway of selecting KSs is on the basis of numerical mea-
sures of priority, utility, interestingness, etc. For example, CENTAUR, like
manymedical programs, will first request the data that give the most weight
for the disease under consideration. Thus the weight given to a KS is
another form of indexing by which a strategy can be applied. If we wish
to explain these weights, we should ideally replace them by descriptors that
"generate" them, and then have the strategy give preference to KSs having
certain descriptors. NEOMYCINs meta-rules for requesting data (de-
scribed above) are a step in this direction.
MOLGENs"least-commitment" meta-strategy is a good example of
implicit encoding by priority assignment. The ordering of tasks specified
by least commitmentis "Look first for differences, then use them to sketch
out an abstract plan, and finally refine that plan .... " This ordering of
tasks is implicit in the numerical priorities that Stefik has assigned to the
design operators in MOLGEN.Therefore, an explanation system for
MOLGEN could not explain the least-commitment strategy but could only
say that the program performed one task before another because the prior-
ity was higher for the former.

29.6.3 Absence of Support Knowledge

Wehave little to say about support knowledge in these systems because


none of them represents it. That is, the causal or mathematical models,
statistical studies, or world knowledgethat justifies the KSs is not used
during reasoning. As discussed in Section 29.5, this limitation calls into
question the problem-solving flexibility or "creativeness" of these pro-
grams. In any case, the knowledge is not available for explanation.

29.6.4 Summary

The strategy/structure/support framework can be applied to any knowl-


edge-based system by asking certain questions: What are the KSs in the
system, i.e., what kinds of recognition or construction operations are per-
566 Extensionsto Rulesfor Explanation
andTutoring

formed? Howare the KSs labeled or organized, by data/constraint or by


hypothesis/operation? Is this indexing used by the interpreter or by explicit
strategical KSs, or is it just an aid for the knowledge engineer? What
theoretical considerations justify the KSs? Is this knowledge represented?
With this kind of analysis, it should be clear how the knowledge repre-
sented needs to be augmented or decomposed if an explanation facility is
to be built for the system. Quite possibly, as in MYCIN,the representational
notation will need to be modified as well.

29.7 Conclusions

The production rule formalism is often chosen by expert system designers


because it is thought to provide a perspicuous, modular representation.
But we have discovered that there are points of flexibility in the represen-
tation that can be easily exploited to embed structural and strategic knowl-
edge in task rules, context clauses, and screening clauses. Arguing from a
teachers perspective, we showed that hierarchies of problem features and
diagnoses, in addition to a domain-independent statement of strategy, are
useful to justify rules and teach approaches for using them. Also, when a
rule is causal, satisfactory explanations generalize the rule in terms of an
underlying process model. This same knowledge should be made explicit
for purposes of explanation, ease of modification, and potential improve-
ment of problem-solving ability.
Characterizing knowledge in three categories, we concluded that MY-
CINs rules were used as a programming language to embed strategic and
structural principles. However, while context and screening clauses are
devices that dont precisely capture the paths of expert reasoning, the basic
connection between data and hypothesis is a psychologically valid associ-
ation. As such, the "core rules" represent the experts knowledgeof causal
processes in proceduralized form. Their knowledge is not necessarily com-
piled into this form, but may be compiled with respect to causal models
that may be incomplete or never even learned. For this reason, support
knowledge needs to be represented in a form that is somewhat redundant
to the diagnostic associations, while structure and strategy can be directly
factored out and represented declaratively.
The lessons of this study apply to other knowledge-based programs,
including programs that do not use the production rule representation.
The first moral is that one cannot simply slap an interactive front end onto
a good AI program and expect to have an adequate teaching system. Sim-
ilarly, an explanation system may have to do more than just read back
reasoning steps and recognize questions: it may be useful to abstract the
Conclusions 567

reasoning steps, relating them to domain models and problem-solving


strategies.
Other knowledge bases could be studied as artifacts to evaluate the
expressiveness of their representation. Is the design of the inference struc-
ture explicit? Can it be reasoned about and used for explanation? Where
are the choice points in the representation and what principles for their
use have not been represented explicitly? For rule-based systems one
should ask: What is the purpose of each clause in the rule and why are
clauses ordered this way? Whyis this link between premise and conclusion
justified? Under what circumstances does this association come to mind?
Finally, future knowledge engineering efforts in which the knowledge
of experts is codified could benefit from an epistemology that distinguishes
KSs from meta-level knowledge of three kinds--strategy, structure, and
support knowledge. Relative to that framework, then, it makes sense to ask
about the appropriateness of representing knowledge using rules, units,
or other notations. Whenthe system fails to behave properly, changes to
either the epistemology or the rules should be entertained. In fact, this is
a cyclic process in which changes are made to the rules that subtly tear at
the framework, and after incorporating a series of changes, a new, better
epistemology and revised notation can be arrived at. (For example, a single
MYCINrule might seem awkward, but a pattern such as 40 rules having
the same first 3 clauses suggests some underlying structure to the knowl-
edge.) Thus a methodology for converging on an adequate epistemology
comes in part from constant cycling and reexamining of the entire system
of rules.
The epistemology that evolved from attempts to reconfigure MYCINs
rules is NEOMYCINs etiological taxonomy, multiple disease process hier-
archies, data that trigger hypotheses, etc., plus the domain-independent
task hierarchy of meta-rules. In our use of terms like "problem feature,"
we have moved very far from MYCINstoo abstract concept of "clinical
parameter," which did not distinguish between data and hypotheses. Our
epistemology provides an improved basis for interpreting expert reason-
ing, a valuable foundation for knowledge engineering, as echoed by Swan-
son et al. (1977):

Three aspects of the experts adaptation are especially important to the


design of decision support systems: the generative role of basic principles of
pathophysiology, the hierarchical structure of disease knowledge, and the
heuristics used in coping with information processing demands.

These categories of knowledge provide a framework for understanding an


expert.fWe ask, "What kind of knowledge is the expert describing?" This
framework enables us to focus our questions so that we can separate out
detailed descriptions of the experts causal model from both the associa-
tions that link symptomto disorder and the strategies for using this knowl-
edge.
568 Extensionsto Rulesfor Explanation
andTutoring

29.8Postscript: How the Rule Formalism Helped

Despite some apparent shortcomings of MYCINsrule formalism noted in


this chapter and throughout the book, we must remember that the pro-
gram has been influential because it works well. The uniformity of rep-
resentation has been an important asset. With knowledge being so easy to
encode, it was perhaps the simple parameterization of the problem that
made MYCINsuccessful. The program could be built and tested quickly
at a time when little was knownabout building expert systems. Finally, the
explicit codification of medical knowledge, nowtaken for granted in expert
systems, allows examination of, and improvement upon, the knowledge
structures.
PART TEN

Evaluating
Performance
3O
The Problem of Evaluation

Early in the development of MYCINwe felt the need to assess formally


the programs performance. By 1973 we had already run perhaps a
hundred cases of bacteremia through the system, revising the knowledge
base as needed whenever problems were discovered. At the weekly project
meetings Cohen and Axline were increasingly impressed by the validity of
the programs recommendations, and they encouraged the design of an
experiment to assess its performance on randomly selected cases of bac-
teremic patients. There was a uniform concern that it would be inadequate
to assess (or report) the work on the basis of anecdotal accolades alone--
an infi)rmal approach to evaluation for which many efforts in both AI and
medical computer science had been criticized.

30.1Three Evaluations of MYCIN

Shortliffe accordingly designed and executed an experiment that was re-


ported as a chapter in his dissertation (Shortliffe, 1974). Five faculty and
fellows in the Stanford Division of Infectious Diseases were asked to review
and critique 15 cases for which MYCINhad offered therapy advice. Each
evaluator ran the first of the 15 cases through MYCINhimself (in order
to get a feeling for how the program operated) and was then given print-
outs showing the questions asked and the advice generated for each of the
other 14 cases. Questions were inserted at several places in the typescripts
so that we could assess a variety of features of the program:

its ability to decide whether a patient required treatment;


its ability to determine the significance of isolated organisms;
its ability to determine the identity of organisms judged significant;
its ability to select therapy to cover for the list of most likely organisms;
its overall consultative performance.

571
572 TheProblemof Evaluation

The design inherently assumed that the opinions of" recognized ex-
perts provided the "gold standard" against which the programs perfor-
mance should be assessed. For reasons outlined below, other criteria (such
as the actual organisms isolated or the patients response to therapy) did
not seem appropriate. Despite the encouraging results of this experiment
(hereafter referred to as Study 1), several problems were discovered during
its execution:

The evaluators complained that they could not get an adequate "feel"
for the patients by merely reading a typescript of the questions MYCIN
asked (and they therefore wondered how the program could do so).
Because the evaluators knew they were assessing a computer program,
there was evidence that they were using different (and perhaps more
stringent) criteria for assessing its performance than they would use in
assessing the recommendations of a human consultant.
MYCINs"approval rating" of just under 75% was encouraging but in-
tuitively seemed to be too low for a truly expert program; yet we had
no idea how high a rating was realistically achievable using the gold
standard of" approval by experts;
The time required from evaluators was seen to be a major concern; the
faculty and fellows agreed to help with the study largely out of curiosity,
but they were all busy with other activities and some of them balked at
the time required to thoroughly consider the typescripts and treatment
plans for all 15 cases.
Questions were raised regarding the validity of a study in which the
evaluators were drawn from the same environment in which the pro-
gram was developed; because of regional differences in prescribing hab-
its and antimicrobial sensitivity patterns, some critics urged a study de-
sign in which MYCINsperformance in settings other than Stanford
could be assessed.

Many of these problems were addressed in the design of our second


study, also dealing with bacteremia, which was undertaken in the mid-
1970s and for which a published report appeared in 1979 (Yu et al., 1979a).
This time the evaluators were selected from centers around the country
(five from Stanford, five from other centers) and were paid a small hon-
orarium in an effort to encourage them to take the time required to fill
out the evaluation forms. Because the evaluators did not have an oppor-
tunity to run the MYCIN program themselves, we deemphasized the actual
appearance of a MYCIN typescript in this study (hereafter referred to as
Study 2). Instead, evaluators were provided with copies of each of" the 15
patients charts up to the time of the therapy decisions (with suitable pre-
cautions taken to preserve patient anonymity). They once again knew they
were evaluating a computer program, however. In addition, although the
A Summary
of EvaluationConsiderations 573

forms were designed to allow evaluators to fill them out largely by using
checklists, the time required to complete them was still lengthy if the phy-
sician was careful in the work, and there were once again long delays in
getting the evaluation forms back for analysis. In fact, despite the "moti-
vating honorarium," some of the evaluators took more than 12 months to
return the booklets.
Although the MYCINknowledge base for bacteremia had been con-
siderably refined since Study 1, we were discouraged to find that the results
of Study 2 once again showed about 75% overall approval of the programs
advice. It was clear that we needed to devise a study design that would
"blind" the evaluators to knowledge of which advice was generated by
MYCINand that would simultaneously allow us to determine the overall
approval ratings that could be achieved by experts in the field. Webegan
to wonder if" the 75% figure might not be an upper limit in light of the
controversy and stylistic differences amongexperts.
As a result, our meningitis study (hereafter referred to as Study 3)
used a greatly streamlined design to encourage rapid turnaround in eval-
uation forms while keeping evaluators unaware of what advice was pro-
posed by MYCIN(as opposed to other prescribers from Stanford). Study
3 is the subject of Chapter 31, and the reader will note that it reflects many
of the lessons from the first two studies cited above. With the improved
design we were able to demonstrate formally that MYCINsadvice was
comparable to that of" infectious disease experts and that 75%is in fact
better than the degree of agreement that could generally be achieved by
Stanford faculty being assessed under the same criteria.
In the next section we summarize some guidelines derived from our
experience. Webelieve they are appropriate when designing experiments
for the evaluation of expert systems. Then, in the final section of this
chapter, we look at some previously unpublished analyses of the Study 3
data. These demonstrate additional lessons that can be drawn and on
which future evaluative experiments may build.

30.2A Summary of Evaluation Considerations

The three MYCINstudies, plus the designs for ONCOCINevaluations


that are nearing completion, have taught us many lessons about the vali-
dation of these kinds of programs. Wesummarize some of those points
here in an effort to provide guidelines of use to others doing this kind of
work. l

J Much of this discussionis basedonShortliffes contributionto Chapter8 of BuildingExpert


S~stem.~,edited by R. Hayes-Roth,D. Lenat, and D. Waterman (Hayes-Roth,Waterman and
Lenat, 1983).
574 TheProblemof Evaluation

30.2.1 Dependence on Task, System, Goals, and Stage


of Development

Most computing systems are developed in response to some human need,


and it might therefore be logical to emphasize the systems response to that
need in assessing whether it is successful. Thus there are those who would,
argue that the primary focus of a system evaluation should be on the task
for which it was designed and the quality of its corresponding perfbr-
mance. Other aspects warranting formal evaluation are often ignored. It
must accordingly be emphasized that there are diverse components to the
evaluation process. Webelieve that validation is most appropriately seen
as occurring in stages as an expert system develops over time.
The MYCINwork, however, has forced us to focus our thinking on
the evaluation of systems that are ultimately designed to perform a real-
world task, typically to be used by persons who are not computer scientists.
Certainly one of our major goals has been the development of a useful
system that can have an impact on society by becoming a regularly used
tool in the community for which it is designed. Although we have shown
in earlier chapters that manybasic science problems typically arise during
the development of such systems, in this section we will emphasize the
staged assessment of the developing tool (rather than techniques for mea-
suring its scientific impact as a stimulus to further research). Wehave
organized our discussion by looking at the "what?", "when?", and "how?"
of evaluating expert systems.

30.2.2 What to Evaluate?

As mentioned above, at any stage in the development of a computing


system several aspects of its performance could be evaluated. Someare
more appropriate than others at a particular stage. However, by the time
a system has reached completion it is likely that every aspect will have
warranted formal assessment.

Decisions/Advice/Performance

Since accurate, reliable advice is an essential component of an expert con-


sultation system, it is usually the area of greatest research interest and is
logically an area to emphasize in evaluation. However, the mechanisms for
deciding whether a systems advice is appropriate or adequate may be dif-
ficult to define or defend, especially since expert systems tend to be built
precisely for those domains in which decisions are highly judgmental. It is
clear that no expert system will be accepted by its intended users if they
fail to be convinced that the decisions made and the advice given are per-
tinent and reliable.
A Summary
of EvaluationConsiderations 575

Correct Reasoning

Not all designers of expert systems are concerned about whether their
program reaches decisions in a "correct" way, so long as the advice that it
offers is appropriate. As we have indicated, for example, MYCIN was not
intended to simulate human problem solving in any formal way. However,
there is an increasing realization that expert-level performance mayrequire
heightened attention to the mechanisms by which human experts actually
solve the problems for which the expert systems are being built. It is with
regard to this issue that the interface between knowledge engineering and
psychology is the greatest, and, depending on the motivation of the system
designers and the eventual users of the expert program, some attention to
the mechanisms of reasoning that the program uses may be appropriate
during the evaluation process. The issue of deciding whether or not the
reasoning used by the program is "correct" will be discussed further below.

Discourse (I/O Content)

Knowledge engineers now routinely accept that parameters other than


correctness will play major roles in determining whether or not their sys-
tems are accepted by the intended users (see Chapter 32). The nature
the discourse between the expert system and the user is particularly im-
portant. Here we mean such diverse issues as:

the choice of" words used in the questions and responses generated by
the program;
the ability of the expert system to explain the basis for its decisions and
to customize those explanations appropriately for the level of expertise
of the user;
the ability of the system to assist the user when he or she is confused or
wants help; and
the ability of the expert system to give advice and to educate the user in
a congenial fashion so that the frequently cited psychological barriers to
computer use are avoided.

It is likely that issues such as these are as important to the ultimate success
of an expert system as is the quality of its advice. For this reason such issues
also warrant formal evaluation.

Hardware Environment (I/O Medium)

Although some users, particularly when pressed to do so, can become


comfortable with a conventional typewriter keyboard to interact with com-
puters, this is a new skill for other potential users and frequently not one
576 TheProblemof Evaluation

they are motivated to learn. For that reason we have seen the development
of light pen interfaces, touch screens, and specialized keypads, any of
which may be adequate to facilitate simple interactions between users and
systems. Details of the hardware interface often influence the design of the
system software as well. The intricacies of this interaction cannot be ig-
nored in system evaluation, nor can the mundane details of" the users
reaction to the terminal interface. Once again, it can be difficult to design
evaluations in which dissatisfaction with the terminal interface is isolated
as a variable, independent of discourse adequacy or decision-making per-
formance. As we point out below, one purpose of staged evaluations is to
eliminate some variables from consideration during the evolution of the
system.

Efficiency

Technical analyses of system behavior are generally warranted. Underu-


tilized CPUpower or poorly designed methods for accessing disk space,
for example, may introduce resource inefficiencies that severely limit the
systems response time or cost effectiveness. Inefficiencies in small systems
are often tolerable to users, but will severely limit the potential for those
systems to grow and still remain acceptable.

Cost Effectiveness

Finally, and particularly if it is intended that an expert system becomea


widely used product, some detailed evaluation of its cost effectiveness is
necessary. A system that requires an excessive time commitment by the
user, for example, mayfail to be accepted even if it excels at all the other
tasks we have mentioned. Few AI systems have reached this stage in system
evolution, but there is a wealth of relevant experience in other computer
science areas. Expert systems must be prepared to embark on similar stud-
ies once they reach an appropriate stage of development.

30.2.3 When to Evaluate?

The evaluation process is a continual one that should begin at the time of
system design, extend in an informal fashion through the early stages of
development, and become increasingly formal as a developing system
moves toward real-world implementation. It is useful to cite nine stages of
~
system development, which summarize the evolution of an expert system.
They are itemized in Table 30-1 and discussed in some detail below.

2Theseimplementation
steps are based~ma discussion of expert systemsin Short|iffe and
Davis(1975).
A Summary
of EvaluationConsiderations 577

TABLE
30-1 Steps in the Implementation of an Expert System
1. Top-level design with definition of long-range goals
2. First version prototype, showingfeasibility
3. System refinement in which informal test cases are run to generate feedback
from the expert and from users
4. Structured evaluation of performance
5. Structured evaluation of acceptability to users
6. Service functioning for extended period in prototype environment
7. Follow-upstudies to demonstratethe systems large-scale usefulness
8. Programchanges to allow wide distribution of the system
9. General release and distribution with firm plans for maintenanceand updating

As mentioned above, it is important for system designers to be explicit


about their long-range goals and motives for building an expert system.
Thus the first stage of a systems development (Step 1), the initial design,
should be accompanied by explicit statements of what the measures of the
programs success will be and how failure or success will be evaluated. It
is not uncommonfor system designers to ignore this issue at the outset. If
the evaluation stages and long-range goals are explicitly stated, however,
they will necessarily influence the early design of the expert system. For
example, if" explanation capabilities are deemedto be crucial for the user
community in question, this will have important implications for the sys-
tems underlying knowledge representation.
The next stage (Step 2) is a demonstration that the design is feasible.
At this stage there is no attempt to demonstrate expert-level performance.
The goal is, rather, to show that there is a representation scheme appro-
priate for the task domain and that knowledge-engineering techniques can
lead to a prototype system that shows some reasonable (if not expert) per-
formance on some subtask of that domain. An evaluation of this stage can
be very informal and may simply consist of showing that a few special cases
can be handled by the prototype system. Successful handling of the test
cases suggests that with increased knowledge and refinement of the rea-
soning structures a high-performance expert system is possible.
The third stage (Step 3) is as far as manysystems ever get. This is the
period in which informal test cases are run through the developing system,
the systems performance is observed, and feedback is sought from expert
collaborators and potential end users. This feedback serves to define the
major problem areas in the systems development and guides the next
iteration in system development. This iterative process may go on for
months or years, depending on the complexity of the knowledge domain,
the flexibility of the knowledgerepresentation, and the availability of tech-
niques adequate to cope with the domains specific control or strategic
processes. One question is constantly being asked: how did this system do
on this case? Detailed analyses of strengths and weaknesses lead back to
further research; in this sense evaluation is an intrinsic part of the system
development process.
578 The Problem of EvaLuation

Once the system is performing well on most cases with which it is


presented, it is appropriate to turn to a more structured evaluation of its
decision-making performance. This evaluation can be performed without
assessing the programs actual utility in a potential users environment.
Thus Step 4 is undertaken if the test.cases being used in Step 3 are found
to be handled with skill and competence, and there accordingly develops
a belief that a formal randomized study will show that the system is capable
of handling almost any problem from its domain of expertise. Only a few
expert systems have reached this stage of evaluation. The principal ex-
amples are studies of the PROSPECTOR program developed at SRI In-
ternational (Gaschnig, 1979) and the MYCIN studies described earlier
this chapter. It should be emphasized that a formal evaluation with ran-
domized case selection may show that the expert system is in fact not
performing at an expert level. In this case, new research problems or
knowledge requirements are defined, and the system development returns
to Step 3 for additional refinement. A successful evaluation at Step 4 is
desirable before a program is introduced into a user environment.
The fifth stage (Step 5), then, is system evaluation in the setting where
the intended users have access to it. The principal question at this stage is
whether or not the program is acceptable to the users for whomit was
intended. Essentially no expert systems have been formally assessed at this
stage. The emphasis in Step 5 is on the discourse abilities of the program,
plus the hardware environment that is provided. If expert-level perfor-
mance has been demonstrated at Step 4, failure of the program to be
accepted at Step 5 can be assumed to be due to one of these other human
factors.
If a system is formally shown to make good decisions and to be ac-
ceptable to users, it is appropriate to introduce it for extended periods in
some prototype environment (Step 6). This stage, called field testing, is in-
tended largely to gain experience with a large number of test cases and
with all the intricacies of on-site performance. Careful attention during
this stage must be directed toward problems of scale7 i.e., what new diffi-
culties will arise when the system is made available to large numbers of
users outside of the direct control of the system developers? Careful ob-
servation of the programs performance and the changing attitudes of
those who interact with it are important at this stage.
After field testing, it is appropriate to begin follow-up studies to dem-
onstrate a systems large-scale usefulness (Step 7). These formal evaluations
often require measuring pertinent parameters before and after introduc-
ing the system into a large user community (different from the original
prototype environment). Pertinent issues are the systems efficiency, its cost
effectiveness, its acceptability to users who were not involved in its early
experimental development, and its impact on the execution of the task
with which it was designed to assist. During Step 7 new problems may be
discovered that require attention before the system can be distributed (Step
A Summaryof Evaluation Considerations 579

8). These may involve programming changes or modifications required to


allow the system to run on a smaller or exportable machine.
Finally, the last stage in system development is general release as a
marketable product or in-house tool (Step 9). Inherent at this stage are
firm plans for maintaining the knowledge base and keeping it current.
One might argue that the ultimate evaluation takes place at this stage when
it is determined whether or not the system can succeed in broad use. How-
ever, a systems credibility is likely to be greater if good studies have been
done in the first eight stages so that there are solid data supporting any
claims about the quality of the programs performance.

30.2.4 How to Evaluate?

It would be folly to claim that we can begin to suggest detailed study


designs for all expert systems in a single limited discussion. There is a
wealth of information in the statistical literature, for example, regarding
the design of" randomized controlled trials, and much of that experience
is relevant to the design of expert system evaluations. Our intention here,
therefore, is to concentrate on those issues that complicate the evaluation
of expert systems in particular and to suggest pitfalls that must be consid-
ered during study design.
Wealso wish to distinguish between two senses of the term evaluation.
In computer science, system evaluation often is meant to imply optimiza-
tion in the technical sense--timing studies, for example. Our emphasis, on
the other hand, is on a systems performance at the specific consultation
task for which it has been designed. Unlike many conventional programs,
expert systems do not deal with deterministic problems for which there is
clearly a right or wrong answer. As a result, it is often not possible to
demonstrate in a straightforward fashion that a system is "correct" and
then to concentrate ones effort on demonstrating that it reaches the so-
lution to a problem in some optimal way.

Need for an Objective Standard

Evaluations require some kind of "gold standard"--a generally accepted


correct answer with which the results of a new methodology can be com-
pared. In the assessment of new diagnostic techniques in medicine, for
example, the gold standard is often the result of an invasive procedure
that physicians hope to be able to avoid, even though it may be 100%
accurate (e.g., operative or autopsy results, or the findings on an angio-
gram). The sensitivity and specificity of a new diagnostic liver test based
on a blood sample, for example, can best be assessed by comparing test
results with the results of liver biopsies from several patients who also had
the blood test; if the blood test is thereby shownto be a good predictor of
580 The Problem of Evaluation

the results of the liver biopsy, it maybe possible to avoid the more invasive
procedure in future patients. The parallel in expert system evaluation is
obvious; if we can demonstrate that the expert systems advice is compa-
rable to the gold standard for the domain in question, it may no longer
be necessary to turn to the gold standard itself if it is less convenient, less
available, or more expensive.

Can the Task Domain Provide a Standard?

In general there are two views of how to define a gold standard for an
expert systems domain: (1) what eventually turns out to be the "correct"
answer for a problem, and (2) what a human expert says is the correct
answer when presented with the same information as is available to the
program. It is unfortunate that for many kinds of problems with which
expert systems are designed to assist, the first of these questions cannot be
answered or is irrelevant. Consider, for example, the performance of MY-
CIN. One might suggest that the gold standard in its domain should be
the identity of the bacteria that are ultimately isolated from the patient, or
the patients outcome if he or she is treated in accordance with (or in
opposition to) the programs recommendation. Suppose, then, that MYCIN
suggests therapy that covers for four possibly pathogenic bacteria but that
the organism that is eventually isolated is instead a fifth rare bacterium
that was totally unexpected, even by the experts involved in the case. In
what sense should MYCINbe considered "wrong" in such an instance?
Similarly, the outcome for patients treated for serious infections is not
100%correlated with the correctness of therapy; patients treated in ac-
cordance with the best available medical practice may still die from ful-
minant infection, and occasionally patients will improve despite inappro-
priate antibiotic treatment. Accordingly, we said that MYCINperformed
at an expert level and was "correct" if it agreed with the experts, even if
both MYCINand the experts turned out to be wrong. The CADUCEUS
program has been evaluated by comparing the diagnoses against those
published on selected hard cases from the medical literature (Miller et al.,
1982).

Are Human Experts Evaluated?

Whendomain experts are used as the objective standard for performance


evaluation, it is useful to ask whether the decisions of the experts them-
selves are subjected to rigorous evaluations. If so, such assessments of hu-
man expertise may provide useful benchmarks against which to measure
the expertise of a developing consultation system. An advantage of this
approach is that the technique for evaluating experts is usually a well-
accepted basis for assessing expertise and thus lends credibility to an eval-
uation of the computer-based approach.
A Summary
of EvaluationConsiderations 581

Informal Standards

Typically, however, human expertise is accepted and acknowledged using


less formal criteria, such as level of training, recommendationsof previous
clients, years of experience in a field, numberof publications, and the like.
[Recently, Johnson et al. (1981) and Lesgold (1983) have studied measures
of human expertise that are more objective.] Testimonials regarding the
performance of a computer program have also frequently been used as a
catalyst to the systems dissemination, but it is precisely this kind of anec-
dotal selling of a system against which we are arguing here. Manyfields
(e.g., medicine) will not accept technological innovation without rigorous
demonstration of the breadth and depth of the new products capabilities.
Both we and the PROSPECTOR researchers encountered this cautious
attitude in potential users and designed their evaluations largely in
response to a perceived need for rigorous demonstrations of performance.

Biasing and Blinding

In designing any evaluative study, considerations of sources of bias are of


course important. We learned this lesson when evaluating MYCIN,and,
as mentioned earlier, this explains many of the differences between the
bacteremia evaluation (Study 2) and the meningitis study (Study 3).
commentsand criticisms from Study 2 evaluators reflected biases regarding
the proper role for computers in medical settings (e.g., "I dont think the
computer has an adequate sense of how sick this patient is. Youd have to
see a patient like this in order to judge."). As a result, Study 3 mixed
MYCINs recommendations with a set of recommendations from nine
other individuals asked to assess the case (ranging from infectious disease
faculty members to a medical student). Whennational experts later gave
opinions on the appropriateness of therapeutic recommendations, they did
not know which proposed therapy (if any) was MYCINsand which came
from the faculty members. This "blinded" study design removed an im-
portant source of potential bias, and also provided a sense of where MY-
CINs performance lay along a range of expertise from faculty to student.

Controlling Variables

As we pointed out in the discussion of when to evaluate an expert system,


one advantage of a sequential set of studies is that each can assume the
results of the experiments that preceded it. Thus, for example, if a system
has been shown to reach optimal decisions in its domain of expertise, one
can assume that the systems failure to be accepted by its intended users
in an experimental setting is a reflection of inadequacies in an aspect of
the system other than its decision-making performance. One key variable
that could account for system failure can be "removed" in this way.
582 TheProblemof Evaluation

Realistic Standards of Performance

Before assessing the capabilities of an expert system, it is necessary to


define the minimal standards that are acceptable for the system to be called
a success. It is ironic that in manydomains it is difficult to decide what
level of performance qualifies as expert. Thus it is important to measure
the performance of human experts in a field if they are assessed by the
same standards to be used in the evaluation of the expert system. As we
noted earlier, this point was demonstrated in the MYCIN evaluations. In
Studies 1 and 2, MYCINsperformance was approved by a majority of
experts in approximately 75% of cases, a figure that seemed disappoint-
ingly low to us. Wefelt that the system should be approved by a majority
in at least 90%of cases before it was madeavailable for actual clinical use.
The blinded study design for the subsequent meningitis evaluation (Study
3), however, showed that even infectious disease faculty members received
at best a 70-80% rating from other experts in the field. Thus the 90%
figure originally sought may have been unrealistic in that it inadequately
reflected the extent of disagreement that can exist even among experts in
a field such as clinical medicine.

Sensitivity Analysis

A special kind of evaluative procedure that is pertinent for work with


expert systems is the analysis of a programs sensitivity to slight changes in
knowledge representation, inference weighting, etc. Similarly, it may be
pertinent to ask which interactive capabilities were necessary for the ac-
ceptance of an expert consultant. One approach to assessing these issues
is to compare two versions of the system that vary the feature under con-
sideration. An example of studies of this kind are the experiments that we
did to assess the certainty factor model. As is described in Chapter 10
(Section 10.3), Clancey and Cooper showed that the decisions of MYCIN
changed minimally from those reported in the meningitis evaluation
(Chapter 31) over a wide range of possible CF intervals for the inferences
in the system. This sensitivity analysis helped us decide that the details of
the CFs associated with rules mattered less than the semantic and struc-
tural content of the rules themselves.

Interaction of Knowledge: Preserving Good Performance


When Correcting the Bad

An important problem, discussed in Chapter 7, can be encountered when


an evaluation has revealed system deficiencies and new knowledge has been
added to the system in an effort to correct these. In complex expert sys-
tems, the interactions of new knowledge with old can be unanticipated and
FurtherComments
on the Study3 Data 583

lead to detrimental effects on problems that were once handled very well
by the system. An awareness of this potential problem is crucial as system
builders iterate from Step 3 to Step 4 and back to Step 3 (see Table 30-1).
One method for protecting against the problem is to keep a library of old
cases available on-line for batch testing of the systems decisions. Then, as
changes are made to the system in response to the Step 4 evaluations of
the programs performance, the old cases can be run through the revised
version to verify that no unanticipated knowledge interactions have been
introduced (i.e., to show that the programs performance on the old cases
does not deteriorate).

Realistic Time Demands on Evaluators

A mundane issue that must be considered anyway, since it can lead to


failure of a study design or, at the very least, to unacceptable delays in
completing the programs assessment, is the time required for the evalu-
ators to judge the systems performance. If expert judgments are used as
the gold standard for adequate program performance, the opinions of the
experts must be gathered for the cases used in the evaluation study. A
design that picks the most pertinent two or three issues to be assessed and
concentrates on obtaining the expert opinions in as easy a manner as pos-
sible will therefore have a much better chance of success. Wehave previ-
ously mentioned the one-year delay in obtaining the evaluation booklets
back from the experts who had agreed to participate in the Study 2 bac-
teremia evaluation. By focusing on fewer variables and designing a check-
list that allowed the experts to assess program performance much more
rapidly, the meningitis evaluation was completed in less than half that time
(Chapter 31).

30.3Further Comments on the Study 3 Data

When the Study 3 data had been analyzed and published (Chapter 31),
we realized there were still several lingering questions. The journal editors
had required us to shorten the data analysis and discussion in the final
report. Wealso had asked ourselves several questions regarding the meth-
odology and felt that these warranted further study.
Accordingly, in 1979 Reed Letsinger (then a graduate student in our
group) undertook an additional analysis of the Study 3 data. What follows
is largely drawn from an internal memothat he prepared to report his
findings. The reader should be familiar with Chapter 31 before studying
the sections below.
584 The Problem of Evaluation

30.3.1 Consistency of the Evaluators

The eight national evaluators in Study 3 could have demonstrated internal


inconsistency in two ways. Since each one was asked first to indicate his
own decision, he could be expected to judge as acceptable any of the pre-
scribers decisions that were identical to his own. The first type of incon-
sistency would occur if this expectation were violated. Amongthe 800
judgments in the Study 3 data (8 evaluators x 10 prescribers 10 pa-
tients), 15 instances of this type of inconsistency occurred. Second, since
several prescribers would sometimes make the same decision regarding a
patient, another form of inconsistency would occur if an evaluator were
to mark identical treatments for the same patient differently for different
prescribers. Since the evaluators had no basis for distinguishing amongthe
subjects (prescribers), such discrepancies were inherently inconsistent.
Twenty-two such instances occurred in the Study 3 data set.
These numbers indicate that 37 out of the 800 data points (4.6%) could
be shown to be in need of correction on the basis of these two tests. Such
a figure tells us something about the reliability of the data---clearly perti-
nent in assessing the study results. We have wondered about plausible
explanations for these kinds of inconsistencies. One is that the evaluators
were shown both the drugs recommended by the prescribers and the rec-
ommended doses. They were asked to base their judgment of treatment
acceptability on drug selection alone, but we did ask separately for their
opinion on dosage to help us assess the adequacy of MYCINsdosing al-
gorithms (see Chapter 19). It appeared in retrospect, however, that the
evaluators sometimes ignored the instructions and discriminated between
two therapy prescriptions that differed only in the doses of the recom-
mended drugs. These judgments are thus only inconsistent in the sense
that they reflect judgments that the evaluators were not supposed to be
making. The problem reflects the inherent tension between our wanting
to get as much possible information from evaluators and the risks in in-
troducing new variables or data that may distract evaluators from the pri-
mary focus of the study. Another methodologic point here is that such
design weaknesses may be uncovered by making some routine tests for
consistency.

30.3.2 Agreement Among Evaluators

The tendency of the experts to agree with one another has a direct impact
on the power of the study to discriminate good performance from bad.
Consider two extreme cases. At one end is the case where on the average
the evaluators agree with each other just as much as they disagree. This
means that on each case the prescribers would tend to get scores around
the midpoint--in the case of the MYCINstudy, around 4 out of 8. The
cumulative scores would then cluster tightly around the midpoint of the
Further Commentson the Study 3 Data 585

possible range, e.g., around 40 out of 80. The differences between the
quality of" performance of the various subjects would be "washed out," the
scores would all be close to one another, and consequently, it would be
very unlikely that any of the differences between scores would be signifi-
cant. At the other extreme, if the evaluators always agreed with each other,
the only "noise" in the data would be contributed by the choice of the
sample cases. Intermediate amounts of disagreement would correspond-
ingly have intermediate effects on the variability of the scores, and hence
on the power of" the test to distinguish the performance capabilities of the
subjects.
A rough preliminary indication of" the extent of this agreement can be
derived from the MYCIN data. A judgment situation consists of a partic-
ular prescriber paired with a particular case. Thus there are 100judgment
situations in the present study, and each receives a score between 0 and 8,
depending on how many of the evaluators found the performance of the
subject acceptable on the case. The range between 0 and 8 is divided into
three equal subranges, 0 to 2, 3 to 5, and 6 to 8. A judgment situation
receiving a score in the first of these ranges may be said to be generally
unacceptable, while those receiving scores in the third range are generally
acceptable. The situations scoring in the middle range, however, cannot be
decided by a two-thirds majority rule, and so may be considered to be
undecided due to the evaluators inability to agree. It turns out that 53 out
of" the 100judgment situations were undecided in this sense in the MYCIN
study.
For a more accurate indication of the level of this disagreement, the
evaluators can be paired in all possible combinations, and the percentage
of judgment situations in which they agree can be calculated. The mean
of this percentage across all pairs of evaluators reflects howoften we should
expect two experts to agree on the question of whether or not the perfor-
mance of a prescriber is acceptable (when the experts, the prescriber, and
the case are chosen from populations for which the set of evaluators, the
set of subjects, and the set of cases used in the study are representative
samples). In the MYCIN study, this mean was 0.591. Thus, if the evalua-
tors, prescribers, and cases used in this study are representative, we would
in general expect that if we choose two infectious disease experts and a
judgment situation at random on additional cases, the two experts will
disagree on the question of whether or not the recommended therapy is
acceptable 4 out of every 10 times!
Before such a number can be interpreted, more must be known about
the pattern of agreement. One question is how the disagreement was dis-
tributed across the subjects and across the cases. It turns out that the var-
iation across subjects was remarkably low for the MYCINdata, with a
standard deviation of less than 6 percentage points. The standard devia-
tion across cases was slightly higher--just under 10 percentage points. Very
little of the high level of disagreement amongthe graders can be attributed
to the idiosyncracies of a few subjects or of a few cases. If it had turned
586 TheProblemof Evaluation

out that a large amount of the disagreement focused on a few cases or a


few subjects, they could have been disregarded, and the power of the study
design increased.
A second question that can be raised is to what extent the disagree-
ments result from differing tolerance levels amongthe different evaluators
for divergent recommendations. A quick and crude measure of this tol-
erance level is simply the percentage of favorable responses the evaluators
gave. The similarity between the tolerance levels of two graders can be
measured by the difference between those percentages. It is then possible
to rank all the pairs of evaluators in terms of the degree of similarity of
their tolerance levels, just as it is possible to rank pairs of evaluators by
their agreements. The extent to which the tendency of the evaluators to
agree or disagree with one another can be explained by the variation in
their tolerance levels can be measured by the correlation between these
two rankings. With the MYCIN study, the Spearman rank correlation coef-
ficient turns out to be 0.0353 with no correction for ties. This is not sig-
nificantly greater than 0. If there had been a significant correlation, the
scores given by the evaluators could have been weighted in order to nor-
malize the effects due to different tolerance levels. The actual disagree-
ment among the evaluators would then have been reduced.
A third possibility is that different groups of experts represent differ-
ent schools of thought on solving the type of problems represented in our
sample. If so, there should be clusters of evaluators, all of whose members
agree with each other more than usual, while membersof different clusters
tend to disagree more than usual. There was some slight clustering of this
sort in the MYCIN data. Evaluators 1, 3, and 4 all agreed with each other
more often than the mean of 0.591, as did 2 and 6, and matching any
memberof the first group with any memberof the second gives an agree-
ment of less than the mean. However, evaluator 8 agreed with all five of
these evaluators more than 0.591. These clusterings are probably real, but
they cannot account for very much of the tendency of the evaluators to
disagree. If significant clustering had been uncovered, the data could have
been reinterpreted to treat the different "schools" of experts as additional
variables in the analysis. Within each of these "schools," the agreement
would then have been considerably increased.
In retrospect we now realize that the design of the MYCIN study would
have permitted several different kinds of patterns to be uncovered, any
one of which could have been used as a basis for increasing the agreement
among the evaluators, and hence the power of the test. Unfortunately,
none of these patterns actually appeared in the MYCIN data.

30.3.3 Collapsing the Data

The previous discussion of the tendency of the experts to agree with one
another is subject to at least one objection. Suppose that, for a particular
case, four of the ten prescribers made the same recommendation, and
FurtherComments
on the Study3 Data 587

expert e 1 agreed with the recommendation while expert e2 did not. Then
el and e2 would be counted as disagreeing four times, when in fact they
are only disagreeing over one question. If a large number of the cases lead
to only a few different responses, then it might be worth lumping together
the prescribers that made the same therapy recommendation. Then the
experts will be interpreted as judging the responses the subjects made,
rather than the subjects themselves. As is noted in the next section, this
kind of collapsing of the data is useful for other purposes as well.
Deciding whether two treatments are identical may be nontrivial.
Sometimes the responses are literally identical, but in other cases the re-
sponses will differ slightly, although not in ways that would lead a physician
with a good understanding of the problem to accept one without also
accepting the other. One plausible criterion is to lump together two therapy
recommendations for a case if no evaluator accepts one without accepting
the other. A second test is available when one of the evaluators gives a
recommendation that is identical to one of the prescribers recommenda-
tions. Recommendations that that evaluatorjudged to be equivalent to his
own can then be grouped with the evaluators recommendation, so long as
doing so does not conflict with the first criterion. In using either of these
tests, the data should first be made consistent in the manner discussed in
Section 30.3.1.
Using these tests, the ten subjects in the ten cases of the MYCIN study
reduced to an average of 4.2 different therapy recommendations for each
case, with a standard deviation of 1.55 and a range from 2 to 6. This seems
to be a large enough reduction to warrant looking at the data in this col-
lapsed form.

30.3.4 Judges as Subjects

With the collapsing of prescribers into therapies, it may be possible to


identify an evaluators recommendation with one or more of the prescri-
bers recommendations. By then eliminating that evaluator from the rank
of judges, his recommendation can be considered judged by the other
evaluators. In this way the evaluators may be used as judges of each other,
thereby allowing comparisons with the rankings of the original prescribers.
This does not always work, since sometimes an evaluators recommendation
cannot be identified with any of the prescribers. In Study 3, 9 out of 80
evaluator-generated therapies could not be judged as identical to any of
the prescribers recommendations.
Measuring the evaluators performance against each other in this man-
ner provides another indication of the extent of disagreement among
them. It also produces more scores that can be (roughly) compared to the
percentage scores of the prescribers. In Study 3, 8 more scores can be
added to the 10 assigned to the prescribers, giving a field of 18 scores. The
analysis of variance or chi-square was run on this extended population.
The new analysis showed that the mean score for the evaluators was
588 The Problem of Evaluation

0.699, which is both higher than the mean agreement (0.591) and higher
than the mean of the prescribers scores (0.585). This latter fact is to
expected, since the subjects included people who were chosen for the study
because their level of expertise was assumed to be lower than that of the
evaluators. Nevertheless, half of the evaluators scored above the highest-
scoring prescriber (while the other half spread out evenly over the range
between the top-ranking subject and the eighth-ranking subject). The fact
that agreement between the evaluators looks higher on this measure than
it does on other measures indicates that much of the disagreement was
over therapies that none of the evaluators themselves recommended.
It is interesting to ask whythe evaluators ranked higher in this analysis
than the Stanford faculty members among the prescribers, many of whom
would have qualified as experts by the criteria we used to select the national
panel. A plausible explanation is the method by which the evaluators were
asked to indicate their own preferred treatment for each of the ten cases.
As is described in Chapter 31, [or each case the expert was asked to indicate
a choice of treatment on the first page of the evaluation form and then to
turn the page and rank the ten treatments that were recommended by the
prescribers. There was no way to force the evaluators to make a commit-
ment about therapy before turning the page, however. It is therefore quite
possible that the list of prescribers recommendations served as "memory
joggers" or "filters" and accordingly influenced the evaluators decisions
regarding optimal therapy for some of the cases. Since none of the pre-
scribers was aware of the decisions made by the other nine subjects, the
Stanford faculty members did not benefit from this possible advantage.
Wesuspect this may partly explain the apparent differences in ratings
among the Stanford and non-Stanfbrd experts.

30.3.5 Summary

The discussion in this section demonstrates many of the detailed sub-


analyses that may be performed on a rich data set such as that provided
by Study 3. Information can be gathered on interscorer reliability of the
evaluation instrument, and statistical techniques are available for detecting
correlations and thereby increasing the reliability (and hence the power)
of the test.
31
An Evaluation of MYCINs
Advice

Victor L. Yu, Lawrence M. Fagan,


Sharon Wraith Bennett, William J. Clancey,
A. Carlisle Scott, John F. Hannigan, Robert L. Blum,
Bruce G. Buchanan, and Stanley N. Cohen

A number of computer programs have been developed to assist physicians


with diagnostic or treatment decisions, and many of them are potentially
very useful tools. However, few systems have undergone evaluation by
independent experts. We present here a comparison of the performance
of MYCINwith the performance of clinicians. The task evaluated was the
selection of antimicrobials for cases of acute infectious meningitis before
the causative agent was identified.
MYCINwas originally developed in the domain of bacteremias and
then expanded to include meningitis. Its task is a complicated one; it must
decide whether and how to treat a patient, often in the absence of micro-
biological evidence. It must allow for the possibility that any important
piece of information might be unknown or uncertain. In deciding which
organisms should be covered by therapy, it must take into account specific
clinical situations (e.g., trauma, neurosurgery), host factors (e.g., immu-
nosuppression, age), and the possible presence of unusual pathogens (e.g.,
F. tularen.sis or Candidanonalbicans). In selecting optimal antimicrobial ther-
apy to cover all of the most likely organisms, the system must consider
antimicrobial factors (e.g., efficacy, organism susceptibility) and relative
contraindications (e.g., patient allergies, poor response to prior therapy).
When knowledge about a new area of infectious disease is incorpo-
rated into MYCINsknowledge base, the systems performance is evaluated

This chapter is an edited version of an article originally appearing in Journal of the American
Medical Association 242:1279-1282 (1979). Copyright 1979 by the American Medical As-
sociation. All rights ,eserved. Used with permission.

589
590 AnEvaluationof MYCINs
Advice

to show that its therapeutic regimens are as reliable as those that an infec-
tious disease specialist would recommend. An evaluation of the systems
ability to diagnose and treat patients with bacteremia yielded encouraging
results (Yu et al., 1979a). The results of that study, however, were difficult
to interpret because of the potential bias in an unblinded study and the
disagreement among the infectious disease specialists as to the optimal
therapeutic regimen for each of the test cases.
The current study design enabled us to compare MYCINsperfor-
mance with that of clinicians in a blinded fashion. This study involved a
two-phase evaluation. In the first phase, several prescribers, including MY-
CIN, prescribed therapy for the test cases. In the second phase of the
evaluation, prominent infectious disease specialists, the evaluators, assessed
these prescriptions without knowing the identity of the prescribers or
knowing that one of them was a computer program.1

31.1Materials
and Methods

Ten patients with infectious meningitis were selected by a physician who


was not acquainted with MYCINsmethods or with its knowledge base
pertaining to meningitis. All of the patients had been hospitalized at a
county hospital affiliated with Stanford, were identified by retrospective
chart review, and were diagnostically challenging. Twocriteria for case
selection ensured that the ten cases would be of diverse origin: there were
to be no more than three cases of viral meningitis, and there was to be at
least one case from each of four categories, tuberculous, fungal, viral, and
bacterial (including at least one with positive gram stain of the cerebro-
spinal fluid and at least one with negative gram stain). A detailed clinical
summary of each case was compiled. The summary included the history,
physical examination, laboratory data, and the hospital course prior to
therapeutic intervention. These summaries were used to run the MYCIN
consultations. Only the information contained in the summaries was used
as input to MYCIN,and no modifications were made to the program.
These same summaries were presented to five faculty members in the
Division of Infectious Diseases in the Departments of Medicine and Pedi-
atrics at Stanford University, to one senior postdoctoral fellow in infectious
diseases, to one senior resident in medicine, and to one senior medical
student. The resident and student had just completed a six-week rotation

1Wewishto thankthe followinginfectiousdiseasesspecialists whoparticipatedin this study:


DonaldArmstrong, M.D.;JohnE. Bennet,M.D.;RalphD. Feigin, M.D.;AllanLavetter, M.D.;
Phillip J. Lerner, M.D.;GeorgeH. McCracken, Jr., M.D.;Thomas C. Merigan,M.D.;James
J.Rahal, M.D.;JackS. Remington,M.D.;WilliamS. Robinson,M.D.;PenelopeJ. Shackelford,
M.D.;Paul E Wehrle,M.D.;and AnneS. Yeager,M.D.
Materials and Methods 591

in infectious diseases. None of" these individuals was associated with the
MYCINproject. The seven Stanford physicians and the medical student
were asked to prescribe an antimicrobial therapy regimen for each case
based on the information in the summary. If they chose not to prescribe
antimicrobials, they were requested to specify which laboratory tests (if any)
they would recommend for determining the infectious etiology. There
were no restrictions concerning the use of textbooks or any other reference
materials, nor were any time limits set for completion of the prescriptions.
Ten prescriptions were compiled for each case: that actually given to
the patient by the treating physicians at the county hospital, the recom-
mendation made by MYCIN, and the recommendations of the medical
student and of the seven Stanford physicians. In the remainder of this
chapter, MYCIN,the medical student, and the eight physicians will be
referred to as prescribers.
The second phase of the evaluation involved eight infectious disease
specialists at institutions other than Stanford, hereafter referred to as eval-
uators, who had published clinical reports dealing with the managementof
infectious meningitis. They were given the clinical summaryand the set of
ten prescriptions for each of the ten cases. The prescriptions were placed
in random order and in a standardized format to disguise the identities of
the individual prescribers. The evaluators were asked to make their own
recommendations for each case and then to assess the ten prescriptions.
The 100 prescriptions (10 each by 10 prescribers) were classified by each
evaluator into the following categories:

Equivalent: the recommendation was identical to or equivalent to the eval-


uators own recommendation (e.g., treatment of one patient with naf-
cillin was judged equivalent to the use of oxacillin);
Acceptable alternative: the recommendation was different from the evalua-
tors, but he considered it to be an acceptable alternative (e.g., the
selection of ampicillin in one case was considered to be an acceptable
alternative to penicillin);
Not acceptable: the evaluator found the recommendation unacceptable or
inappropriate (e.g., the recommendation of chloramphenicol and am-
picillin in one case was considered to be unacceptable by all evaluators
who thought the patient had tuberculosis and who prescribed antitu-
berculous therapy).

The 800 assessments (100 each by 8 evaluators) were analyzed as fol-


lows. A one-way analysis of variance (ANOVA)was used to analyze the
overall difference effects between MYCIN and the other prescribers. The
Tukey studentized range test was used to demonstrate individual differ-
ences between prescribers following attainment of significance. A similar
analysis of variance was used to measure evaluator variability.
592 AnEvaluationof MYCINs
Advice

TABLE 31-1 Ratings of Ant[microbial Selection Based on Evaluator Rating and


Etiologic Diagnosis
No. (%)of items No. of casesin
No. (%)of items whichtherapy was whichtherapy
whichtherapy was rated acceptable* failed to covera
ratedacceptable*by by majorityof treatable
an evaluator(n = 80) evaluators(n =10) pathogen(n =10)
MYCIN 52 (65) 7 (70) 0
Facuhy-1 50 (62.5) 5 (50) 1
Facuhy-2 48 (60) 5 (50) 1
Infectious disease fellow 48 (60) 5 (50) 1
Faculty-3 46 (57,5) 4 (40) 0
Actual therapy 46 (57.5) 7 (70) 0
Facuhy-4 44 (55) 5 (50) 0
Resident 36 (45) 3 (30) 1
Faculty-5 34 (42.5) 3 (30) 0
Student 24 (30) 1 (10) 3
*Therapy wasclassified as acceptableif an evaluatorrated it as equivalentor as an acceptable
alternative.

31.2 Results
The evaluators ratings of each prescriber are shown in the second column
of Table 31-1. Since there were 8 evaluators and 10 cases, each prescriber
received 80 ratings from the evaluators. Sixty-five percent of MYCINs
prescriptions were rated as acceptable by the evaluators. The correspond-
ing mean rating for the five facuhy specialists was 55.5% (range, 42.5% to
62.5%). A significant difference was found among the prescribers; the
hypothesis that each of the prescribers was rated equally by the evaluators
is rejected (standard F test, F= 3.29 with 9 and 70 d];" p < 0.01).
Consensus among evaluators was measured by determining the num-
ber of cases (n = 10) in which the prescriber received a rating of acceptable
from the majority (five or more) of experts (third column of "Fable 31-1).
Seventy percent of MYCINstherapies were rated as acceptable by a ma-
jority of the evaluators. The corresponding mean ratings [or the five fac-
uhy prescribers was 44% (range, 30% to 50%). MYCINfailed to win
rating of acceptable from the majority of evaluators in three cases. MYCIN
prescribed penicillin fl)r a case of meningococcal meningitis, as did fi)ur
evaluators. However, [bur other evaluators prescribed penicillin with chlor-
amphenicol as initial therapy before identification of the organism, and
they rated MYCINstherapy as not acceptable. MYCINprescribed peni-
cillin as treatment for group B Streptococcus; however, most evaluators se-
lected ampicillin and gentamicin as initial therapy. MYCIN prescribed pen-
icillin as treatment {br Lister[a; however, most evaluators used combinations
of two drugs.
Comment 593

There were seven instances in which prescribers selected antimicrobial


therapy that failed to cover a treatable pathogen (fourth column of Table
31-1). Five instances inw)lved a case of tuberculous meningitis in which
ineffective antibacterials (ampicillin, penicillin, and chloramphenicol)
no antirnicrobials were prescribed. The other two instances included a case
of meningococcal meningitis where one prescriber failed to prescribe any
antimicrobial therapy and a case of" cryptococcal meningitis where flucy-
tosine was prescribed in inadequate dosage as the sole therapy.

31.3 Comment
In clinical medicine it may be difficult to define precisely what constitutes
appropriate therapy. Our study used two criteria for judging the appro-
priateness of therapy. One was simply whether or not the prescribed ther-
apy would be effective against the offending pathogen, which was ulti-
mately identified (fourth column of Table 31-1). Using this criterion, five
prescribers (MYCIN,three faculty prescribers, and the actual therapy
given the patient) gave effective therapy for all ten cases. However, this
was not the sole criterion, since failure to cover other likely pathogens and
the hazards of overprescribing are not considered. The second criterion
used was the judgment of eight independent authorities with expertise in
the management of meningitis (second and third columns of Table 31-1).
Using this criterion, MYCIN received a higher rating than any of the nine
human prescribers.
This shows that MYCINscapability in the selection of antimicrobials
for meningitis compares favorably with the Stanford infectious disease spe-
cialists, who themselves represent a high standard of excellence. Three of
the Stanford faculty physicians would have qualified as experts in the man-
agement of meningitis by the criteria used for the selection of the national
evaluators.
Of" the five prescribers who never failed to cover a treatable pathogen
(fourth column of Table 31-1), MYCINand the faculty prescribers were
relatively efficient and selective as to choice and number of antibiotics
prescribed. In contrast, while the actual therapy prescribed by the physi-
cians caring for the patient never failed to cover a treatable pathogen, their
therapeutic strategy was to prescribe several broad-spectrum antimicro-
bials. In eight cases, the physicians actually caring for the patient pre-
scribed two or three antimicrobials; in six of these eight cases, one or no
antimicrobial would have sufficed. Overprescribing of antimicrobials is not
necessarily undesirable, since redundant or ineffective antimicrobial ther-
apy can be discontinued after a pathogen has been identified. However,
an optimal clinical strategy attempts to limit the number and spectrum of
antimicrobials prescribed to minimize toxic effects of drugs and superin-
594 AnEvaluationof MYCINs
Advice

fection while selecting antimicrobials that will still cover the likely patho-
gens.
The primary limitation of our investigation is the small number of
cases studied. This was a practical necessity, since we had to consider the
time required for the evaluators to analyze 10 complex cases and rate 100
therapy recommendations. Although only 10 patient histories were used,
the selection criteria provided for diagnostically diverse and challenging
cases to evaluate MYCINsaccuracy. The selection of consecutive or ran-
dom cases of meningitis admitted to the hospital might have yielded a
limited spectrum of meningitis cases that would not have tested fully the
capabilities of either MYCIN or the Stanford physicians. In addition to
our evaluation, the program has undergone extensive testing involving
several hundred cases of retrospective patient histories, prospective patient
cases, and literature cases of meningitis. These have confirmed its com-
petence in determining the likely identity of the pathogen, selecting an
effective drug at an appropriate dosage, and recommending further di-
agnostic studies (a capability not evaluated in the current study).
Because of the diagnostic complexities of the test cases, unanimity in
all eight ratings in an individual case was difficult to achieve. For example,
in one case, although the majority of evaluators agreed with MYCINs
selection of antituberculous drugs for initial therapy, two evaluators did
not and rated MYCINstherapy as not acceptable. Six of the ten test cases
had negative CSFsmears for any organisms, so in these cases antimicrobial
selection was madeon a clinical basis. It is likely that if" more routine cases
had been selected, there would have been greater consensus among eval-
uators.
The techniques used by MYCINare derived from a subfield of com-
puter science knownas artificial intelligence. It may be useful to analyze
some of the factors that contributed to the programs strong performance.
First, the knowledge base is extremely detailed and, for the domain of
meningitis, is more comprehensive than that of most physicians. The
knowledge base is derived from clinical experience of infectious disease
specialists, supplemented by information gathered from several series of
cases reported in the literature and from hundreds of actual cases in the
medical records of three hospitals.
Second, the program is systematic in its approach to diagnosis. A pop-
ular maximamong physicians is "One has to think of the disease to rec-
ognize it." This is not a problem for the program; rare diseases are never
"forgotten" once information about them has been added to the knowledge
base, and risk factors for specific meningitides are systematically analyzed.
For example, the duration of headache and other neurological symptoms
for one week before hospital admission was a subtle clue in the diagnosis
of tuberculous meningitis. The program does not overlook relevant data
but also does not require complete and exact information about the patient.
For example, in a case involving a patient with several complex medical
Comment 595

problems, the presence of" purpura on physical examination was an im-


portant finding leading to the diagnosis of meningococcal meningitis. How-
ever, even if" the purpura were absent or had been overlooked, MYCIN
would have treated empirically for meningococcal meningitis on the basis
of" the patients age and CSFanalysis.
Third, since the program is based on the judgments of experienced
clinicians, it reflects their understanding of the diagnostic importance of
various findings. The program does not jump to conclusions on the basis
of an isolated finding, nor does it neglect to ask for key pieces of infor-
mation. Abnormal findings or test results are interpreted with respect to
the clinical setting.
Finally, the system is up to date; frequent additions and modifications
ensure its currentness. The meningitis knowledge base incorporates infor-
mation from the most recent journal articles and the current experience
of an infectious diseases division. Therapy selection and dosage calcula-
tions are derived from prescribing recommendations more recent than
those in any textbook. (This was a factor in a case for which, at the time
of this study, the recommendation of tow-dose amphotericin B therapy
combined with flucytosine was available only in recent issues of specialty
journals.)
Because MYCINcompared favorably with infectious disease experts
in this study, we believe that it could be a valuable resource for the prac-
ticing physician whose clinical experience for specific infectious diseases
may be limited. The data demonstrate the programs reliability. However,
further investigations in a clinical environment are warranted. Questions
concerning the programs acceptability to practicing physicians and its im-
pact on patient care, as well as issues of cost and legal implications, remain
to be answered. Other capabilities of MYCIN that may assist the practicing
physician include the following:

1. Identifying each of the potential pathogens with an estimate of its like-


lihood in causing the disease (Chapter 5).
2. Recommendingantimicrobial dosages, considering weight, height, sur-
face area, and renal function. Separate dosage regimens are given for
the neonate, infant, child, and adult, lntrathecal dosage regimens are
also given (Chapter 19).
3. Checking for contraindications of specific drugs, including pregnancy,
liver disease, and age (Chapter 6).
4. Graphing predicted serum concentrations for aminoglycosides with re-
lation to the expected minimal inhibitory concentration of the organism
(Chapter 19).
5. Justifying its recommendation in response to queries by the physician
(Chapter 18).
596 AnEvaluationof MYCINs
Advice

The methodology of the evaluation is of interest because it was de-


veloped in an attempt to analyze clinical decisions for which there is no
clear right or wrong choice. Since most areas of medicine are characterized
by a variety of acceptable approaches, even among experts, the technique
used here may be generally useful in assessing the quality of decision mak-
ing by other computer programs.
PART ELEVEN

Designing for Human


Use
32
Human Engineering of
Medical Expert Systems

Although we have frequently referred to human engineering issues


throughout this book and have considered them from the outset in our
design of MYCINand its descendents, we have also noted that MYCIN
was never used routinely in patient-care settings. Yes, the program was able
to explain its reasoning, and this seemedlikely to heighten its acceptability.
And yes, we spent muchtime attending to detail so that (a) user aids were
available at any time through the use of HELPand question mark com-
mands, (b) the system automatically corrected spelling errors when it was
"obvious" what the user meant, and (c) a physician could enter only the
first few characters of a response if what was entered uniquely defined the
intended answer. However, there were still significant barriers that pre-
vented us from undertaking the move to formal implementation.
Some of these barriers were unrelated to human engineering issues,
viz., the need for an enhanced knowledge base in other areas of infectious
disease at a time when both Axline and Yu were departing from Stanford,
the difficulty of obtaining funding for knowledge base enhancement when
the program itself" had become both large and competent, and our own
lack of enthusiasm for implementation studies once we had come to iden-
tify some of" the computer science inadequacies in MYCINsdesign and
preferred to work on those in a new environment. All of these might have
been ignored, however, since MYCIN was fully operational and could have
been tested clinically with relatively little incremental effort. What dis-
suaded us from doing so was the simple fact that we knew the program was
likely to be unacceptable, for mundane reasons quite separate from its
excellent decision-making performance. Most of these issues were related
to logistical and human-engineering problems in the programs introduc-
tion. Wehave described these pragmatic considerations elsewhere (Short-
liffe, 1982a) and have indicated how they influenced our decision to turn
our attention to the development of a new system for clinical oncology (see
Chapter 35). Wewill briefly summarize these points here.
First, although there was a demonstrated need for a system like MY-
CIN (see the data on antibiotic use outlined in Chapter 1), we did not feel

599
600 Human
Engineeringof MedicalExpertSystems

there was a recognized need on the part of individual practitioners. Most


physicians seem to be quite satisfied with their criteria for antibiotic selec-
tion, and we were unconvinced that they would be highly motivated to seek
advice from MYCIN,particularly in light of the other problems noted
below.
Our second concern was our inability to integrate MYCINnaturally
into the daily activities of" practitioners. The program required a special
incremental effort on their part: once they had decided to consider giving
a patient an antibiotic, it would have been necessary to find an available
terminal, log on, and then respond to a series of questions (many of which
were simply transcriptions of" lab results already knownto be available on
other computers at Stanford). Linkage of SUMEX (MYCINs"home" com-
puter) to Stanford lab machines was considered but rejected because of
lack of resources to do so and the realization that a research machine like
SUMEX would still have been unable to offer high-quality reliable service
to physician users. Whenthe machine was heavily loaded, annoying pauses
between MYCINsquestions were inevitable, and a total consultation could
have required as long as 30 minutes or an hour. This was clearly unac-
ceptable and would have led to rejection of the system despite its other
strong features. Slight annoyances, such as the requirement that the phy-
sicians type their answers, would have further alienated users. Adapting
MYCINto run on its own machine was an unrealistic answer because of
the computational resources needed to run a program of that size (at that
time) and our lack of interest in trying to adapt the code for a non-Interlisp
environment. 1
Thus, as of late 1978, MYCINbecame a static system, maintained on
SUMEXfor demonstration purposes and for student projects but no
longer the subject of" active research. In addition, in the subsequent five
years its knowledge base has become rapidly outdated, particularly with
regard to antimicrobial agents. The "third-generation" cephalosporins
have been introduced in the intervening years and have had a profound
effect on antibiotic selection for a number of commonproblems in infec-
tious disease (because of their broad spectrum and low toxicity relative to
older agents). This point emphasizes the need [or knowledge base main-
tenance mechanisms once expert systems are introduced for routine use
in dynamic environments, where knowledge may change rapidly over time.
Even though MYCIN is no longer a subject of active work, the exper-
iments described in this book have been a productive source of new in-
sights. In this final section to the book, we describe related pieces of work
that show some of the ways in which MYCINhas influenced our research

tThe CON(;EN programwithin DENDRAL had just been recoded from Interlisp to BCPL,
and wewereacutely awareof the manpowerinvestmentit tookby someoneintimately familiar
withthe designandcode. This efli~rt couldonlyhavebeenundertakenunderthe conviction
that the result wouldbe widelyused.
TheInterfaceLanguage
for Physicians 601

activities in the areas of human engineering and user attitudes. Our new
work on ONCOCIN,for example, has been based on underlying knowl-
edge structures developed for MYCINbut has been augmented and re-
vised extensively because of our desire to overcome the barriers that pre-
vented the clinical implementation of MYCIN.Our attitude on the
importance of human factors in designing and building expert systems is
reflected in the title of a recent editorial we prepared on the subject: "Good
Advice is Not Enough" (Shortliffe, 1982b).

32.1The Interface Language for Physicians

It was never our intention to become enmeshed in the difficult problems


of understanding unconstrained English. Work in computational linguis-
tics achieved important resuhs during the 1960s and 1970s, but we saw
the problems as being extremely difficult and were afraid that our progress
in other areas would be slowed if we became overly involved in building
language capabilities for MYCIN.We did spend time ensuring that the
program could express itself" in English, but this was not difficult because
of the stereotypic form of the rules and the power of LISE We totally
aw)ided any need for the program to understand natural language during
the consultation (depending instead on HOW,WHY,and EXPLAINcom-
mands as described in Chapter 18), but we did build a simple question-
answering (QA) system that was available electively at the end of the advice
session. Although it was possible to get answers to most questions using
the QA module, the system was not very robust, and it took new users
some time to learn how to express themselves so that they would be under-
stood. Once again, the capability that was developed for question answer-
ing (which was borrowed for the TEIRESIASwork; see Chapter 9) was
greatly facilitated by the highly structured and uniform techniques for
knowledge representation that we had used.
It is important to note that our desire to avoid natural language pro-
cessing accounts in large part fbr the decision to use goal-directed (back-
ward-chained) reasoning in MYCIN.If we had simply allowed the user to
start a consultation by describing a patient, it would have been necessary
that MYCINunderstand such text descriptions before beginning forward-
chained inw)cation of rules. By using a backward-chained approach, MY-
CI N controlled the dialogue and therefore could ask specific questions that
generally required one- or two-word answers.
From a human-engineering viewpoint, this decision was suboptimal,
even though, ironically, it was made to avoid language-understanding
problems that we knew would have annoyed physician users. The problem
that resuhed from having MYCINcontrol the dialogue was the inability
602 Human
Engineeringof MedicalExpertSystems

of the user to volunteer information, meaning that he or she had to wait


for MYCINto ask about what was known to be a crucial point. Alain
Bonnet, a postdoctoral fellow from France, was fascinated by this problem
when he visited our group in the mid-1970s. He decided to look for ways
in which MYCINsknowledge structures could be augmented to permit
volunteered information about a patient at the beginning of a consultation
session. His work on this subsystem, known as BAOBAB, is described in
Chapter 33. The complexity of the issues that needed to be addressed in
building such a capability are clear in that article. Fascinating though the
work was, BAOBAB never functioned at a performance level sufficiently
high to justify its incorporation into MYCIN.
Despite the limitations of its language capabilities, we are generally
pleased with the ability of MYCIN and the EMYCIN systems to appear to
converse in English through the use of rather simple techniques of" text
generation and understanding. This conversational appearance of the pro-
gram is due to the combined efforts of several project membersand to the
flexibility of the underlying knowledge structures used. Issues in compu-
tational linguistics in the EMYCIN environment continue to be fruitful
areas of investigation for student projects. As recently as 1980, a medical
student and research assistant, Lou Sanner, added code to MYCINthat
was able to generate prose summaries of patients from our library of old
cases. His generalized approach to the problem was added to EMYCIN
and generates prose descriptions of stored cases from any EMYCIN do-
main. An example of one of his MYCIN case translations is shown in Figure
32-1.

32.2Assessing Physicians Attitudes


As many of the early papers in this volume indicate, we proceeded through
the 1970s with the firm conviction that AI techniques offered potential
solutions to problems that had limited physicians acceptance of advice-
giving systems. Wewere especially convinced that explanation capabilities
were crucial for user acceptance and that this single failing in particular
largely accounted for the rejection of systems based solely on statistical
approaches. As is discussed in Chapter 30, we could not prove that expla-
nations would make a difference unless we implemented a consultation
system in a clinical environment where controlled studies could be under-
taken. Thus we had depended on our intuitions and appealed to others
to believe in what we felt was an obvious requirement for optimal systems.
In 1980, however, a combination of events encouraged us to undertake
a formal analysis of physicians attitudes. Wehad toyed with the idea for
several years but had been discouraged by the time and resources necessary
Assessing Physicians Attitudes 603

A summary is nowbeing generated:


[consultationof 7-May-77
6:00PM]
Pt600is a 33year old Caucasianfemalewith clinical evidence
of otitis mediawhohasneurologicalsigns of 5 hoursandsymptoms of 1
dayduration. Sheis febrile andweighs70 kgm.Shehasimpaired
renalfunction.Sheis 4 + sick (on a scaleof 4). Thepatient
thoughtto havea csf infection symptomaticfor 1 day.
TESTRESULTS:
CBC: WBC 25K PMNS 85% Bands 12%
CSF: WBC 12500 PMNS 98%
glucose 25 (bloodglucose140)
protein 450
recentserumcreatinine1
CULTURES: Whenobtained: Organisms
csf 6 hoursago Gramneg
rod
Grampos
coccusin pairs
DRUGS:
Erythromycin
wasstarted (oral) 30hoursago.

FIGURE 32-1 Example of a MYCIN case summary.

to do such a study well. In August of 1980 Stanford hosted the annual


Workshopon Artificial Intelligence in Medicine, and we organized a two-
day tutorial program so that local physicians who were interested could
learn about this emerging discipline. In addition, funding from the Henry
J. Kaiser Family Foundation allowed us to support a questionnaire-based
project to assess physicians attitudes. Finally, a doctoral student in edu-
cational psychology, Randy Teach, joined the project that summer and
brought with him much-neededskills in the areas of statistics, study design,
and the use of computer-based statistical packages.
The resulting study used the physicians who were attending the AIM
tutorial as subjects, with a control group of M.D.s drawn from the sur-
rounding community. Chapter 34 summarizes the results and concludes
with design recommendations derived from the data analysis. The reader
is referred to that chapter for details; however, it is pertinent to reiterate
here that a programs ability to give explanations for its reasoning was
judged to be the single most important requirement for an advice-giving
system in medicine. This observation accounts for our continued commit-
ment to research on explanation, both in the ONCOCIN program (Lang-
lotz and Shortliffe, 1983) and in current doctoral dissertations from the
Heuristic Programming Project (Cooper, 1984; Kunz, 1984). Other results
of the attitude survey reemphasize the importance of human-engineering
issues (such as ease of use and access) in the design of acceptable consulting
systems.
604 Human
Engineeringof MedicalExpertSystems

32.3Clinical Implementation of an Expert System

It seems appropriate that we close a book about the MYCIN"experiments"


with a description of ONCOCIN,MYCINsmost recent descendent. The
problem domain for this program was selected precisely because it seemed
to offer an excellent match between the problem-solving task involved and
the set of pragmatic considerations that we outlined at the beginning of
this chapter. Chapter 35 describes ONCOCINs task domain in some detail
and discusses the knowledge structures and architecture used to heighten
its clinical effectiveness. However,Chapter 35 does not discuss the logistics
of implementation that are among the newest lessons learned by our
group. Thus what follows here is a description of our experience with
ONCOCINsimplementation. Much of the discussion is drawn from a re-
cent paper written by members of" the ONCOCIN project (Bischoff et al.,
1983). The reader may find it usefill to study the technical description in
Chapter 35 befbre reading this discussion of what has happened since the
system was introduced for clinical use.
ONCOCINassists physicians with the management of patients en-
rolled in experimental plans (called protocols) fi)r treating cancer with
chemotherapy. The system has been in limited use in the Stanford Oncol-
ogy Clinic since Mayof 1981. The potential utility of such a system has
been recognized at several major cancer treatment centers, and other
groups have been developing systems to assist with similar tasks (Horwitz
et al., 1980; Blumet al., 1980; Wirtschafter et al., 1980). Since the core of
knowledge about oncology protocols is defined in protocol documents, the
domain of cancer chemotherapy has the advantage of" having a readily
available source of" structured knowledgeof the field. The ongoing involve-
ment of oncologists with ONCOCIN,both as research colleagues and as
potential users, has provided additional expertise and highly motivated
collaboration in knowledge base development. Wecurrently have encoded
the protocols for Hodgkins disease, non-Hodgkins lymphoma, breast can-
cer, and oat cell carcinoma of the lung~ and will be adding all of the other
treatment protocols employed at Stanfi)rd. It should be emphasized that
the resulting computer-based protocols include both the specific rules
gleaned from the protocol documents and some additional judgmental ex-
pertise from our experts, who have defined the ways in which the system
:~
ought to respond to unusual or aberrant situations.

ZTheoat cell protocolis the mostcomplex protocolat Stanford.It wasimplemented to verify


that our representationschemewouldapplyto essentially anyof the protocolscurrentlyin
use. However,it has not yet beenreleasedfl)r routineuse, pendingits thoroughtesting.
:~In orderto designa program that couldbe operationalin the short tewn,our initia~ design
plan wasconsciouslyto avoidmajortheoretical barriers suchas management of inexactrea-
soningand generalizedmethodsfi)r temporalreasoning.
Clinical Implementation
of an ExpertSystem 605

32.3.1 System Design

ONCOCINs system design is a result of the combined efforts of an inter-


disciplinary group of computer scientists, clinicians, statisticians and sup-
port staff, totaling 29 individuals. System design began in July of 1979.
From the outset, the logistics of how a consultation system could fit into
the busy ontology clinic were a crucial design consideration; one of our
thst tasks was to study the flow of information within the clinic. Weasked
the oncology fellows about their attitudes regarding computers and asked
them to assess the potential role of" such technology in the oncology clinic.
A Stanford industrial engineer with experience in the area of humanfac-
tors was consulted during the iterative phase of interface design. Program-
mers would offer mock demonstrations to those with little or no computer
expertise. After getting comments and suggestions on the demonstration,
moditications were made, and a new mock-up was presented. This process
was repeated until all felt satisfied with the interaction. Design decisions
of this type were discussed at regular research meetings involving both
physicians and computer scientists.
The design of the reasoning program, which is written in Interlisp
and uses AI representation techniques (see Chapter 35), was affected
our desire to create a system that provides rapid response. The original
ONCOCIN prototype used keyboard-oriented interactive programs bor-
rowed from EMYCIN.As was mentioned earlier in this chapter, we knew
from our previous work, however, that this type of interaction would be
too tedious and time-consuming for a busy clinic physician. A physician
using MYCINoften had to wait while questions were generated and rules
were tried. The use of" the EMYCIN interface, however, enabled us to
create the programs knowledge base and to evaluate its therapy recom-
mendations while we were concurrently deciding on the interface design.
The ultimate interface incorporates a fast display program that is separate
from the AI reasoning program (Getting et al., 1982). Thus ONCOCIN
is actually a set of independent programs that run in parallel and com-
municate with each other.
Ama, jor design goal was to have ONCOCIN used directly by physicians
at the time of a patients visit to the clinic for chemotherapy. One way to
encourage physicians inw)lvement was to make the system easily accessible
while providing a wuiety of hard-copy reports that had previously either
not existed or required manual preparation. A computer-generated sum-
mary sheet is produced in the morning for each scheduled patient enrolled
in one of the protocols handled by the computer. The summary sheet is
attached to the patients chart and serves as a reminder of the patients
diagnosis and stage, expected chemotherapy, and any recent abnormal
laboratory wdues or toxicities. A centrally located video display terminal is
used by the oncologist after the patient has been examined. The physician
interacts with ONCOCINshigh-speed data acquisition program (the In-
terviewer). While the clinician is entering data through the Interviewer, that
606 Human
Engineeringof MedicalExpertSystems

program is passing pertinent answers to the reasoning program (the Rea-


soner), which uses the current patient data, the past history, and the pro-
tocol assignment to formulate a treatment plan. By the time data entry is
complete, the Reasoner has generally completed its plan formulation and
has passed the results back to the Interviewer, which in turn displays the
recommendation to the user. The physician can then agree with or modify
the systems treatment recommendation, make adjustments to the labora-
tory and x-ray tests suggested for the patient by ONCOCIN, and end the
session. Progress notes are produced on a printer near the ONCOCIN
terminal so they can be easily removed, verified and signed by the physi-
cian, and then placed in the hospital chart. After the session the computer
also generates an encounter sheet, which lists the tests to be ordered, when
they should be scheduled, and when the patient should return to the clinic
for his or her next visit. This information is generated on a second printer
located at the front desk, where these activities are scheduled.
The system design attempts to prevent the computer system from
being perceived as an unwanted intrusion into the clinic. The physician/
computer interaction takes the place of a task that the physician would
otherwise perform by hand (the manual completion of a patient flow sheet)
and requires only 5 to 7 minutes at the terminal. A training session of 30
minutes has been adequate fbr physicians to achieve independent use of
the system, and the hard-copy reports assist the physicians with their re-
sponsibilities. Because we were eager to make the system as flexible as
possible and to simulate the freedom of choice available to the physicians
when they fill out the flow sheets by hand, the program leaves the users
largely in control of the interaction. Except for the patients white cell
count, platelet count, and infi)rmation about recent radiation therapy (key
issues in determining appropriate therapy), the physicians may enter what-
ever information they feel is pertinent, leaving some fields blank if they
wish. An important evaluative issue that we are accordingly investigating
is whether ONCOCIN encourages more complete and accurate recording
of the flow sheet data despite the users ability to skip entries if he or she
wishes to do so. Users may enter data into the flow sheet format in whatever
order they prefer, skipping forward or backward and changing current or
old answers. This approach is radically different from that used in MYCIN
in that the physician decides what information to enter and the reasoning
can proceed in a data-directed fashion. Data entry in a flow sheet format
aw)ids the problems of natural language understanding that prevented this
approach in MYCIN.

32.3.2 Terminal Interface

The system incorporates a special terminal interface to ensure that a busy


clinician can find ONCOCINfast and easy to use, as well as simple to learn.
The physician interacts with a high-speed (9600 baud) video display ter-
Clinical Implementation
of an ExpertSystem 607

Jump
Ahead
\

Next
Blank
\ \

FIGURE32-2 ONCOCINs21-key pad.

minal with multiple windows, simulating the appearance of the conven-


tional paper flow sheet. Simulation of the form makes the interaction more
comfortable and familiar.
A customized keyboard was designed for data entry. It allows the phy-
sician to enter the flow sheet information using a 21-key pad (Figure
32-2), which is located to the right of a conventional terminal keyboard.
Weconsidered light pens and touch screens but felt that they were either
too expensive or too unreliable at the present time. Furthermore, a simple
key pad was adequate for our needs. The layout of the key pad is simple
and self explanatory. Ten of the keys make up a number pad, which is laid
out the same way as the numbers on push-button telephones. Our human
factors consultant recommendedthis arrangement because we could safely
assume user experience with push-button telephones, while user experi-
608 Human
Engineeringof MedicalExpertSystems

ence with a calculator-style number pad would be likely to be more limited.


The other keys on the pad are "Yes" and "No" keys, and cursor control
keys. The labels on the cursor control keys suggest that the user is filling
in the blanks on a paper form, for example, "Next Blank," "Clear Blank,"
"Jump Ahead," etc. Our human factors consultant suggested using this
terminology instead of terms including the word "Field" (e.g., "Next
Field"), which are information-processing terminology and not as intuitive
for naive computerusers. This decision reflects our general effl)rt to awfid
computer jargon in talking with physicians, printing text on the terminal
screen, or communicating with them in memos.

32.3.3 Display Design

The design of the display is derived from the paper flow sheet used for
manyyears fl)r protocol data gathering and analysis. The display screen is
divided into four sections as indicated in Figure 32-3:

a. the explanation field, which presents the justification for the recommen-
dation indicated by the user-controlled cursor location (the black block
in the figure)
b. the message.field, which identifies the patient and provides a region for
sending pertinent messages from ONCOCIN to the physician
c. theflow sheet, which displays a region of the conventional hard copy flow
sheet; the display includes columns for past visits, and the physician
enters data and receives recommendations in the right-hand column
d. the soft key ident!fiers, labels that indicate the special functions associated
with numbered keys across the top of the terminal keyboard

Note that when the physician is entering patient data, the explanation
field specifies the range of expected entries for the item with which the
cursor is aligned. Whenthe system has recommendedtherapy (as in Figure
32-3), the explanation field provides a brief justification of the drug dosage
indicated by the cursor location.

32.3.4 Integration into the Clinic

To make ONCOCINs integration into the clinic as smooth as possible, we


scheduled clinic meetings led by the oncology members of our research
team. At one early meeting to announce that the system would soon be
available, we gave a system demonstration and held a discussion of our
project goals. Individual training sessions were then scheduled to teach
each physician how to use the system. These orientation sessions were brief
and informative. They stressed that the physician is the ultimate decision
609

II i,

~0 CO -e-
..o c5~.

go w~
~a"7
~oa
0
~ .00 R
{g Z
0
Z
o

W~ m,
e~
0
CO
g,,-:
"004
g~

n
Cxl

n"
w z,. E~
~ oE~
o~ ~ .;he
610 Human
Engineeringof MedicalExpertSystems

maker about the patients care, and that the computer-based consultant is
intended to remind the physician about the complex details of the proto-
cols and to collect patient data. Membersof our group meet with oncology
faculty and physicians occasionally to give them progress reports on our
research.
Wealso enlisted the help of" a data manager who is responsible for
training sessions, ensures that on-line patient records are current, and sees
that the system runs smoothly. The data manager is available whenever
the system is running in the clinic and offers assistance when necessary.
This role has proved to be particularly crucial. The data manager is the
most visible representative of our group in the clinic (other than the col-
laborating oncologists themselves). The person selected for this role there-
fore must be responsible, personable, tactful, intelligent, aware of the sys-
tems goals and capabilities, and able to communicateeffectively with the
physicians. If the person in this role is unable to satisfy these qualifications,
he or she can make system use seem difficult, undesirable, and imposing
to the physician users.
Integration of the system into the clinic was planned as a gradual
process. Whenthe system was first released, the program handled a small
number of patients and protocols. As the program became more familiar
to the physicians, we added more patients to the system. Weare in the
process of adding new protocols, which in turn will mean additional pa-
tients being handled on the computer. ONCOCIN was initially available
only three mornings per week. It is now available whenever patients who
are being followed on the computer are scheduled. This plan for slow
integration of the system into the clinic has made ONCOCINs initial re-
lease less disruptive to the clinic routine than it would have been if we had
attempted to incorporate a comprehensive system that handled all patients
and protocols from the onset. This method of integration has also allowed
us to fine-tune our system early in its development, based on responses
and suggestions from our physician users.

32.3.5 Responses and Modifications to the System

After the systems initial release, the data manager and the collaborating
oncologists collected comments and suggestions from the physicians who
used the system. We have made numerous program changes in response
to suggestions for modifications and desirable new features. Wehave also
conducted a number of fi)rmal studies to evaluate the impact of the system
on physicians attitudes, the completeness and accuracy of data collection,
and the quality of the therapeutic decisions.
Wesoon learned that some of our initial design decisions had failed
to anticipate important physician concerns. For example, if the Reasoner
needed an answer to a special question not on the regular flow sheet form,
Clinical Implementation
of an ExpertSystem 611

our initial approach was to have the Interviewer interrupt data entry to
request this additional information. The physicians were annoyed by these
interruptions, so we modified the scheme to insert the question less obtru-
sively on a later section of" the flow sheet, and to stop forcing the physician
to answer such questions.
Another concern was that ONCOCIN was too stringent about its drug
dosage recommendations, requesting justifications from the physician even
for minor changes. Weneeded to take into account, for example, that a
different pill size might decrease or increase a dose slightly and yet would
be preferable for a patients convenience. Wesubsequently obtained from
the oncologists on our team ranges for each chemotherapeutic agent,
within which any dosage modifications could be considered insignificant.
4Such minor modifications no longer generate requests for justification.
We also modified the program to recommend the same dose that the phy-
sician prescribed during a prior visit if that recommendationis within the
acceptable range calculated by the program.
Some system users also asked whether the program could generate a
progress note for the patients visit. Whenwe developed this feature and
installed a small printer to prepare these notes in duplicate, use of the
system was immediately made more desirable because this capability saved
the physician the time required to dictate a note. This feature also helps
to encourage the physician to enter relevant data completely and accurately
because the quality of the resulting progress note is dependent on the data
entry process.
Whenthe system was first released, it was available only on the three
mornings per week when the majority of lymphoma patients were seen
(the computer, a DECSystem 2020, is used at other times by other mem-
bers of our research community). This allowed us to provide rapid re-
sponse time through an arrangement for high-priority use of the com-
puter. Since some lymphomaprotocol patients were seen at other times,
however, there were continuing problems in keeping the computer-based
files up to date and thus in establishing ONCOCINs role as a reliable aid
for the managementof that subset of patients. In response to this problem,
we have made the system available whenever a patient known to the system
is seen in the clinic. Whenthe physician initiates a consultation, the pro-
gram checks to see if" the computer response is likely to be slow and, if so,
prints out a warning to that effect. The physician may then either abort
the session or proceed with the anticipation that the interaction will take
longer than usual. Wehave found that the physicians understand and
appreciate this feature and will often continue despite the delays.

~Ct, rrent research is also investigating au adaptationof ONCOCINsrecommendationscheme


whereby it will critique tl~e physiciansowntherapyplanandgiveadviceonlywhenspecifically
requestedto do so (l.anglotz andShnrtliffe, 1983).
612 Human
Engineeringof MedicalExpertSystems

32.3.6 Lessons Learned

It is clear that in order for a computer-based consultant to be effective in


a clinical setting, the overall system design must take into account both the
needs of the intended users and the constraints under which they function.
This is the central theme of the lessons that we have learned from the
MYCINand ONCOCINexperiences. The program must be designed to
satisfy a need for consultation and to provide this assistance in a fast, easy-
to-use, and tactful manner. It should ideally avoid an incremental time
commitmentor an increase in the responsibilities of its users, or they will
tend to resist its use. Wehave found that providing extra information-
processing services, such as printing progress notes for the physicians,
significantly heightens the systems appeal.
For ONCOCIN to have an effective role as a physicians assistant, pro-
viding both data managementfunctions and consultations on patient treat-
ment, it needs to be part of the daily routine in the clinic. Because of" the
limited number of patients and protocols currently on the system, ON-
COCINis still an exception to the daily routine; this will change as more
protocols are encoded and the system is transferred to dedicated hardware.
We are planning to move ONCOCINto a personal workstation (a LISP
machine capable of handling large AI programs) so that it will be self-
contained. As it becomes the principal record-keeping system in the on-
cology clinic and enables the oncologists to receive useful advice for essen-
tially all of their patient encounters, ONCOCIN will become successfully
integrated into the clinic setting. The next stage will be to disseminate the
system, mounted on single-user workstations, into other settings outside
Stanford.
Physician involvement in the design of ONCOCIN has been crucial in
all aspects of the system development. The collaborating oncologists pro-
vide answers to questions that are unclear from the protocol descriptions,
evaluate the programs recommendations to ensure they are reasonable,
offer useful feedback during the development of the user interface, and
provide advice about how the computer-based consultation system can best
fit into the clinic setting. Their collaboration and that of the computer
scientists, medical personnel, and others in our interdisciplinary group (all
of whomare committed to the creation of" a clinically useful consultation
tool) have combined to create a system for which limited integration into
a clinical setting has been accomplished. Weexpect that total integration
will be feasible within the next few years.
33
Strategies for
Understanding Structured
English

Alain Bonnet

Psychological work on memory,in particular by Bartlett (1932), has led


the conclusion that people faced with a new situation use large amounts
of highly structured knowledge acquired from previous experience. Bart-
lett used the word schema to refer to this phenomenon. Minsky (1975),
his famous paper, proposed the notion of a frame as a fundamental struc-
ture used in natural language understanding, as well as in scene analysis.
I will use the former term in the rest of this chapter, in spite of its general
connotation.
The main thesis defended by Bartlett was that the phenomena of
memorization and remembering are both constructive and selective. The
hypothesis has more recently been revived by psychologists working on
discourse structure (Collins, 1978; Bransford and Franks, 1971; Kintsch,
1976). Various experiments performed on subjects who were told stories
and then asked to describe what they remembered showed that people not
only forget facts but add some. Moreover, they are unable to distinguish
between what they have actually heard and what they have inferred. People
hearing a story make assumptions, which they might revise or refine as
more information comes in, either confirmatory or contradictory. Making
such assumptions entails building (or retrieving) models of the expected
text contents. A corollary of this process is that if the story adequately fits
the model people have in mind, the story will be understood more easily.
Although it is difficult to give a formal definition of what constitutes
a coherent text, it is an accepted notion that sentences that comprise it are

This chal)ter is based on a technical memo(HPP-79-25) from the Heuristic Programming


lh(~iect, l)cparmlent of Computer Science, Stanford University. Used with permission.

613
614 Strategiesfor Understanding
StructuredEnglish

linked by cause-effect relationships, chronological orderings, and the like.


Flashbacks are not contradictory with coherence, but they can make the
text more difficult to comprehend. Texts dealing with specific domains
seem to be structured in terms of topic. Consequently, an important prob-
lem to face is recognizing the different topics and deciding when a shif t in
topic occurs.
Several frame-based languages, such as KRL(Bobrow and Winograd,
1977), the "units package" (Stefik, 1979), and FRL(Roberts and Goldstein,
1977), implement the basic concepts underlying frames, or schemata.
schema contains slots. They can be viewed as variables that will be bound
to data. Each slot contains "facets" (FRL), "aspects" (units package),
"descriptors" (KRL), which specify howto fill the slots, for example, spec-
ifying the type of values acceptable (numeric, strings of characters), the
range of possible values, values to assign by defauh, or attached procedures
describing what to do if the slot is filled in (this is a wayto makeinferences).
Slots may be organized into hierarchical schemata, in which case values
may be inherited from one schema to a more specialized one. This hier-
archy and concomitant inheritance avoids any duplication of common
properties.
BAOBAB is an experimental natural language interface to the MYCIN
system and is based on the idea that clusters of topics can be represented
in framelike structures. Its design has been inspired by observations
regarding text coherence, shifts in topic, temporal orderings, etc. Figure
33-1 shows a short dialogue between a physician and the program. The
English that is generated is somewhatstilted, but its main purpose is to be
nonambiguous. In this chapter are presented the principles and methods
that permitted the implementation of a program with these capabilities.

33.1Related Work and Goals

AI researchers have explored a variety of strategies to recognize shifts in


topic occurring during dialogues or written texts. One issue faced is the
necessity to narrow the space of" possible referents of a linguistic object by
focusing at different levels of" detail. Thus Grosz (1977) studied the role
of focus in the interpretation of utterances and its relation to domain
structure. She used the task structure to resolve definite noun phrases in
task-oriented dialogues. Sidner (1979) extended this work to determine
the use of focusing in the resolution of pronoun references and other kinds
of anaphora occurring in dialogues. Rosenberg (1977) has studied how
themes are linked via references in newspaper articles.
Another major reason for focusing is to avoid combinatorial explosion.
The fear of such an explosion is mainly motivated by the large number of
inferences that may be made if all possible frames are activated. In fact,
Related Workand Goals 615

I amready
** THISIS A 26 YEAR
OLDMALE
PATIENT
Myunderstanding
is:
Theageof thepatient
is 26
Thesexof thepatient
is male
** FIVE DAYSAGO,HE HADRESPIRATORY-TRACT
SYMPTOMS
Whatis his name?
** JO
Myunderstanding
is:
Thename
of thepatientis Jo
Respiratory-tract
is oneof thesymptoms
that thepatienthad
** A COUPLE OFDAYS BEFORE THEADMISSION,
HEHADA MALAISE
Please
give methedateof admission
** MARCH12, 1979
Myunderstanding
is:
Thepatient
wasadmittedat thehospital
3 daysago
Malaise
is oneof thesymptomsthatthepatient
had5 daysago

FIGURE33-1 Short sample dialogue. The physicians inputs


appearin capital letters after the double asterisks.

some of the frames might rule out others, thus enabling the space of
possible inferences to be pruned. This isstie has also been raised by Char-
niak (1978). Embodying world knowledge in frames (Minsky, 1975)
scripts (Abelson, 1973; Schank and Abelson, 1975) led to the development
of" programs that achieved a reasonably deep level of understanding, for
example, GUS (Bobrow et al., 1977), NUDGE(Goldstein and Roberts,
1977), FRUMP(DeJong, 1977) and SAM (Cullingford, 1977).
BAOBABand the other programs mentioned so far have a common
feature: they do not interpret sentences in isolation. Rather, they interpret
in the context of an ongoing discourse and, hence, use discourse structure.
BAOBAB also explores issues of (a) what constitutes a model for structured
texts and (b) how and when topic shifts occur. However, BAOBABis in-
terested neither in inferring implicit facts that might have occurred tem-
porally between facts explicitly described in a text nor in explaining inten-
tions of characters in stories (main emphases of works using scripts or
plans). Our program focuses instead on coherence of texts, which is mainly
a task of detecting anomalies, asking the user to clarify vague pieces of
information or disappointed expectations, and suggesting omissions. The
domain of application is patient medical summaries, a kind of text for
which language-processing research has mainly consisted of filling in for-
matted grids without demanding any interactive behavior (Sager, 1978).
BAOBABsobjectives are to understand a summary typed in "natural med-
616 Strategiesfor Understanding
StructuredEnglish

icaljargon" by a physician and to interact by asking questions or displaying


what it has understood.
The program uses a model of" the typical structure of medical sum-
maries, which consists of a set of related schemata, described below.
BAOBAB uses both its medical knowledge and its model of" the usual de-
scription of" a medical case to interpret the dialogue or the text and to
produce an internal structure usable by MYCIN.The program then uses
this information to guide a standard consultation session.
BAOBAB behaves like a clerk or a medical assistant who knows what
a physician has to describe and how a malady is ordinarily presented. It
reacts to violations of the model, such as a description that ignores symp-
toms or that fails to mention results of cultures that have been drawn. It
does not attempt to use its knowledgeto infer any diagnosis but, in certain
cases, can draw inferences that will facilitate MYCINstask. BAOBAB uses
these capabilities to establish relationships between the concepts stated.
This facilitates interpretations of what is said. For example, BAOBAB
knows that "semi-coma" refers to the state of consciousness of the patient
and "hyperthyroidism" to a diagnosis. One use of the program would be
to allow the physician to volunteer information before or during the con-
sultation. This feature would respond to the commonfrustration expressed
by some users who object to having to wait for MYCIN to ask a key question
before they can tell it about a crucial symptom.
BAOBAB consists of (a) a parser that maps the surface input into
internal representation, (b) a set of schemata that provide a model of the
kind of" information that the program is ready to accept and of the range
of inferences that it will be able to draw, (c) episode-recognition strategies
that allow appropriate fl)cusing on particular pieces of the texts, and (d)
an English-text generator used to display in a nonambiguous fashion what
has been understood. As described in Chapter 5, this generator was already
available in MYCIN.The main emphasis here wilt therefore be on the
description of schemata and schema-activation strategies. These techniques
have been successfully implemented, using Interlisp (Teitelman, 1978),
a program connected with MYCINsdata base and running on the SUMEX
computer at Stanford.

33.2Schemata and Their Relations

Medical summaries can be viewed as sequences of episodes that correspond


to phrases, sentences, or groups of sentences dealing with a single t:)pic.
Each such topic may be represented by a schema. Processing and under-
standing a text consist of mapping episodes in the text onto the schemata
that constitute the model. Matching a schema can be discontinuous; that
is, two episodes referring to the same schema need not necessarily be jux-
Schemata
andTheirRelations 617

taposed (they might be separated by an episode referring to another


schema). Wewill refer to this phenomenonas a temporary schema-shift.
A typical scenario is as follows. The medical case is introduced with
general information, such as the date and the reason for admission to the
hospital. Then the patient is presented (name, age .... ). Symptoms(noted
by the patient) and signs (observed by the physician) are described.
physical exam is usually performed, and cultures are taken for which re-
sults are pending or available. The structure of such a text can be captured
in a sequence of schemata, one of which is shown in Figure 33-2. These
texts are usually well structured. Redundancies can appear, but discrep-
ancies are rather rare (although they must be detected when they occur).
Expectations are usually satisfied.
A typical BAOBAB schema contains domain-specific knowledge and
resembles a frame (Minsky, 1975) or script (Schank and Abelson, 1975)
unit (Stetik, 1979). Relevant slots define expected values, default values,
and attached procedures. Attributes relating to the same topic are gathered
into these schemata. There is some overlap between them (such as
WEIGHT, which can occur in the identification of the patient as well as in
the results of a physical exam). Each schema contains two types of slots:
global slots (comlnents, creation date, authors name, how to recognize the
schema, what is the preferred position of" the schema within summaries)
and individual slots (which correspond to MYCINsclinical parameters).
Each individual slot contains facets specifying howto fill it in or what actions
to take when it has been filled in (by procedural attachment).
Global slots are mainly used to decide whether a part of the text being
analyzed suggests or confirms a schema or how the confirmation of one
schema causes another one to be abandoned. The slots CONFIRMED-BY
and SU(;(;ESIEI)-BY point to lists of slots belonging to the schema.
first defines the schema(characteristic slots), whereas the other is nones-
sential for confirnfing the schema. The slots TERMINATED-BY and
I~REF-FOLLOWED-BY specify relationships of mutual exclusion and par-
tial ordering between schemata. All these slots are described in more detail
in the section dew)ted to strategies [or activating schemata. Nonglobal slots
are always attributes grouped within a schema. Each is, in turn, a schema
whose slots are the facets mentioned above (Roberts and Goldstein, 1977).

33.2.1 An Example of a Schema

In the $DESCRIPT schema (Figure 33-2), the first three global slots (AU-
TH()R, CREATION-I)ATE, and COMMENT)are used for documenta-
tion, whereas the next four are used to define strategies for schema-shifts
(see below). Then six individual slots (corresponding to parameter names)
define the schema. Each of them is described by subslots, or facets, some
of which (e.g., EXPECT, TRANS, LEGALVALS, CHECK, PROMPT) al-
ready exist in the structure of MYCINsknowledge base. Others have been
618 Strategies for Understanding Structured English

~.OE_sg.R!P_T_
AUTHOR:BONNET
CREATION-DATE: OCT-10-78
COMMENT:
Patient identification
CONFIRMED-BY: (NAMEAGESEXRACE)
TERMINATED-BY: ($SYMPTOM)
SUGGESTED-BY: (WEIGHTHEIGHT)
PREF-FOLLOWED-BY: ($SYMPTOM)

NAME
EXPECT:ANY
TRANS:
("the nameof" *)
TOBEFILLED:T
WHENFILLED:DEMONNAME
AGE
EXPECT: POSNUMB
TRANS:
("the age of" *)
CHECK:
(CHECK VALU0 100.0(LIST "Is the patient really"
VALU "years old?") T)
TOBEFILLED: T
WHENFILLED:SETSTATURE
SEX
EXPECT:(MALEFEMALE)
TRANS:
("the sex of" *)
TOBEFILLED:T
WHENFILLED:SEXDEMON
RACE
EXPECT:(CAUCASIAN BLACKASIANINDIAN LATINOOTHER)
TRANS:
("the race of" *)
WEIGHT
EXPECT: POSNUMB
TRANS:
("the weightof" *)
CHECK:
(CHECK VALULIGHTHEAVY (LIST "Does the patient
really weigh"VALU
"kilograms?")T)
HEIGHT
EXPECT: POSNUMB
CHECK:
(CHECK VALUSMALL TALL(LIST "Is the patient
really" VALU
"centimeters
tall?") T)

FIGURE
33-2 Schemaof a patient description.

created to allow the program to intervene during the course of" the dia-
logue. For example, when the slot TOBEFILLED holds the value T (true),
it means that the value of the variable must be asked if the physician does
not provide it. The WHENFILLED feature specifies a procedure to run
as soon as the slot is filled in. This is the classic way of" makinginferences.
For example, SETSTATUREspecifies narrower ranges of weight and
height for a patient according to his or her age.
Schemata
andTheirRelations 619

33.2.2 Facets

Expected and legal values. EXPECTis used for single-valued param-


eters, whereas LEGALVALS is used for multi-valued parameters (see
Chapter 5). They both give a list of" possible values for an attribute.

Linguistic information. TRANSalways contains a phraSe in English


describing the parameter; it is used for generating translations of rules
and other semantic entities. PROMPT contains a question, in English, that
asks the user about the corresponding parameter. It is used, in addition to
the usual way MYCIN asks for information, to clarify a concept recognized
as "fuzzy." For example, entry of the clause "THE PATIENTDRINKS6
CANS OF BEER EVERY MORNING"leads BAOBABto ask "Is the pa-
tient alcoholic?" since MYCINhas no explicit knowledge about alcoholic
beverages, but can recognize such keywords as drink or alcohol. CHECK
contains a question that can be used to request verification whenever a
value outside the normal range has been given.

TOBEFILLED.If the TOBEFILLED facet of an attribute is set to T


(true), it meansthat the slot has to be filled. Concretely, this means that
the slot has not yet been filled when the schema is abandoned, the attached
request will be carried out. This does not necessarily mean that the param-
eter is essential fiom a clinical point of view; it maybe essential for com-
munication purposes.

33.2.3 Procedural Attachment

In BAOBAB,there are two kinds of procedural attachment. The first,


called WHENFILLED, allows associated actions to be carried out depend-
ing on conditions local to the slot. It is analogous to the "demons" of
Selfridge (1959) or Charniak (1972). The second kind of attachment, called
PREDICATE, is used to specify how to fill a slot and is mentioned last.
These facets allow BAOBAB to:

a. Produceinferences. If the attribute of a clause that has just been built has
an attached procedure, it can trigger the building of another clause; for
example, INFERFEVER is run as soon as the temperature is known and
can lead to a clause such as "The patient is not febrile."
b. Narrowa range ~[" expected values. Consider, for example, the weight of a
patient. This has a priori limits, by default, of 0 and 120 kilograms. This
range is narrowed according to the age of the patient as soon as the
latter is known.
c. Make predictions. An event like "a lumbar puncture" can cause predic-
tions about "CSFdata" (not about their values, but about the fact that
620 Strategiesfor Understanding
StructuredEnglish

they will be mentioned). These predictions will be checked, and appro-


priate questions will be asked if they remain unfulfilled as the dialogue
proceeds.
d* Dynamically modify the grammar. A semantic category like <PATIENT>
can be updated by the name of the patient as soon as it is known. This
update is done by the procedure DEMONNAME as indicated in Figure
33-2.
e. Specify how to fill a slot. Sometimesa procedure expresses the most con-
venient way to match a category. This kind of procedure has been called
a "servant." For example, the best way to match a <VALUE> is to know
that it points to its corresponding <ATTRIBUTE>. This is much sim-
pler than examining the list of 500 values in the dictionary.

33.2.4 Default Values

BAOBAB
distinguishes among three kinds of default values:

a. Some parameters have default values that are negations of symptoms;


for example, TEMPERATURE has "98.6 F" as a default value (negation
of fever), and STATE-OF-CONSCIOUSNESS has "alert" as a default
value (negation of altered consciousness).
b. Other parameters depend on the result of a medical exam or procedure,
and in such cases the default value is simply "unknown." Pointing out
an unknown value to tile physician might remind him or her that the
procedure has in fact been carried out and that a result should have
been mentioned. An example of such a default value is that for the
parameter STATE-OF-CHEST,which depends on an x-ray.
c. Finally, some parameters inherit a value from another variable; fl)r ex-
ample, the date of a culture might reasonably be the date of admission
to the hospital (if the infection is not hospital-acquired).

Note that any default value assumed by the program is explicitly


stated. This feature allows the user to override the default value when in
disagreement with it (a mandatory feature because a default value might
be used later by the consuhation program and therefore be taken into
account in the fl)rmation of the diagnosis).

33.3 The Grammar

In a technical domain, where specialists write for specialists, terseness of


style is widespread (e.g., "T 101.4 rectal"). Thus a syntactic parsing does
not provide enough additional information to justify its use fi)r text corn-
The Grammar 621

prehension. Instead, a computer program can use a semantically oriented


grammar. This grammar makes the parsing process unambiguous and
therefore efficient. Discussions of" this point can be found in Burton (1976)
and Hendrix (1976).
BAOBABsparser uses a context-free augmented grammar [cf. the
augmented transition network of" Woods(1970)]. A grammar rule specifies
(1) the syntax, (2) a semantic verification of the parsed tree resulting
the syntactic component, and (3) a response expression used to build one
or several clauses. The grammar is divided into specific and nonspecific
rules.
Specific grammar rules are associated with the slots of schemata and
describe the way these can be mentioned at the surface level. Categories
used in the rules are things such as <PATIENT>, <SIGN>, and <DI-
AGNOSIS>.This link between the grammar and the schemata provides
a means to try, by priority, those grammar rules that are appropriate to
the schema under consideration. Furthermore, it provides a means to post-
pone the risk of combinatorial explosion due to the large number of gram-
mar rules (due to the specificity of the categories used in the productions).
Nonspecitic grammar rules use general concepts such as <ATTRI-
BUTE>, <OBJECT>, and <VALUE>, which are commonly used to rep-
resent knowledge in systems. This kind of rule is general enough to be
used in other domains; but once the syntax has been recognized, these
rules must undergo a semantic check in order to verify that, say, values
and attributes fit together, hence the importance of the augmentation of
the grammar mentioned above.
Specific grammar rules enable the system to recognize peculiar con-
structs. For example, "120/98" and "98 F" do not belong to well-known
syntactic classes but have to be recognized as values tbr blood pressure and
temperature. (;rammar rules such as

<VITAL> --, <BP> <HIGH/LOW>


<VITAL>
) --, <TEMP><TEMPNUM>I<TEMP><NUM>(DEGREES

are used to parse "BP 130/94" or "T 98 E" The category <TEMPNUM>
has an attached procedure, a specific piece of code that recognizes "F" as
Fahrenheit, detaches it from "98," verifies that 98 is a reasonable value for
a temperature, and finally returns "98 degrees" as the value of the tem-
perature.
The following are examples of the "syntax" of purely semantic rules:

<sentence> --, <patient> <experience> <symptom> <time>


<sympton~> -+ <modifier> <symptom>
<patient> --, patient I <name>
Strategies for Understanding Structured English

<name> ~ (the name of the patient, usually encountered at


the beginning of the text)
<experience> --, complain of I experience I <have>
<symptom>-~ headache [ malaise I chill I . . .
<modifier> -, severe I painful]...
<have> ~ has I had I ...
<time> ~ <num> <time-unit> ago [ on <date>
<time-unit> --, dayI weekI . . .
<num> -~ 112131...
<date> ~ a date recognized by an associated LISP function

This subset of the grammar enables the program to recognize inputs such
as the following:

1. NAPOLEON COMPLAINED OF SEVERE HEADACHE 3 DAYS AGO


2. BILL EXPERIENCED MALAISE ON SEPT-22-1978
3, JANE HAD CHILLS ON 10/10/78

Examples of purely syntactic rules are as follows:

<SENTENCE> ~ <NP> <VP>


<NP> --, <NOUN> [ <ADJ> <NOUN> I <DET> <ADJ>
<NOUN> J <DET> <NOUN> I...
<VP> --. <VERB> I <VERB> <NP> I <VERB> <PREPP>
<PREPP> --, <PREP> <NP>

where <NP> stands for noun phrase, <VP> for verb phrase, <DET>
for determiner, <PREPP> for prepositional phrase and <PREP> for
preposition. The set of rules enables the system to recognize input sentence
1 above (except for the notion of time), as shown in the syntactic tree
Figure 33-3.
Whenthe semantic component interprets such a syntactic tree, it
checks that <NOUN> is matched by a person (whereas the direct use of
<PATIENT>would make useless such a verification). Input sentences
such as the following would thus be rejected:

4. THE BOAT COMPLAINED OF HEADACHE


5. BILL COMPLAINED OF A SEVERE LEG

Numerous systems use a representation based on the notion of object-


The Grammar 623

<SENTENCE)

<NP> <VP>

I
<NOUN> <VERB> <PREPP>

<PREP> <NP>

<DET> <ADJ> <NOUN>


NAPOLEON COMPLAINED OF A SEVERE HEADACHE.

FIGURE33-3 A conventional syntactic tree.

attribute-value triples with an optional associated predicate-function. In


such domains, one can define grammar rules such as:

<SENTENCE> --, <OBJECT/ATTRIBUTE> <PREDICATE-


FUNCTION> <VALUE>

<OBJECT/ATTRIBUTE> --, <ATTRIBUTE> OF <OBJECT> I


<OBJECT> <ATTRIBUTE>

<OBJECT> --, PATIENT] CULTURE] ORGANISM ]...

<ATTRIBUTE> --, ISATTRIBUTE (attached procedure


specifying how to recognize an attribute)

<PREDICATE-FUNCTION> ~ <SAME> ] <NOTSAME> ]...

<SAME>-~ IS[HAS[...

<VALUE> ~ ISVALUE (attached procedure specifying how to


recognize the value of an attribute)

Such "syntactico"-semantic rules enable the recognition of input sen-


tences such as:

6. THE TEMI)ERATURE OF THE PATIENT IS 99


624 Strategiesfor Understanding
StructuredEnglish

7. THE MORPHOLOGY OF THE ORGANISM IS ROD

The complete form of the <SENTENCE>rule is displayed below.


The first line is the syntax, the second is the augmentation, and the third
is the response. CHECKAV (check attribute value) is a function of two
arguments, <ATTRIBUTE>and <VALUE>, that returns "true" if the
value matches the attribute, in which case the response expression is pro-
duced; otherwise, the semantic interpretation has failed.

((<OBJECT/ATTRIBUTE> <PREDICATE-FUNCTION> <VALUE>)


((CHECKAV <ATTRIBUTE> <VALUE>)
(LIST <PREDICATE-FUNCTION> <ATTRIBUTE> <VALUE>)))

It is interesting to note that the predicate function is usually a verb phrase,


and the <ATTRIBUTE>OF <OBJECT> sequence a noun phrase, as is
<VALUE>. This means that a syntactic structure is being implicitly used.
The interpreter progresses in a left-to-right and top-down fashion,
with backtracking. Whenevera grammar rule is satisfied but a part of the
input remains to be analyzed, the remaining part is given back to the
control structure, which then can invoke special processes; for example, a
conjunction at the head of the remaining input can trigger an attempt to
resolve it as an elliptical input. Thus in "ENGLISHPEOPLE LOVE
BLONDSANDDRINKTEA," the second part can be analyzed as "English
people drink tea." The algorithm implemented for handling elliptical in-
puts has been inspired by LIFER (Hendrix, 1976). Whenan input fails
be recognized, the interpreter assumes that a part of the input is missing
or implicit, and it looks at the preceding utterance. If parts of the input
match categories used in the grammar rule satisfied by the earlier input,
it then assumes that the parts that have no correspondence in the new
input can be repeated.

33.4Schema-Shift
Strategies

A language describing choices between schemata, and therefore schema-


shift strategies, should include an attempt to answer the following ques-
tions: Howis a schema focused, confirmed, or abandoned? What are the
links between schemata (such as exclusive or sequencing relations)?

33.4.1 Suggest vs. Confirm

Bullwinkle makes the distinction [Bullwinkle (1977); see also Sidner (1979)]
between potential and actual shifts of focus, pointing out that the cues
Schema-Shift Strategies 625

suggesting a new frame must be confirmed by a subsequent statement in


order to aw)id making unnecessary shifts. This phenomenon is handled
in a different fashion in BAOBAB. Instead of waiting for the suggestion
to be confirmed, a qualitative distinction is made between the slots of a
fiame. The ones marked as suggesting but not confirming are regarded
as weak clues and will not lead to a shift of focus, whereas the ones marked
as confirming (hence suggesting) are sufficiently strong clues to command
the shift. This distinction can be illustrated by the following two examples:

1. "The patient was found comatose. She was admitted to the hospital. A
lumbar puncture was performed. She denied syncope or diplopia..."
2. "The patient was found comatose. He was admitted to the hospital. The
protein from CSF was 58 mg%..." (CSF = cerebrospinal fluid)

In Example 1, the lumbar puncture suggests CSFresults that are not given
(weak clue). In Example 2, a detail of CSFresults (strong clue) is given
directly ("the protein"). In other words, the physician jumps into detail,
and the frame is directly confirmed.

33.4.2 Top-down vs. Bottom-up

Sometimesthe schema is explicitly announced, as in "results of the culture."


This is a name-driven invocation of the schema. More often, the instantia-
tion of the schema is content-driven. The clues used are the attributes
associated with the schema, their expected values (if" any), and other con-
cepts that nfight suggest the frame. For example, "skin" is related to "rash,"
which belongs to the physical exam frame. These are indeed very simple
indices. Research on more sophisticated methods for recognizing the rel-
ewmt schema, such as discrimination nets, have been suggested (Charniak,
1978).

33.4.3 Termination Conditions

A simple case in which a schema can be terminated is when all of its slots
have been filled. This is an ideal situation, but it does not occur very often.
Another case is when the intervention of a schema implies that another
schema is out of focus, which could be, but is not necessarily, the result of
chronological succession. In general, this phenomenon occurs when the
speaker actually starts the plot after setting the characters of the story.
There is no standard way to decide when the setting is finished. However,
as soon as the story actually starts, the setting could be closed and possibly
completed with default values or with the answers to questions about what-
ever was not clear or onfitted. A TERMINATED-BY slot has been created
626 Strategiesfor Understanding
StructuredEnglish

to define which schemata can explicitly terminate others; for example, the
$SYMPTOM schema usually closes the $DESCRIPT schema (name, age,
sex, race), as it is very unlikely that the speaker will give the sex of the
patient in the middle of the description of" the symptoms. This fact is due
to the highly constrained nature of the domain.

33.4.4 Termination Actions

Whena schema is terminated, the program infers all the default values of
the unfilled slots. It also checks whether the expectations set during the
story have been fulfilled. These actions can be performed only when a shift
has been detected or at the end of the dialogue; otherwise, the program
might ask too early about information that the user will give later. In the
case where a schema has been exhausted (all its slots filled), an a priori
choice with regard to the predicted next schema is made. This choice is
possible by using a PREFERABLY-FOLLOWED-BY pointer that, in the
absence of a bottom-up (data-driven) trigger for the next schema, decides
in a top-down fashion which schema is the most probable to follow at a
given point.

33.4.5 Schema-Grammar Links

Specific grammarrules described earlier are always associated with clinical


parameters and therefore with schemata. This link is interesting from two
points of" view:

a. The interpreter takes advantage of this relationship to try specific rules


in order of decreasing probability of relevance to the schema currently
in focus. There is no quantitative notion of probability, but the preferred
sequencing causes the trial according to priority not only of grammar
rules associated with the activated schema, but also of the ones of the
preferred successor, in case an unforeseen shift occurs. Rules are reor-
dered whenever a schema-shift occurs, which explains why the more
disorganized presentations of" a text take longer to be parsed.
b. The parser can examine the content of a schema during the semantic
interpretation of" an input. For example, it can check the correspon-
dence of an attribute and a value. It can also trigger a question whose
answer is needed to interpret the current input. Therefore, there is a
two-way connection between schemata and the grammar. This link is
one of the key ideas underlying the interactive behavior of the program.
Schema-Shift
Strategies 627

33.4.6 Comparison with Story-Grammars

Other methods have been proposed to take advantage of the coherent


structure of texts. Psychologists and linguists have attempted to draw a
parallel between the structure linking sentences within a text and the struc-
ture linking words within sentences. The notion of story-grammars, or text-
grammars, grew out of this analogy, leading to the representation as con-
text-free rules of" the regularities appearing in such simple texts as fables.
Rumelhart (1975) describes a story as an introduction followed
episodes. An episode is an event followed by a reaction. A reaction is an
internal response followed by an overt response, etc. A simple observation
supporting the parallel is that two sentences in sequence usually bear some
kind of relation to each other (often implicit); otherwise, the juxtaposition
would be somewhat bizarre. Recognizing a paragraph as a sequence of
sentences "at a syntactic level" leads to building a tree structure that may
be further used by a semantic component.
The limits of the analogy between phrase structure and text structure
can be easily ascertained. Winograd (1977) underlines the limits of a gen-
erative approach by pointing out that "there are interwoven themes and
changes of scene which create a much more complex structure than can
be handled with a simple notion of constituency." Furthermore, even if one
can give an exhaustive list of words satisfying <NOUN>, it is difficult to
determine how to match a <CONSEQUENCE>or an <OVERT-RE-
SPONSE>.It follows that whether or not the process of a grammar rule
has been satisfied is not easy to define. Even if we can predict that a de-
terminer will precede an adjective or a noun, it is much more difficult to
foresee that an emotion will be followed by a reaction, or at least not with
the same regularity. It also seems that the "syntactic" category of a phrase
is strongly domain-dependent. A given sentence may be a consequence or a
reason according to the context. This phenomenon occurs less frequently
with traditionally syntactic categories.
In addition, flashbacks are commonlyused when people tell stories.
In particular, a consequence might very well precede an explanation of an
event. Chronological order is not often respected, as in "Van Gogh had
difficulties to wake up. He had drunk a lot the night before." Along the
same lines, elliptical phenomena(incomplete inputs) seem difficult to re-
solve; if" one can determine the missing part of a sentence by reference to
the syntactic structure of the preceding sentence, it is not easy to guess the
nonstated event that has caused a reaction. The "syntactic" categories of
text-grammars correspond more or less to schemas. The model defined in
BAOBAB merely defines a partial ordering, or links of a preferred order-
ing between schemata. It follows that the introduction may be absent or that
signs may precede symptoms without the text being regarded as incoherent.
Violations of the idealistic model only cause requests for clarification or
additional information. They make the comprehension process more dif-
ficult but do not halt it.
628 Strategies for Understanding
StructuredEnglish

33.5Direction for Future Work

The BAOBAB grammar is currently not very large (about 200 rules), and
only seven schemata have been implemented. In order to be able to parse
more complicated texts efficiently, ones inw~lving symptoms that might
imply different infections (with interactions between them), a computer
program will need more sophisticated clues to determine which schema is
the most appropriate to apply and in what priority. Furthermore, certain
concepts are currently ignored because their relevance to medical knowl-
edge is not always straighttorward. For example, an infection acquired by
a memberof the patients family or even the patients occupation could
give hints to the physician during the early stages of the diagnostic tbr-
mulation, but these are not currently used.

33.6Sample Sessions

The tbllowing examples demonstrate BAOBABs capabilities as a natural


language interface for describing cases to MYCIN.The physician types
the input in upper case after the double asterisks; BAOBABs responses
are in lower case. An actual summary written by a physician has been
chosen. It is slightly simplified, broken downinto inputs, and given as a
dialogue to facilitate the interactions. A few italicized commentshave been
inserted in parentheses to clarify points. Namesof schemata start with the
symbol $.
In tile first example, the exposition of the summarydoes not satisfy
the model (mainly because expectations are not fulfilled), which therefore
causes interventions from the program. Questions are asked, analogous to
the way in which MYCINasks them. In the other two examples, the model
is satisfied; the dialogue follows a normal course.

33.6.1 First Example

**A 19 YEAR
OLDBOYWITHPETECHIAL
RASH,
LETHARGY
AND
POSITIVE
BRUDZlNSKIS
SIGN.
Patients
name?
(The $DESCRIPTschemahas been activated by the beginningof the input.
Thenthere is a schema-shift which generates the request for the missing name.)

** BILL

Myunderstanding
is:
Sample Sessions 629

Thenameof the patientis Bill


Theageof the patient is 19 years
Thesex of the patient is male
Petechialis oneof the typesof rashwhichthe patienthas
Lethargyis oneof the symptoms that the patient had
Brudzinskiis oneof the signsthat the patient showed
"" HEHADAN LP
Myunderstanding
is:
A lumbarpuncturehas beenperformedon the patient
** BYE
(Unfulfilled expectations
trigger requestsfor CSFresults.)
Whatis the total WBC
count of the CSF?
*" 9500
Whatpercent of the WBCs
in the CSFwerepolys?
** 70
Whatis the protein value(mg/100cc)
in the CSF?
** 32
Whatis the glucosevalue(milligramspercent)in the CSF?
*12

33.6.2 Second Example

** REASONS
FORADMISSION:CHILLS HEADACHE
MALAISE (sets up expectations)
SoI will expectchill headache
malaiseas valuesfor symptoms
DATEOF ADMISSION:
DECEMBER
25 1978
Myunderstanding is that:
the patient wasadmittedat the hospital 4 daysago
*" HISTORYOF THEPRESENT
ILLNESS (announcesthe $1DENTIFICATION
schema)
Ok.
** PATIS A 26 YEAROLDCAUCASIAN
FEMALE
Myunderstanding is that:
the ageof the patientis 26years
the raceof the patientis Caucasian
the sexof the patientis female
the nameof the patient is Pat
*ONEDAY PRIORTO ADMISSIONSHE EXPERIENCED (shift to $SYMPTOMS)
CHILLS ANDHEADACHE
Myunderstanding is that:
chill is oneof the symptoms
that the patient had5 daysago
headache is oneof the symptoms
that the patient had5 daysago
(time inferences)
* ONEDAYAFTERSHEHADA MALAISE
Myunderstanding
is that:
630 Strategies for Understanding Structured English

malaiseis oneof the symptoms


that the patient had4 daysago
** A LUMBARPUNCTURE
WASPERFORMED (weakclue to $LABDATA)
Myunderstanding
is that:
a lumbarpuncturehasbeenperformed
on the patient
** SHE DENIEDSYNCOPE
OR PHOTOPHOBIA ($SYMPTOMS
and $SIGNS)
Myunderstanding
is that:
syncope
is not oneof the symptoms
that the patient had
photophobia
is not oneof the symptoms
that the patient had
** SOCIALHISTORY:OCCASIONAL
ALCOHOL
USE
Doyoususpectrecentalcoholichistory in Pat? (clarification)
** NO
Myunderstanding is that:
the patientis not analcoholic
** PHYSICALEXAM (announces$EXAM)
Ok.
** WEIGHT110 POUNDS
Myunderstanding is that:
the weightof the patient is 49.9 kgms
** TEMP101 DEGREES
P 90 BP 110/75 RR 28
Myunderstanding is that:
the temperatureof the patientis 101
the pulseof the patientis 90
the bloodpressureof the patient is 110/75
the respiratoryrate of the patientis 28breaths/minute
** SKIN NORMAL (leadsto a clausevia inference)
Myunderstanding is that:
the patient doesnot havea rash or cutaneous
lesions
** NECK:MARKED
NUCHAL
RIGIDITY (anotherinference)
Myunderstanding
is that:
meningismus
is oneof the signs that the patient showed
** SRUDZINSKIS
SIGN WASPOSITIVE
Myunderstandingis that:
Brudzinskiis oneof the signsthat the patient showed
** LABDATA (announcesSLABDATA)
Ok.
** CSFRESULTS (sets a context)
Ok.
** WBC9500 (understood
in context)
Myunderstanding
is that:
the WBC
from the CSFis 9500
** PROTEIN
56
Myunderstanding is that:
the CSFprotein is 56
** GLUCOSE
45 MG%
Myunderstanding
is that:
the CSFglucosevalueis 45
Sample Sessions 631

** PERIPHERAL
CBC (sets anothercontext)
Ok.
** WBC
23.5 (understood
in context)
Myunderstanding is that:
the whitecountfromthe patientsperipheral
CBC(in thousands)is 23.5
** THEVALUE
OF WBC
IS 22.5 (correctsa mistyping)
Insteadof 23.5? (makingsure that it wasmeant)
"* YES
Mynewunderstanding is that: Thewhite countfrom
the patients peripheralCBC(in thousands)
is 22.5
** BYE
I haveassumed the following by default: (inferencesby default)
the stateof consciousnessof the patientis alert
the patientsX-rayis not abnormal
the patient is not immunosuppressed
Doyou agree?
** THEPATIENTIS IMMUNOSUPPRESSED (overridesa default value)
Mynewunderstanding
is that: Thepatient is immunosuppressed
Doyou agree?
** YES
bye.

33.6.3 Third Example

** REASONS
FORADMISSION:HEADACHE,
LETHARGY
ANDFEVER
So I will expect HEADACHE
LETHARGY
FEBRILEas symptoms
** DATEOF ADMISSION
ONMARCH
14
Thatwas6 daysago.
Myunderstanding is
the patient wasadmittedat the hospital 6 daysago
*" BARBARA
IS A 3 YAEROLDWHITEFEMALE
= YEAR (automaticspellingcorrection)
Myunderstanding is
the ageof the patientis 3 years
the raceof the patientis Caucasian
the sexof the patientis female
the nameof the patient is Barbara
** ONEDAY PRIORTO ADMISSION,SHEBECAME
LETHARGIC
Myunderstandingis
lethargyis oneof the symptoms
that the patient had7 daysago
** SHE COMPLAINED
OF HEADACHE
ANDHADA TEMPERATURE
OF 101 DEGREES
Myunderstanding is
headacheis oneof the symptoms that the patient had
the temperatureof the patientis 101
thepatientis febrile
632 Strategies for Understanding Structured English

** A PHYSICIANOBTAINEDA COMPLETE
BLOODCOUNT
Myunderstanding is
the results of a recentCBC
of the patientare available
** A LUMBARPUNCTURE
WASPERFORMED
Myunderstandingis
a lumbarpuncturehasbeenperformedon the patient
** WEIGHT
39 LBS
Myunderstanding is
the weightof the patientis 17.7 kgms
** HEIGHT
1.03 METER
Is the patientreally 103.0centimeters
tall? (askingfor confirmationbecause
that seems
tall for 3 years)
(Y or N) ** YES
Myunderstanding is
the heightof the patient is 103cms
** BP120/70P 140
Myunderstandingis
the bloodpressureof the patient is 120/70
the pulseof the patientis 140
** RR28 T 101 (redundancy,the temperaturewasalready known)

Myunderstanding is
therespiratoryrate of the patientis 28
** SKIN: PETECHIAE
ONFACEANDSHOULDERS
Myunderstanding is
petechialis oneof the typesof rashwhichthe patient has
** NECK:DEFINITEMENINGISMUS
Myunderstanding
is
meningismus
is oneof the signs that the patient showed
** LABDATA:CSFRESULTS (announces
lab results; no clauseis built)
Ok.
** WBC110 WITH96%POLYS (interpretationin the contextof CSF)
Myunderstanding is
the WBC from the CSFis 110
the percento! PMNsin the CSFis 96
** PROTEIN25 MG%ANDGLUCOSE
95
Myunderstanding is
the CSFprotein is 25
the CSFglucosevalueis 95
** PERIPHERAL
CBC
Ok.
** WBC12300
Is the whitecountfromthe patients peripheralCBC(in thousands)really 12300?
(the physicianforgot that a number
of thousands
wasexpected)
** NO
Pleaseenter the right valuenow
** 12.3
Myunderstanding
is
Conclusion 633

the white countfromthe patients peripheralCBC(in thousands)


is 12.3
** WITH76%POLYSAND1% BANDS
Myunderstanding is
the percentof PMNsin the CBC
is 76
the percentof peripheralWBCs
whichare immature
in the CBCis 1
* SIMULTANEOUS
BLOOD
GLUCOSE:
170
Myunderstandingis
the simultaneous
bloodglucosevalueis 170
** BYE
I haveassumed
the followingby default:
the state of consciousness
of the patientis alert
the patient has not shownsymptoms of mumps
it is unknown whetherthe patients X-rayis abnormal
the patient doesnot havea rash or cutaneous lesions
the patient doesnot haveevidenceof ocularnervedysfunction
the patientis not analcoholic
the patient is not immunosuppressed
it is unknown whetherthe patient doeshavea TBrisk factor
the patient is not a compromised host
the patient doesnot live in a crowded environment
the patient hasnot beenrecentlyexposed to a contagiousdisease
the patient hasnot undergone surgery
the infection wasnot acquiredwhile the patientwashospitalized
the patient hasnot beenseriouslyburned
thereis not evidence of congenitalmalformation involvingthe central nervoussystem
thepatientis not a diabetic
the patientis not a prematureinfant
it is unknown whetherthe patient hashadaninjury or insult to, or defectin the centralnervoussystem
Youcanchangeanyof themif youdisagree
** SHE HAS UNDERGONE
SURGERY
I suppose
it shouldreplacethe patient hasnot undergone
surgery
Right?
** YES
Ok.
"" GOOD-BYE

33.7 Conclusion

The strategies outlined in this chapter could be applied to a broad range


of structured texts. The approach rests on the assumption that the texts
scenarios can be seen as sequences of" episodes, identifiable by the program,
in order to be integrated into appropriate schemata. Therefore, clustering
attributes into framelike structures must make sense in the domain of
application. The episodes could simultaneously refer to several schemata;
that is, the associated schemata could have slots in common.Furthermore,
it should be possible to define partial-ordering links between schemata.
634 Strategiesfor Understanding
StructuredEnglish

The relationships could be rather loose, but the more constrained they are,
the better this feature would work.
Expert systems usually need some kind of understanding to commu-
nicate in natural jargon with their users (expert, consultant, and/or stu-
dent). The technique described here--breaking the knowledge down into
schemata that correspond to different pieces of texts, associating semantic
grammar rules with the schemata, and using strategies fi)r recognizing
episode shifts--should be generally applicable in such domains.
34
An Analysis of Physicians
Attitudes

Randy L. Teach and Edward H. Shortliffe

Despite the promise of medical computing innovations, many health care


professionals have expressed skepticism about the computers role as an
aid to clinicians. A number of barriers have been noted. For example,
Friedman and Gustafson (1977) have suggested that system designers tend
to develop systems that are neither convenient for physicians nor respon-
sive to their needs. Glantz (1978) has questioned the trade-off in costs and
benefits for most medical computing applications, including computer-
assisted consultations. Schwartz (1970) has noted that physicians are wary
of formal decision aids because they perceive such tools to be a threat to
their jobs and to their professional stature. He has also suggested that
physicians are concerned about their ability to learn how to use computer
systems (Schwartz, 1979), but that they simultaneously fear the prospect
of being "left behind" if they fail to keep current. Other observers (Eisen-
berg, 1974; Weizenbaum, 1976) have questioned the role of computers as
clinical consultation systems, suggesting that computer-based consultants
may be an inappropriate use of computing technology that will inevitably
degrade and debase the human function.
Observations such as these are generally based on personal experience
without benefit of formal studies of physicians attitudes. The few available
studies have sought physicians opinions regarding computing technology
in general, but have tended not to specifically examine attitudes regarding
the clinical introduction of computers. One early study (Mayneet al., 1968)
found little physician interest or faith in the role of computingtechnology.
However, Startsman and Robinson (1972) and others (Day, 1970; Resnikoff

This chapter is based on an article originally appearing in Computers in Biomedical Research


14:542-558 (December 1981). Copyright 1981 by Academic Press. All rights reserved.
Used with permission.

635
636 AnAnalysisof PhysiciansAttitudes

et al., 1967) have reported supportive physician attitudes. A follow-up to


the Startsman and Robinson study by Melhorn and coworkers (1979) pro-
duced almost identical results, but also noted that physicians might be
reluctant to accept the clinical use of computing technology.

Motivation for the Current Study

Our study was motivated by the belief that the future of research in medical
computing, particularly the development of computer-based consultation
systems, depends on improving our understanding of the needs, expec-
tations and performance demands of clinicians. The previous studies had
not specifically addressed these issues. Our study used a questionnaire,
similar in format to the instrument developed by Startsman and Robinson
(1972) but different in content. One modification was to limit the scope
our survey by focusing only on physicians attitudes regarding clinical con-
sultation systems. Previous studies had been more general in their focus
and had surveyed a broader range of opinion. Wechose this more limited
focus because several research groups currently developing medical con-
sultation systems are concentrating on physician users and have recognized
the need for better information about the concerns and performance de-
mands of clinicians. Another change was the inclusion of statements de-
signed to ascertain the performance capabilities that physicians consider
necessary for a consultation program to be clinically acceptable. Previous
studies had not addressed this important aspect. Wehoped that with these
modifications the study would yield results from which guidelines could
be formulated to help medical computing experts design more acceptable
clinical consultation systems.

Relationship Between Physicians Characteristics and


Attitudes

A second objective of the study was to test the commonassumption that


prior experience with computers affects attitudes about the clinical use of
computing technology. Wetherefore included measures of both computing
experience and knowledge of computing concepts in the questionnaire. A
number of other demographic variables were also included.

Impact of a Medical Computing Course on Attitudes

A third objective was to assess the impact of an intensive medical comput-


ing course on physicians attitudes. The authors of both of the previous
major studies (Startsman and Robinson, 1972; Melhorn et al., 1979),
well as others (Levy, 1977), had speculated that intensive educational ef-
Methods 637

forts might result in increased acceptance of medical computing by phy-


sicians. Partly to test this assumption, we designed a medical computing
tutorial and measured its impact on the attitudes of the physician atten-
dees. l The tutorial faculty consisted of 15 physicians and computer sci-
entists who are active researchers in the development of computer-based
clinical consultation systems. Presentations encompassed the researchers
work, goals, and perspective on the role of computer-assisted decision mak-
ing in clinical medicine. An introductory session was included to introduce
physicians to general computing concepts and terminology.

34.1Methods
34.1.1 Instrument

A survey instrument (questionnaire) was developed to measure physicians


attitudes regarding computer-based consultation systems. Attitudes were
measured by the instrument along three dimensions: (l) the acceptability of
different medical computing applications; (2) expectations about the effect
of computer-based consultation systems on medicine; and (3) demands re-
garding the performance capabilities of consultation systems. Every effort
was made to include items representative of the design issues that are
currently being considered by medical computing experts. We performed
extensive pilot testing of the questionnaire prior to its use in the study.
Acceptance was measured by asking physicians about eight real or
imagined medical computing applications. The applications ranged from
computer-based medical records to the use of computers as substitutes for
physicians in underserved areas (Table 34-1). The Expectation- and De-
mand-scales included statements about medical computing, emphasizing
the potential role of computer-based consultation systems. Each statement
used a Likert-type scale in which respondents were instructed to mark one
of five categories: (1) strongly disagree, (2) somewhatdisagree, (3) not
(4) somewhatagree, (5) strongly agree.
The Expectation-scale (E-scale) included 17 statements and was de-
signed to measure physicians opinions about how computer-based con-
sultations are likely to affect the practice of medicine (i.e., howcomputers
will affect medical practice). ~ The Demand-scale (D-scale) of 15 statements

VIhe tutorial was oftcred by tile Departments of Medicine and Computer Science at Stanford
University in Atugt,st of 1980. h was organized in conjunction with the Sixth Annual Work-
shop on Artificial Intelligence ill Medicine, which was sponsored by the Division of Research
Resources of the NIll.
ZThe statements are shownin "fable 34-3. For identification purposes in this paper, each is
identified by an E followed by a ntunher. The letter E denotes that the statement belongs to
the Expectation-scale.
638 AnAnalysisof PhysiciansAttitudes

sought physicians opinions regarding the most desirable performance ca-


pabilitiea for computer-based consultation systems (i.e., what computers
should be able to do). ~ The possible range of ratings for statements on both
the E- and D-scales is -2 to + 2. On the E-scale a positive rating means
that respondents felt that the stated effect is not likely to occur, and a
negative rating meansthat they feh that the effect is likely. On the D-scale
a positive rating means that the item was judged to be an important ca-
pability for computer-based clinical systems, and a negative rating means
that it is judged to be unimportant.
A set of background questions was also included on the questionnaire.
These included items about medical specialty, type of practice (academic
medicine or private practice), number of years since receiving the M.D.
degree, percentage of time devoted to research, and extent of prior ex-
perience with computers. All questions in this group contained fixed re-
sponse categories. A second set of 22 questions asked respondents to in-
dicate their (self-reported) level of knowledge about computers and
computer science concepts.

34.1.2 Participants

Two samples of physicians were included in the study. One included reg-
istrants for the tutorial mentioned above. The 85 physicians who filled out
the questionnaire represented 90% of the physicians registered for the
tutorial. Twenty-nine nonphysician attendees who were engaged in either
basic medical research or medical computing also returned survey forms.
By announcing that the course was appropriate for physicians with
little or no knowledge of medical computing, we hoped to attract a cross
section of physicians. Although continuing medical education (CME)credit
was also available, we were aware that the backgrounds and attitudes of
these physicians might contrast with those who chose not to attend the
tutorial. Therefore, a second sample of physicians was selected from Stan-
ford Medical School clinical facuhy and from Stanford-affiliated physicians
practicing in the surrounding community.

34.1.3 Procedure

The questionnaire was included in the preregistration packet that was


mailed to all tutorial registrants approximately one month before the
course. A cover letter asked respondents to complete and return the ques-
tionnaire as soon as possible so that the results could be used to guide the

3TheDemand-scale
statementsare shownin Table34-5. Eachstatementis identified by a D
followedby a number.
Results 639

speakers presentations. At the end of the tutorial, participants were asked


to complete the same questionnaire for a second time. A respondent-
selected code number facilitated matching of pretutorial and posttutorial
results. To encourage open and unbiased responses, the respondents were
assured of anonymity.
The second sample, stratified by medical specialty, was randomly se-
lected from the roster of Stanford Medical School faculty and affiliated
community physicians. These individuals, 57 faculty members and 92 af-
filiated physicians, received a questionnaire with a cover letter requesting
their help with the research study and assuring them of anonymity. The
letter also invited them to participate in the tutorial and instructed them
to return the registration form instead of the questionnaire if they wished
to do so. None chose to register. 4 A follow-up letter was sent to the entire
149-member sample three weeks after the original mailing to maximize
questionnaire return. Sixty-one questionnaires of the original 149 were
eventually returned (41%).
Nonparametric Chi-square analysis was used to compare the tutorial
and nontutorial samples. Reliability of the attitude scales was determined
on a subsample of ten subjects (Cronbach, 1970). Internal consistency
the scales was calculated by correlating odd and even items and correcting
the resulting correlations using the Spearman-Brown formula (Cronbach,
1970). Means and standard deviations were computed for each of the in-
dividual statements included on the three attitude scales. The Expectation-
and Demand-scales were subjected to factor analysis to identify meaningful
subgroupings of statements. Principal factoring with iteration was em-
ployed (Nie et al., 1975). Simplification of the factor structure was obtained
by oblique rotation with delta set equal to zero. Analysis of variance was
used to compare the attitudes of physicians with different backgrounds
and knowledge of medical computing. Analysis of variance was also used
to compare pretutorial and posttutorial ratings.

34,2Results
34.2.1 Characteristics of Physicians Studied

The final sample of 146 physicians included subsamples of 85 tutorial


participants and 61 physicians who were associated with Stanford Univer-
sity Medical Center but who chose not to participate in the tutorial (control
group). Of the combined sample, 43% were in medical fields (internal

~AII recipients had also received an initial announcementfor the course several weeks earlier,
and none had registered in response to the initial mailing.
64O AnAnalysisof PhysiciansAttitudes

medicine, family practice, pediatrics, general practice), 27% were from


surgical fields (general surgery, surgical subspecialties, obstetrics/gynecol-
ogy, anesthesiology), and 30%were from other specialties (primarily ra-
diology and pathology). There was no significant difference between the
two subsamples (Chi-square = 5.16, p > .05).
Of the combined sample, 44% were academicians, 45% were in private
practice, and 11% were Stanford house staff. 5 Differences between the
subsamples (Chi-square = 6.28, p < .01) were due to the separation of the
house staff group from the academic subgroup. A separate analysis of
house staff responses to the questionnaire items revealed that they had
response patterns almost identical to those of the academicians. Incorpo-
ration of the house staff into the academic category resulted in comparable
frequencies for the attendees and controls (Chi-square = 4.93, p > .05).
Of the combined sample, 31% had fewer than 10 years of experience
since graduating from medical school, 22% 10 to 20 years, and 47% more
than 20 years. Differences between the attendees and controls were not
significant (Chi-square = 3.24, p > .20). While 43%of subjects reported
that they devoted no time to research, 27% devoted less than a third of
their time, and only 30% devoted more than a third of their time to re-
search. The difference between attendee and control groups was not sig-
nificant (Chi-square = 5.73, p > .05). Finally, 46% reported no computing
experience, 32%had had some experience (i.e., at least running "canned"
computer programs), and 22% reported extensive experience including
the design of computing systems. There was no significant difference be-
tween the tutorial attendees and the controls (Chi-square = 3.17, p > .20).

34.2.2 Acceptance Ratings

The options for the Acceptance question are shown in Table 34-1. Physi-
cians had an average Acceptance rating of" 5.5 application~, out of the 8 .
included on the scale. The table shows that support for th~ 5 major ap-
plications exceeded 80% of respondents.
Medical speciality was the only characteristic that was significantly pre-
dictive of a respondents Acceptance of computing applications. Table
34-2 shows that surgeons were less accepting of medical computing appli-
cations than either of the other two subgroups. There was no significant
difference in the Acceptance rating between tutorial and nontutorial par-
ticipants, private practice and academic physicians, those with several years
in practice and those who had recently graduated, physicians engaged in
research and those who were not, or physicians with and without comput-
ing experience.

5All house-staffsubjects weretutorial attendeesrather than members


of the control group.
641

~xx
642 AnAnalysisof PhysiciansAttitudes

TABLE34-2 Scheffe Comparisonof Acceptance


Ratings for Subgroupsof MedicalSpecialists
Standard
Specialty Mean deviation Significance
1. Medical 6.03 1.55
1 vs. 2~p<.01
2. Surgical 4.35 1.82
2vs. 3~p<.01
3. Other 5.67 1.84
Total 5.45 1.84

34.2.3 Expectation Ratings

Table 34-3 displays the ratings and standard deviations for each statement
on the Expectation-scale. The statements are listed in order of their av-
erage ratings, from those outcomes that physicians thought were the most
likely to occur to those that were expected to occur less frequently. The
average Expectation rating for physicians was slightly positive (X = .42).
This was comparable to that of the nonphysician sample, shown in the
right-hand column. Only 3 of the 17 statements received negative ratings
(i.e., were judged likely to occur), including fears about the possibility that
consultation systems will increase government control of medicine, con-
cerns that systems will increase the cost of care, and expectations that pa-
tients will blame the computer program for ineffective treatment decisions.
On the other hand, physicians felt strongly that consultation systems would
neither interfere with their efficiency nor force them to adapt their think-
ing to the reasoning process used by the computer program. They also
felt that the use of consultation systems would not reduce the need for
either specialists or paramedical personnel.
Subgroups of physicians displayed significant differences in their Ex-
pectations about how computer-assisted consultations will affect medical
practice. The means and standard deviations for all the significant findings
are summarizedin "Fable 34-4. A significance level of .01 was used for each
analysis in order to maintain an overall significance level of less than .06.
The Expectations of tutorial registrants were on the average more positive
than those of the nontutorial group, although neither group thought that
consultation programs would adversely affect medical practice. Physicians
in academic settings and those in training indicated overall positive Ex-
pectations, whereas private practice physicians tended to hold slightly neg-
ative Expectations. Young doctors expressed more positive Expectations
than did physicians with 10 to 20 years of experience, although the recent
graduates were no more positive than physicians with at least 20 years
experience. Experience with computers was positively related to Expecta-
tions, as was Knowledge about computing concepts.
Results 643

TABLE34-3 Means and Standard Deviations (in Parentheses) for Ratings


Expectation Statements
Physicians Nonphysicians
n = 146 n = 29
El. Will increase government control of -.26 .15
physicians practices (1.23) (.95)
E2. Will be blamed by patients for errors in -.23 -.30
management (1.15) (1. i0)
E3. Will increase the cost of care -.14 .44
(1.07) (1.09)
E4. Will threaten personal and professional .02 .50
privacy (1.41) (1.45)
E5. Will result in serious legal and ethical .32 -.04
problems (e.g., malpractice) (1.06) (.98)
E6. Will threaten the physicians self-image .32 .15
(1.23) (1.01)
E7. Will be hard for physicians to learn .34 .85
(1.17) (.95)
E8. Will result in reliance on cookbook medicine .43 .92
and diminish judgment (1.34) (1 ~ 14)
E9. Will diminish the patients image of the .45 .74
physician (1.16) (1.10)
El0. Will be unreliable because of computer .51 1.07
"maltunctions" (1.09) (.83)
E11. Will dehumanize medical practice .53 1.04
(1.34) (1.09)
El2. Will depend on knowledge that cannot easily .53 1.00
be kept up to date (1.20) (1.00)
E 13. Will alienate physicians because of electronic .62 .41
gadgetry (1.03) (1.08)
E14. Will force physician to think like computer .73 1.19
(1.15) (1.00)
El5. Will reduce the need for paraprofessionals .83 .82
(.91) (1.08)
El6. Will reduce the need for specialists .99 1.11
(1.07) (1.09)
El7. Will result in less efficient use of physicians 1.05 1.56
time (.84) (.58)
Total scale = .42 .68
644 AnAnalysisof PhysiciansAttitudes

TABLE 34-4 Scheffe Comparisonsof Expectations for Physicians with


Different Characteristics
Standard
Characteristic Groups Mean deviation Significance
Totals .41 .59
Professional 1. Academic .55 .58 1 vs. 2 ~ p < .01
orientation 2. Private .22 .59 3 vs. 2 ~ p < .01
3. Training .64 .48
Clinical 1. < 10 yrs. .59 .52
experience 2. 10 to 20 yrs. .18 .54 1 vs. 2 ~ p < .01
3. > 20 yrs. .39 .63
Computing 1. Little or none .24 .62 1 vs. 3 --, p < .01
experience 2. Moderate .50 .58
3. Extensive .63 .47

34.2.4 Demand Ratings

Table 34-5 depicts statements on the Demand-scale, ordered from most to


least important according to the average rating each received. Physicians
Demandswere significantly less than those of the nonphysicians, although
the ranked ordering of each Demandstatement was almost the same for
the two groups. A systems ability to explain its advice was thought to be
its most important attribute. Second in importance was a systems ability to
understand and update its own knowledge base. Improving the cost effec-
tiveness of tests and therapies was also important. Physicians did not think
that a system has to display either perfect diagnostic accuracy or perfect
treatment planning to be acceptable. On the other hand, they would not
accept the use of a consultation system as a standard for acceptable medical
practice, nor would they recommend reducing the amount of technical
knowledge that physicians have to know just because a consultation system
is available. The differences found among physician subgroups on the
Expectation-scale were not evident on the Demand-scale.
A test-retest reliability coefficient of r = .94 was obtained across two
administrations of the three scales: Acceptance, Expectations, and De-
mands. The split-half reliability for the D-scale was only r = .70, and that
of the E-scale was r = .83. These rather modest split-half reliabilities sug-
gested to us that the scales were measuring more than one aspect of phy-
sicians attitudes. In order to better understand the structure of physicians
attitudes measured, these scales were subjected to factor analysis. Five ma-
jor groups of statements (factors) were extracted from the combined scales
and are described below. Correlations among them were low, ranging from
.01 to .19, except for Factors 1 and 5, which correlated at .31. The factors
accounted [or 45%of the total variance of the combined scales.
Results 645

TABLE34-5 Means Ratings and Standard Deviations (in Parentheses) for De-
mand Statements
Physicians Nonphysicians
n = 146 n = 129
D1. Should be able to explain their diagnostic and 1.42 1.78
treatment decisions to physician users (.80) (.42)
D2. Should be portable and flexible so that 1.14 1.52
physician can access them at any time and (.81) (.51)
place
D3. Should display an understanding of their own .99 1.48
medical knowledge (.94) (.80)
D4. Should improve the cost efficiency of tests and .85 I.I 1
therapies (.99) (1.58)
D5. Should automatically learn new information .84 1.41
when interacting with medical experts (1.02) (.75)
D6. Should display commonsense .75 1.11
(1.20) (.97)
D7. Should simulate physicians thought processes .64 .93
(1.16) (1.07)
D8. Should not reduce the need for specialists .46 .70
(1.18) (1.07)
D9. Should demandlittle efibrt from physician to .35 1.19
learn or use (1.20) (.92)
DI0. Should respond to voice command and not .26 .56
require typing (1.23) (1.05)
DI I. Should not reduce the need for .26 .85
paraprofessionals (1.06) (1.03)
D 12. Should significantly reduce amount of -.08 .00
technical knowledge physician must learn and (1.34) (1.49)
remember
D13. Should never make an error in treatment -.25 -.22
planning (1.33) (1.34)
D14. Should never make an incorrect diagnosis -.45 -.26
(1.31) (1.46)
D I5. Should become the standard for acceptable -.80 .00
medical practice (1.13) ( 1.07)
Total scale = .44 .81
646 AnAnalysisof PhysiciansAttitudes

TABLE34-6 Intercorrelation of Physicians Computing


Knowledge, Acceptance, Expectations, and Demands
Demands Expectations Knowledge

Acceptance .27* .26* .27*


Knowledge .08 .26*
Expectations .05
*p < .001

Factor 1 includes statements E7, E8, E11, El3, and El7 (Table 34-3).
It relates to Expectations about how physicians might be personally af-
fected by a consultation system. All of these statements received positive
ratings (i.e., the outcomes were judged to be unlikely) ranging from .34
61.05. Factor loadings for the statements ranged from .43 to .59.
Factor 2 includes statements D 1, D2, D3, D5, and D6 from the D-scale
(Table 34-5). The factor is composed of the performance Demandsthought
by physicians to be the most important. Ratings of the statements ranged
from .75 to 1.42. Factor loadings for the statements ranged from .41 to
.65.
Factor 3 relates to Demandsabout system accuracy. It includes state-
ments D13 and D14, which were rated relatively unimportant by the re-
spondents. Factor loadings were .84 and .89, respectively.
Factor 4 includes statements from both scales and relates to physicians
attitudes regarding the effect of computing systems on the need for health
care personnel. It includes statements El5, El6, D8, and D11. The factor
reflects the opinion that consultation systems will not and should not affect
the need for either specialists or paraprofessionals.
Factor 5 includes statements E 1, E4, E5, E6, E8, E9, and E 11 from the
E-scale. It is similar to Factor 1 because statements E8 and E11 relate to
both factors; however, its focus appears to be slightly different. Whereas
Factor 1 related to the individual practitioner, Factor 5 is concerned with
the effect of consultation programs on medical practice in general. Factor
loadings ranged from - .70 to - .41.
Nearly the same pattern of differences among physicians was found
for the factors as was found for the full-scale ratings. Individual differences
in Expectations on Factors 1 and 5 were related to differences in knowledge
about computer concepts, experience with computers, time in medical
practice, professional orientation, and tutorial participation. Individual dif-
ferences were not found on ratings of the other three factors.
Table 34-6 shows the relationship between the scale ratings and Knowl-
edge about computers and medical computing concepts. Acceptance was

6Factorloadingscan rangefrom- 1.0 to + 1.0 and indicate the degreeof relationship be-
tweeneachstatementand the factm.
Discussion 647

moderately related to Knowledge, Expectations, and Demands. Knowledge


was also related to Expectations but not to Demands, and Expectations
were unrelated to Demands. These results are consistent with the differ-
ences reported above for the analyses of variance.

34.2.5 Tutorial Findings

Of the tutorial participants, 50%completed the posttutorial questionnaire.


The posttutorial sample did not differ from the pretutorial group on any
of the sample characteristics including medical specialty, professional ori-
entation, years of medical experience, time devoted to research, or com-
puting experience.
The tutorial affected physicians in two ways. First, it significantly in-
creased their self-reported knowledge about computing concepts from a
mean of 15.0 concepts to a mean of 25.5 concepts (p < .001). Second,
raised the level of their performance Demandsfrom a mean of .44 to a
mean of .72 (p < .01), although the relative importance of the individual
statements did not change. Physicians Expectations did not change overall;
although Factor 1 did show a slight change in the positive direction (i.e.,
the outcomes were judged less likely than they had been before the course),
the difference was not enough to be statistically significant. The mean
posttutorial Acceptance rating of 6.0 was not significantly different from
the tutorial registrants pretutorial rating of 5.8. Also, participation in the
tutorial did not alter the relatively low pretutorial Acceptance ratings of
the surgical specialists.

34.3 Discussion

The study we have described had three principal goals: (1) to measure
physicians attitudes regarding consultation systems, (2) to compare the
attitudes of subgroups of physicians, including those who chose to attend
a medical computing tutorial and those who did not, and (3) to assess the
impact of the continuing education course on the attitudes and knowledge
of the physicians who enrolled. In this section, we discuss some of the
results relevant to each of these goals.

34.3.1 Attitudes of Physicians

There was no significant difference in demographics or computing knowl-


edge between the tutorial attendees and the control group. The overall
analysis of physicians attitudes was therefore based on responses from all
648 AnAnalysisof PhysiciansAttitudes

physicians surveyed. The respondents were selective in their Acceptance


of computing applications. Applications that were presented as aids to
clinical practice were more readily accepted than those that involved the
automation of clinical activities traditionally performed by physicians
themselves. The distinction between a clinical aid and a replacement seems
to be important to physicians and suggests design criteria and preferred
modes for the introduction of computing innovations. This perspective is
consistent with historical attitudes regarding the adoption of other kinds
of technological innovation. For example, computerized axial tomography
has been widely accepted largely because it functions as a remarkably use-
ful clinical tool, providing physicians with faster and more reliable infor-
mation, but it in no way infringes on the physicians patient-management
role. In contrast, automated history-taking systems have not received wide-
spread acceptance, despite their accuracy and reliability. Wesuspect that
one reason physicians have resisted their use is because they are perceived
as a threat to a traditional clinical function.
Some observers have speculated that many physicians oppose com-
puter-based decision aids because they fear a loss of job security and pres-
tige. The study results do not support this viewpoint. The physicians sur-
veyed believe that consultation systems will not reduce the need for either
specialists or paraprofessionals. Furthermore, they do not feel that either
a physicians self-image or the respect he or she receives from patients will
be reduced by the use of" this kind of system. They are worried that con-
sultation systems may increase the cost of care, although they believe that
the programs should be designed to decrease costs. This Expectation may
reflect past experience with new technologies that have generally increased
cost, at least initially, but have eventually been accepted because of per-
ceived improvement in patient care. In light of the generally positive Ex-
pectations of physicians, as demonstrated in this study, it is unlikely that
the acceptance of a medical consultation system will depend solely on its
ability to reduce the cost of care; the crucial factor, rather, is likely to be
the systems ability to improve the quality of patient care or to simplify its
delivery.
The results from the Demand-scale indicate, however, that for a system
to improve patient care in an acceptable fashion, it must be perceived as a
tool that will assist physicians with managementdecisions. It is clear that
physicians will reject a system that dogmatically offers advice, even if it has
impressive diagnostic accuracy and an ability to provide reliable treatment
plans. Physicians seem to prefer the concept of a system that functions as
much like a human consultant as possible.

34.3.2 Comparisons Among Subgroups

Physicians Expectations about the effect of computer-assisted consultation


systems on medical practice were generally positive, although considerable
differences amongphysicians were noted. The finding that physicians with
Discussion 649

prior computing experience have more positive Expectations regarding the


effects of consultation systems supports the belief of other investigators,
although even the groups with little or no experience generally had positive
attitudes. The slightly more positive Expectations of academic physicians
may be a source of" encouragement to medical computing researchers be-
cause this kind of system development typically depends on support from
the academic community. However, the more negative Expectations of pri-
vate practice physicians and of those who chose not to attend the tutorial
are worrisome. These groups represent the majority of practitioners in the
country and are, in particular, the physicians for whommany of the re-
search systems are designed. 7 Furthermore, although many of their con-
cerns, such as worries about increased government control of medical prac-
tice, defy direct attention by the medical computing researcher, an
increased awareness of them may lead to more sensitive design decisions
and more tactful introduction of" new systems.

34.3.3 Effect of the Tutorial

The tutorial experience had a small but significant effect on physicians


Demands and also produced a substantial increase in their knowledge
about computing concepts. The results from the Demand-scale were of
particular interest. Physicians apparently gained new insights from the
tutorial into the potential use and capabilities of medical computing and
increased their performance Demands accordingly. These opinions re-
garding the attributes of acceptable computing systems were surprisingly
uniform across physician subgroups both before and after the tutorial. Our
interpretation of this result is that physicians are serious about these De-
mandsand that consultation systems are not likely to be clinically effective,
regardless of the accuracy of their advice, until these capabilities have been
incorporated.
On the other hand, the tutorial had no significant effect on physicians :,,
Acceptance of computer applications or on their Expectations regarding
the effect of consultation systems on medical practice. The failure of the
tutorial to change the Acceptance rating is not surprising because the pre-
tutorial ratings were already very high. It is possible that an expanded set
of applications on the Acceptance scale, particularly applications that in-
wflve the automation of traditional physician functions, would have pro-
duced a different result. Similarly, the Expectations of the tutorial regis-
trants were markedly positive prior to the tutorial and were not
significantly changed as a result of the course. Before the survey we were
concerned that the Expectations of the course participants might decline

7Although our study includedphysicianswithdifferent backgrounds and interests (e.g., med-


ical speciahy,timedew)tedto research),wecannotgeneralizewithcertainty fromour results
to the nationalcom,nunity of physicians.Ourself-selectedtutorial participantswerealmost
all academic or academicallyaffiliated, andour nontutorial(control)samplewasselectedfrom
a sinfilar I)opulation.
650 An Analysis of Physicians Attitudes

on the posttutorial questionnaire; it was possible that the physicians in the


audience would begin to worry about the effects of certain applications
after being exposed to the problems and uncertainties experienced by the
medical computing researchers. Instead, the attendees apparently under-
stood both the potential and the problems associated with designing con-
sultation programs and took a more positive approach by increasing their
Demands for more humanlike performance from the systems.
Although physicians with positive Expectations could be distinguished
from those with negative ones on the basis of their knowledge about com-
puting concepts prior to the tutorial, increasing their knowledge about
these concepts did not change their Expectations. Since physicians with
negative Expectations were also the least likely to participate voluntarily in
our CMEprogram, the effectiveness of CMEin increasing the acceptance
of clinical computing amongthe most resistant physicians is questionable.
However, the study results indicate that computing applications have al-
ready obtained a strong core of support among some physicians. This
support may even be deeper than we had expected because, for the phy-
sicians we surveyed, it extended to the belief that medical computing
should be considered an area of basic medical research, comparable to
biochemistry and immunology. In response to a question on this subject
included at the end of the questionnaire, 75%of the pretutorial and control
group physicians agreed that medical computing should be considered an
area of basic medical research, and another 14% were undecided. Webe-
lieve that this uniformly positive response may have been influenced by
the administration of the questionnaire, and physicians asked the same
question without the context provided by the survey instrument might
respond less favorably. On the other hand, even physicians with minimal
computing experience seem likely to accept the fimdamental research com-
ponent of medical computer science if it is pointed out to them. This
suggests a strong educational message that must be conveyed to the medical
communityregarding the research role of" the discipline.

34.4 Recommendations
The results of this survey counter the commonimpression that physicians
tend to be resistant to the introduction of clinical consultation systems.
Although we have polled physicians only from the immediate vicinity of"
our medical center, there is no reason to assume that a nationwide survey
would achieve markedly different results. Wehave found that a significant
segment of the medical community believes that assistance from computer-
based consultation systems will ultimately benefit medical practice. How-
ever, a major concern at present is whether system developers can respond
adequately to physician demands tor performance capabilities that extend
Recommendations 651

beyond currently available computer science techniques. In light of these


results, the following recommendations may be helpful.

1. Strive to minimize changes to current clinical practices. The system should


idea.lly replace somecurrent clinical function, thereby avoiding the need
for an additive time commitment by the physician. The system should
ideally be available when and where physicians customarily make de-
cisions.
2. Concentrate some of" the research effort on enhancing the interactive ca-
pabilities of the expert system. The more natural these capabilities, the
more likely it is that the system will be used. At least four features
appear to be highly desirable:
a. Explanation. The system should be able to justify its advice in terms
that are understandable and persuasive. In addition, it is preferable
that a system adapt its explanation to the needs and characteristics
of the user (e.g., demonstrated or assumed level of background
knowledge in the domain). A system that gives dogmatic advice is
likely to be rejected.
b. Commonsense. The system should "seem reasonable" as it progresses
through a problem-solving session. Someresearchers argue that the
programs operation should therefore parallel the physicians reason-
ing processes as muchas possible. There is a growing body of knowl-
edge about the psychological underpinnings of medical problem
solving (Elstein et al., 1978), and systems that draw on these insights
are likely to find an improved level of acceptance by the medical
community.
c. Knowledgerepresentation. The knowledge in the system should be easy
to bring up to date, and this often seriously constrains the format
for storing information in the computer. A challenging side issue is
the automatic "learning" of new knowledge of the domain, either
through interaction with expert physicians or through "experience"
once the system is in regular use.
d. Usability. The system should be easy to learn and largely self-docu-
menting. The mode of interaction may be the key to acceptability,
and effective methods for understanding text or spoken language
should dramatically increase the utility of clinical systems. For rou-
tine activities, it is preferable that use of the system be as easy as
pressing a button.
3. Recognizethat 100%accuracy is neither achievable nor expected. Physicians
will accept a system that functions at the same level as a humanexpert,
as long as the interactive capabilities noted above are a component of
the consultative process.
4. Consider carefully the most appropriate criteria for assessing a clinical con-
sultation system. Not all medical computer programs should be judged
652 An Analysis of Physicians Attitudes

on the same basis, and cost-effectiveness may appropriately be a sec-


ondary concern when a system can be shown to significantly improve
the quality of patient care or the efficiency of its delivery.
5. Whendesigning systems, consider the concerns and demandsthat physicians
express about consultation systems. These should be used to guide both
the development and the implementation of the systems of the future.
It is increasingly recognized that it takes only one shortcoming to render
an otherwise well-designed system unacceptable.

The considerations outlined here place severe demands on current


computing capabilities. Many of the issues that we have cited, and that
were included on the Demand-scale in the survey, are capabilities that are
beyond the current state of the art in computer science. They thus help
delineate some of the important basic research issues for future work in
medical computing.
35
An Expert System for
Oncology Protocol
Management

Edward H. Shortliffe, A. Carlisle Scott,


Miriam B. Bischoff, A. Bruce Campbell,
William van Melle, and Charlotte D. Jacobs

This chapter describes an oncology protocol management system, named


ONCOCIN after its domain of expertise (cancer therapy) and its historical
debt to MYCIN.The program is actually a set of interrelated subsystems,
1the primary ones being:

1. the Reasoner, a rule-based expert consultant that is the core of the sys-
tem; and
2. the Interviewer, an interface program that controls a high-speed terminal
and the interaction with the physicians using the system.

The Interviewer is described in some detail in Chapter 32. This chapter


describes the problem domain and the representation and control tech-
niques used by the Reasoner. We also contrast ONCOCINwith EMYCIN

This chapter is based on an article originally appearing in Proceedings of the Seventh IJCAI,
1981, pp. 876-881. Used by permission of" International Joint Conferences on Artificial
Intelligence, Inc.; copies of the Proceedings are available from William Kaufmann, Inc., 95
First Street, l.os Ahos, CA94022.
lEach program runs in a separate fork under the TENEXor TOPS-20 operating systems,
thereby approximating a parallel processing system architecture. Another program, the In-
temctor, handles interprocess communication. There is also a process that provides back-
ground utility operations such as file backup. This chapter does not describe these aspects of
the system design or their implementation. Details are available elsewhere (Gerring et al.,
1982).

653
654 AnExpertSystemfor OncologyProtocol Management

(Chapter 15), explaining why the EMYCINformalism was inadequate for


our purposes, even though it did strongly influence the systems rule-based
design.

35.1Overview of the Problem Domain

ONCOCIN is designed to assist clinical oncologists in the treatment of


cancer patients. Because the optimal therapy for most cancers is not yet
known, clinical oncology research is commonlybased on complex formal
experiments that compare the therapeutic benefits and side effects (tox-
icity) of proposed alternative disease treatments. "Cancer" is a general term
for many diseases having different prognoses and natural histories. A
treatment that is effective against one tumor may be ineffective against
another. Thus a typical cancer research center may conduct many simul-
taneous experiments, each concerned with a diffi~rent kind of cancer and
its optimal therapy (i.e., the treatment plan with the best chance of cure,
remission, or reduction in tumor size and the least chance of serious side
effects).
Each of these experiments is termed a protocol. Patients with a given
tumor must meet certain eligibility criteria before they are accepted for
treatment on the protocol; ineligible patients are treated in accordance with
the best state-of-the-art therapy and are therefore not part of a formal
clinical experiment. 2 Patients accepted tbr protocol treatment, on the other
hand, are randomly assigned to receive one of two or more possible treat-
ments. The experiment requires close monitoring of each patients clinical
response and treatment toxicity. These data are tallied for all patients
treated under the alternate regimens, and in this way the state of the art
is updated over time.
Each protocol is described in a detailed document, often 40 to 60 pages
in length, which specifies the alternate therapies being compared and the
data that need to be collected. Therapies may require as manyas eight to
ten drugs, given simultaneously or in sequence, continuously or intermit-
tently. In addition, pharmacologic therapy may be combined with appro-
priate surgery or radiation therapy. No single physician is likely to remem-
ber the details in even one of these protocol documents, not to mention
the 30 to 60 protocols that may be used in a major cancer center (any one
of which may be used to guide treatment of the patients under the care
of a single physician). Although an effort is made to have the documents
available in the oncology clinics when patients are being treated for their

2Unfortunately, for manytumorsthe best state-of the-art therapymaycauseintolerable tox-


icity or be onlypartially effective. Thatis wilythere is a cnnstantsearchfor improved
ther-
apeuticplans.
Research Objectives 655

tumors, it is often the case that a busy clinic schedule, coupled with a
complex protocol description, leads a physician to rely on memory when
deciding on drug doses and laboratory tests. Furthermore, solutions for
all possible treatment problems cannot be spelled out in protocols. Physi-
cians use their own judgment in treating these patients, resulting in some
variability in treatment from patient to patient. Thus patients being treated
on a protocol do not always receive therapy in exactly the manner that the
experimental design suggests, and the data needed for formal analysis of
treatment results are not always completely and accurately collected. In
some cases, patients suffer undue toxicity or are undertreated simply be-
cause protocol details cannot be remembered, located, or are not explicitly
defined.
The problems we have described reach far beyond the oncology clinic
at Stanford Medical Center. There are now several institutions designing
protocol management systems to make the details of treatment protocols
readily available to oncologists and to insure that complete and accurate
data are collected. :~ ONCOCIN is superficially similar to some of the de-
veloping systems, but both its short- and long-term goals are unique in
ways we describe below. One overriding point requires emphasis: in order
to achieve its goals, ONCOCIN must be used directly by busy clinicians;
the implications of" this constraint have pervaded all aspects of the system
design.

35.2Research Objectives

The overall goals of the ONCOCIN


project are

1. to demonstrate that a rule-based consultation system with explanation


capabilities can be usefully applied and can gain acceptance in a busy
clinical environment;
2. to improve the tools currently available, and to develop new tools, for
building knowledge-based expert systems for medical consultation; and
3. to establish both an effective relationship with a specific group of phy-
sicians and a scientific foundation, which will together facilitate future
research and implementation of computer-based tools for clinical de-
cision making.

:~A memolronl the M.I.T. Laboratory for Computer Science (Szolovits, 1979) describes
collaboration between M.I.T. and oncologists who have been building a protocol management
system at Boston University (Horwitz et al., 1980). They are planning to develop a program
for designing new chemotherapy protocols. To our knowledge, this is the only other project
that proposes to use AI techniques in a clinical oncology system. However, the stated goals
of that effort differ trom those of ONCOCIN.
656 An Expert System for Oncology Protocol Management

Hence ONCOCINsresearch aims have two parallel thrusts: to per-


form research into the basic scientific issues of" applied artificial intelligence,
and to develop a clinically useful oncology consultation tool. The AI com-
ponent of the work emphasizes the tollowing:

1. the implementation and evaluation of recently developed techniques


designed to make computer technology more natural and acceptable to
physicians;
2. extension of the methods of rule-based consultation systems so that they
can interact with a large data base of time-oriented clinical information;
B. the design of a generalized control structure, separate from the domain
knowledge, with the hope that the general system can be usefully ap-
plied in other problem areas with similar tasks;
4. continuation of basic research into mechanisms for making decisions
based on data trends over time;
5. the design of a rapid, congenial interface that can bring a high-perfor-
mance AI system to a group of users who are not experienced with AI
or with computers in general; and
6. the development of techniques for assessing knowledge base complete-
ness and consistency (see Chapter 8).

35.3 System Overview

The ONCOCIN system will eventually contain knowledge about most of


the protocols in use at the Oncology Clinic at Stanford Medical Center.
Although protocol knowledge is largely specified in a written document,
many questions arise in translating the information into a computer-based
format. Knowledge base development has therefbre been dependent on
the active collaboration of Stanford oncologists. Wehave started by encod-
ing the knowledge contained in the protocols for treatment of Hodgkins
disease and the non-Hodgkins lymphomas.4 In generating its recommen-
dation, the system uses initial data about the patients diagnosis, results of
current laboratory tests, plus the protocol-specific information in its knowl-
edge base. As infbrmation is acquired, it is stored on-line in files associated
with the patient.
After examining a patient, the physician uses a video display terminal
to interact with ONCOCINsdata-acquisition program (the Interviewer;

aWe also implemented the complex protocol fbr treating oat cell carcinoma of tbe lung.
Because the oat cell protocol is the ntost complex at Stanford, and it took only a month to
encode the relevant rules, we are hopeful that the representation scheme we have devised
will be able to manage, with only minor modifications, the other protocols we plan to encode
in the future.
System Overview 657

STATIC KNOWLEDGE DYNAMIC DATA

General ] Protocol- Data Base


Strategies I of
of Specific
Oncology Oncology
I Knowledge Patients
ChemotherapyJ

Reasoner Interviewer
Interactor
(Interlisp) (SAIL)

FIGURE 35-1 Overview of ONCOCIN.

see Chapter 32), reviewing time-oriented data from the patients previous
visits to the clinic, entering information regarding the current visit, and
receiving recommendations, generated by the Reasoner, of appropriate
therapy and tests. "Fhe Reasoner and Interviewer are linked with one an-
other as shown ill Figure 35-1. Each is able to use a data base of prior
patient data. In addition, the Reasoner has access to information regarding
the execution of chemotherapy protocols (control blocks) and specific in-
formation (rules) about the chemotherapy being used to treat the patient.
Before terminating an interaction, the physician can examine the expla-
nation provided with each recommendation. 5 The physician may approve

~Wehave chosen a rcpresenlation that had also facilitated early work to allow ONCOCIN to
ofti:r a .jusfilicafion lot any intermediary conclusions that the system made in deriving the
advice (Langlotz and Shortliffe, 1983).
658 An Expert System for Oncology Protocol Management

or modify ONCOCINsrecommendation; any changes are noted by the


system and kept available for future review. ONCOCIN
also provides hard-
copy backup to complement the on-line interaction and facilitate com-
munication among clinic personnel.

35.4 The Reasoner

35.4.1 Why Not EMYCIN?

ONCOCINsReasoner communicates with the Interviewer during a con-


suhation. Although EMYCINsinteractive routines provided a means for
us to develop a prototype system quickly, the need to interact eventually
with a specialized interface program is one of several reasons that we chose
to build most of ONCOCIN fiom scratch rather than to implement it as
a new EMYCIN system (Chapter 15). Other important differences between
ONCOCINsapplication and the domains for which EMYCINsystems have
been built include the fbllowing:

1. ONCOCIN requires serial consideration of patients at intervals typically


spread over many months. Each clinic visit is a new data point, and
conventional EMYCIN context trees and case data tables do not easily
accommodate multiple measurements of the same attribute over time.
2. Expert-level advice from ONCOCIN also requires inference rules based
on assessment of" temporal trends fi)r a given parameterJ } Because EMY-
CINassumes that a consuhation is to be given at a single point in time,
it does not provide a mechanismfor assessing trends or interacting with
a data bank of past information on a case.
3. ONCOCIN does not require many of the capabilities provided by EMY-
CIN. For example, the simplified interaction mediated through the In-
terviewer allows questions to be answered directly without dealing with
the complexities of" natural language understanding.
4. Because of the nature of the interaction with the Interviewer, ONCO-
CIN needs to operate in a data-driven mode. Although EMYCINhas
a limited allowance for forward chaining of rules, it would be incon-
venient to force a largely data-driven reasoning process into the EMY-
CIN format.

tqhis same point led to the development of Fagans VMsystem (Chapter 22), a rule-based
program that was influenced by EMYCIN but differed in its detailed implementation because
of the need to fbllow trends in patients under treatment in an intensive care unit. The
development o[" similar capabilities for ONCO(IN is au active area of research at present.
The Reasoner 659

35.4.2 Representation

Knowledge about the oncology domain is represented using five main data
structures: contexts, parameters, data blocks, rules, and control blocks. 7 In
addition, we use a high-level description of each of these structures to serve
as a template for guiding knowledge acquisition during the definition of
8individual instances.
Contexts represent concepts or entities of the domain about which the
system needs static knowledge. Individual contexts are classified by type
(e.g., disease, protocol, or chemotherapy) and can be arranged hierarchi-
cally. During a consuhation, a list of "current" contexts is created as infor-
mation is gathered. These current contexts together provide a high-level
description of the patient in terms of knownchemotherapeutic plans. This
description serves to focus the systems recommendation process.
Parameters"represent the attributes of patients, drugs, tests, etc., that
are relevant for the protocol managementtask (e.g., white blood count,
recommended dose, or whether a patient has had prior radiotherapy).
Each piece of information accumulated during a consultation is repre-
sented as the value of a parameter. There are three steps in determining
the value of a parameter. First, the system checks to see if the value can
be determined by definition in the current context. If not, the "normal"
method of finding the value is used: if the parameter corresponds to a piece
of laboratory data that the user is likely to know, it is requested from the
user; otherwise, rules for concluding the parameter are tried. Finally, the
system may have a (possibly context-dependent) default value that is used
in the event that the normal mechanism fails to produce a value, or the
9user may be asked to provide the answer as a last resort.
Data bloct~" define logical groupings of" related parameters (e.g., initial
patient data or laboratory test results). A data block directs the system to
treat related parameters as a unit when requesting their values from the
Interviewer, storing the values on a patients file, or retrieving previously
stored values.
Rules" are the familiar productions used in MYCINand other rule-
based systems; they may be invoked in either data-driven or goal-directed
mode. A rule concludes a value for some parameter on the basis of values
of other parameters. A rule may be designated as providing a definitional

7There are a few additional data structures designed to coordinate the interaction between
the Reasoner and the Interviewer.
8The knowledge base editor is based on the similar programs designed and implemented for
EMYCIN.A graphics editor has also been developed for use on the LISP machine worksta-
tions to which we iutend to transter ONCOCIN (Tst~ji and Shortliffe, 1983).
)This "pure" description of ONCOCINstechnique for assigning values to parameters is
actually further complicated by the free-form data entry allowed in the Interviewer. The
details of howthis is handled, and the corresponding relationship to control blocks, will not
be described here.
660 An Expert System for Oncology Protocol Management

value or a default value as defined above. The rules are categorized by the
context in which they apply.
As in EMYCINsystems, rules are represented in a stylized format so
that they may be translated from Interlisp into English for explanation
purposes. 1 This representation scheme more generally allows the system
to "read" and manipulate the rules. It has also facilitated the development
of programs to check for consistency and completeness of the rules in the
knowledge base (Chapter 8).
Below are the English translations of two ONCOCINrules. Note that
ql
Rule 78 provides a default value for the parameter "attenuated dose.
RULEO75
Todetermine the currentattenuated dosefor all drugsin MOPP
or for all drugsin PAVe:
IF: 1) Thisis thestart of thefirst cycleaftercyclewasaborted,
and
2) Thebloodcountsdonot warrantdoseattenuation
THEN:Conclude that the currentattenuated doseis 75percentof the previousdose.
RULE078
Aftertryingall othermethodsto determine
the currentattenuated
dosefor a~l drugs:
IF: Thebloodcountsdowarrantdoseattenuation
THEN:Conclude that the currentattenuateddoseis the previous
doseattenuatedbythe minimum
of the doseattenuationdueto lowWBC andthedoseattenuation dueto lowplatelets.

Control blocks serve as high-level descriptions of the systems methods


for performing tasks. Each contains an ordered set of steps to be used for
accomplishing a specific task (e.g., formulating a therapeutic regimen or
calculating the correct dose of a drug). Note that this data structure allows
us to separate control descriptions explicitly from decision rules, a distinc-
tion that was often unclear in EMYCINsystems. Because we wish to be
able to explain any action that ONCOCINtakes, control blocks can be
translated into English using the same translation mechanism that is used
to translate rules, for example:
ADVISE
Tomakea recommendationabouttreatingthe patient:
1) Formulatea therapeuticregimen.
2) Determinethe tests to recommend.
3) Determinesuggestionsaboutthe patient.
4) Determine
thetimetill thepatientsnextvisit.
DOSE
Tocalculate
thecorrectdosage of thedrug:
1) Determinethe currentattenuateddose.
2) Determinethe unitsin whichthe drugshouldbemeasured.
3) Determinethe maximum allowabledoseof the drug.
4) Determine
therouteof administration.
5) Determinethe numberof daysfor whichthe drugshouldbegiven.
6) Computethe dosebased uponbodysurfacearea.

1In keeping with the philosophy reflected in ~)ther systems we have designed, ONCO(]IN
able to produce natural language explanations for its recommendations.See also the criti-
quing workof Langlotz and Shortliffe (1983).
l lPAVeand MOPP are acronymsfor two of the drug combinations used to treat Hodgkins
disease.
The Reasoner 661

To summarize the differences between ONCOCINsrules and those


used in MYCINand other EMYCINsystems:

1. Control is separated from domain knowledge, although process infor-


mation is still codified in a modular format using control blocks.
2. The contextual information, which defines the setting in which a rule
can be applied, is separated from the main body of the rule and used
for screening rules when they are invoked (see next section).
3. Rules are subclassified to distinguish the major mechanisms by which
the values of parameters can be determined (definitional, normal, and
default rules).

35.4.3 Control

Whena user specifies the task that ONCOCIN is to perform, the corre-
sponding control block is invoked. This simply causes the steps in the
control block to be taken in sequence. These steps may entail the following:
~.,
1 Fetching a data block, either by loading previously stored data or by re-
questing them from the user. This causes parameter values to be set,
resulting in data-directed invocation of rules that use those parameters
(and that apply in the current context).
2. Determining the value of a parameter. This causes goal-directed invocation
of the rules that conclude the value of the parameter (and apply in the
current context). Definitional rules are applied first, then the normal
rules, and if no value has been found by these means, the default rules
are tried. It" a rule that is invoked in a goal-directed fashion uses some
parameter whose value is not yet known, that parameters value is de-
termined so that the rule can be evaluated. In addition, concluding the
value of any parameter, either by the action of rules or when infor-
mation is entered by the user, may cause data-directed invocation of
other rules.
3. Invoking another control block.
4. Calling a special-purpose,function (which may be domain-dependent).

The effects of this control mechanism contrast with the largely back-
ward-chained control used in MYCINand other EMYCINsystems. Figure
35-2 shows the goal-oriented procedure used in EMYCIN.All invocation
of rules results because the value of a specific parameter is being sought.
Rules used to determine the value of that parameter can be referenced in
any order, although ordering is maintained for the assessment of the pa-
rameters occurring in the conditional statements in each rules premise.
Antecedent (data-driven) rules are used when the users response to
question, or (less commonly) the conc;usion from another rule, triggers
662 An Expert System for Oncology Protocol Management

START
(Goal)
~B~

BC

@oQ
BC
~r

A Mm
BC

KEY
P Find Parameter
R Try Rule
A Ask User
"Causes"
BC BackwardChaining
FC Forward Chaining

FIGURE 35-2 Control in EMYCIN.

one of tile systems forward-chained rules. These rules can only be used
as antecedent rules, they typically have single conditions in their premises,
and repeated forward chaining is permitted only if one rule concludes with
certainty that the premise of another is true.
In ONCOCIN (Figure 35-3), on the other hand, initial control is de-
rived from the control block invoked in response to the task selected by
the user. Forward chaining and backward chaining of" rules are intermin-
gled, v~ and any rule can be used in either direction.

12The broken line in Figure 35-3 outlines the portion of the ONCOCIN
control structure
that is identical to that [~)t, nd in EMYCIN
(Figure 35-2).
WhyArtificial Intelligence Techniques? 663

START

BC
KEY
CB Invoke Control Block
DB Fetch DataBIock
R Try Rule
P Find Parameters Value
A Ask User
"Causes"
FC Forward Chaining
BC Backward Chaining

FIGURE 35-3 Control in ONCOCIN.

35.5WhyArtificial Intelligence Techniques?

We have learned from the MYCINexperience, and in building other EMY-


CIN systems as well, that a major part of each development effort has been
the encoding of poorly understood knowledge. Enlisting the time and en-
thusiasm of domainexperts has often been difficult, yet progress is usually
impossible without active collaboration. Thus there is great appeal to a
domain in which much of the needed knowledge is already recorded in
664 AnExpertSystemfor OncologyProtocol Management

thorough, albeit lengthy and complicated, documents (viz., the protocol


descriptions that are written for every cancer therapy clinical experiment).
Much of the appeal of the ONCOCIN problem domain is the availability
of detailed documents that we can study and use [or knowledge base de-
velopment.
As we noted earlier, several other centers have begun to develop pro-
tocol management systems, but none has chosen to use techniques drawn
from artificial intelligence. Complicated though the chemotherapy proto-
cols may be, they are largely algorithmic, and other groups have been able
to encode much of the knowledge using less complex representation tech-
niques. Our reasons for choosing an AI approach for encoding the knowl-
edge of oncology chemotherapy are varied.13 It should be stressed that all
protocols have important loopholes and exceptions; when an aberrant sit-
uation arises for a patient being treated, the proper managementis typi-
cally left unspecified. For example, the lymphomaprotocols with which we
have been most involved to date include several rules of the following form:
IF: there
is evidence
ofdisease
extension
THEN:referthepatient
tolymphoma
clinic
IF:there
is significant
toxicity
tovincristine
THEN:considersubstituting
velban

As shown here, the protocols often defer to the opinions of the at-
tending physicians without providing guidelines on which they might base
their decisions. Hence there is no standardization of responses to unusual
problems, and the validity of the protocol analysis in these cases is accord-
ingly subject to question. One goal is to develop approaches to these more
complex problems that characterize the management of patients being
treated for cancer. It is when these issues are addressed that the need for
AI techniques is most evident and the task domain begins to look similar
in complexity to the decision problems in a system like MYCIN.Rules will
eventually have uncertainty associated with them (we have thus far avoided
the need for certainty weights in the rules in ONCOCIN),and close col-
laboration with experts has been required in writing new rules that are not
currently recorded in chemotherapy protocols or elsewhere. In addition,
however, AI representation and control techniques have already allowed
us to keep the knowledge base flexible and easily modified. They have also
allowed us to develop explanation capabilities and to separate kinds of
knowledge explicitly in terms of their semantic categories (Langlotz and
Shortliffe, 1983; Tsuji and Shortliffe, 1983).

13Because weneeda high-speedinterlace to ensurethe systemsacceptanceby physicians,


we havebeenforced to design a complexsystemarchitecture with asynchronous processes.
Wehavealso wantedto allow each processto run in whatevercomputerlanguageseemsmost
appropriate for its task. ONCOCIN subprocessesare currently written in Interlisp, SAIL,
and assembler(Gerringet al., 1982).Wehavenot describedthe total systemor our reasons
for makingthese designdecisions,but webelievethe structureis necessaryto achieveaccep-
tanceof the systemin a clinical setting.
Conclusion 665

35.6 Conclusion
In summary,the project seeks to identify new techniques for bringing large
AI programs to a clinical audience that would be intolerant of systems that
are slow or difficult to use. The design of a novel interface that uses both
custom hardware and efficient software has heightened the acceptability
of" ONCOCIN. Formal evaluations are underway to allow us to determine
both the effectiveness and the acceptability of the systems clinical advice.
For the present we are trying to" build a useful system to which in-
creasingly complex decision rules can be added. We are finding, as ex-
pected, that the encoding of complex knowledge that is not already stated
explicitly in protocols is arduous and requires an enthusiastic community
of" collaborating physicians. Hence we recognize the importance of one of
our research goals noted earlier in this report: to establish an effective
relationship with a specific group of physicians so as to facilitate ongoing
research and implementation of advanced computer-based clinical tools.
PART TWELVE

Conclusions
36
Major Lessons from This
Work

In this book we have presented experimental evidence at many levels of


detail for a diverse set of hypotheses. As indicated by the chapter and
section headings, the major themes of the MYCINwork have many vari-
ations. In this final chapter we will try to summarize the most important
results of the work presented. This recapitulation of the lessons learned
should not be taken as a substitute for details in the sections themselves.
Weprovide here an abstraction of the details, but hope it also constitutes
a useful set of lessons on which others can build. The three main sections
of this chapter will

reiterate the main goals that provide the context for the experimental
work;
discuss the experimental results from each of the major parts of the
book; and
summarize the key questions we have been asked, or have asked our-
selves, about the lessons we have learned.

If we were to try to summarize in one word why MYCINworks as well


as it does, that word wouldbe flexibility. By that we meanthat the designers
choices about programming constructs and knowledge structures can be
revised with relative ease and that the users interactions with the system
are not limited to a narrow range in a rigid form. While MYCIN was under
construction, we tried to keep in mind that the ultimate system would be
used by many doctors, that the knowledge base would be modified by several
experts, and that the code itself would be programmedby several program-

669
670 MajorLessonsfromThis Work

mers. l In hindsight, we now see many areas of inflexibility in MYCIN and


EMYCIN.For example, the knowledge acquisition system in EMYCINre-
quires that the designer of a new system express taxonomic knowledge in
a combination of rules and contexts; no facile language is provided for
talking about such structures. We lose some expressive power because
MYCINs z representation of all knowledge in rules and tables does not
separate causal links from heuristics. And MYCINscontrol structure fore-
closes the possibility of tight control over the sequence of rules and pro-
cedures that should be invoked together. Thus we are recommending that
the principle of flexibility be pushed even farther than we were able to do
during the last decade.
Twoimportant ingredients of a flexible system are simplicity and mod-
ularity. Wehave discussed the simplicity of both the representation and
control structure in MYCIN,and the modularity of the knowledge base.
While simple structures are sometimes frustrating to work with, they do
allow access from many other programs. For example, explanation and
knowledge acquisition are greatly facilitated because the rules and back-
ward chaining are syntactically simple (without much additional compli-
cation in their actual implementation). The semantics of the rules also
appear simple, to users at least, because they have been defined that way
by persons in the users own profession.
The modularity of MYCINsknowledge representation also contrib-
uted to its success. The rules were meant to be individual chunks of knowl-
edge that could be used, understood, or modified independently of other
rules. McCarthy, in his paper on the Advice Taker (McCarthy, 1958), set
as one requirement of machine intelligence that a program be modifiable
by giving it declarative statements about new facts and relations. It should
not be necessary to reprogram it. That has been one of the goals of all
work on knowledge programming, including our own. MYCINsrules can
be stated to the rule editor as new relations and are immediately incor-
porated into the definition of the systems behavior.
Modularity includes separation of individual "chunks" of" knowledge
from one another and from the program that interprets them. But it also
implies a structuring of the knowledge that allows indexing from many
perspectives. This facilitates editing, explanation, tutoring, and interpret-
ing the individual chunks in ways that simple separation does not. In the

1Asmentioned,LISPprovideda goodstarting place for the development of a systemlike


MYCIN becauseits programming constructs neednot be fixed in type andsize and it allows
the buildingof data structuresthat are executableas code.At the timeof systemconstruction,
a designeroften needsto postponemakingcomnfitments about data structures, data types,
sizes of" lists, and so forth until experimenting witha runningprototype.At tbe timethe
knowledge basefor an expertsystemis underconstruction,similar degreesof flexibility are
requiredto allowthe program to improveincrementally.Atthe timea systemis run, it needs
flexibility in its I/Ohandling,for example,
to correctmistakesandprovidedifferentassistance
to differentusers.
21nmuchof this chapter, whatwesay about the design of MYCIN carries over to EMYCIN
as well.
TwoSets of Goals 671

case of MYCINs rule-based structure, both the elements of data in a rules


premises and the elements of the rules conclusion are separated and in-
dexed. However, it is now clear that more structuring of a knowledge base
than MYCINsupports will allow indexing chunks of knowledge still fur-
ther, for example to explain the strategies under which rules are inter-
preted or to explain the relationships among premise clauses.

36.1Two
Sets of Goals

It must be emphasized that the MYCINexperiments were inherently in-


terdisciplinary, and we were thus guided by two distinct sets of issues:
medical goals and artificial intelligence goals. They can be seen as two sides
of" the same coin. Wewere trying to build an AI system capable of high-
performance problem solving in medicine. Yet each side made its own
demands, and we were often forced to allocate resources to satisfy one or
the other set of concerns.
On the medical side we wanted to demonstrate the sufficiency of sym-
bolic inference rules in medical problems for which statistical and numer-
ical methods had mostly been used previously. Wewere also trying to find
methods that would allow programs to focus on therapy, as well as on
diagnosis. We were explicitly trying to address recognized problems in
medical practice and found considerable evidence that physicians fre-
quently err in selecting antimicrobial agents. Wewere trying to develop a
consultation model with which physicians would be comfortable because it
mirrored their routine interactions with consultants in practice. And we
were trying to develop a system that could and would be used in hospitals
and private practice.
On the AI side, as we have said, the primary motivation was to explore
the extent to which rules could be used to achieve expert-level problem
solving. In DENDRAL, situation-action rules had been used to encode
much of the programs knowledge about mass spectrometry, but consid-
erably more knowledge resided in LISP procedures. In MYCIN,we wanted
to use rules exclusively, to see if this could be done in a problem area as
complex as medicine. The overriding principle guiding us was the belief
that the flexibility of a program was increased by separating medical knowl-
edge from procedures that manipulate and reason with that knowledge.
Webelieved that by making the representation more flexible, it would be
easier to build nmre powerful programs in domains where programs grow
by accretion.
The previous chapters reflect this duality of goals. It is important to
recognize the tensions this duality introduced in order to understand ad-
equately both the descriptions of the experimental work in this book and
the underlying motivations for the individual research efforts.
672 MajorLessonsfromThis Work

36.2Experimental Results

Although we were not always explicitly aware of the hypotheses our work
was testing, in retrospect a number of results can be stated as consequences
of the experiments performed. The nature of experiments in AI is not
well established. Yet, as we said in the preface, an experimental science
grows by experimentation and analysis of results. The experiments re-
ported here are not nearly as carefully planned as are, for example, clinical
trials in medicine. However, once some uncharted territory has been ex-
plored, it is possible to review the path taken and the results achieved.
Wehave used the phrase "MYCIN-likesystem" in many places to char-
acterize rule-based expert systems, and we have tried throughout the book
to say what these are. In summary, then, let us say what we mean by rule-
based systems. They are expert systems whose primary mode of represen-
tation is simple conditional sentences; they are extensions of production
systems in which the concepts are closer in grain size to concepts used by
experts than to psychological concepts. Rule-based systems are deductively
not as powerful as logical theorem-proving programs because their only
rule of inference is modus ponens and their syntax allows only a subset of
logically well-formed expressions to be clauses in conditional sentences.
Their primary distinction from logic-based systems is that rules define facts
in the context of how they will be used, while expressions in logic-based
systems are intended to define facts independently of their use. 3 For ex-
ample, the rule A -~ B in a rule-based system asserts only that fact A is
evidence for fact B.
Rule-based systems are primarily distinguished from frame-based sys-
tems by their restricted syntax. The emphasis in a rule is on the inferential
relationship between facts (for example, "A is evidence for B" or "A causes
B"). In a frame the emphasis is on characterizing concepts by using links
of manytypes (including evidential relations).
Rule-based systems are sometimes characterized as "shallow" reasoning
systems in which the rules encode no causal knowledge. While this is largely
(but not entirely) true of MYCIN,it is not a necessary feature of rule-based
systems. An expert may elucidate the causal mechanisms underlying a set
of rules by "decompiling" the rules (see Section 29.3.2 for a discussion of
decompiling the knowledge on which the tetracycline rule is based). The
difficulties that one encounters with an expanded rule set are knowledge
engineering difficulties (construction and maintenance of the knowledge
base) and not primarily difficulties of representation or interpretation.
However, the causal knowledge thus encoded in an expanded rule set
would be usable only in the context of the inference chains in which it fits

3Thiswayof makingthe distinction waspointed out by John McCarthy


in a private com-
munication.
Experimental Results 673

and would not be as generally available to all parts of the reasoning system
as one might like. A circuit diagram and the theoretical knowledge under-
neath it, in contrast, can be used in manydifferent ways.
Winston (1977) summarized the main features of MYCINas follows:

1. MYCIN can help physicians diagnose infections.


2. MYCINis a backward-chaining deduction system.
3. MYCINcomputes certainty factors.
4. MYCIN talks with the consulting physician in English.
5. MYCINcan answer a variety of questions about its knowledge and be-
havior.
6. MYCIN can assimilate new knowledge interactively.

While this is a reasonable summary of what the program can do, it stops
short of analyzing how the main features of MYCINwork or why they do
not work better. The analysis presented here is an attempt to answer those
questions. Not all of the experiments have positive results. Someof the
most interesting results are negative, occasionally counter to our initial
beliefs. Some experiments were conceived but never carried out. For ex-
ample, although it was explicitly our initial intention to implementand test
MYCINon the hospital wards, this experiment was never undertaken.
Instead the infectious disease knowledge base was laid to rest in 19784
despite studies demonstrating its excellent decision-making performance.
This decision reflects the unanticipated lessons regarding clinical imple-
mentation (described in Part Eleven) that would not have been realized
without the earlier work.
Finally, a word about the organization of this section on results. We
have described the lessons mostly from the point of view of what we have
learned about building an intelligent program. Wewere looking for ways
to build a high-performance medical reasoning program, and we made
many choices in the design of MYCIN to achieve that goal. For the program
itself, we had to choose (1) a model of diagnostic reasoning, (2) a repre-
sentation of" knowledge, (3) a control structure for using that knowledge,
and (4) a model of how to tolerate and propagate uncertainty. Wealso had
to formulate (5) a methodology for building a knowledge base capable
making good judgments. Our working hypothesis, then, was that the
choices we made were sufficient to build a program whose performance
was demonstrably good. 5 If we had failed to demonstrate expert-level per-
formance, we would have had reason to believe that one or more of our
choices had been wrong. In addition, other aspects of the program were

lMuch of the MYCIN-inspired work reported in this volume was done after this date, how-
ever.
5Note that sufficiency is a weak claim. Wedo not claim that any choice we made is necessary~
nor do we claim that our choices cannot be improved.
674 MajorLessons fromThis Work

also tested: (6) explanation and tutoring, (7) the user interface, (8)
dation, (9) generality, and (10) project organization. The following
subsections review these ten aspects of the program and the environment
in which it was constructed.

36.2.1 The Problem-Solving Model

From the point of view of MYCINsreasoning, the program is best viewed


as an example of the evidence-gatheringparadigm. This can be seen as a form
of search, in which the generator is not constructing complex hypotheses
from primitive elements but is looking at items from a predefined list. For
diagnosis, MYCINhas the names of 120 organisms. (Twenty-five of the
possible causes are explicitly linked to evidence through rules, the rest can
be reasoned about through links in tables or links to prior cultures. Prop-
erties of all of them must be known,including their sensitivities to each of
the drugs.) Logically speaking, MYCINcould run down the list one at
time and test each hypothesis by asking what evidence there is ~br or
against it. This would not produce a pleasing consultation, but it would
provide the same diagnoses.
This sort of evidence gathering can be contrasted with heuristic search
in which a generator of hypotheses defines the search space, as in DEN-
DRAL.It also differs from generate-and-test programs in that hypotheses
are not considered (or tested) unless there is evidence pointing to them.
Solutions to problems posed to EMYCIN systems are interpretations
of the data. EMYCIN implicitly assumes that there is no unique solution
to a problem, but that the evidence will support several plausible conclu-
sions from a fixed list. (This is partly because of the uncertainty in both
the data and the rules.) The size of the solution space is thus N where N
is the number of single conclusions on the fixed list. In MYCIN there are
120 organism names on the list of possible identities. However,it is unlikely
that more than a half-dozen organism identities will have sufficient evi-
dence to warrant covering for them. If we assume that MYCIN will cover
for the top six candidate organisms in each case, the number of possible
combinations6 in a solution is more like

or about 109. Obviously, the method of evidence gathering does not gen-
erate all of them.

aThenumberof medicallymeaningfulconclusionsis actually muchfewerbecausecertain


combinations
are implausibleor nearly impossible.
Experimental Results 675

We have used EMYCINto build systems in a variety of domains of


medicine and engineering. An appropriate application of the evidence-
gathering model seems to meet most of the following criteria:

a classification problem in which data are explained or "covered" by


hypotheses from a predefined list;
a problem that is partly defined by explaining, once, a snapshot of data
(as opposed to continuous monitoring problems in which hypotheses are
revised frequently as more data are collected);
a problem of sufficient difficulty that practitioners often turn to text-
books or experts for advice;
a problem of sufficient difficulty that experts require time for reason-
ing--their solutions are not instantaneous (but neither do they take doz-
ens of hours);
a problem of narrow enough scope that a knowledge base can be built
and refined in a "reasonable" time (where the resources available and
the importance of the problem partly define reasonableness);
a problem that can be defined in a "closed world," i.e., with a vocabulary
that covers the problem description space but is still bounded and "rea-
sonably" small.

Additional characteristics of problems suitable for this kind of solution are


listed in Section 36.2.9 on the generality of the EMYCIN framework.

36.2.2 Representation

One of MYCINsmost encouraging lessons for designers of expert systems


is the extent to which good performance can be attained with the simple
syntax of fact triples and conditional rules. MYCINsrules are augmented
with a context tree around which the dialogue is organized, but other
EMYCIN systems (e.g., PUFF) use a degenerate tree of only one kind
object. Also, many rules were encoded in a "shorthand" form (as entries
in tables). CFs were added to the simple rule form in MYCIN,but again,
other EMYCIN systems (e.g., SACON)perform well with categorical rules
(all CFs = 1). For many problems, the simple syntax of fact triples and
conditional associations among facts is quite appropriate. In Chapter 3
(Section 3.2) we summarized many additional production system enhance-
ments that were developed for MYCIN.
On the other hand, our experience using EMYCINto build several
expert systems has suggested some negative aspects to using such a simple
representation for all the knowledge. The associations that are encoded in
rules are elemental and cannot be further examined (except through the
symbolic text stored in slots such as JUSTIFICATIONor AUTHOR).
reasoning program using only homogeneous rules with no internal dis-
tinctions amongthem thus fails to distinguish among:
676 MajorLessonsfromThis Work

Chance associations (e.g., proportionally more left-handed than right-


handed persons have been infected by E. coli at our institution)
Statistical correlations (e,g., meningococcalmeningitis outbreaks are corre-
lated with crowded living conditions)
Heuristics based on experience rather than precise statistical studies (e.g., oral
administration of drugs is less reliable in children than are injections)
Causal associations (e.g., streptomycin can cause deafness)
Definitions (e.g., all E. coli are gram-negativerods)
Knowledgeabout structure (e.g., the mouth is connected to the pharynx)
Taxonomicknowledge(e.g., viral meningitis is a kind of infection)

The success of MYCIN,which generally does not distinguish among


these types of associations, demonstrates that it is possible to build a high-
performance program within a sparse representation of homogeneous
rules (augmented with a few other knowledge structures). Nevertheless,
limited experience with CENTAUR, WHEEZE, NEOMYCIN, and ON-
COCINleads us to believe that the tasks of" building, maintaining, and
understanding the knowledge base will be easier if the types of knowledge
are separated. This becomes especially pertinent during knowledge acqui-
sition (as described in Part Three) and when teaching the knowledge base
to students (Part Eight).
Every formalism limits the kinds of things that can be expressed. From
the start we were trying to balance expressive power against simplicity and
modularity. As in DENDRAL, in MYCINwe departed from a "pure" pro-
duction rule representation by allowing complex predicates in the left-hand
sides of rules and complex actions in the right-hand sides. All of the in-
ferential knowledge was still kept in rules, however. Every rule was aug-
mented with additional information, using property lists. We used the
premise and action properties of rule names for inferential knowledge and
used the other properties for bookkeeping, literature references, and the
like. 7 Meta-rules can reference the values of any of these slots, to focus
attention within the backward-chaining flow of control, thereby making it
more sensitive to global context.
Many problems require richer distinctions or finer control than
MYCIN-likerules provide. A more general representation, such as frames,
allows a system designer to make the description of the world more com-
plex. In frames, for instance, it is easier to express the following:

7This is the majordistinction betweenour rules and frames.Inferenceaboutinheritanceof


values is not handledimplicitly in MYCIN, as it wouldbe in a frame-basedsystem,but is
explicitly dealt within the action parts of the rules (usingthe contexttree). However,
there
is considerablesimilarity in the augmented formof MYCINs rules and frames,and in their
expressivepower.Althoughframesare typically usedto represent single concepts,whereas
rules representinferential relationships, the structural similarities betweenthese encoding
techniques suggest that frame-basedand rule-based representations are not a strict
dichotomy.
Experimental
Results 677

Procedural knowledge--sequencing tasks


Control knowledge--when to invoke knowledge sources
Knowledge of context--the general context in which elements of the
knowledge base are relevant
Inheritance of properties--automatic transfer of values of some slots
from parent concepts to offspring
Distinctions among types of links--parent and offspring concepts may
be linked as
o class and instance
o whole and part
o set and subset

The loss of simplicity in the frame representation, however, may complicate


the inference, explanation, and knowledge acquisition routines. For ex-
ample, inheritance of properties will be handled (and explained) differ-
ently depending on the type of link between parent and offspring concepts.
There is a trade-off between simplicity and expressive power. A sim-
pler representation is easier to use but constrains the kinds of things a
system builder might want to say. There is also a trade-off between gen-
erality and the power of" knowledge acquisition tools. An unconstrained
representation may have the expressive power of a programming language
such as LISP or assembly language, but it can be more difficult to debug.
There is considerable overlap among the alternative representation meth-
ods, and current work in AI is still experimenting with different ways of
making this trade-off.

36.2.3 Control of Inferences

A strong result from the MYCINexperiment is that simple backward


chaining (goal-driven reasoning) is adequate for reasoning at the level
an expert. As with DENDRAL, it was somewhat surprising that high per-
formance could be achieved with a simple well-known method. The quality
of performance is the same as (and the line of reasoning logically equiva-
lent to) that of" data-driven or other control strategies. The main virtues
of a goal-driven control strategy are simplicity and ability to focus requests
for data. It is simple enough to be explained quickly to an expert writing
rules, so that he or she has a sense of how the rules will be used. And it
allows explanations of a line of reasoning that are generally easily under-
stood by persons requesting advice.
Internally, backward chaining is also simple. Rules are checked for
applicability (i.e., the LHSsare matched against the case data to see if the
RHSs should be executed) if and only if the RHSs are relevant to the
subgoal under consideration. Relevance is determined by an index created
678 MajorLessons fromThis Work

automatically at the time a rule is created, so rule invocation is highly


focused. For example, a new rule A --, B will be added to the UPDATEDBY
list associated with parameter B; then when subgoal B is under consider-
ation only the rules on this list are tried.
Wealso needed to focus the dialogue, and we did it by introducing
the context tree to guide the subgoal selection. 8 In addition, we needed to
overcome some of the sensitivity to the order of clauses in a rule dictating
the order in which subgoals were pursued and questions were asked. Thus
the preview mechanism (Chapter 3) was developed to check all clauses
a rule to see if any are knownto be false before chaining backward on the
first clause. Once the preview mechanism was implemented, we found we
could avoid the appearance of stupidity by introducing antecedent rules
in order to make definitional inferences immediately upon receiving some
data, for example:

SEX OF PT IS MALE --, PREGNANCYOF PT IS NO

Then, regardless of where a clause about pregnancy occurred in a rules


premise, the above antecedent relation would keep the backward-chaining
control structure from pursuing earlier clauses needlessly for male pa-
tients. Without the antecedent rule, however, nonpregnancy would not be
known for males until the pregnancy clause caused backward chaining and
the above relation (as a consequent rule) caused the system to check the
sex of the patient. Without the preview mechanism, earlier clauses would
have been pursued (and unnecessary lines of reasoning possibly generated)
before the relevance of the patients sex was discovered.
The main disadvantage of this control strategy is that users cannot
interrupt to steer the line of reasoning by volunteering new information.
A user can become frustrated, knowing that the systems present line of
reasoning will turn out to be fruitless as a result of data that are going to
be requested later. This human-engineering issue is discussed again in
Section 36.2.7.
We carried the idea of separating knowledge from inference proce-
dures a step further when we separated control strategies from the rule
invocation mechanism. One of the elegant points about this experiment is
the use of the same rule formalism to encode strategy rules as we use for
the medical rules, with attendant use of the same explanation procedures.
In Part Nine we discuss writing meta-rules for controlling inference using
the same rule formalism, interpreter, and explanation capabilities. There
is sufficient generality in this formalism to support meta-level reasoning,
as well as meta-meta-level reasoning and beyond. Weneeded to add some
new predicates to talk about rules and rule sets. And we needed one change
in the interpreter to check for higher-level rules before executing rules

XRecallthat the contexttree wasintroduced[br twoother reasonsas well: to allowMYCIN


to keeptrack of multipleinstances of the samekindof" object, andto allowthe programto
understandhierarchicalrelationshipsamong entities.
Experimental
Results 679

applicable to a subgoal. Wedid not experiment enough with meta-rules to


determine how much expressive power they offer. However, both CEN-
TAURand NEOMYCIN give some indication of the control and strategy
knowledge we need in medical domains, some of which appears difficult
to represent in meta-rules because we lack a rich vocabulary for talking
about sequences of tasks. Although meta-rules were designed to prune or
reorder the set of rules gathered up by the backward-chaining control
routine, their implementation is clean because they reference rules at the
next lower level by content and not by name; i.e., they do not require
specification of an explicit sequence of rules to be invoked in order (e.g.,
Rule 50 then Rule 71 then Rule 39).
Meta-rules allow separation of types of knowledge in ways that are
difficult to capture in medical rules alone. Somediagnostic strategies were
initially built into the inference procedure, such as exhaustive invocation
of rules--an inherently cautious strategy that is appropriate for this med-
ical context but not fi)r all. Sometimes, though, we wanted MYCIN to be
more sensitive to context; the age of the patient, for example, mayindicate
that some rules can be ignored. ~ Meta-rules work because they can examine
the contents of rules at the next lower level and reason about them. This
is part of the benefit of the flexibility provided by LISP and the simplicity
of the rule syntax.
Wehave little actual experience with meta-rules in MYCIN,however.
Because of the cautious strategy of invoking all relevant rules, we found
few opportunities for using them. The one or two meta-rules that made
good medical sense could be "compiled out" by moving their contents into
the rules themselves. For example, "do rules of type A before those of type
B" can be accomplished by manually ordering rules on the UPDATEDBY
list or manually ordering clauses in rules. The system overhead of deter-
mining whether there are any meta-rules to guide rule invocation is a high
price to pay if all of the rules will be invoked anyway. So, although their
potential power fk)r control was demonstrated, their actual utility is being
assessed in subsequent ongoing work such as NEOMYCIN (Clancey, 1983).

36.2.4 Inexact Inference

MYCIN is known partly for its model of inexact inference (the CF model),
a one-number calculus for propagating uncertainty through several levels
of inference flom data to hypotheses. MYCINsperformance shows that,
for some problems at least, degrees of evidential support can be captured
adequately in a single number,l and a one-number calculus can be devised

~This was not donewith meta-rules, however,becauseit could easily be handledby the
previewmechanism and judicious use of screeningclauses.
IAlthoughthe CFmodelwasoriginally basedon separate conceptsof belief and disbelief
(as definedfor MBandMD in Chapteri 1), recall that eventhen the net belief is reflected
in a single numberand onlyone number is associatedwitheach inferential rule.
68O MajorLessons fromThis Work

to propagate uncertainty. The one number we actually use is a combination


of disparate factors, most importantly strength of inference and utility
considerations. Theoretically, it would have made good sense to keep those
separate. Heuristically and pragmatically, we were unable to acquire as
many separate numbers as we would have needed for Bayesian probability
calculations followed by calculations of expected values (utilities) associated
with actions and outcomes.
The CF in a rule measures the increased strength of the conclusion. In
effect, we asked the medical experts "How much more strongly do you
believe the conclusion h after you know the premises e are true than you
did before?" If we were dealing strictly with probabilities, which we are
not, then the CF for positive evidential support would be a one-number
approximation to

P(hle ) - P(h)
1 - P(h)

The one-number calculus achieves the goals we sought, although with-


out the precision that many persons desire. The combining of uncertainty
depends on relatively small numbers of rules being applicable at any point.
Otherwise, many small pieces of evidence ultimately boost the support of
every hypothesis to 0.99 and we lose distinctions amongstrengths of sup-
port for hypotheses. The effect of the propagation is a modestly accurate
clustering of hypotheses by gross measures of evidential strength (HIGH,
MEDIUM,LOW,NONE). But within a cluster the ranking of hypotheses
is too dependent on the subjectiveness of the CFs, as well as on the cer-
tainty propagation scheme, to be taken precisely.
The focus of a decision-making aid, however, needs to be on recom-
mendations for action. Thus it needs costs and benefits, as well as proba-
bilities, associated with various outcomes. When MYCINrecommends
treating for Streptococcus, for example, it has combined the likelihood of
strep with the risk of failing to treat for it. For this reason we now realize
it is perhaps more appropriate to think of CFs as measures of importance
rather than of probability or strength of belief. That is, they measure the
increased importance of acting on the conclusion of a rule in light of" new
evidence mentioned in the premise. For example, self-referencing rules
mention the same parameter in both premise and action parts:

A&B&C--,A

Such a rule is saying, in effect, that if you already have reason to be!ieve
A, and if B and C are likely in this case, then increase the importance of
A. In principle, we could have separated probabilities from utilities. In
practice, that would have required more precision than infectious disease
experts were willing or able to supply.
Experimental
Results 681

The discontinuity around the 0.2 threshold is not a necessary part of


the CF model. It was added to the implementation to keep the backward-
chaining control structure from expending effort for very small gain. In
a data-driven system the data would all be gathered initially, and the in-
ferences, however weak, could be propagated exhaustively. In a goal-driven
system, however, the 0.2 threshold is a heuristic that precludes unnecessary
questions. In the rule

A&B&C~D

if any clause is not "true enough," the subsequent clauses will not be pur-
sued. If clause A, after tracing, has not accumulated evidence over the 0.2
threshold then the system will not bother to ask about clauses B and C. In
brief, the threshold was invented for purposes of human engineering since
it shortens a consultation and reduces the number of questions asked of
the user.
This value of the threshold is arbitrary, of course. It should simply be
high enough to prevent the system from wasting its time in an effort to
use very small pieces of evidence. With a sick patient, there is a little evi-
dence for almost every disease, so the threshold also helps to avoid covering
for almost every possible problem. The threshold has to be low enough,
on the other hand, to be sure that important conclusions are considered.
Once the 0.2 threshold was chosen, CFs on rules were sometimes set with
it in mind. For example, two rules concluding Streptococcus, each at the
CF=0.1 level, would not be sufficient alone to include Streptococcus in the
list of possible causes to consider further.ll
Because we are not dealing with probabilities, or even with "pure"
strength of inference alone, our attempt to give a theoretical justification
for CFs was flawed. Webased it on probability theory and tried to show
that CFs could be related to probabilities in a formal sense. Our desiderata
for the CF combining function were based on intuitions involving confir-
mation, not just probabilities, so it is not surprising, in retrospect, that the
justification in terms of formal probability theory is not convincing (see
Chapter 12). So the CF model must be viewed as a set of heuristics for
combining uncertainty and utility, and not as a calculus for confirmation
theory. As we noted in Chapter 13, the Dempster-Shafer theory of evi-
dence offers several potential advantages over CFs. However, simplifying
assumptions and approximations will be necessary to make it a computa-
tionally tractable approach.
In a deductive system the addition of new facts, as axioms, does not
change the validity of theorems already proved. In many interesting prob-
lena areas, such as medical diagnosis, however, new knowledge can invali-
date old conclusions. This is called nonmonotonic reasoning (McDermott

J ISeethe exchangeof messages


at the endof Chapter10for a discussionof howthis situation
arose in the development
of the meningitisknowledge base.
682 MajorLessonsfromThis Work

and Doyle, 1980) because new inferences are not always adding new con-
clusions monotonically to the accumulating knowledge about a problem.
In MYCIN,early conclusions are revised as new data are acquired--for
example, what looked like an infection of" one type on partial evidence
looks like another infection after more evidence is accumulated. The prob-
lems of" nonmonotonicity are mostly avoided, though, because MYCIN
gathers evidence for and against many conclusions, using CFs to adjust
the strength of evidence of each, and only decides at the end which con-
clusions to retain. As pointed out in Section 29.4.3, self-referencing rules
can change conclusions after all the evidence has been gathered and thus
may be considered a form of nonmonotonic reasoning.

Quantification of "Soft" Knowledge

Weknow that the medical knowledge in MYCINis not precise, complete,


or well codified. Although some of it certainly is mathematical in nature,
it is mostly "soft" in the sense that it is judgmental and empirical, and there
are strong disagreements among experts about the formulation of what is
known. Nevertheless, we needed a way of representing the strength of
associations in rules and of calculating the strength with which numerous
pieces of evidence support a conclusion. Wefirst looked for a calculus of
imprecise concepts that did not involve combining numbers. For example,
a few pieces of weakly suggestive evidence would combine into moderately
suggestive evidence, and many pieces would be strongly suggestive. But
how many? And how do the different qualitative degrees combine? Wedid
not like the idea of discrete categories of strength since it introduces dis-
continuities in the combinations. So we looked for a continuous function
that was not overly sensitive to small changes in degrees.
In working with CFs, we found that quantifying soft knowledge does
not require fine levels of precision (Chapter 10). That is why this calculus
can be used in a practical domain. With several rules providing evidence
for a conclusion, the CFs could be written rather roughly and still give the
desired effect. Welater showed that, fbr the MYCIN domain, experts did
not have to use more than four or five degrees of evidential strength, even
though we provided a continuous scale from 0 to 1.
Wediscovered two styles of rule composition. The first follows our
initial belief that rules can be written independently of one another. The
CFs are set by experts based on their accumulated experience of how much
more likely or important the conclusion is after the premises are known
than it is before they are known. This assumes that CFs do not need to be
precisely set because (a) the knowledgeitself is not precise and (b) about
as manyrules will have CFs that are "too high" as will have ones that are
"too low" (in some undefinable, absolute sense). The second style of setting
CFs is more tightly controlled. Each new empirical association of evidence
Experimental
Results 683

Data:
Erroneous
lnconlplete
Rules:
Erroneous(or only partly correct)
Incomplete
Conceptual framework (domain-dependentand domain-independentparts):
Incorrect vocabularyof attributes, predicates, and relations
Incorrect inference structure
Incomplete set of concepts
Incompletelogical structure

FIGURE
36-1 Sources of uncertainty in rule-based systems.

with a conclusitm, in this view, requires examining rules with similar evi-
dence or similar conclusions to see how strong the association should be,
relative to the others. For example, to set the CF on a new rule, A --, Z,
one would look at other rules such as:

X --, Z (CF = 0.2)


Y~Z(CF = 0.8)

Then, if" evidence A is about as strong as Y (0.8) and much stronger than
X (0.2), the new CF should be set around the 0.8 level. The exchange
messages at the end of Chapter 10 reflects the controversy that arose in
our group over these two styles of CF assignment.
In both cases, the sensitivity analysis mentioned in Chapter 10 con-
vinced us that the rules we were putting into MYCIN were not dependent
on precise values of CFs. That realization helped persons writing rules to
see that they could be indifferent to the distinction between 0.7 and 0.8,
for example, and the system would not break down.

Corrections for Uncertainty

There are many "soft" or ill-structured domains, including medical diag-


nosis, 12 in which formal algorithmic methods do not exist (Pople, 1982).
In diagnostic tasks there are several sources of uncertainty besides the
heuristic rules themselves. These are summarized in Figure 36-1.

WThere are so-calledclinical algorithmsin medicine,but they do not carry the guaranteesof
cnrrecmessthat characterize mathe,naticalor computationalalgorithms.Theyare decision
flow charts in whichheuristics havebeenbuilt into a branchinglogic so that paramedical
personnelcan use themto provide goodcare in manycommonly occurringsituations.
684 MajorLessonsfromThis Work

In an empirical domain, the measurements, observations, and terms


used to describe data may be erroneous. Instruments sometimes need re-
calibrating, or electronic noise in the line can produce spurious readings.
Sometests are notoriously unreliable. Similarly, observers sometimes make
mistakes in noticing or recording data. Amongthese mistakes is the failure
to describe correctly what one sees. This ranges from checking the wrong
box to choosing words poorly. The data are often incomplete as well. Tests
with the most diagnostic value and least cost or inconvenience are done
first, as a matter of general strategy. At any time, there are always more
tests to be done (if" only to redo an old one) and always new observations
to be made (if only to observe the same variables for a few more hours).
But some action must eventually be taken on the best available data, even
in the absence of complete information.
With the rules, too, it is impossible to guarantee correctness and com-
pleteness (Chapter 8). This is not the fault of the expert supplying the
rules; it is inevitable in problem areas in which the knowledgeis soft.
Finally, the whole conceptual framework may be missing some critical
concepts and may contain constructs that are at the wrong level of detail.
Domain-independent parts of the framework that may introduce errors
into the problem-solving process include the inference structure and the
calculus for combining inexact inferences. The domain-dependent aspects
of the problem-solving framework include the vocabulary and the concep-
tual hierarchies used to relate terms. Somequestions of chemistry, for
example, require descriptions of" molecules in terms of electron densities
and cannot be answered with a "ball and stick" vocabulary of molecular
structure. Similarly, expert performance in medical domains will some-
times require knowledge of" causality or pathophysiologic mechanism,
which is not well represented in MYCIN-likerules (see Chapter 29).
The best answer we have found for dealing with uncertainty is redun-
dancy. By that we mean using multiple, overlapping sources of knowledge
to reach conclusions, and using the overlaps as checks and balances on the
correctness of the contributions made by different knowledge sources. In
MYCIN we try to exploit the overlaps in the information contributed by
laboratory and clinical data, just as physicians must. For example, a high
fever and a high white cell count both provide information about the se-
verity of an infection. On the assumption that the correct data will point
more coherently to the correct conclusions than incorrect data will, we
expect the erroneous data to have very little effect after all the evidence
has been gathered. The absence of a few data points will also have little
overall effect if other, overlapping evidence has been found. Overlapping
inference paths, or redundancy in the rules, also helps correct problems
of" a few incorrect or missing inferences. With several lines of reasoning
leading from data to conclusions, a few can be wrong (and a few can be
missing), and the system still ends up with correct conclusions.
Experimental
Results 685

Werecognize that introducing redundant data and inference rules is


at odds with the independence assumptions of the CF model. Wedid not
want the system to fail for want of one or two items of information. When
we encounter cases with missing evidence, a redundant reasoning path
ensures the robustness of the system. In cases where the overlapping pieces
of evidence are all present, however, nothing inside the system prevents it
from using the dependent information multiple times. We thus have to
correct fbr this in the rule set itself. The dependencies maybe syntactic--
for example, use of the same concept in several rules--in which case an
intelligent editor can help detect them. Or they may be semantic--for
example, use of causally related concepts--in which case physicians writing
or reviewing the rules have to catch them.
In the absence of prior knowledge about which data will be available
fbr all cases, we felt we could not insist on a vocabulary of independent
concepts for use in MYCINsrules. Therefore, we had to deal with the
pragmatic difficulty of sometimes having too little information and some-
times having overlapping intormation. Our solution is also pragmatic, and
not entirely satisfactory: (a) check for subsumed and overlapping rules
during knowledgeentry so that they can be separated explicitly; (b) cluster
dependent pieces of evidence in single rules as muchas possible; (c) orga-
nize rules hierarchically so that general information will provide small evi-
dence and more specific information will provide additional confirmation,
taking notice of the dependencies involved in using both general and spe-
cific evidence; (d) set the CFs on dependent rules (including rules in
hierarchy) to take account of the possibilities of reasoning with redundant
paths if all data are included and reasoning with a unique path if most data
are missing.
The problems of an incomplete or inappropriate conceptual scheme
are harder to fix. In some cases where we have tried, the EMYCIN frame-
work has appeared to be inappropriate, e.g., a constraint satisfaction prob-
lem (MYCINstherapy algorithm) and problems involving tight procedural
control (VMand ONCOCIN).In these instances, we have abandoned this
approach to the problem because substantial changes to the conceptual
scheme would have required rethinking the definitions of all parts of EMY-
CIN. The domain-dependent parts are under the control of the experts,
though, and can be varied more easily. Not surprisingly, experts with whom
we have collaborated seem to prefer working largely within one frame-
work. In MYCIN, tor example, there was not a lot of mixing of, say, clinical
concepts (such as temperature) and theoretical concepts (such as the effect
of fever on cellular metabolism). If the conceptual scheme is inappropriate
for the problem, then there is no hope at present for incorporating a
smooth correction mechanism. Weare always tempted to add more param-
eters and rules before making radical changes in the whole conceptual
framework and approach to the problem, so we will be slow to discover
corrections for fundamental limitations.
686 MajorLessonsfromThis Work

36.2.5 Knowledge Base Construction and


Maintenance

One of the major lessons of this and other work on expert systems is that
large knowledge bases must be built incrementally. In many domains, such
as medicine, the knowledgeis not well codified, so it is to be expected that
the first attempts to build a knowledgebase will result in approximations.
As noted earlier, incremental improvements require flexible knowledge
structures that allow easy extensions. This means not only that the syntax
should be relatively simple but that the system should allow room for
growth. Rapid feedback on the consequences of changes also facilitates
improvements. A knowledge base that requires extra compilation steps
before it can be tried (especially long ones) cannot grow easily or rapidly.
Knowledgeacquisition is now seen as the critical bottleneck in building
expert systems. Wecame to understand through this work that the knowl-
edge-engineering process can be seen as a composite of three stages:

1. knowledge base conceptualization (problem definition and choice of


conceptual framework);
2. knowledge base construction (within the conceptual framework); and
3. knowledge base refinement (in response to early performance).

In each stage, the limiting factors are (a) the expressive power of the rep-
resentation, (b) the extent to which knowledge of the domain is already
well structured, (c) the ability of the expert to formulate new knowledge
based on past experience, (d) the power of the editing and debugging tools
available, and (e) the ability of the knowledge engineer to understand the
basic structure and vocabulary of the domain and to use the available tools
to encode knowledge and modify the framework.
Our experiments focus largely on the refinement stage.l~ Within this
stage, the model that we have found most useful is that of" debugging in
context; an expert can more easily critique a knowledge base and suggest
changes to it in the context of specific cases than in the abstract. Initial
formulations of rules are often too general since the conceptualization
stage appropriately demands generality. Such overgeneralizations can
often best be found and fixed empirically, i.e., by running cases and ex-
amining the programs conclusions.
One important limitation of our model is its failure to address the
problem of integrating knowledge from different experts. For some ex-
tensions to the knowledgebase there is little difference between refinement
by one expert or many. For extensions in which different experts use dif-
ferent concepts (not just synonyms for the same concept), we have no tools

laSomework in progress on the ROGET program(Bennett, 1983) attempts to build


intelligent, interactive tool to aid in conceptualizationandconstructionof EMYC1N
systems
in newdomains.
Experimental Results 687

for reaching a consensus.14 As suggested in Part Three, the best solution


we found for this problem was designating a knowledge base "czar" who
was responsible for maintaining coherence and consistency of the knowl-
edge base. The process is facilitated, however, by techniques for comparing
new rules with previously acquired knowledge and for performing high-
level analyses of large portions of the knowledge base (Chapter 8).
found that this static analysis was insufficient, at least in domainsin which
nonformal, heuristic reasoning is essential. The best test of strength of a
knowledge base appears to be empirical. Nevertheless, a logical analysis
can provide important cues to persons debugging or extending a knowl-
edge base, for example, in indicating gaps in logical chains of rules.
There are other models tor transferring expertise to a program be-
sides knowledge engineering. The war horse of AI is programming each
new performance wogram using LISP (or another favorite language). This
is euphemistically called "custom crafting" or, more recently, "procedural
embedding of knowledge." In general, it is slower and the result is usually
less flexible than with knowledge engineering, as we learned from DEN-
DRAL.
Another model is based on a direct dialogue between expert and pro-
gram. This would, if" successful, eliminate the need for a knowledge en-
gineer to translate and transform an experts knowledge. Our attempts to
reduce our dependence on knowledge engineers, however, have been
largely unsuccessful. Someof the tools built to aid the maintenance of a
knowledge base (e.g., the ARLeditor; see Chapter 15) have been used
both experts and knowledge engineers. TEIRESIAS(Chapter 9) provides
a model by which experts can refine a knowledge base without assistance
from a knowledge engineer. For very simple domains such tools can prob-
ably suftice for use by experts with little training. As the complexity of a
domain grows, however, the amount of time experts can spend seems to
shrink. So far, the only way we have found around this dilemma is for
knowledge engineers to act as "transducers" to help transform experts
knowledge into usable form.
Other models of knowledge acquisition that we considered leave the
expert as well as the knowledge engineer out of the transfer process. Two
such models are reading and induction. In the reading model, a program
scans the literature looking for facts and rules that ought to be included
in the knowledge base. Wehad considered using the parser described in
Chapter 33 to read simplified transcriptions of journal articles. But the
difficulties described in that chapter led us to believe that there was as
muchintellectual effort in transcribing articles for such purposes as in
15
fi)rmulating rules directly.

HWedo reco,d the author of each rule with date, justification, and literature citations, but
these are ,rot used by the program except as text strings to be printed.
~SMore recent work by others at Stanford explores the use of knowledge-based techniques
lot infierring new medical knowledge from a large data base of patient information (Blum,
1982).
688 MajorLessonsfromThis Work

Wedid not have the resources to experiment with induction in the


MYCINdomain. Wekept statistics on rule invocations and found them to
be somewhat useful in revealing patterns to the knowledge engineers. For
example, rules that are never invoked over a set of test cases maybe either
covering rare circumstances--in which case they are left unchanged---or
failing to match because of errors in the left-hand sides--in which case
they are modified. Learning new rules by induction is a difficult task when
the performance program chains several rules together to link data to
conclusions. In these cases, the so-called credit assignment problem--spe-
cifically, the problem of deciding which rules are at fault in case of poor
performance--demands considerable expertise. In TEIRESIAS, credit as-
signment was largely turned over to the expert for this reason.
Since knowledge engineering was our primary mode of knowledge
acquisition, we found that some interactive tools for building, editing, and
checking the knowledge base gave needed assistance to the system builders.
This is sometimes referred to as knowledge programming--the construction
of complex programs by adding declarative statements of knowledge to an
inference framework. The emphasis is on transferring the domain-specific
knowledge into a framework and not on building up the framework in the
first place from LISP programming constructs. At worst, this is accom-
plished by an expert using an on-line text editor. This is primitive, but if
the expert is comfortable with the syntax and the problem-solving frame-
work, a complex system can still be built more quickly than it could if the
expert were forced to write new code, keeping track of array indices and
go-to loops. There are many higher levels of assistance possible. Consid-
erable error checking can be done on the syntax, and even more help can
be provided by an intelligent assistant that understands some of the seman-
tics of the domain. Knowledgeprogramming, with any level of assistance,
is one of the powerful ideas to come out of AI work in the 1970s.

36.2.6 Explanation and Tutoring

Whenwe began this work, there had been little attempt in AI to provide
justifications of a programs conclusions because programs were mostly
used only by their designers. PARRY (Colby, 1981) had a selective trace
that allowed designers to debug the system and casual users to understand
its behavior. DENDRALs Predictor also had a selective trace that could
explain the origins of predicted data points, but it was used only for de-
bugging. As part of our goal of making MYCIN acceptable to physicians,
we tried from the start to provide windowsinto the contents of the knowl-
edge base and into the line of reasoning. Our working assumption was
that physicians would not ask a computer program for advice if they had
to treat the program as an unexaminable source of expertise. They nor-
mally ask questions of, or consult, other physicians partly for education to
help with future cases and partly for clarification and understanding of
Experimental Results 689

the present case. Webelieve that initial acceptance of an advice-giving


system depends on users being able to understand why it provides the
advice that it does (Chapter 34). Moreover, physicians are sensitive to well-
established legal guidelines that argue against prescribing drugs without
understanding why (or whether) they are appropriate.

The Model

The model of explanation in MYCINis to "unwind the goal stack" in


response to a WHYquestion. That is, when a user wants to know why an
item of information is needed, MYCINsanswer is to show the rule(s) that
caused this item to be requested. Answers to successive WHYquestions
show successively higher rules in the stack. For example, in the reasoning
chain

MYCINchains backward from goal E to the primary element A. A user


who wants to know why A is requested will see the rule A -~ B. A second
WHYquestion (i.e., "WHYdo you want to know B?") will cause MYCIN
to show the rule B ~ C, and so on. Keeping a simple history list of rule
invocations is adequate for producing reasonable explanations of the
programs line of reasoning, in part because reasoning is explicitly goal-
directed. The goals and subgoals provide an overall rationale for the in-
vocation of rules. The history list captures the context in which informa-
tion is sought as well as the purpose for which it is sought.
But questions asking why MYCIN requests a particular piece of infor-
mation provide only a small window on the reasoning process. The com-
plementary HOWquestions extend the view somewhat by allowing a user
to ask how a fact has already been established or will later be pursued. The
same history list provides the means for answering HOWquestions during
a consultation. For example, a user may be told that item A2 is needed
because B is the current goal and there is a rule of the form

A1 8c A2 8c A3 ~ B

where A1 is already known (or believed) to be true. Then the user may ask
how A1 is knownand will then see the rules that concluded it (or be told
that it is primary information entered at the terminal if no rules were used).
Similarly, the user may ask how A3 will be pursued if the condition re-
garding A2 is satisfied.
Explanations can be much richer. For example, they can provide in-
sights into the structure of the domain or the strategy behind the line of
reasoning. All of these extensions require more sophistication than is em-
bodied in looking up and downa history list. This is a minimal explanation
690 MajorLessons fromThis Work

system. It provides reasons that are only as understandable as the rules


are, and some can be rather opaque. Looking up or down the goal stack
is not always appropriate, but this is all MYCIN can do. Sometimes, for
instance, a user would like a justification for a rule in terms of the under-
lying theory but cannot get it. Moreover, MYCIN has no model of the user
and thus cannot distinguish, say, a students question from a physicians.
These issues were discussed at length in Chapters 20 and 29.
At the end of" a consultation, a user may ask questions about MYCINs
conclusions (final or intercaediate) and will receive answers muchlike those
given during the consultation. General questions about the knowledge base
may also be asked. In order to getMYCINto answer WHYNOTquestions
about hypotheses that were rejected or never considered, more reasoning
apparatus was needed. Since there is no history of rules that were not tried,
MYCIN needs to read the rules to see which ones might have been relevant
and then to determine why they were not invoked.

Tutoring

Wehad initially assumed that physicians and students would learn about
infectious disease diagnosis and therapy by running MYCIN,especially if
they asked why and how. This mode of teaching was too passive, however,
to be efficient as a tutorial system, so we began to investigate a more active
tutor, GUIDON.The program has two parts: (a) the knowledge base used
by MYCIN,and (b) a set of domain-independent tutorial rules and pro-
cedures.
Weoriginally assumed that a knowledgebase that is sufficient for high-
performance problem solving would also be sufficient for tutoring. This
assumption turned out to be false, and this negative result spawned revi-
sions in our thinking about the underlying representation of MYCINs
knowledge. Weconcluded that, for purposes of teaching, and for expla-
nation to novices, the facts and relations known to MYCINare not well
enough grounded in a coherent model of medicine (Chapter 29). MYCINs
knowledge is, in a sense, compiled knowledge. It performs well but is not
very comprehensible to students without the concepts that have been left
out. For example, a MYCINrule such as

may be a compilation of several associations and definitions:

A--+A
1
A1 --+A2
A,2--* B
Experimental Results 691

If Al and A2 are not observable phenomena or quantities routinely mea-


sured, the only association that matters for clinical practice is A ~ B. A
student would gain some benefit from remembering MYCINs compiled
knowledge, but the absence of an underlying model makes it difficult to
remember a scattered collection of rules. Additional knowledge of the
structure of the domain, and of problem-solving strategies, provides the
"glue" by which the rules are made coherent. Recent work at M.I.T. by
Swartout (1983) and Patil et al. (1981) has further emphasized this point.
Wealso believe that an intelligent tutoring program can be devised
such that medical knowledge and pedagogical knowledge are explicitly
separated. The art of" pedagogy, however, is also poorly codified and evokes
at least as much controversy as the art of medicine. GUIDON has directed
meaningful dialogues with both the MYCINand SACONknowledge bases,
st) its pedagogical knowledge(tutoring rules; see Chapter 26) is not specific
to medical education. Someof the knowledge about teaching is procedural
because the sequence of actions is often important. Thus the pedagogical
knowledge is a mixture of rules and stylized procedures.

36.2.7 The User Interface

Consuhation Model

We chose to build MYCINon the model of a physician-consultant who


gives advice to other physicians having questions about patient care. Was
it a good choice?
Here the answer is ambiguous. From an AI point of view, the consul-
tation model is a good paradigm for an interactive decision-making tool
because it is so clear and simple. The program controls the dialogue, much
as a humanconsultant does, by asking for specific items of data about the
problem at hand. Thus the program can understand short English re-
sponses to its questions because it knows what answers are reasonable at
each point in the dialogue. Moreover, it can ask for as much--and only as
much--information as is relevant. Also, the knowledge base can be highly
specialized because the context of the consultation can be carefully con-
trolled.
A disadvantage of the consultation model as implemented in MYCIN,
16
however, is that it prevents a user from volunteering pertinent data.
Ahhough the approach avoids the need for MYCINto understand free-
text data entry, physicians can find it irritating if they are unable to offer
key pieces of information and must wait for the program to ask the right
question. 17 In addition, MYCIN asks a lot of questions (around 50 or 60,

l~i()ur one attempt to permit w)hmteered information (Chapter 33) was of limited success,
largely because of the complexity of getting a computer to understand free text.
17The ability to accept w)hmteered i,lformation is a major feature of the PROSPECTOR
model of interaction emhodied in KAS(Reboh, 1981).
692 Major Lessons from This Work

usually), and the number increases as the knowledge base grows. Few phy-
sicians want to type answers to that many questions--in fact, few of them
want to type anything. With current technology, then, the consultation
model increases the cost of getting advice beyond acceptable limits. Clini-
cians would rather phone a specialist and discuss a case verbally. Moreover,
the consultation model sets up the program as an "expert" and leaves the
users in the undesirable position of asking a machine for help. In some
professions this maybe acceptable, but in medicine it is difficult to sell.
One way to avoid the need for typing so many answers is to tap into
on-line patient data bases. Manyof MYCINsquestions, for example, could
be answered by looking in automated laboratory records or (as PUFFnow
does) could be gathered directly from medical instruments (Aikins et al.,
1983). Another way is to wait for advanced speech understanding and
graphical input.
The consultation model assumes a cooperative and knowledgeable
user. Weattempted to make the system so robust that a user cannot cause
an unrecoverable error by mistake. But the designers of any knowledge
base still have to anticipate synonymsand strange paths through the rules
because we know of no safeguards against malice or ignorance. Some med-
ically impossible values are still not caught by MYCIN. l~ If users are co-
operative enough to be careful about the medical correctness of what they
type, MYCINsimplementation of the consultation model is robust enough
to be helpful.

Other Models of Interaction

DENDRAL does not engage a user in a problem-solving dialogue as MY-


CINdoes. Instead, it accepts a set of constraints (interactively defined) that
specify the problem, then it produces a set of solutions. This might be
called the "hired gun" model of interaction: specify the target, accept the
results, and dont ask questions.
Recently we have experimented with a critiquing model for the ON-
COCINprogram, an attempt to respond to some of the limitations of the
traditional consultation approach. In the critiquing model, a user states his
or her own management plan, or diagnosis, and the program interrupts
only if the plan is judged to be significantly inferior to what the program
would have recommended (Langlotz and Shortliffe, 1983).
The monitoring model of the VMprogram (Chapter 22) follows much
the same interactive strategy as that of ONCOCIN--offering advice only
when there is a need. In addition, it periodically updates and prints a
summaryand interpretation of the patients condition.

18For example, .John McCarthy (maliciously) told MYCIN that the site of a culture was am-
niotic fluid--for a male patient--and MYCINincorrectly accepted it (McCarthy, 1983).
Nonmedica[ users (including one of" the authors) have found similar "tar-out bugs" as
consequence of sheer ignorance of medicine.
Experimental
Results 693

English Understanding

Weattempted to design a satisfactory I/O package without programming


extensive capabilities for understanding English. One of the pleasant sur-
prises was the extent to which relatively simple sentence parsing and gen-
erating techniques can be used. In ELIZA, Weizenbaum (1967) showed
that a disarmingly natural conversation can be produced by a program
with no knowledge of the subject matter. Wewanted to avoid the extensive
effort of" designing a program for understanding even a subset of unre-
stricted English. Thus we used roughly the same techniques used in ELIZA
and in PARRY(Colby, 1981). Our main concern at the beginning was that
the subset of" English used by physicians was too broad and varied to be
handled by simple techniques. This concern was unfounded. Subsequently,
we have come to believe that the more technical the domain, the more
stylized the communication. Then keyword and phrase matching are suf-
ficient fi)r understanding responses to questions and for parsing questions
asked by users. As long as the program is in control of the dialogue, there
is little problem with ambiguity because the types of responses a user can
give are determined by the programs questions. Even in a mode in which
a user asks questions about any relevant topic (Chapter 18), simple parsing
techniques are usually adequate because (a) the range of relevance is rather
restricted and (b) terms with ambiguity within this range are few in number
and are disambiguated by other terms with unique meanings that serve to
fix the context.
Wedid find, however, that our simple parser was not sufficient for
understanding many facts presented at once in a textual description of a
patient (Chapter 33). The facts picked out of the text were largely correct,
but we missed many. Wecould successfully restrict the syntax of questions
a person can ask without overly restricting the nature of the questions. But
we found no general forms for facts that gave us assurance that the pro-
gram could understand the wide variety of verbs used in case descriptions.
There are several shortcomings in MYCINSinterface that could an-
tagonize physicians. 19 First, it requires that a user type. There is a tantalizing
possibility of speech-understanding interfaces that accept sentences in
large vocabularies from multiple speakers. But these are not here yet, and
certainly were only glimmers on the horizon in 1975. Second, MYCIN
requires users to provide information that they know is stored on other
computers in the same building. Wewere prepared to string cables among
the computers, but the effort and expense were not justified as long as
MYCINwas only a research program. Third, as we have noted, MYCIN
does not accept volunteered information. Although we experimented with

19Thelessonslearnedregardingthe limitations of MYCINs interface havegreatly influenced


the design of our recent ONCOCIN system(Chapters32 and 35). That systemsdomainwas
selected largely becauseit providesa natural mechanismfor allowingthe physicianto vol-
unteer patient information(i.e., the flowsheet), and becausedata can be enteredusing
special keypadrather than the full terminalkeyboard.
694 MajorLessonsfromThis Work

programs to permit this kind of interaction (Chapter 33), the theoretical


issues involved prevented robust performance and discouraged us from
incorporating the facility on a routine basis. Besides, eventually MYCIN
asks all questions that it considers relevant, so, in a logical sense, volun-
teered information is unnecessary. From the users point of view, however,
MYCIN is often too fully in control of the dialogue. Users would like to
be able to steer the line of reasoning and get the program to focus on a
few salient facts at the beginning. Fourth, as mentioned above, we believe
it is important to provide a window into the line of reasoning and the
knowledge base. The windowthat we provide is narrow, however, and lacks
the flexibility and clarity that would let a physician see quickly whyMYCIN
reasons as it does. Part of the difficulty is that the rules provided as expla-
nations often mix strategy and tactics and thus are difficult to understand
in isolation. Our more recent work on explanation has begun to look at
issues such as these (Chapter 20).

36.2.8 Validation

There are many dimensions to the question "How good is MYCIN?"We


have looked in detail at two: (a) Howgood is MYCINsperformance? and
(b) What features would make such systems acceptable to physicians?

Decision-Making Performance

Weexperimented with three evaluations of MYCIN,each refined in light


of our experience with the previous one, and believe that something much
like Turings test can demonstrate the level of performance of an expert
system. In the third evaluation, we asked outside experts to rate the con-
clusions reached by MYCIN,several Stanford faculty, house staff, and stu-
dents-on the same set of randomly selected, hard cases. Then, as in
Turings test (Turing, 1950), we looked at the statistics of howthe outside
experts rated MYCINsperformance relative to that of the Stanford faculty
and the others. The conclusion from these studies is that MYCINrecom-
mends therapeutic actions that are as appropriate as those of experts on
Stanfords infectious disease faculty--as judged by experts not at Stanford.
(More precisely, the outside experts disagreed with MYCINsrecommen-
dation no more often than they disagreed with the recommendations of
the Stanford experts.)
Although they are reasonably conclusive, studies such as this are ex-
pensive. Considerable research time was consumed in the design and ex-
ecution of the MYCINstudies, and we required substantial contributed
time from Stanford faculty, house staff, and students and from outside
experts. Moreover, we learned from the earlier studies that we needed to
separate the quality of advice from other factors affecting the utility and
Experimental
Results 695

acceptance of the program. Thus the final study provides no information


about whether the system would be used in practice, what the cost-benefit
trade-offs would be, etc. However, we believe that high performance is a
sine qua non for an expert system and thus deserves separate evaluation
early in a programs evolution (see Chapter 8 of Hayes-Roth et al., 1983).

Acceptability

Unfortunately, we still have not fully defined the circumstances under


which physicians will use a computer for help with clinical decision making.
Only in the recent ONCOCINwork (Chapters 32 and 35) have we shown
that physicians can be motivated to use decision aids in carefully selected
and refined environments. In the original MYCINprogram we had hoped
to provide intelligent assistance to clinicians and to be able to demonstrate
that the use of a computer reduced the number (and severity of conse-
quences) of inappropriate prescriptions for antibiotics. Physicians in
teaching hospital, however, may not need assistance with this problem to
the same extent as others--or, even if they do, they do not want it. So we
found ourselves designing a program largely for physicians not affiliated
with universities, with whomwe did not interact daily.
In a survey of physicians opinions (Chapter 34), we confirmed our
impression that explanations are necessary for acceptance. If an assistant
is unable to explain its line of reasoning, it will not gain the initial confi-
dence of the clinicians who have to take responsibility for acting on its
therapy recommendations. There is an element of legal liability here and
an element of professional pride. A physician must understand the alter-
native possible causes of a problem and the alternative treatments, or else
he or she maybe legally negligent. Also, professionals will generally believe
they are right until given reason to think otherwise. Wealso found that
high performance alone was not sufficient reason for a practicing physician
(or engineer or technician) to use a consultation program (Shortliffe,
1982a). Wethought that finding a medical problem that is not solved well
(and finding documentation of the difficulties) was the right starting place.
What we failed to see was that adoption of a new tool is not based solely
on demonstrated need coupled with demonstrated high performance of
the tool. In retrospect, that was naive. Acceptability is different from high
performance (Shortliffe, 1982b).

36.2.9 Generality

One of the most far-reaching sets of experiments in this work involved the
generalizability of the MYCINrepresentation scheme and inference en-
gine. Webelieved the skeletal program could be used for similar problem-
solving tasks in other domains, but no amount of analysis and discussion
696 MajorLessons fromThis Work

could have been as convincing as the working demonstrations of EMYCIN


in several different areas of medicine, electronics, tax advising, and soft-
ware consulting. Making the inference engine domain-independent meant
we had to write the rule interpreter so that it manipulates only the symbols
named in the rules and makes no semantic transformations except as spec-
ified in the knowledge base.
However, there are a number of assumptions about the type of problem
being solved that are built into EMYCIN. Weassume, for instance, that the
problem to be solved is one of analyzing a static collection of data (a "snap-
shot"), weighing all relevant evidence for and against competing hy-
potheses, and recommending some action. The whole formalism loses
strength whenit is stretched outside the limits of its design. Wesee parallels
with earlier efforts to build a general problem solver; however, the gen-
erality of EMYCIN is intended to be strongly bounded.
There is no mystery to how a system (such as MYCIN)can be gener-
alized (to EMYCIN)so that it is applicable to many problems in other
domains: keep the reasoning processes and the knowledgebase separate. However,
some of the limiting characteristics of the data, the reasoning processes,
the knowledge base, and the solutions are worth repeating.

The Data

EMYCIN was designed to analyze a static collection of data. The data may
be incomplete, interdependent, incorrect ("noisy"), and even inconsistent.
A system built in EMYCIN can, if the knowledge base is adequate, resolve
ambiguities and cope with uncertainty and imprecision in the data. EMY-
CIN does assume, however, that there is only one set of data to analyze
and that new data will not arrive later from experiments or monitoring.
The number of elements of data in the set has been small--roughly 20-
100--in the cases analyzed by MYCINand other EMYCINsystems. But
there seems to be no reason why more data cannot be accepted.

Reasoning Processes

EMYCIN is set up to reason backward from a goal to the data required to


establish it. It can also do some limited iorward reasoning within this con-
text. It thus requests the data it needs when they are not otherwise avail-
able.
It is an evidence-gathering system, collecting evidence for and against
potentially relevant conclusions. It is not set up to reason in other ways,
for example, by generating hypotheses from primitive elements and testing
them, by instantiating a template, or by refining a high-level description
through successive abstraction levels. It can propagate uncertainty from
Experimental
Results 697

the data, through uncertain inference rules, to the conclusions. Backtrack-


ing is not supported because the system follows all relevant paths.
Overall, the reasoning is assumed to be analytic and not synthetic.
Diagnostic and classification tasks fit well; construction and planning tasks
do not. The piece of MYCINthat constructs a therapy plan within con-
straints, for example, was coded as a few rules that call for evaluating
specialized procedures (Chapter 6). It is a complex constraint satisfaction
problem, with symbolic expressions of constraints. It was not readily coded
in MYCIN-like rules because of the numerous comparison operations (for
example, "minimizing").
An interpretation of the data, for instance "the diagnosis of the prob-
lem," is the usual goal in EMYCIN systems. In at least one case (SACON;
see Chapter 16), however, a solution can have a somewhat more prescrip-
tive flavor. Given a description of a problem, SACONdoes not solve it
directly but rather describes what the user should do to solve it. The pre-
scription of what to do "covers" the data in much the same way as a di-
agnosis covers the data. Because the evidence-gathering model fit this
problem, it was not necessary to treat it as a constraint satisfaction problem.

Knowledge Base

The form of knowledge is assumed primarily to be situation-action rules


and fact triples (with CFs). Other knowledge structures, such as tables
facts and specialized procedures, are included as well. Since the knowledge
base is indexed and is small relative to the rest of the program, the size of
the knowledge base should not be a limiting factor for most problems.
MYCINsknowledge base of 450 rules and about 1000 additional facts (in
tables) is the largest with which we have had experience, although ON-
COCINis ahnost that large and is growing rapidly.

Solutions

As mentioned in the discussion of evidence gathering, the solutions are


assumed to be subsets of elements from a predefined list. There are 120
organisms in MYCINslist of possible causes. In this problem area, the
evidence is generally considered insufficient for a precise determination
of a unique solution or a strictly ordered list of solutions. Because the
evidence is almost certainly incomplete in the first 24-48 hours of a severe
infection, both MYCIN and physicians are expected to "cover for" a set of
most likely and most risky causes. It is not expected that someone can
uniquely identify "the cause" of the problem when the data are suggestive
but still leave the problem underdetermined.
698 Major Lessons from This Work

36.2.10 Project Organization

Funding

Funding for the research presented here was not easy to find because of
the duality of goals mentioned above. Clinically oriented agencies of the
government were looking for fully developed programs that could be sent
to hospitals, private practices, military bases, or space installations. They
saw the initial demonstration with bacteremia as a sign that ward-ready
programs could be distributed as soon as knowledge of other infections
was added to MYCIN.And they seemed to believe that transcribing sen-
tences from textbooks into rules would produce knowledge bases with clin-
ical expertise. Other funding agencies recognized that research was still
required, but we failed to convince them that both medical and AI research
were essential. Wefelt that the kinds of techniques we were using could
help codify knowledge about infectious diseases and could help define a
consensus position on issues about which there are differences of medical
opinion. But we also felt that the AI techniques themselves needed analysis
and extension before they could be used for wholesale extensions to med-
ical knowledge. More generally, we saw medicine as a difficult real-world
domain that is typical of many other domains. Failing to find an agency
that would support both lines of activity, we submitted separate proposals
for the dual lines. Alter the initial three years of NIHsupport for MYCIN,
only the AI line was funded by the NSF, ONR, and DARPA (in the efforts
that produced EMYCIN, GUIDON, and NEOMYCIN).By 1977 our med-
ical collaborators were in transition for other reasons
i anywayso we largely
....
stopped developmg the mfecuous disease knowledge base. 20

Technology Transfer

Whenwe began, we believed in the "better mousetrap" theory of technol-


ogy transfer: build a high-performance program that solves an important
problem, and the world will transfer the technology. Wehave learned that
several elements of this naive theory are wrong. First, there is a bigger
difference between acceptability and performance than we appreciated, as
mentioned above. Second, there has to be a convenient mechanism of
transfer. MYCINran only in Interlisp under the TENEXand TOPS-20

2That is not to say, however, that all medical efforts stopped. Shortliffe rejoined the project
in 1979 and began defining and implementing ONCOCIN.Clancey needed to retormulate
MYCINsknowledge base in a form more suitable for tutoring (NEOMYCIN)and enlisted
the help of Dr. Tim Beckett. Several medical problem areas were investigated and prototype
systems were built using EMYCIN.These include pulmonary function testing (PUFF), blood
clotting disorders ((;LOT), and complications of pregnancy (GRAVIDA).And several masters
and doctoral students have continued to use medicine as a test-bed for ideas in AI and
decision making, causal reasoning, representation and learning. Several projects undertaken
after 1977 are included in the present volume.
KeyQuestionsandAnswers 699

operating systems. Since hospital wards and physicians offices do not have
access to the same equipment that computer science laboratories do, we
would have had to rewrite this large and complex system in another lan-
guage to run on smaller machines. We were not motivated to undertake
this task. Now, however, smaller, cheaper machines are available that do
run Interlisp and other dialects of LISP, so technology transfer is much
more feasible than when MYCINwas written.

Stability

Wewere tortunate with MYCIN in finding stability in (a) the goals of the
project, (b) the code, and (c) the system environment.
The group of researchers defining the MYCINproject changed as
students graduated, as interests changed, and as career goals took people
out of our sphere. Shortliffe, Buchanan, Davis, Scott, Clancey, Fagan, Aik-
ins, and van Melle formed a core group, however, that maintained a certain
continuity. Even with a fluid group, we found stability in the overall goal
of trying to build an AI system with acknowledged medical expertise.
Those who felt this was too narrow a goal moved on quickly, while others
found this sharp focus to be an anchor for defining their own research.
Another anchor was the code itself. Much of any individuals code is
opaque to others, and MYCIN contains its share of "patches" and "hacks."
Yet because the persons writing code felt a responsibility to leave pieces of
program that could be maintained and modified by others, the program-
ming practices of most of the group were ecologically sound. 21 Finally, the
stability of Interlisp, TENEX,and the SUMEX-AIM facility contributed
greatly to our ability to build a system incrementally. Without this outside
support, MYCINcould not have expanded in an orderly fashion and we
would have been forced to undertake massive rewrites just to keep old
code running.

36.3Key Questions and Answers

Werealize that a book of this size, describing several experiments that are
interrelated in complex and sometimes subtle ways, may leave the reader
asking exactly what has been learned by the research and what lessons can
be borrowed by others already working in the field or about to enter it.
This final chapter has attempted to summarize those lessons, but we feel
the need to close with a brief list of frequently asked questions and our

Z~Billvail Melle,Carli Scott, andRandy


Davisespeciallyenforcedthis ethic. In particular,
vanMellessystem-buildingtools helpedmaintainthe integrity of a rapidly changing,complex
system.
700 MajorLessons fromThis Work

answers to them. The responses are drawn from the work described in
earlier chapters but are also colored by our familiarity with other work in
AI (particularly research on expert systems). Despite the brevity and sim-
plicity of the questions and answers, we feel that they do summarize the
key lessons learned in the MYCIN experiments. For those readers who like
to start at the end when deciding whether or not to read a book, we hope
that the list will pique their curiosity and motivate them to start reading
from the beginning.

Is a productionrule ,[brmalismsu/Jicient .for creating programsthat can reason


at the level of an expert?
Yes, although we discovered many limitations and modified the "pure"
production rule formalism in several ways in order to produce a program
that met our design criteria.
Is backwardchaining a good model of control.[br guiding the reasoning and the
dialogue in consultation tasks?
Yes, particularly when the input data must be entered by the user, al-
though for efficiency and human-engineering reasons it is desirable to
augment it with forward chaining and meta-level control as well.
Is the evidence-gathering model useful in other domains?
Yes, there are many problems in which evidence must be gathered and
weighed for a set of possible hypotheses. Infectious disease diagnosis is
typical of" many problems in having a prestored list of hypotheses that
defines the search space. It is not the only uselul model for hypothesis
formation, however. In other problem areas, hypotheses can be synthe-
sized from smaller elements and then evidence gathered for them in a
manner closer to the generate-and-test approach. Or evidence can be
gathered during the generation of" hypotheses, as in the heuristic search
model used in DENDRAL.
Is the CFmodelof inexact reasoningsu[,ficiently precise for expert-level perfor-
mance?
Yes, at least in domains where the evidence weights are used to cluster
sets of most likely hypotheses rather than to select the "best" from among
them. Some domains demand, and supply, finer precision than the CF
model supports, but we felt we lost little information in reasoning with
the infectious disease rules using the CF model. Wewould need to per-
form additional experiments to determine the breadth of the models
applicability, but we recognize that a calculus of more than one number
allows finer distinctions.
Whatis the best way to build a large knowledgebase?
Knowledge engineering is, for now. Because the problem areas we con-
sider most appropriate for plausible reasoning are those that are not
already completely structured (e.g., in sets of equations), constructing
Key Questions and Answers 701

knowledge base requires defining some new structures. Filling out a


knowledge base, then, requires considerable testing and refinement in
order to forge a robust and coherent set of plausible rules. Knowledge
engineering requires a substantial investment in time for both the knowl-
edge engineer and domain expert, but there are currently no better
methods for transferring expertise to expert systems.
Were we successful in generalizing the problem-solving frameworkbeyond the
domainof infectious diseases?
Yes, EMYCIN has been demonstrated in many different problem areas.
It has limitations, but its value in system building is more dependent on
the structural match of the problem to the task of diagnosis than it is on
the specific knowledgestructures of the subject area.
Can the contents of an EMYCIN knowledgebase be effectively used alone for
tutoring students and trainees?
No, the knowledge base does not contain a rich enough model of the
causal mechanisms, support knowledge, or taxonomies of a domain to
allow a student to build a coherent picture of how the rules fit together
or what the best problem-solving strategies are.
Is the consultation modelof interaction a goodone for a decision-makingaid for
physicians?
For physicians the tradeoff between time and benefit is the key consid-
eration. A lengthy consultation will only be acceptable if there are major
advantages for the patient or physician to be gained by using the system.
For most applications, therefore, a decision-making aid should be inte-
grated with routine activities rather than called separately for formal
consultations. For practitioners in other fields, however, the consultation
model may be quite acceptable.
Is a simple key word and phrase parser powerful enough for natural language
interaction betweenusers and a system in a technical domain?
Yes, as long as the user can tolerate a stylized interaction and tries to
phrase responses and requests in understandable ways. The approach is
probably not sufficient, however, for casual users who seldom use a sys-
tem and accordingly have no opportunity to learn its linguistic idiosyn-
crasies.
Can we prove the correctness of conclusions from MYCIN?
No, because the heuristics carry no guarantees. However, we can dem-
onstrate empirically how well experts judge the correctness of a pro-
grams conclusions by using a variant of Turings test.
Whyis MYCIN not used routinely and why are the rules not published?
Although MYCINgives good advice and has been a marvelous source
of" new knowledge about expert systems and their design, computers
that run Interlisp are still too expensive, and there are enough deficien-
702 MajorLessonsfromThis Work

cies in MYCINsbreadth of knowledge and user interface that it would


not be a cost-effective tool for physicians to use on such narrow problem
areas as meningitis and bacteremia. We have been asked why we have
not published MYCINsrules about infectious diseases as a service to
physicians and medical students, even though the system itself is not
available. The long answer is in Chapter 29, but the short answer is that
it would not be a service. The rules, as written, do not separate the "key"
factors from the context-setting factors, they omit many causal mecha-
nisms that relate key factors with conclusions, and they (together with
the rule interpreter) embodya strategy of medical diagnosis that is never
explicit. They are not readable as text, nor were they intended to be.
They make more sense in the context of use than they do in isolation.
Why does MYCINwork so well?
There are manyreasons. First, the task was carefully chosen to increase
the likelihood of success: infectious disease therapy selection is a com-
binatorial problem within a restricted (and relatively small) vocabulary,
with time available for several seconds of reasoning, and with available
local expertise. Also, there is not just one unique solution to a problem,
but a set of acceptable conclusions. Second, the simple, modular knowl-
edge structures we used were designed to be easily understood and
changed. Thus the knowledge base could be built incrementally with
rapid feedback, i.e., without losing much time to radical changes in
underlying data structures and access functions. (In addition, the knowl-
edge structures could be used for multiple interrelated purposes,
thereby exploiting and further demonstrating the power and utility of
a modular representation scheme.) Third, the research team was dedi-
cated and highly motivated to make MYCINwork. Six doctoral disser-
tations on MYCINand related programs resulted from these efforts,
with at least as mucheffort expended by others not working for degrees.
Epilog

Although the "MYCINgang" has disbanded, the excitement of creating


the constellation of programs described in this book is still evident. The
legacy of this group is a set of results that others can build on. Weare
proud to have been active in the definition and development of MYCIN
and are enriched by having worked with such an outstanding group of
individuals.

703
Appendix: Annotated
Typescript of a Sample
MYCINConsultation

Instructions for Users at the Start of an


A,1 Infectious Disease Consultation (Optional)

[Here,as elsewhere,the responses


typedby the user follow a double-asteriskprompt.]
MYCIN3-Jun-78...
Specialoptions(type? for help):

Instructions?(Y or N)
** YES
This is a computerprogramnamed MYCINthat hasbeendesignedto adviseyouregardingan appropriate
choiceof infectious diseasetherapy.It is understood that youhavea patientwhomayhaveaninfection. Please
answerthe following questions,terminatingeachresponsewith RETURN. To correct typing errors, usethe
DELETE key to delete single characters,<ctrl>Wto deletea word,and<ctrl>Qto deletethe wholeline.
If youare not certain of youranswer,youmaymodifythe response by inserting a certainty factor (a number
from1 to 10) in parentheses after yourresponse.Absolutecertainty (10) is assumed for everyunmodified answer.
It is likely that some of the followingquestionscannotbeanswered with certainty.
Youmaychange an answerto a previousquestionin two ways.If the programis waitingfor a response from
you(that is, hastyped**), enter CHANGE followedby the number(s) of the question(s)whoseanswers will
altered. Youmayalso change a previousanswerat anytime (evenwhenthe program is not waitingfor
responsefromyou) by typing <ctrl>F(Fix), whichwill causethe program to interrupt its computationand
whatyouwantto change.(If the response to <ctrl>Fis not immediate, try typing the RETURN keyin addition.)
Try to avoidgoingbackbecause the processrequiresreconsidering the patient fromthe beginningandtherefore
maybe slow.
Notethat youmayalso enter UNK (for UNKnown) if youdonot knowthe answerto a question,? if youwish
seea moreprecisedefinition of the questionor some examplesof recognized responses, ??if youwantto seeall
recognizedresponses, the wordRULE if youwouldlike to seethe decisionrule whichhasgenerated the question
beingasked,the wordWHY if youwouldlike to seea moredetailed explanationof the question,or the letters QA
if youwouldlike to interrupt the consultationin orderto askquestions regardingthe decisionsmade so far in the
consultation.If youare ever puzzledaboutwhatoptionsare availableto youduringa consultation,enterthe word
HELP anda list of optionswill belisted for you.

SampleResponse [user input follows the **]


Doesthe patient havea risk factor for tuberculosis?
*t?

705
706 Appendix: Annotated Typescript of a Sample MYCINConsultation

Oneor moreof the followingare considered


risk factors for TB:a) positivePPD (5TU),b) history of closecontact
with a personhavingactive TB,c) household memberwith a past history of active TB, d) chestX-rayshowing
apical scarring, e) granulomas
seenonbiopsyof anyorgantissue.
expectedresponsesare: YESNO
EnterHELPfor user options.
** YES

SUMMARY:
(typectrI-Oto abortprintout)
UNK- Answernot known.
?- Rephrases the questionandgives examples of recognizedresponses.
??- Prints a list of all recognizedresponses.
RULE- Prints thecurrentdecisionrule.
QA- Programenters question-answering mode.
CHANGE # - Gobackandre-request answerto questionnumber #.
COMMENT - Enter comments aboutanyaspectof the systemsperformance. Yourcomments will be
forwardedto those in chargeof the MYCIN program.
WHY- Giveshigh-levelexplanationof the currentreasoning chainthat provoked this question.
HOW# - ExplainsHOW the systemwill achievea goal referred to by number # in a previous
explanation.
EXPLAIN- Providesa moredetailed explanationof a previousanswergiven by a WHY command.
FORGET - Resetsthe explanationof the reasoning chainbackto lowestlevel, as if youneverusedthe
WHY/EXPLAIN commands.
STOP- Halts the program withoutcompletingthe consultation.
SAVE- Haltsthe program, savingthe currentpatientona disk file, retrievableat a later date.
HELP- Printsthis list.

A.2Infectious Disease Consultation Session

[This is a typescript of a MYCIN


consultationonan actual patient treatedat Stanford.MYCIN wasnot used
to suggesttreatment.Spaceshavebeenaddedfor readability, andcomments (in italics) havebeen
inserted.Otherwiseit is a verbatimtranscript. Usersresponses
follow doubleasterisks.]
........ PATI
ENT-538
........
1) Patients name:
** PT538 [Name,age, sex, and race are alwaysaskedbecausethey are INITIALDATA.]
2) Age:
** 34 YEARS
3) Sex:
** MAEL
-MALE [MYCIN
corrects misspellings.]
4) Race:
** WHY [Theuserasksfor anexplanation.
In this instanceit is trivial.]
[i.e. WHY
is it importantto determine
the raceof PATIENT-538?]
[1.0] Theraceof a patientis oneof the 5 standardparametersthat identify
a patient.
Thismaybeusefullater in the consultation.Forexample,it is
relevantin determiningthe organisms(other than thoseseenoncultures
or smears)whichmightbecausingthe infection.
[backto question
4...]
** CAUCASIAN [This is the answer
to the questionaboutrace.]

Pleasesupplyoneanswerundereachheading.If your responseis not


Infectious Disease Consultation Session 707

understood,youwill beprompted by the standardquestioningmethod.


(enter? for normalprompting,
* for questions
that are not applicable.)
Pleaseenterthe resultsof positiveculturesusingthe followingtable.
site culture date datel method
I
number collected positive ofcollection
[urine,sputum
only]
5)**DONE
[Thenext questionasksfor severalitemsin tabular form
sincethey are foundtogetherin the hospital record.]
Pleaseenterthe results of pending
culturesin the followingtable.
site culture date stainl method of[
number collected examined collection
[urine,sputum
only]

6)** CSF123430-Jan-77YES
stainof morphl [MYClN follows up onthe stain of
smear the smearand the morphology.]
[type DONE
if no organismwasseenon the smear]
7)**DONE
[No informationaboutstain or morphologyis available. MYClN
now
followsup question6, askingaboutotherpositive cultures.]
site cul # datecoll. smear coll. methodl
examined?l
8)**DONE [Thereare no morepositive cultures.]

Pleaseenternegativeculturesin the followingtable.


site culture datel method ofl
number collected collection
[urine,sputum
only]
9)**DONE [Thereare nonegativecultures.]

10) DoyoususpectPt538mayhavean infection at a site fromwhichyou


havenot obtainedculture specimens?
** NO
Pleaseenterinformationoncurrentantimicrobials.
drug routeof datel
name administration started
11)** CEPHALOTHIN IV 23-Jan-77
12)**DONE
Pleaseenterinformationonprior antimicrobials.
drug routeof datel datel
name administration started stopped
13)**DONE
[Considering
organisms
(other than thoseisolated fromcultures) whichmight
bepresent...]
14) HasPt538recently hadsymptoms of persistent headache or other
abnormal neurologicsymptoms(dizziness,lethargy, etc.)?
** YES
15) HasPt538recently hadobjective evidence of abnormal neurologicsigns
(nuchalrigidity, coma,seizures,etc.) documentedby physician
observationor examination?
** YES
TheCSFcultures will be consideredto beassociatedwith MENINGITIS.
708 Appendix: Annotated Typescript of a Sample MYCINConsultation

16) Pleasegive the date on whichclinical evidence (symptoms, signs, or


laboratorytests) of the meningitisfirst appeared.
** 29-Jan-77 22:15
17) DoesPt538havean abnormalchest X-ray?
** NO
18) DoesPt538haveoneor moreof the followingrisk factors for TB: a)
positiveintermediate strengthPPD,b) history of closecontactwith
personwith active TB, c) a household member with a past history of
activeTB, d) apicalscarringonchestX-ray,e) history of granulomas
onbiopsyof liver, lymphnode,or other organs?
** NO
19) DoesPt538haveanycutaneouslesions or rash on physical examination?
** NO
20) DoesPt538haveevidenceof ocular nervedysfunction?
** NO
21) Is Pt538a compromised host (e.g. alcoholic,sickle-cell-disease,
immunosuppressed)?
** YES
22) Is Pt538immunosuppressed (e.g. corticosteroidtherapy,cytotoxic drug
therapy,radiation therapy,leukemia,lymphoma)?
** NO
23) For howmanydayshasPt538had abnormalneurologicsigns?
** 7.2 HOURS [Notethat the answerwasgivenin hours.
MYCINhasto beflexible enoughto convertunits.]
24) For howmanydayshas Pt538had neurologic symptoms?
** 1 DAYS
25) HasPt538hadrecentevidenceof tender or enlargedsalivary glands?
** NO
26) HasPt538beenexposedto anycontagiousdiseaserecently (e.g.
meningococcal
disease, mumps)
** NO
PleaseenterCSFfindings in the followingtable
CSF Percent Protein CSF Peripheral
i
WBC PMN Glucose Glucose
27)** 100 99 280 5 UNKNOWN
[Notethat someinformationis missing.]
28) Doyouknowthe results of a recent CBCof Pt538?
** YES
Pleaseenter CBC
results in the followingtable
WBC Percent Percent
I
in 1000s PMN BANDS
29)**1.9 UNKNOWN UNKNOWN
30) HasPt538hadanyinjury or insult to, or defectin the central-
nervous-system (excluding neurosurgery)?
** NO
31) HasPt538ever undergone anytype of surgery?
** YES
32) Does Pt538 have a history of NEUROSURGERY?
** NO
33) DoesPt538live in a crowded environment?
** NO
34) Doyoususpectrecentalcoholic history in Pt538?
** NO
35) Is meningitisa hospital-acquired
infection?
** WHY
[This is a moreinterestingquestion,requiringpresentation of the rule that caused
question35to be
asked.Notethat the first five clausesof the rule havebeeninvestigatedandquestion35is motivated
by the last clause.MYClN first rephrases
the questionto give the userits interpretationof the WHY]
Infectious Disease Consultation Session 709

[i.e. WHYis it importantto determine


whetherthe infection wasacquired
whilethe patientwashospitalized?]
]2.0] Thiswill aid in determining the organisms (otherthan thoseseenon
culturesor smears)whichmightbe causingthe infection.
It hasalreadybeenestablishedthat
[2.1] the infectionwhichrequirestherapyis meningitis,and
[2.2] organisms werenot seenonthe stain of this pendingCSFculture,
and
[2.3] the typeof the infectionis bacterial,and
[2.4] neurosurgery is not oneof the relevantitemsfromthe history
of Pt538,and
[2.5] the ageof Pt538is greaterthan10 days
Therefore, if
[2.6] the infection wasacquiredwhile the patientwashospitalized
then
there is evidencethat the organisms (other than thoseseenon
culturesor smears)whichmightbecausingthe infectionis e.coli
(.75) staphylococcus-coag-pos (.3) pseudomonas-aeruginosa
klebsiella-pneumoniae (.5)
Author:Yu
[RULE545]
[backto question35...]
[MYClNnowrequests the answerto question35 again.
Instead of answering,the user askshowclause2.3 wasconfirmed.]
** HOW
2.3
[i.e. HOW
wasit established
that the typeof the infectionis bacterial?]
[Ratherthan printing the contentsof the rules, MYCIN
showsthe rule
numbersandthe strengthof positive andnegativeevidencefromeachrule.]
Thefollowing rules concluded about:
thelikelihoodthat thetypeof theinfectionis bacterial
Rule Yes No
[2.7] RULE148 (.19)
[2.8] RULE500 (.51)
[2.9] RULE501 (.95)
[3.0] RULE502 (.97)
[3.1] RULE526 (.98)
[3.2] RULE504 (.97)
[3.3] RULE524 (.98)
Thus,it hasbeenestablished
that the typeof the infectionis bacterial.
[backto question
35...]
** HOW 2.5
[i.e. HOW
wasit establishedthat the ageof pt538is greaterthan10days?]
In answer
to question2 yousaid that the ageof pt538is 34years
Thus,it hasbeenestablishedthat the ageof pt538is greaterthan 10days.
[backto question35...]
** y
36) Is Pt538sclinical history consistentwith EPIGLOTTITIS?
** NO
37) Is Pt538sclinical history consistentwith OTITIS-MEDIA?
** NO
38) HasPt538ever undergonesplenectomy?
** NO
39) Is Pt538a burnpatient?
** WHY [Anotherexplanationis requested.]
710 Appendix: Annotated Typescript of a Sample MYCINConsultation

[i.e.
WHYisit important
to determine
whether
PATIENT-538
hasbeenseriously
burned?]
[4.0]
Thiswillaidindeterminingtheorganisms
(other
thanthose
seenon
culturesorsmears)whichmightbecausing
theinfection.
Ithasalreadybeenestablished
that
[4.1] theinfection
whichrequires
therapy
ismeningitis,
and
[4.2] organisms
werenotseenonthestainofthispending
CSFculture,
and
[4.3]thetypeoftheinfection
isbacterial
Therefore,
if
[4.4] pt538hasbeenseriouslyburned
then
there is suggestive
evidence(.5)thatpseudomonas-aeruginosa
is one
of theorganisms
(otherthanthose
seenoncultures
orsmears]
whichmightbecausingtheinfection
Author:
Yu
[RULE578]
[backto question39...]
** WHY [in this case,the user asksanotherWHY
question,backingup the subgoa/tree.]
[i.e. WHYis it importantto determine
the organisms
(other than thoseseen
oncultures or smears)whichmightbecausingthe infection?]
[5,0] ... in order to determinethe therapeuticregimenof PATIENT-538.
If
[5.1] informationhasbeengatheredaboutorganisms isolated fromthe
patient, organismsnotedonsmears of the patient, negative
culturesof the patient, suspected
infectionswithout
microbiologicalevidence, currentdrugsof the patientandprior drugsof the patient, and
[5.2] an attempthasbeenmade to deducethe organismswhichrequire
therapy,and
[5.3] Youhavegivenconsiderationto organisms (other than those
seenin cultures or smears)whichmightbe present,andfound
at least oneorganism whichrequirestherapy.
then
determinethe best therapyrecommendation fromamong the drugslikely
to beeffectiveagainstthe significant organisms
[RULE092]
[backto question39...]
** y

[MYCIN
nowprints its intermediateconclusions
aboutthe likely causesof the infection.]
I havefoundevidence(RULE545RULE557) that treatmentshouldcover for E.coli
causingmeningitis.
I havefound evidence(RULE545 RULE557 RULE578) that treatment should cover
for Pseudomonas-aeruginosa
causingmeningitis.
I havefoundevidence(RULE545RULE557) that treatmentshouldcover for
Klebsiella-pneumoniae
causingmeningitis.
I havefoundevidence(RULE507)
that treatmentshouldcoverfor Diplococcus-
pneumoniae causingmeningitis.
[NowMYC/N
beginsto ask questionsrelated to appropriatetherapy.]
40) DoesPt538havefocal neurologicalsigns?
** UNKNOWN
[Somequestionsare motivatedby the organismunderconsideration.Thus
the followinglines sometimes
provideuseful contextualinformationto a user.]
[Determining
whichdrugsare desirablefor useagainstthe E.coli...]
[Determiningwhichdrugsare desirable for useagainstthe Pseudomonas-
aeruginosa...]
Infectious Disease Consultation Session 711

[Determining
whichdrugsare desirablefor useagainstthe Klebsiella-
pneumoniae...]
[Determining
whichdrugsare desirablefor useagainstthe Diplococcus-
pneumoniae...]
Mytherapyrecommendations will bedesignedto treat for organisms
that either are verylikely or, althoughless
likely, wouldhavea significanteffect ontherapyselectionif theywerepresent.It is importantto coverfor the
followingprobableinfection(s) andassociated organism(s):
INFECTION-1
is MENINGITIS
+ <ITEM-l>E.COLI[from clinical evidenceonly]
+ <ITEM-2>PSEUDOMONAS-AERUGINOSA [from clinical evidence only]
+ <ITEM-3>KLEBSIELLA-PNEUMONIAE [from clinical evidenceonly]
+ <ITEM-4>DIPLOCOCCUS-PNEUMONIAE [from clinical evidence only]
41) DoesPt538havea clinically significant allergic reactionto any
antimicrobialagent?
** NO
[ConsideringAMPICILLIN-AND-GENTAMICIN for use against INFECTION-l...]
42) Doyou havereasonto suspectthat Pt538mayhaveimpairedrenal
function?
"" NO
43) Pt538s weight in kilograms (or <number>
POUNDS):
"" 70
[Finally, MYClNprints the best therapyit hasfoundto maximizethe chances of effective coverage
of the cause(s)of the infection whileminimizingthe number
of drugs,their toxici~ andside effects.]
[REC-1]Mypreferred therapyrecommendation is as follows:
In order to coverfor Items<1 2 3 4>:
Givethe followingin combination:
1) AMPICILLIN
Dose:3.5g (28.0 ml) q4hIV [calculatedon basis of 50mg/kg]
2) GENTAMICIN
Dose: 119mg(3.0 ml, 80mg/2ml ampule)q8hIV [calculatedon basis
1.7 mg/kgplus considergiving 5 mgq24hIntrathecal
Comments:Monitor serumconcentrations
Sincehighconcentrations
of penicillins caninactivate
aminoglycosides,
donot mix thesetwoantibiotics in the sameIV
bottle.

Doyouwish to seethe next choicetherapy?


** YES [Other therapiescanbe examined.]

Which,if any, of the abovedrugsdoyouwisheliminatedfromconsideration?


** AMPICILLIN
[ConsideringGENTAMICIN
for useagainst INFECTION-I...]
[ConsideringPENICILLIN
for useagainst INFECTION-I...]
[REC-2]Next best therapy recommendation:
In order to coverfor Items<12 3>:
Give: GENTAMICIN
Dose: 119mg(3.0 ml, 80mg/2ml ampule)q8hIV [calculatedon basis
1.7 mg/kg]plus considergiving 5 mgq24hIntrathecal
Comments: Monitor serumconcentrations
In order to coverfor Item<4>:
Give: PENICILLIN
Dose:3.5 million units (7.0 ml) q4hIV [calculatedonbasisof 50000
units/kg]
712 Appendix: Annotated Typescript of a Sample MYCINConsultation

RECOMMENDATION-2
is equivalent to RECOMMENDATION-1.
Wouldyoulike to seedetails of this comparison?
** YES
Looking
at the coverage
of the mostlikely items,weseethat:
RECOMMENDATION-1
consists of 1 first choice drug (AMPICILLIN for ITEM-1
and ITEM-4)and1 third choice drug (GENTAMICINfor ITEM-l, ITEM-2and
ITEM-3)
RECOMMENDATION-2 consists of 1 first choice drug (PENICILLIN
for ITEM-4)
and 1 third choicedrug (GENTAMICIN for ITEM-2,ITEM-1andITEM-3)
ORGANISMS RECOMMENDATION-1 RECOMMENDATION-2
mostlikely Drug-- ChoiceRanking Drug-- ChoiceRanking
ITEM-1 AMPICILLIN-AND-GENTAMICIN--lst GENTAMICIN--3rd
ITEM-2 GENTAMICIN--2nd GENTAMICIN--2nd
ITEM-3 GENTAMICIN--3rd GENTAMICIN--3rd
ITEM-4 AMPICILLIN--lst PENICILLIN--lst
(Thedesirability of a drugis definedto beits lowestrankingfor the items
it covers.)
Doyouwishto seethe next choicetherapy?
** NO

Answers to Users Questions About This


A.3 Consultation and About the Knowledge Base

Pleaseaskanysimple-English questionsregardingthe aboveconsultationsessionor the overall


knowledge of the consultationsystem.When discussingthe consultationabove,pleaserefer to organisms,
cultures, etc. by their numericnames (e.g., ORGANISM-I). Questionswhichwereaskedduring the consultation
maybe specified by th~ numbers that precededthemabove,(e.g., WHY DIDYOUASKQUESTION 1 ?).
ControI-Owill aborttype-out,or cancelthe currentquestion.TypeHELP for a list of the optionsavailable.Enter
the wordSTOP whenyouhaveno further questions.
TypeHELP
for the list of questionoptions.
** HELP
QA-OPTIONS:
(Typectrl-O to abortprintout)
ADVICE-- Prints suggestions for usingthe QAsystem effectively.
QT-- Prints examples of questiontypesrecognized by the system.
EQ <numbers>-- Prints anexplanationof the specifiedquestion(s)fromthe
consultation,e.g., EQ8 13.
LP <o~e~ type>-- Printslist of attributesassociatedwith objecttype,e.g., LP
CULTURE. (Objecttypesare patient, infection, culture, organism,
anddrug.)
PR <numbers>-- Prints specifiedrule(s), e.g., PR3734987
EXPLTHER-- Prints a short descriptionof the therapyprogram.
REC-- Promptsuser for alternative therapy RECommendation and
compares it to MYCINs recommendation.
RX<infection><organism
identities> -- Prints treatmentlists for specifiedorganisms causingthe indicated
infection, e.g., RXMENINGITIS ENTEROCOCCUS.
JUSTIFICATION -- Whenrules areprintedout, includetheir justifications.
LITERATURE -- Whenrules are printedout, includetheir literature references.
NOJUST-- Stopgivingjustificationand/orliterature withrules.
TERSE -- Print rulesin terseformat.
VERBOSE -- Turnsoff terseprintingof rules.
STOP-- Escapefrom QA-module.
HELP-- Printsthis list.
Answers to Users Questions About Consultation and Knowledge Base 713

**JUSTIFICATION
*" LITERATURE
**WHYDID YOUTREATE.COLI?
ITEM-1
is E.coli associated with meningitis.I decidedto treat ITEM-1
on
the basisof clinical evidence
only.
Thefollowing rules concluded about:
the likelihoodthat e.coli is oneof the organisms
(otherthanthose
seenoncultures or smears)whichmightbe causingthe infection
RULE545
wasusedto concludethat e.coli is oneof the organisms
(other than
thoseseenoncultures or smears)whichmightbe causingthe infection
(.74). Thelast questionaskedbefore the conclusionwasmade was35.
RULE557
wasusedto concludethat e.coli is oneof the organisms
(other than
thoseseenoncultures or smears)whichmightbe causingthe infection
(.93). Thelast questionaskedbefore the conclusionwasmade was37.

[A summary of whyE. coil wastreated showsthe parameters


associatedwith E. coil
asa likely cause
of the infection.]

Theparameters mostfrequentlyusedin theserules were:


the infectionwhichrequirestherapy
whethera smearof the culture wasexamined
whetherorganisms wereseenonthe stain of the culture
the typeof the infection
Wouldyoulike a summary
of ALLthe parameters
usedby these rules?
** YES
Parameter
(rules with this parameter)
the whitecountfromthe patients peripheralCBC (in thousands)
(557)
whetherthe results of a recentCBCof the patientare available(557)
whether the infection wasacquiredwhile the patient washospitalized(545)
the ageof the patient(545)
the relevantitemsfromthe history of the patient(545)
thetypeof the infection(545,557)
whetherorganisms wereseenon the stain of the culture (545, 557)
whethera smearof the culture wasexamined (545, 557)
the infection whichrequirestherapy(545, 557)

[Thenext questionis aboutMYCINs knowledge


in general, not aboutthe conclusions
drawnfor the casejust discussed.]

** HOWDOYOUUSETHETYPEOF THEINFECTION?
Thefollowingrules use:
the typeof the infection
154,578, 574, 572, 570, 565, 564, 559, 557, 553, 551,549,547, 545, 543, 541,
539, 537, 535, 533, 531,529,523, 520, 519, 517, 515, 513, 511,507,300,
289, 288, 287, 280, 174, 167
Theparameter
mostfrequentlyconcluded by theserules is:
the organisms(other than those seenoncultures or smears)whichmight
becausingthe infection
Wouldyou like a summary
of ALLthe parametersconcludedby these rules?
"" NO
Whichof the rules do youwishto see?
** 280
714 Appendix: Annotated Typescript of a Sample MYCINConsultation

RULE280

[Thisrule appliesto all cultures,andis tried in orderto find out about


the organisms (other than thoseseenoncultures or smears)whichmight
becausingthe infection]
If: 1) Theinfection whichrequirestherapyis meningitis,and
2) Thetypeof the infectionis fungal,and
3) Organisms werenot seenon the stain of the culture, and
4) Thepatient is not a compromised host, and
5) Thepatient hasbeento an areathat is endemic for coccidiomycoses,and
6) Theraceof the patient is oneof: black asianindian, and
7) Thecryptococcalantigenin the CSFwasnot positive
Then: Thereis suggestive evidence(.5) that cryptococcusis not oneof the
organisms (other than thoseseenon cultures or smears)which
mightbe causingthe infection
Author:YU
Justification:Dark-skinnedraces,especiallyFilipino, Asian,andBlack(in
that order) haveanincreasedsusceptibility to coccidiomycoses
meningitis.
Literature: Stevens,D.A. et al. Miconazole in Coccidiomycosis.AmJ Med,
60:191-202,Feb1976.

Therapy RecommendationConsultation
A.4 Session

[If the infection andorganism


identities are known,MYCIN
canstill behelpful in
recommending appropriate therapy.]
MYCIN
3-Jun-78 ...
Specialoptions(type? for help):
** REC
Instructions?(Y or N)
** YES
Thisprogramrecommends therapyfor a patient with infectious diseases.Youwill beaskeda few facts about
the patient whichwill beusedto determine the dosagesandappropriateness of giving certain antibiotics. Then
youwill beprompted for the infection(s) that youwanttreatedandthe organism(s) whichyoufeel may
responsiblefor eachinfecton.
MYCIN cannotmakea recommendation without knowingthe infection(s) and organism(s)to be treated. Other
informationis requested for the purposeof refining therapyselection,but is not essential;youmayanswer
UNKNOWN to any of these questions.
Case2, AA,47-24-31,VMC,A 3 yearold femalewith onedayhistory of headache,
lethargy, fever to 104. L.P.:
Protein 25, glucose95110cells with 96%polys. Pe: Meningismus,
petechiaeon armsandshoulders.
28-NOV-77 11:05:51
[consultationof 23-JUN-76
12:00]
........ PATIENT-683
........
1) Patientsname:
(first-last)
** PT683
2) Age:
**3.5
3) Sex:
** FEMALE
4) Is Pt683currently receivingtherapywith anyantimicrobialagent?
** NO
Therapy Recommendation Consultation Session 715

5) Duringthe presentillness, hasPt683beentreatedwith an


antimicrobialagentwhichis nolongerbeingadministered?
** NO
Infection: MENINGITIS
Organism:VIRUS
[Determining whichdrugsare desirablefor useagainstthe Virus...]
Are there anyother organisms whichmightbecausingthis meningitis?
** YES
Organism: HEMOPHILUS-INFLU ENZAE
6) Hasthelab reportedthein-vitro antibiotic sensitivitiesof the
hemophilus-influenzae causingthe meningitis(ITEM-2)?
** N
[Determiningwhichdrugsare desirablefor useagainstthe Hemophilus-
influenzae...]
7) Does Pt683havea clinically significant allergic reactionto any
antimicrobialagent?
** NO
Are there anyother organisms whichmightbecausingthis meningitis?
** YES
Organism: NEISSERIA-MENINGITIDIS
8) Hasthe lab reportedthe in-vitro antibiotic sensitivitiesof the
neisseria-meningitidis causingthe meningitis(ITEM-3)?
*" N
[Determining whichdrugsare desirablefor useagainstthe Neisseria-
meningitidis...]
Are there anyother organisms whichmightbe causingthis meningitis?
"" NO
Are there additional infections for whichyouwanttherapyrecommendations?
** NO
INFECTION-1
is MENINGITIS
+ ... VIRUS
Supportivetherapy.
+ <ITEM-2> HEMOPHILUS-INFLUENZAE
+ <ITEM-3> NEISSERIA-MENINGITIDIS
[Considering AMPICILLIN-AND-CHLORAMPHENICOL for use against INFECTION-1...}
9) Whatis the mostrecentcreatinineclearancein ml/minwhichyoufeel
is a true representation
of Pt683srenalfunction?
** UNKNOWN
10) Whatis the mostrecentserumcreatinine of Pt683(in mg/100ml)?
"* UNKNOWN
11) Pt683s weight in kilograms(or <number> POUNDS):
** 17.7
[REC-1]Mypreferred therapyrecommendation is as follows:
In orderto coverfor Items<23>:
Givethe followingin combination:
1) AMPICILLIN
Dose:886mg(7.0 ml) q4hIV [calculatedon basis of 50mg/kg]
[Thedoseshouldbemodifiedin SEVERE renal failure. I canadjust
the doseonly if a creatinineclearanceor a serum creatinine
is obtained,andthe patientsweightis known.]

2) CHLORAMPHENICOL
Dose:443mg(4.4 ml) q6hIV [calculatedonbasis of 25 mg/kg]

Pleaseaskanysimple-English questionsregardingthe aboveconsultationsessionor the overall knowledge of


the consultationsystem.Whendiscussingthe consultationabove,pleaserefer to organisms,cultures, etc. by their
numericnames (e.g. ORGANISM-I).
Questionswhichwereaskedduring the consultation maybe specified by the
numbers that precededthemabove,(e.g., WHY DIDYOUASKQUESTION 1?). ControI-Owill abort type-out,
cancelthe current question.TypeHELP for a list of the optionsavailable. Enterthe wordSTOP whenyouhaveno
further questions.
716 Appendix: Annotated Typescript of a Sample MYCINConsultation

TypeHELP
for the list of questionoptions.
** JUSTIFICATION
** LITERATURE
[MYCIN
cancompare
its recommendation
againstonethat the user wouldlike to consider.]
** REC
For eachitem in turn, enter the drugyouwouldhaveprescribedin
RECOMMENDATION-1.
In order to minimizethe number of drugsin your recommendation,
you maynot
wantto prescribetherapyfor everyitem. Itemswhichrepresentthe most
likely organisms
are indicatedwitha plus sign ( + ).
+ ITEM-l--thevirus **
+ ITEM-2--thohemophilus-influenzae ** CARBENICILLIN
+ ITEM-3~the neisseria-meningitidis ** CARBENICILLIN
[Checking
for contraindications...]
[ConsideringCARBENICILLIN for use against INFECTION-1...]
[Nocontraindications
found...]
[Nowcomparing
your prescription to MYCINs...]
ORGANISMS MYCtNsregimen Yourregimen
"mostlikely" Drug-- ChoiceRanking Drug-- ChoiceRanking
ITEM-2 AMPICILLIN-AND-CHLORAMPHENICOL--lst CARBENICILLIN--3rd
ITEM-3 AMPICILLIN--lst CARBENICILLIN--2nd
(Thedesirability of a drugis definedto beits lowestrankingfor the items
it covers.)
[Givingonedrugis often preferableto giving two. In this case,however,
the onewouldnot be as effective.]
Yourrecommendation for the mostlikely organism(s)
is not optimal,since
your prescription of 1 third choicedrug(CARBENICILLINfor ITEM-3andITEM-2)
shouldbe avoided.
[You mayrefer to your regimenas RECOMMENDATION-2
in later questions.]
** STOP
References

Abelson, R. 1973. The structure of belief systems. In Computer Models of


Thought and Language, eds. R. C. Schank and K. M. Colby, pp. 287-
339. San Francisco: Freeman.
Adams, J. B. 1976. A probability model of medical reasoning and the
MYCINmodel. Mathematical Biosciences 32: 177-186. (Appears as
Chapter 12 in this volume.)
Aiello, N. 1983. A comparative study of control strategies for expert sys-
tems: AGEimplementation of three variations of PUFEIn Proceedings
of the Third National Conferenceon Artificial Intelligence, pp. 1-4.
Aiello, N., and Nii, H. E 1981. AGE-PUFF:A simple event-driven pro-
gram. Report no. HPP-81-25, Computer Science Department, Stan-
ford University
Aikins, J. S. 1979. Prototypes and production rules: An approach to knowl-
edge representation for hypothesis formation. In Proceedings of the 6th
International Joint Conferenceon Artificial Intelligence (Tokyo), pp. 1-3.
(Appears with revisions as Chapter 23 of this volume.)
1980. Prototypes and production rules: a knowledge representa-
tion for computer consultations. Ph.D. dissertation, Stanford Univer-
sity. (Also Stanford Report no. STAN-CS-80-814.)
1983. Prototypical knowledgefor expert systems. Artificial Intelli-
gence 20(2): 163-210.
Aikins, J. S., Kunz, J. C., Shortliffe, E. H., and Fallat, R.J. 1983. PUFF:
An expert system for interpretation of pulmonary function data. Com-
puters and Biomedical Research 16: 199-208.
Allen, J. 1978. Anatomy of LISP. NewYork: McGraw-Hill.
Anderson, J. 1976. Language, Memoryand Thought. Hillsdale, NJ: Erlbaum.
Anthony, J. R. 1970. Effect on deciduous and permanent teeth of tetra-
cycline deposition in utero. Postgraduate Medicine 48(4): 165-168.
Barker, S. F. 1957. Induction and Hypothesis: a Study of the Logic of Confir-
mation. Ithaca, NY: Cornell University Press.
Barnett, J. A. 1981. Computational methods for a mathematical theory of
evidence. In Proceedings of the 7th International Joint Conference on Arti-
ficial Intelligence (Vancouver, B.C.), pp. 868-875.
Barnett, J. A., and Erman, L. 1982. Making control decisions in an expert
system is a problem-solving task. Technical report, Information Sci-
ences Institute, University of Southern California.
Barr, A., and Feigenbaum, E. A. (eds.). 1981, 1982. The Handbookof Arti-
ficial Intelligence (vols. 1, 2). Los Altos, CA: Kaufmann.

717
718 References

Barr, A., Beard, M., and Atkinson, R. C. 1976. The computer as a tutorial
laboratory: The Stanford BIP project. International Journal of Man-
Machine Studies 8: 567-596.
Bartlett, E C. 1932. Remembering:A Study in Experimental and Social Psy-
chology. Cambridge, U.K.: Cambridge University Press.
Bennett, J. S. 1983. ROGET:A knowledge-based consultant for acquiring
the conceptual structure of an expert system. Report no. HPP-83-24,
Computer Science Department, Stanford University.
Bennett, J. S., and Goldman, D. 1980. CLOT:A knowledge-based consul-
tant for bleeding disorders. Report no. HPP-80-7, Computer Science
Department, Stanford University.
Bennett, J. S., and Hollander, C. R. 1981. DART:An expert system for
computer fault diagnosis. In Proceedings of the 7th International Joint
Conference on Artificial Intelligence (Vancouver, B.C.), pp. 843-845.
Bennett, J. S., Creary, L., Engelmore, R., and Melosh, R. 1978. SACON:
A knowledge-based consultant for structural analysis. Report no. HPP-
78-23, Computer Science Department, Stanford University.
Bischoff, M., Shortliffe, E. H., Scott, A. C., Carlson, R. W., and Jacobs, D.
1983. Integration of a computer-based consultant into the clinical set-
ting. In Proceedings of the 7th Symposiumon ComputerApplications in Med-
ical Care (Baltimore, MD), pp. 149-152.
Blum, B. I., Lenhard, R., and McColligan, E. 1980. Protocol directed pa-
tient care using a computer. In Proceedings of 4th Symposiumon Computer
Applications in Medical Care (Washington, D.C.), pp. 753-761.
Blum, R. L. 1982. Discovery and representation of causal relationships
from a large time-oriented clinical database: The RX project. Ph.D.
dissertation, Stanford University. (Also in Computersand Biomedical Re-
search 15: 164-187.)
Bobrow, D. G. 1968. Natural language input for a computer problem-
solving system. In Semantic Information Processing, ed. M. Minsky, pp.
146-226. Cambridge, MA: MIT Press.
Bobrow, D. G., and Winograd, T. 1977. An overview of KRL, a knowledge
representation language. Cognitive Science 1: 3-46.
Bobrow, D. G., Kaplan, R. M., Kay, M., Norman, D., Thompson, H., and
Winograd, T. 1977. GUS:A frame-driven dialog system. Artificial In-
telligence 8: 155-173.
Bobrow, R. J., and Brown, J. S. 1975. Systematic understanding: Synthesis,
analysis and contingent knowledge in specialized understanding sys-
tems. In Representation and Understanding: Studies in Cognitive Science,
eds. D. G. Bobrow and A. Collins, pp. 103-129. NewYork: Academic
Press.
Bonnet, A. 1981. LITHO:An expert system for lithographic analysis. In-
ternal working paper, Schlumberger Corp., Paris, France.
Boyd, E. 1935. The Growth of the Surface Area of the HumanBody. Minne-
apolis: University of Minnesota Press.
References 719

Brachman, R. J. 1976. Whats in a concept: Structural foundations for


semantic networks. Report no. 3433, Bolt Beranek and Newman.
Bransford, J., and Franks, J. 1971. The abstraction of linguistic ideas.
Cognitive Psychology 2:331-350.
Brown, J. S., and Burton, R. R. 1978. Diagnostic models for procedural
bugs in mathematical skills. Cognitive Science 2: 155-192.
Brown, J. S., and Goldstein, I. E 1977. Computers in a learning society.
Testimony for the House Science and Technology Subcommittee on
Domestic and International Planning, Analysis and Cooperation, Oc-
tober, 1977.
Brown, J. S., and VanLehn, K. 1980. Repair theory: A generative theory
of bugs in procedural skills. Cognitive Science 4(4): 379-426.
Brown, J. S., Burton, R. R., and Zydbel, E 1973. A model-driven question-
answering system for mixed-initiative computer-assisted instruction.
IEEE Transactions on Systems, Man, and Cybernetics SMC-3(3): 248-257.
Brown, J. S., Burton, R. R., and Bell, A. G. 1974. SOPHIE:A sophisticated
instructional environment for teaching electronic troubleshooting (an
example of AI in CAI). Report no. 2790, Bolt Beranek and Newman.
Brown, J. S., Burton, R., Miller, M., de Kleer, J., Purcell, S., Hausmann,
C., and Bobrow, R. 1975. Steps toward a theoretic foundation for
complex knowledge-based CAI. Report no. 3135, Bolt Beranek and
Newman.
Brown, J. S., Rubenstein, R., and Burton, R. 1976. Reactive learning en-
vironment for computer-aided electronics instruction. Report no.
3314, Bolt Beranek and Newman.
Brown, J. S., Burton, R. R., and de Kleer, J. 1982. Pedagogical, natural
language, and knowledge engineering techniques in SOPHIEI, II,
and III. In Intelligent Tutoring Systems, eds. D. Sleemanand J. S. Brown,
pp. 227-282. London: Academic Press.
Bruce, B. C. 1975. Generation as a social action. In Proceedings of the Con-
ference on Theoretical Issues in Natural LanguageProcessing, pp. 74-77.
Buchanan, B. G., and Duda, R. O. 1983. Principles of rule-based expert
systems. In Advances in Computers(vol. 22), ed. M. C. Yovits, pp. 164-
216. New York: Academic Press.
Buchanan, B. G., and Feigenbaum, E. A. 1978. DENDRALand Meta-
DENDRAL: Their applications dimension. Artificial Intelligence 11 : 5-
24.
Buchanan, B. G., and Mitchell, T. 1978. Model-directed learning of pro-
duction rules. In Pattern-Directed Inference Systems, eds. D. Waterman
and E Hayes-Roth, pp. 297-312. New York: Academic Press.
Buchanan, B. G., Sutherland, G., and Feigenbaum, E. A. 1970. Rediscov-
ering some problems of artificial intelligence in the context of organic
chemistry. In Machine Intelligence 5, eds. B. Meltzer and D. Michie, pp.
209-254. Edinburgh, U.K.: Edinburgh University Press.
720 References

Buchanan, B. G., Mitchell, T. M., Smith, R. G., and Johnson, C. R., Jr.
1978. Models of learning systems. In Encyclopedia of Computer Science
and Technology 11, ed. J. Belzer, pp. 24-51. NewYork: Marcel Dekker.
Bullwinkle, C. 1977. Levels of complexity in discourse for anaphora dis-
ambiguation and speech act interpretation. In Proceedings of the 5th
International Joint Conferenceon Artificial Intelligence (Cambridge, MA),
pp. 43-49
Burton, R. R. 1976. Semantic grammar: An engineering technique for
constructing natural language understanding systems. Report no.
3453, Bolt Beranek and Newman
1979. An investigation of computer coaching for informal learning
activities International Journal of Man-Machine Studies 11 : 5-24.
Burton, R. R., and Brown, J. S. 1982. An investigation of computer coach-
ing for informal learning activities In Intelligent Tutoring Systems, eds.
D. Sleeman and J. S. Brown, pp. 79-98 NewYork: Academic Press.
Carbonell, J. R. 1970a. AI in CAI: An artificial-intelligence approach to
computer-assisted instruction IEEE Transactions on Man-MachineSys-
tems, MMS-11: 190-202
--. 1970b. Mixed-initiative man-computer instructional dialogues Re-
port no. 1971, Bolt Beranek and Newman
Carbonell, J. R., and Collins, A. M. 1973. Natural semantics in artificial
intelligence In Advance Papers of the 3rd International Joint Conference
on Artificial Intelligence (Stanford, CA), pp. 344-351.
Carden, T. S. 1974. The antibiotic problem (editorial) NewPhysician 23:
19.
Carnap, R. 1950. The two concepts of probability In Logical Foundations
of Probability, pp. 19-51. Chicago: University of Chicago Press
--. 1962. The aim of inductive logic. In Logic, Methodology, and Philos-
ophy of Science, eds. E. Nagel, E Suppes, and A. Tarski, pp. 303-318.
Stanford, CA: Stanford University Press
Carr, B., and Goldstein, I. 1977. Overlays: A theory of modeling for CAI.
Report no. 406, Artificial Intelligence Laboratory, Massachusetts In-
stitute of Technology
Chandrasekaran, B., Gomez, E, Mittal, S., and Smith,J. 1979. An approach
to medical diagnosis based on conceptual schemes. In Proceedings of
the 6th International Joint Conferenceon Artificial Intelligence (Tokyo), pp.
134-142.
Charniak, E. 1972. Toward a model of childrens story comprehension.
Report no. AI TR-266, Artificial Intelligence Laboratory, Massachu-
setts Institute of Technology
1977. A framed painting: The representation of a commonsense
knowledge fragment Journal of Cognitive Science 1(4): 355-394
1978. With a spoon in hand this must be the eating frame. In
Proceedings of the 2nd Conferenceon Theoretical Issues in Natural Language
Processing, pp. 187-193.
References 721

Charniak, E., Riesbeck, C., and McDermott,D. 1980. Artificial Intelligence


Programming. Hillsdale, N J: Erlbaum.
Chi, M. T. H., Feltovich, P. J., and Glaser, R. 1980. Representation of
physics knowledge by experts and novices. Report no. 2, Learning
Research and Development Center, University of Pittsburgh.
Ciesielski, V. 1980. A methodology for the construction of natural language
front ends for medical consultation systems. Ph.D. dissertation, Rut-
gers University. (Also Technical Report no. CBM-TR-112.)
Clancey, W. J. 1979a. Dialogue management for rule-based tutorials. In
Proceedingsof the 6th International Joint Conferenceon Artificial Intelligence
(Tokyo), pp. 155-161.
1979b. Transfer of rule-based expertise through a tutorial dia-
logue. Ph.D. dissertation, Computer Science Department, Stanford
University. (Also Stanford Report no. STAN-CS-769.)
1979c. Tutoring rules for guiding a case method dialogue. Inter-
national Journal of Man-MachineStudies 11: 25-49. (Edited version ap-
pears as Chapter 26 of this volume.)
1981. Tutoring rules for guiding a case method dialogue. In Intel-
ligent Tutoring Systems, eds. D. H. Sleeman and J. S. Brown, pp. 201-
225. NewYork: Academic Press. (Same as Clancey, 1979c.)
1983a. The advantages of abstract control knowledge in expert
system design. In Proceedings of the 3rd National Conference on Artificial
Intelligence (Washington, D.C.), pp. 74-78.
1983b. The epistemology of a rule-based expert system: A frame-
work for explanation. Artificial Intelligence 20:215-251. (Appears as
Chapter 29 in this volume.)
1984. Methodology for building an intelligent tutoring system. In
Methods and Tactics in Cognitive Science, eds. W. Kintsch, J. R. Miller,
and P. G. Poison. Hillsdale, NJ: Erlbaum. Forthcoming.
Clancey, W. J., and Letsinger, R. 1981. NEOMYCIN: Reconfiguring a rule-
based expert system for application to teaching. In Proceedings of the
7th International Joint Conference on Artificial Intelligence (Vancouver,
B.C.), pp. 829-836.
Clark, K. L., and McCabe, EG. 1982. PROLOG:a language for imple-
menting expert systems. In Machine Intelligence, eds. J. Hayes, D. Mi-
chie, and Y. Pao, pp. 455-470. NewYork: John Wiley.
Cohen, P. R., and Feigenbaum, E. A. (eds.). 1982. The Handbookof Artificial
Intelligence (vol. 3). Los Altos, CA: Kaufmann.
Cohen, S. N., Armstrong, M. E, Briggs, R. L., Chavez-Pardo, R., Feinberg,
L. S., Hannigan, J. E, Hansten, E D., Hunn, G. S., Illa, R. V., Moore,
T. N., Nishimura, T. G., Podlone, M. D., Shortliffe, E. H., Smith, L.
A., and Yosten, L. 1974. Computer-based monitoring and reporting
of drug interactions. In Proceedings of MEDINFOIFIP Conference
(Stockholm, Sweden), pp. 889-894.
Colby, K. M. 1981. Modeling a paranoid mind. Behavioral and Brain Sciences
4(4): 515-560.
722 References

Colby, K. M., Parkinson, R. C., and Faught, B. 1974 Pattern-matching


rules for the recognition of natural language dialogue expressions
Report no. AIM-234, Stanford Artificial Intelligence Laboratory, Stan-
ford University.
Collins, A. 1976. Processes in acquiring knowledge. In Schooling and Ac-
quisition of Knowledge, eds. R. C. Anderson, R. J. Spiro, and W. E.
Montague, pp. 339-363 Hillsdale, N J: Erlbaum.
1978. Fragments of a theory of human plausible reasoning In
Proceedingsof the 2nd Conferenceon Theoretical Issues in Natural Language
Processing, pp. 194-201.
Conchie, J. M., Munroe, J. D., and Anderson, D. O. 1970. The incidence
of staining of permanent teeth by the tetracyclines. Canadian Medical
Association Journal 103:351-356.
Cooper, G. E 1984. NESTOR:A medical decision support system that
integrates causal, temporal, and probabilistic knowledge Ph.D. disser-
tation, Computer Science Department, Stanford University. Forthcom-
ing.
Cronbach, L.J. 1970. Essentials of Psychological Testing NewYork: Harper
and Row.
Cullingford, R. 1977. Script application: Computer understanding of
newspaper stories Ph.D. dissertation, Yale University
Cumberbatch, J., and Heaps, H. S. 1973. Application of a non-Bayesian
approach to computer aided diagnosis of upper abdominal pain. In-
ternational Journal of Biomedical Computing 4:105-115.
Cumberbatch, J., Leung, V. K., and Heaps, H. S. 1974. A non-probabilistic
method for automated medical diagnosis International Journal of
Biomedical Computing 5: 133-146.
Davis, R. 1976. Applications of meta-level knowledge to the construction,
maintenance, and use of large knowledge bases. Ph.D. dissertation,
Computer Science Department, Stanford University (Reprinted with
revisions in Davis and Lenat, 1982)
1977a. Generalized procedure calling and content-directed invo-
cation. SIGPLANNotices 12(8): 45-54
1977b. Interactive transfer of expertise: Acquisition of new infer-
ence rules In Proceedings of the 5th International Joint Conference on
Artificial Intelligence (Cambridge, MA),pp. 321-328.
1978. Knowledge acquisition in rule-based systems: Knowledge
about representations as a basis for system construction and mainte-
nance. In Pattern-Directed Inference Systems, eds. D. A. Watermanand
E Hayes-Roth, pp. 99-134 NewYork: Academic Press
1979. Interactive transfer of expertise: Acquisition of new infer-
ence rules Artificial Intelligence 12:121-158. (Edited version appears
as Chapter 9 of this volume)
--. 1980. Meta-rules: Reasoning about control Artificial Intelligence 15:
179-222
References 723

1984. Diagnosis based on structure and function: Paths of inter-


action and the locality principle. Artificial Intelligence: forthcoming.
Davis, R., and Buchanan, B. G. 1977. Meta-level knowledge: Overview and
applications. In Proceedings of the 5th International Joint Conference on
Artificial Intelligence (Cambridge, MA), pp. 920-927. (Edited version
appears as Chapter 28 of this volume.)
Davis, R., and Lenat, D. B. 1982. Knowledge-BasedSystems in Artificial Intel-
ligence. NewYork: McGraw-Hill.
Davis, R., Buchanan, B., and Shortliffe, E. 1977. Production rules as a
representation for a knowledge-based consultation system. Artificial
Intelligence 8(1): 15-45.
Davis, R., Shrobe, H., Hamscher, W., Wieckert, K., Shirley, M., and Polit,
S. 1982. Diagnosis based on description of structure and function. In
Proceedingsof the National Conferenceon Artificial Intelligence (Pittsburgh,
PA), pp. 137-142.
Day, E. 1970. Automated health services: Reprogramming the doctor.
Methods of Information in Medicine 9:116-121.
de Dombal, E T. 1973. Surgical diagnosis assisted by computer. Proceedings
of the Royal Society of London V-184: 433-440.
de Dombal, E T., Leaper, D. J., Horrocks, J. C., Staniland, J. R., and
McCann, A. E 1974. Human and computer aided diagnosis of abdom-
inal pain: Further report with emphasis on the performance of clini-
cians. British Medical Journal 1: 376-380.
de Dombal, E T., Horrocks, J. C., and Staniland, J. R. 1975. The computer
as an aid to gastroenterological decision making. Scandinavian Journal
of Gastroenterology 10: 225-227.
de Finetti, B. 1972. Probability, Induction, and Statistics: The Art of Guessing.
New York: Wiley.
deJong, G. 1977. Skimming newspaper stories by computer. Report no.
104, Computer Science Department, Yale University.
de Kleer, J., Doyle, J., Steele, G., and Sussman, G. 1977. AMORD: Explicit
control of reasoning. In Proceedings of the Symposiumon Artificial Intel-
ligence and Programming Languages, pp. 116-125. Reprinted in SIG-
PLANNotices, vol. 12, and SIGARTNewsletter, no. 64.
Deifino, A. B., Buchs, A., Duffield, A. M., Djerassi, C., Buchanan, B. G.,
Feigenbaum, E. A., and Lederberg, J. 1970. Applications of artificial
intelligence for chemical inference VI. Approach to a general method
of interpreting low resolution mass spectra with a computer. Helvetica
Chimica Acta 53: 1394-1417.
Deutsch, B. G. 1974. The structure of task-oriented dialogs. In IEEE Sym-
posium for Speech Recognition, pp. 250-253.
Ditlove, J., Weidmann,E, Bernstein, M., and Massry, S. G. 1977. Methicillin
nephritis. Medicine 56: 483-491.
Duda, R. O., and Shortliffe, E. H. 1983. Expert systems research. Science
220: 261-268.
724 References

Duda, R. O., Hart, P. E., and Nilsson, N. J. 1976. Subjective Bayesian


methods for rule-based inference systems. In AFIPS Conference Pro-
ceedings of the 1976 National Computer Conference, vol. 45 (NewYork),
pp. 1075-1082.
Duda, R. O., Hart, P. E., Barrett, E, Gaschnig, J., Konolige, K., Reboh, R.,
and Slocum, J. 1978a. Development of the PROSPECTOR consultant
system for mineral exploration. Final report for SRI projects 5821 and
6415, Artificial Intelligence Center, SRI International.
Duda, R. O., Hart, P. E., Nilsson, N. J., and Sutherland, G. L. 1978b.
Semantic network representations in rule-based inference systems. In
Pattern-Directed Inference Systems, eds. D. A. Waterman and E Hayes-
Roth, pp. 203-221. NewYork: Academic Press.
Edelmann, C. M., Jr., and Barnett, H. L. 1971. Pediatric nephrology. In
Diseases of the Kidney, eds. M. B. Strauss and L. G. Welt, p. 1359. Boston:
Little, Brown.
Edwards, L. D., Levin, S., and Lepper, M. H. 1972. A comprehensive
surveillance system of infections and antimicrobials used at Presbyter-
ian-St. Lukes Hospital--Chicago. American Journal of Public Health 62:
1053-1055.
Edwards, W. 1972. N = 1: Diagnosis in unique cases. In Computer Diagnosis
and Diagnostic Methods, ed. J. A. Jacquez, pp. 139-151. Springfield,
IL: Thomas.
Eisenberg, L. 1974. Dont lean on the computer. Physicians World (April).
Elstein, A. S., Shulman, L. S., and Sprafka, S. A. 1978. Medical Problem
Solving: An Analysis of Clinical Reasoning. Cambridge, MA:Harvard
University Press.
Engelmore, R. S., and Terry, A. 1979. Structure and function of the CRYS-
ALISsystem. In Proceedings of the 6th International Joint Conference on
Artificial Intelligence (Tokyo), pp. 250-256.
Erman, L. D., Hayes-Roth, E, Lesser, V. R., and Reddy, D. R. 1980. The
Hearsay-II speech-understanding system: Integrating knowledge to
resolve uncertainty. Computing Surveys 12: 213-253.
Evans, A., Jr. 1964. An ALGOL 60 compiler. In Annual Review of Automatic
Programming (vol. 4), ed. R. Goodman, pp. 87-124. New York: Mac-
millan.
Fagan, L. 1980. VM:Representing time-dependent relations in a clinical
setting. Ph.D. dissertation, Computer Science Department, Stanford
University.
Fagan, L. M., Kunz, J. C., Feigenbaum, E. A., and Osborn, J. J. 1979.
Representation of dynamic clinical knowledge: Measurement inter-
pretation in the intensive care unit. In Proceedingsof the 6th International
Joint Conference on Artificial Intelligence (Tokyo), pp. 260-262. (Edited
version appears as Chapter 22 of this volume.)
Falk, G. 1970. Computer interpretation of imperfect line data. Report no.
AIM-132,Artificial Intelligence Laboratory, Stanford University.
References 725

Faught, W. E 1977. Motivation and intensionality in a computer simulation


model. Report no. AIM-305, Artificial Intelligence Laboratory, Stan-
ford University.
Feigenbaum, E. A. 1963. Simulation of verbal learning behavior. In Com-
puters and Thought, eds. E. A. Feigenbaum and J. Feldman, pp. 297-
309. New York: McGraw-Hill.
--. 1978. The art of artificial intelligence: Themes and case studies of
knowledge engineering. In AFIPS Conference Proceedings of the 1978
National Computer Conference, vol. 47 (Anaheim, CA), pp. 227-240.
Feigenbaum, E. A., Buchanan, B. G., and Lederberg, J. 1971. On gener-
ality and problem solving: A case study involving the DENDRAL pro-
gram. In Machine Intelligence 6, eds. B. Meltzer and D. Michie, pp.
165-190. NewYork: American Elsevier.
Feldman, J. A., Low, J. R., Swinehart, D. C., and Taylor, R. H. 1972. Recent
developments in SAIL: An ALGOL-basedlanguage for artificial in-
telligence. In AFIPS Conference Proceedings of the 1972 Fall Joint Com-
puter Conference, vol. 41 (Anaheim, CA), pp. 1193-1202.
Feltovich, P. J., Johnson, E E., Moiler, J. H., and Swanson, D. B. 1980. The
role and development of medical knowledge in diagnostic expertise.
Paper presented at the annual meeting of the American Educational
Research Association, 1980.
Feurzeig, W., Munter, E, Swets, J., and Breen, M. 1964. Computer-aided
teaching in medical diagnosis. Journal of Medical Education 39: 746-
755.
Fisher, L. S., Chow, A. W., Yoshikawa, T. T., and Guze, L. B. 1975. Ceph-
alothin and cephaloridine therapy for bacterial meningitis. Annals of
Internal Medicine 82: 689-693.
Floyd, R. 1961. A descriptive language for symbol manipulation. Journal
of the Association for Computing Machinery 8: 579-584.
Fox, M. 1981. Reasoning with incomplete knowledge in a resource-limited
environment: Integrating reasoning and knowledge acquisition. In
Proceedingsof the 7th International Joint Conferenceon Artificial Intelligence
(Vancouver, B.C.), pp. 313-318.
Franke, E. K., and Ritschel, W. A. 1976. A new method for quick estimation
of the absorption rate constant for clinical purposes using a nomo-
graph. Drug Intelligence and Clinical Pharmacy10: 77-82.
Friedman, L. 1981. Extended plausible inference. In Proceedings of the 7th
International Joint Conferenceon Artificial Intelligence (Vancouver,B.C.),
pp. 487-495.
Friedman, R. B., and Gustafson, D. H. 1977. Computers in clinical medi-
cine: A critical review (guest editorial). Computers and Biomedical Re-
search 10: 199-204.
Garvey, T. D., Lowrence, J. D., and Fischler, M. A. 1981. An inference
technique for integrating knowledge from disparate sources. In Pro-
ceedings of the 7th International Joint Conferenceon Artificial Intelligence
(Vancouver, B.C.), pp. 319-325.
726 References

Gaschnig, J. 1979. Preliminary performance analysis of the PROSPECTOR


consultant system for mineral exploration. In Proceedings of the 6th
International Joint Conference on Artificial Intelligence (Tokyo), pp. 308-
310.
Genesereth, M. R. 1981. The use of hierarchical models in the automated
diagnosis of computer systems. Report no. HPP-81-20, Computer Sci-
ence Department, Stanford University.
Gerring, E E., Shortliffe, E. H., and van Melle, W. 1982. The Interviewer/
Reasoner model: An approach to improving system responsiveness in
interactive AI systems. AI Magazine 3(4): 24-27.
Gibaldi, M., and Perrier, D. 1975. Pharmacokinetics. NewYork: Marcel Dek-
ker.
Ginsberg, A. S. 1971. Decision analysis in clinical patient managementwith
an application to the pleural effusion syndrome. Report no. R-751-
RC-NLM, Rand Corporation.
Glantz, S. A. 1978. Computers in clinical medicine: A critique. Computer
11 : 68-77.
Glesser, M. A., and Collen, M. E 1972. Toward automated medical deci-
sions. Computers and Biomedical Research 5:180-189.
Goguen, J. A. 1968. The logic of inexact concepts. Synthese 19: 325-373.
Goldberg, A., and Kay, A. 1976. Smalltalk-72 users manual. Report no.
SSL 76-6, Learning Research Group, Xerox PARC, Palo Alto, CA.
Goldstein, I. P. 1977. The computer as coach: An athletic paradigm for
intellectual education. Report no. 389, Artificial Intelligence Labora-
tory, Massachusetts Institute of Technology.
--. 1978. Developing a computational representation of problem solv-
ing skills. Report no. 495, Artificial Intelligence Center, Massachusetts
Institute of Technology.
Goldstein, I. E, and Roberts, B. R. 1977. NUDGE:A knowledge-based
scheduling program. In Proceedings of the 5th International Joint Confer-
ence on Artificial Intelligence (Cambridge, MA),pp. 257-263.
Gorry, G. A. 1973. Computer-assisted clinical decision making. Methods of
Information in Medicine 12:45-51.
Gorry, G. A., and Barnett, G. O. 1968. Experience with a model of se-
quential diagnosis. Computers and Biomedical Research 1 : 490-507.
Gorry, G. A., Kassirer, J. P., Essig, A., and Schwartz, W. B. 1973. Decision
analysis as the basis for computer-aided management of acute renal
failure. American Journal of Medicine 55:473 -484.
Grayson, C.J. 1960. Decision Under Uncertainty: Drilling Decisions by Oil and
Gas Operators. Cambridge, MA:Harvard University Press.
Greiner, R., and Lenat, D. B. 1980. A representation language language.
In Proceedings of the 1st AnnualNational Conferenceon Artificial Intelli-
gence (Stanford, CA), pp. 165-169.
Grinberg, M. R. 1980. A knowledge based design system for digital elec-
tronics. In Proceedingsof the 1st Annual National Conference on Artificial
Intelligence (Stanford, CA), pp. 283-285..
References 727

Grosz, B. 1977 The representation and use of focus in a system for un-
derstanding dialogs. In Proceedings of the 5th International Joint Confer-
ence on Artificial Intelligence (Cambridge, MA),pp. 67-76.
Gustafson, D. H., Kestly, J. J., Greist, J. H., and Jensen, N. M. 1971. Initial
evaluation of a subjective Bayesian diagnostic system. Health Services
Research 6:204-213.
Harr6, R. 1970. Probability and confirmation. In The Principles of Scientific
Thinking, pp. 157-177. Chicago: University of Chicago Press.
Hartley, J., Sleeman, D., and Woods, E 1972. Controlling the learning of
diagnostic tasks International Journal of Man-MachineStudies 4:319-
340.
Hasling, D. W., Ciancey, W. J., Rennels, G. D. 1984. Strategic explanations
for a diagnostic consultation system. International Journal of Man-Ma-
chine Studies: forthcoming.
Hayes-Roth, E, and McDermott, J. 1977. Knowledge acquisition from
structural descriptions. In Proceedingsof the 5th International Joint Con-
ference on Artificial Intelligence (Cambridge, MA),pp. 356-362
Hayes-Roth, E, Waterman, D., and Lenat, D. (eds.). 1983. Building Expert
Systems. Reading, MA:Addison-Wesley.
Hearn, A. C. 1971. Applications of symbol manipulation in theoretical
physics. Communicationsof the Association for ComputingMachinery14(8):
511-516.
Heiser, J. E, Brooks, R. E., and Ballard, J. P. 1978. Progress report: A
computerized psychopharmacology advisor (abstract). In Proceedings
of the 1 l th Collegium Internationale Neuro-Psychopharmacologicum(Vi-
enna), p. 233.
Helmer, O., and Rescher, N. 1960. On the epistemology of the inexact
sciences Report no. R-353, Rand Corporation.
Hempel, C. G. 1965. Studies in the logic of confirmation. In Aspects of
Scientific Explanation and Other Essays in the Philosophy of Science, pp. 3-
51. NewYork: Free Press.
Hendrix, G. G. 1976. The Lifer manual: A guide to building practical
natural language interfaces. Report no. 138, Artificial Intelligence
Center, Stanford Research Institute.
1977. A natural language interface facility. SIGARTNewsletter 61:
25-26
Hewitt, C. 1972. Description and theoretical analysis (using schemata)
PLANNER: A language for proving theorems and manipulating
models in a robot. Ph.D. dissertation, Massachusetts Institute of Tech-
nology.
Hewitt, C., Bishop, P., and Steiger, R. 1973. A universal modular ACTOR
formalism for artificial intelligence In AdvancePapers of the 3rd Inter-
national Joint Conference on Artificial Intelligence (Stanford, CA), pp.
235-245.
728 References

Hilberman, M., Kamm,B., Tarter, M., and Osborn, J. J. 1975. An evalu-


ation of computer-based patient monitoring at Pacific Medical Center.
Computers and Biomedical Research 8: 447-460.
Horwitz, J., Thompson, H., Concannon, T., Friedman, R. H., Krikorian,
J., and Gertman, P. M. 1980. Computer-assisted patient care manage-
ment in medical oncology. In Proceedings of the 4th Symposium on Com-
puter Applications in Medical Care (Washington, D.C.), pp. 771-780.
Jaynes, J. 1976. The Origin of Consciousness in the Breakdownof the Bicameral
Mind. Boston: Houghton Mifflin
Jelliffe, R. W., and Jelliffe, S. M. 1972. A computer program for estimation
of creatinine clearance from unstable serum creatinine levels, age, sex,
and weight. Mathematical Biosciences 14: 17-24.
Johnson, P. E., Duran, A., Hassebrock, E, Moiler, J., Prietula, M., Feltovich,
P. J., and Swanson, D. B. 1981. Expertise and error in diagnostic rea-
soning. Cognitive Science 5(3): 235-283.
Kahneman,D., Slovic, P., and Tversky, A. 1982.Judgment under Uncertainty:
Heuristics and Biases. Cambridge, U.K.: Cambridge University Press.
Keynes, J. M. 1962. A Treatise on Probability. NewYork: Harper and Row.
Kintsch, W. 1976. Memoryfor prose. In The Structure of HumanMemory,
ed. C. Cofer, pp. 90-113. San Francisco: Freeman.
Koffman, E. B., and Blount, S. E. 1973. Artificial intelligence and auto-
matic programming in CAI. In Advance Papers of the 3rd International
Joint Conference on Artificial Intelligence (Stanford, CA), pp. 86-94.
Kulikowski, C., and Weiss, S. 1971. Computer-based models of glaucoma.
Report no. 3, Computers in Biomedicine, Department of Computer
Science, Rutgers University
1982. Representation of expert knowledge for consultation: The
CASNET and EXPERTprojects. In Artificial Intelligence in Medicine,
ed. E Szolovits, pp. 21-55. Boulder, CO: Westview Press.
Kunin, C. M. 1973. Use of antibiotics: A brief exposition of the problem
and some tentative solutions. Annals of Internal Medicine 79: 555-560.
Kunz, J. C. 1984. Use of AI, simple mathematics, and a physiological model
for making medical diagnoses and treatment plans. Ph.D. dissertation,
Stanford Heuristic Programming Project, Stanford University. Forth-
coming.
Kunz, J. C., Fallat, R. J., McClung,D. H., Osborn, J. J., Votteri, B. A., Nii,
H. P., Aikins, J. S., Fagan, L. M., and Feigenbaum, E. A. 1979. A
physiological rule-based system for interpreting pulmonary function
test results. In Proceedings of Computersin Critical Care and Pulmonary
Medicine, pp. 375-379.
Kunz, J. C., Shortliffe, E. H., Buchanan, B. G., and Feigenbaum, E. A.
1984. Computer-assisted decision making in medicine. The Journal of
Medicine and Philosophy 9:135-160.
Langlotz, C. P., and Shortliffe, E. H. 1983. Adapting a consultation system
to critique user plans. International Journal of Man-MachineStudies 19:
479-496.
References 729

Leaper, D. J., Horrocks, J. C., Staniland, J. R., and de Dombal, F. T. 1972.


Computer-assisted diagnosis of abdominal pain using estimates pro-
vided by clinicians. British Medical Journal 4: 350-354.
Ledley, R. S. 1973. Syntax-directed concept analysis in the reasoning foun-
dations of medical diagnosis. Computers in Biology and Medicine 3: 89-
99.
Lenat, D. B. 1975. Beings: Knowledge as interacting experts. In Advance
Papers of the 4th International Joint Conference on Artificial Intelligence
(Tbilisi, USSR), pp. 126-133.
--. 1976. AM:An artificial intelligence approach to discovery in math-
ematics as heuristic search. Ph.D. dissertation, Computer Science De-
partment, Stanford University. (Stanford Reports nos. CS-STAN-76-
570 and AIM-286. Reprinted with revisions in Davis and Lenat, 1982.)
--. 1983. Theory formation by heuristic search. The nature of heu-
ristics II: Backgroundand examples. Artificial Intelligence 21 : 31-59.
Lesgold, A. M. 1983. Acquiring expertise. Report no. PDS-5, Learning
Research and Development Center, University of Pittsburgh. (Also
forthcoming in Tutorials in Learning and Memory, eds. J. R. Anderson
and S. M. Kosslyn. San Francisco: Freeman.)
Lesser, R. L., Fennell, R. D., Erman, L. D., and Reddy, D. R. 1974. Orga-
nization of the HEARSAY II speech understanding system. In Con-
tributed Papers of the IEEE Symposiumon Speech Recognition (Pittsburgh,
PA), pp. 11-21.
1975. Organization of the HEARSAY II speech understanding
system. IEEE Transactions on Acoustics, Speech, and Signal Processing
ASSP-23:11-23.
Levy, A. H. 1977. Is informatics a basic medical science? In MEDINFO 77,
pp. 979-981. Amsterdam: North-Holland.
Linde, C. 1978. The organization of discourse. In Style and Variables in
English, eds. T. Shopen andJ. M. Williams. Cambridge, MA: Winthrop
Press.
Linde, C., and Goguen, J. A. 1978. Structure of planning discourse.Journal
of Social Biological Structure 1 : 219-251.
Lindsay, R. K., Buchanan, B. G., Feigenbaum, E. A., and Lederberg, J.
1980. Applications of Artificial Intelligence for OrganicChemistry: The DEN-
DRALProject. New York: McGraw-Hill.
Luce, R. D., and Suppes, P. 1965. Preference, utility, and subjective prob-
ability. In Handbookof Mathematical Psychology, eds. R. D. Luce, R. R.
Bush, and E. Galanter, pp. 249-410. NewYork: Wiley.
Manna, Z. 1969. The correctness of programs. Journal of Computer and
System Sciences 3:119-127.
MARCCorporation. 1976. MARCUser Information Manual. Palo Alto, CA:
MARCAnalysis Research Corporation.
Mayne, J. G., Weksel, W., and Scholtz, E N. 1968. Toward automating the
medical history. MayoClinic Proceedings 43(1): 1-25
730 References

McCarthy, J. 1958. Programs with commonsense In Proceedings of the


Symposium on the Mechanisation of Thought Processes, pp. 77-84 (Re-
printed in Semantic Information Processing, ed. M. L. Minsky, pp. 403-
409. Cambridge, MA: MIT Press, 1968.)
. 1983. Some expert systems need commonsense. Invited presen-
tation for the New York Academy of Sciences Science Week Sympo-
sium on Computer Culture, April 5-8, 1983. Annals of the New York
Academyof Science: forthcoming
McCarthy, J., Abrahams, E J., Edwards, D. J., Hart, T. P., and Levin, M.
I. 1962. LISP 1.5 Programmers Manual. Cambridge, MA: MIT Press.
McDermott, D. V. and Doyle, J. 1980. Non-monotonic logic I. Artificial
Intelligence 13: 41-72.
Melhorn, J. M., Warren, K. L., and Clark, G. M. 1979. Current attitudes
of medical personnel towards computers Computers and Biomedical Re-
search 12: 327-334
Michie, D. 1974. On Machine Intelligence. NewYork: John Wiley and Sons.
Miller, R. A., Pople, H. E., and Myers, J. D. 1982. INTERNIST-l: An
experimental computer-based diagnostic consultant for general inter-
nal medicine NewEngland Journal of Medicine 307(8): 468-476
Minsky, M. L. 1975. A framework for representing knowledge. In The
Psychology of Computer Vision, ed. P. H. Winston, pp. 211-277. New
York: McGraw-Hill.
Model, M. L. 1979. Monitoring system behavior in a complex computa-
tional environment Ph.D. dissertation, Stanford University. (Also
Technical Report no. CS-79-701.)
Moran, T. E 1973a. The symbolic imagery hypothesis: A production sys-
tem model. Ph.D. dissertation, Computer Science Department, Car-
negie-Mellon University
1973b. The symbolic nature of visual imagery. In Advance Papers
of the 3rd International Joint Conferenceon Artificial Intelligence (Stanford,
CA), pp. 472-477.
Moses, J. 1971. Symbolic integration: The stormy decade. Communications
ACM 8: 548-560
Muller, C. 1972. The overmedicated society: Forces in the marketplace for
medical care. Science 176: 488-492.
Mulsant, B., and Servan-Schreiber, D. 1984. Knowledge engineering: A
daily activity on a hospital ward. Computersand Biomedical Research 17:
71-91.
Neu, H. C., and Howrey, S. P. 1975. Testing the physicians knowledge of
antibiotic use. New England Journal of Medicine 293:1291-1295.
Newell, A. 1973. Production systems: Models of control structures. In Vis-
ual Information Processing, ed. W. G. Chase, pp. 463-526 NewYork:
Academic Press.
1983. The heuristic of George Polya and its relation to artificial
intelligence. In Methods of Heuristics, eds. R. Groner, M. Groner, and
W. F. Bischof. Hillsdale, NJ: Erlbaum.
References 731

Newell, A., and Simon, H. A. 1972. Human Problem Solving. Englewood


Cliffs, N J: Prentice-Hall.
Nie, N. H., Hull, C. H., Jenkins, J. C., Steinbrenner, K., and Bent, D. H.
1975. SPSS: Statistical Package for the Social Sciences. NewYork: Mc-
Graw-Hill.
Nii, H. E, and Feigenbaum, E. A. 1978. Rule-based understanding of sig-
nals. In Pattern-Directed Inference Systems, eds. D. A. Watermanand E
Hayes-Roth, pp. 483-501. New York: Academic Press.
Nii, H. P., Feigenbaum, E. A., Anton, J. J., and Rockmore, A. J. 1982.
Signal-to-symbol transformation: HASP/SIAPcase study. AI Magazine
3(2): 23-35.
Nilsson, N. J. 1971. Problem Solving Methods in Artificial Intelligence. New
York: McGraw-Hill.
Norusis, M. J., and Jacquez, J. A. 1975a. Diagnosis I. Symptom
nonindependence in mathematical models for diagnosis. Computers and
Biomedical Research 8: 156 - 172.
--. 1975b. Diagnosis II. Diagnostic models based on attribute clusters:
A proposal and comparisons. Computers and Biomedical Research 8:173-
188.
OBrien, T. E. 1974. Excretion of drugs in human milk. American Journal
of Hospital Pharmacology31: 844-854.
Osborn, J. J., Beaumont, J. C., Raison, A., and Abbott, R. P. 1969. Com-
putation for quantitative on-line measurement in an intensive care
ward. In Computers in Biomedical Research, eds. R. W. Stacey and B. D.
Waxman, pp. 207-237. New York: Academic Press.
Papert, S. 1970. Teaching children programming. In IFIP Conference on
Computer Education. Amsterdam: North-Holland.
Parry, M. E, and Neu, H. C. 1976. Pharmacokinetics of ticarcillin in pa-
tients with abnormal renal function. Journal of Infectious Disease 133:
46-49.
Parzen, E. 1960. Modern Probability Theory and Its Applications. NewYork:
Wiley.
Patil, R. S., Szolovits, E, and Schwartz, W. B. 1981. Causal understanding
of patient illness in medical diagnosis. In Proceedingsof 7th International
Joint Conference on Artificial Intelligence (Vancouver, B. C.), pp. 893-
899.
1982. Information acquisition in diagnosis. In Proceedings of the
National Conference on Artificial Intelligence (Pittsburgh, PA), pp. 345-
348.
Pauker, S. E, and Pauker, S. G. 1977. Prenatal diagnosis: a directive ap-
proach to genetic counseling using decision analysis. Yale Journal of
Biological Medicine 50: 275-289.
Pauker, S. G., and Szolovits, P. 1977. Analyzing and simulating taking the
history of the present illness: Context formation. In IFIP Working Con-
ference on Computational Linguistics in Medicine, eds. W. Schneider and
A. L. Sagvall-Hein, pp. 109-118. Amsterdam: North-Holland.
732 References

Pauker, S. G., Gorry, G. A., Kassirer, J. E, and Schwartz, W. B. 1976.


Toward the simulation of clinical cognition: Taking a present illness
by computer. American Journal of Medicine 60:981-995.
Peterson, O. L., Andrews, L. P., Spain, R. S., and Greenberg, B. G. 1956.
An analytic study of North Carolina general practice.Journal of Medical
Education 31 : 1-165.
Pipberger, H. V., McCaughan,D., Littman, D., Pipberger, H. A., Cornfield,
J., Dunn, R. A., Batchelor, C. D., and Berson, A. S. 1975. Clinical
application of a second generation electrocardiographic computer
program. American Journal of Cardiology 35:597 -608.
Politakis, E G. 1982. Using empirical analysis to refine expert system
knowledge bases. Ph.D. dissertation, Computer Science Research Lab-
oratory, Rutgers University. (Also Technical Report no. CBM-TR-130.)
Polya, G. 1957. HowTo Solve It: A NewAspect of Mathematical Method. Prince-
ton, N J: Princeton University Press.
Pople, H. E., Jr. 1977. The formation of composite hypotheses in diagnostic
problem solving: An exercise in synthetic reasoning. In Proceedings of
the 5th International Joint Conferenceon Artificial Intelligence (Cambridge,
MA), pp. 1030-1037.
--. 1982. Heuristic methods for imposing structure on ill-structured
problems: The structuring of medical diagnostics. In Artificial Intelli-
gence in Medicine, ed. E Szolovits, pp. 119-190. Boulder, CO: Westview
Press.
Popper, K. R. 1959. Corroboration, the weight of evidence. In The Logic of
Scientific Discovery, pp. 387-419. NewYork: Scientific Editions.
Post, E. 1943. Formal reductions of the general combinatorial problem.
American Journal of Mathematics 65: 197- 268.
Ramsey, E E 1931. The Foundations of Mathematics and Other Logical Essays.
London: Kegan Paul.
Reboh, R. 1981. Knowledge engineering techniques and tools in the PROS-
PECTOR environment. Report no. 243, Artificial Intelligence Center,
SRI International, Menlo Park, CA.
Reddy, D. R., Erman, L. D., Fennell, R. D., and Neely, R. B. 1973. The
HEARSAY speech-understanding system: An example of the recog-
nition process. In AdvancePapers of the 3rd International Joint Conference
on Artificial Intelligence (Stanford, CA), pp. 185-193.
Reimann, H. H., and DAmbola,J. 1966. The use and cost of antimicrobials
in hospitals. Archives of Environmental Health 13:631-636.
Reiser, J. E 1975. BAIL: A debugger for SAIL. Report no. AIM-270,
Artificial Intelligence Laboratory, Stanford University.
Resnikoff, M., Holland, C. H., and Stroebel, C. E 1967. Attitudes toward
computers among employees of a psychiatric hospital. Mental Hygiene
51: 419.
Resztak, K. E., and Williams, R. B. 1972. A review of antibiotic therapy in
patients with systemic infections. AmericanJournal of Hospital Pharmacy
29: 935-941.
References 733

Rieger, C. 1976. An organization of knowledge for problem solving and


language comprehension. Artificial Intelligence 7: 89-127.
Roberts, A. W., and Visconti, J. A. 1972. The rational and irrational use
of systemic microbial drugs. American Journal of Hospital Pharmacy29:
828-834.
Roberts, B., and Goldstein, I. E 1977. The FRL manual. Report no. 409,
Artificial Intelligence Laboratory, Massachusetts Institute of Technol-
ogy.
Rosenberg, S. 1977. Frame-based text processing. Report no. 431, Artificial
Intelligence Laboratory, Massachusetts Institute of Technology.
Ross, P. 1972. Computersin medical diagnosis. CRCCritical Review of Radio-
logical Science 3: 197-243.
Rubin, M. I., Bruck, E., and Rapoport, M. 1949. Maturation of renal func-
tion in childhood: Clearance studies. Journal of Clinical Investigation 28:
1144.
Rumelhart, D. 1975. Notes on a schema for stories. In Representation and
Understanding: Studies in Cognitive Science, eds. D. G. Bobrowand A.
Collins, pp. 211-236. New York: Academic Press.
Rychener, M. D. 1975. The student production system: A study of encod-
ing knowledge in production systems. Technical report, Computer Sci-
ence Department, Carnegie-Mellon University.
Sager, N. 1978. Natural language information formatting: The automatic
conversion of texts in a structured data base. In Advances in Computers
(vol. 17), ed. M. C. Yovlts, pp. 89-162. NewYork: Academic Press.
Sahnon, W. C. 1966. The Foundations of Scientific Inference. Pittsburgh, PA:
University of Pittsburgh Press.
1973. Confirmation. Scientific American 228: 75-83.
Savage, L.J. 1974. Foundations of Statistics. NewYork: Wiley.
Schank, R., and Abelson, R. 1975. Scripts, plans and knowledge. In Advance
Papers of the 4th International Joint Conferenceon Artificial Intelligence
(Tbilisi, USSR), pp. 151-158.
Scheckler, W. E., and Bennett, J. V. 1970. Antibiotic usage in seven com-
munity hospitals. Journal of the American Medical Association 213: 264-
267.
Schefe, E 1980. On foundations of reasoning with uncertain facts and
vague concepts. International Journal of Man-MachineStudies 12: 35-
62.
Scheinok, P. A., and Rinaldo, J. A. 1971. System diagnosis: The use of two
different mathematical models. International Journal of Biomedical Com-
puting 2: 239-248.
Schwartz, G.J., Haycock, G. B., Edelmann, C. M.,Jr., and Spitzer, A. 1976.
A simple estimate of glomerular filtration rate in children derived
from body length and plasma creatinine. Pediatrics 58: 259-263.
Schwartz, W. B. 1970. Medicine and the computer: The promise and prob-
lems of change. New England Journal of Medicine 283: 1257-1264.
734 References

--. 1979 Decision analysis: A look at the chief complaint. New England
Journal of Medicine 300: 556
Schwartz, W. B., Gorry, G. A., Kassirer, J. E, and Essig, A. 1973 Decision
analysis and clinical judgements. American Journal of Medicine 55: 459-
472.
Scott, A. C., Clancey, W. J., Davis, R., and Shortliffe, E. H. 1977. Expla-
nation capabilities of knowledge-based production systems. American
Journal of Computational Linguistics Microfiche 62. (Appears as Chapter
18 of this volume.)
Scragg, G. W. 1975a. Answering process questions In Advance Papers of the
4th International Joint Conferenceon Artificial Intelligence (Tbilisi, USSR),
pp. 435-442
--. 1975b. Answering questions about processes. In Explorations in Cog-
nition, eds. D. A. Normanand D. E. Rumelhart. San Francisco: Free-
man.
Selfridge, O. 1959 Pandemonium:A paradigm for learning. In Proceedings
of Symposium on Mechanisation of Thought and Processes, pp. 511-529.
Teddington, U.K.: National Physics Laboratory
Shackle, G. L. S. 1952 Expectation in Economics. Cambridge, U.K.: Cam-
bridge University Press
1955. Uncertainty in Economics and Other Reflections Cambridge,
U.K.: Cambridge University Press.
Shafer, G. 1976 A Mathematical Theory of Evidence. Princeton, N J: Princeton
University Press
Shortliffe, E. H. 1974. MYCIN:A rule-based computer program for ad-
vising physicians regarding antimicrobial therapy selection. Ph.D. dis-
sertation, Stanford University (Reprinted with revisions as Shortliffe,
1976)
1976. Computer-Based Medical Consultations: MYCIN. NewYork:
American Elsevier.
. 1980 Consultation systems for physicians: The role of artificial
intelligence techniques In Proceedings of the 3rd National Conference of
the CanadianSociety for ComputationalStudies of Intelligence (Victoria,
B.C.), pp. 1-1 I. (Also in Readingsin Artificial Intelligence, eds. B. Web-
ber and N. Nilsson, pp. 323-333. Menlo Park, CA: Tioga Press, 1981)
1982a. Computer-based clinical decision aids: Somepractical con-
siderations. In Proceedings of the AMIACongress 82 (San Francisco, CA),
pp. 295-298
. 1982b. The computer and medical decision making: Good advice
is not enough (guest editorial). IEEEEngineering in Medicine and Biology
Magazine 1(2): 16-18
Shortliffe, E. H., and Buchanan, B. G. 1975 A model of inexact reasoning
in medicine Mathematical Biosciences 23:351-379
Shortliffe, E. H., and Davis, R. 1975 Someconsiderations for the imple-
mentation of knowledge-based expert systems. SIGARTNewsletter 55:
9-12.
References 735

Shortliffe, E. H., Axline, S. G., Buchanan, B. G., Merigan, T. C., and


Cohen, S. N. 1973. An artificial intelligence program to advise phy-
sicians regarding antimicrobial therapy. Computers and Biomedical Re-
search 6: 544-560.
Shortliffe, E. H., Axline, S. G., Buchanan, B. G., and Cohen, S. N. 1974.
Design considerations for a program to provide consultations in clin-
ical therapeutics. In Proceedings of the 13th San Diego Biomedical Sym-
posium (San Diego, CA), pp. 311-319.
Shortliffe, E. H., Davis, R., Axline, S. G., Buchanan, B. G., Green, C. C.,
and Cohen, S. N. 1975. Computer-based consultations in clinical ther-
apeutics: Explanation and rule acquisition capabilities of the MYCIN
system. Computers and Biomedical Research 8: 303-320.
Shortliffe, E. H., Buchanan, B. G., and Feigenbaum, E. A. 1979. Knowl-
edge engineering for medical decision making: A review of computer-
based clinical decision aids. Proceedings of the IEEE 67: 1207-1224.
Siber, G. R., Echeverria, E, Smith, A. L., Paisley, J. W., and Smith, D. H.
1975. Pharmacokinetics of gentamicin in children and adults. Journal
of Infectious Diseases 132:637-651.
Sidner, C. 1979. A computational model of co-reference comprehension
in English. Ph.D. dissertation, Massachusetts Institute of Technology.
Simmons, H. E., and Stolley, P. D. 1974. This is medical progress? Trends
and consequences of antibiotic use in the United States. Journal of the
American Medical Association 227: 1023-1026.
Sleeman, D. H. 1977. A system which allows students to explore algorithms.
In Proceedingsof the 5th International Joint Conferenceon Artificial Intel-
ligence (Cambridge, MA), pp. 780-786.
Slovic, P., Rorer, L. G., and Hoffman, P.J. 1971. Analyzing use of diag-
nostic signs. Investigative Radiology 6: 18-26.
Smith, D. H., Buchanan, B. G., Engelmore, R. S., Duffield, A. M., Yeo, A.,
Feigenbaum, E. A., Lederberg, J., and Djerassi, C. 1972. Applications
of artificial intelligence for chemical inference VIII: An approach to
the computer interpretation of the high resolution mass spectra of
complex molecules: Structure elucidation of estrogenic steroids. Jour-
nal of the American Chemical Society 94:5962-5971.
Sprosty, P.J. 1963. The use of questions in the diagnostic problem solving
process. In The Diagnostic Process, ed. J. A. Jacquez, pp. 281-308. Ann
Arbor, MI: University of Michigan School of Medicine.
Startsman, T. S., and Robinson, R. E. 1972. The attitudes of medical and
paramedical personnel towards computers. Computers and Biomedical
Research 5:218-227.
Stefik, M. 1979. An examination of a frame-structured representation sys-
tem. In Proceedings of the 6th International Joint Conferenceon Artificial
Intelligence (Tokyo), pp. 845-852.
Stevens, A. L., Collins, A., and Goldin, S. 1978. Diagnosing students mis-
conceptions in causal models. Report no. 3786, Bolt Beranek and New-
man.
736 References

Swanson, D. B., Feltovich, E J., and Johnson, E E. 1977. Psychological


analysis of physician expertise: Implications for design of decision sup-
port systems In MEDINFO77, pp. 161~164. Amsterdam: North-Hol-
land.
Swartout, W. R. 1981. Explaining and justifying expert consulting pro-
grams. In Proceedingsof the 7th International Joint Conferenceon Artificial
Intelligence (Vancouver, B.C.), pp. 815-822
1983. A system for creating and explaining expert consulting pro-
grams. Artificial Intelligence 21 : 285-325.
Swinburne, R. G. 1970. Choosing between confirmation theories Philosophy
of Science 37: 602-613
1973. An Introduction to Confirmation Theory. London: Methuen.
Szolovits, E 1979. Artificial intelligence and clinical problem solving Re-
port no. MIT/LCS/TM-140, Laboratory for Computer Science, Mas-
sachusetts Institute of Technology.
Szolovits, P., and Pauker, S. G. 1978. Categorical and probabilistic reason-
ing in medical diagnosis Artificial Intelligence 11" 115-144.
Teach, R. L. 1984. Patterns of explanation and reasoning in clinical med-
icine: Implications for improving the performance of expert computer
systems Ph.D. dissertation, Stanford University Forthcoming
Teitelman, W. 1978. Interlisp Reference Manual Palo Alto, CA: Xerox Palo
Alto Research Center
Tesler, L. G., Enea, H. J., and Smith, D. C. 1973. The LISP70 pattern
matching system In Advance Papers of the 3rd International Joint Con-
ference on Artificial Intelligence (Stanford, CA), pp. 671-676.
Trigoboff, M. 1978. IRIS: A framework for the construction of clinical
consultation systems Ph.D. dissertation, Department of Computer Sci-
ence, Rutgers University
Tsuji, S., and Shortliffe, E. H. 1983. Graphical access to the knowledge
base of a medical consultation system. In Proceedings of AAMSICongress
83 (San Francisco), pp. 551-555.
Turing, A. M. 1950. Computing machinery and intelligence Mind 59: 433-
460. (Reprinted in Computers and Thought, eds. E. A. Feigenbaum and
J. Feldman. NewYork: McGraw-Hill, 1963)
Tversky, A. 1972. Elimination by aspects: A theory of choice Psychology
Review 79: 281-299.
van Melle, W. 1974. Would you like advice on another horn? MYCIN
project internal working paper, Stanford University
1980. A domain-independent system that aids in constructing
knowledge-based consultation programs Ph.D. dissertation, Com-
puter Science Department, Stanford University (Stanford Reports
nos. STAN-CS-80-820and HPP-80-22. Reprinted as van Melle, 1981)
198 I. System Aids in Constructing Consultation Programs. AnnArbor,
MI: UMI Research Press
References 737

van Melle, W., Scott, A. C., Bennett, J. S., and Peairs, M. 1981. The EMY-
CIN manual Report no. HPP-81-16, Computer Science Department,
Stanford University
Waldinger, R., and Levitt, K. N. 1974. Reasoning about programs Artificial
Intelligence 5:235-316.
Warner, H. R., Toronto, A. E, Veasey, L. G., and Stephenson, R. 1961. A
mathematical approach to medical diagnosis: Application to conge~aital
heart disease. Journal of the American Medical Association 177(3): 177-
183.
Warner, H. R., Toronto, A. E, and Veasy, L. G. 1964. Experience with
Bayes theorem for computer diagnosis of congenital heart disease.
Annals of the NewYork Academyof Science 115: 2.
Waterman, D. A. 1970. Generalization learning techniques for automating
the learning of heuristics Artificial Intelligence 1: 121-170
1974. Adaptive production systems. Complex Information Pro-
cessing Working Paper, Report no. 285, Psychology Department, Car-
negie-Mellon University
1978. Exemplary programming In Pattern-Directed Inference Sys-
tems, eds. D. A. Waterman and E Hayes-Roth, pp. 261-280. NewYork:
Academic Press.
Weiner, J. L. 1979. The structure of natural explanation: Theory and ap-
plication. Report no. SP-4305, System Development Corporation
1980. BLAH:A system which explains its reasoning. Artificial In-
telligence 15: 19-48
Weiss, C. E, Glazko, A. J., and Weston, J. K. 1960. Chloramphenicol in the
newborn infant New England Journal of Medicine 262: 787-794
Weiss, S. M., Kulikowski, C. A., Amarel, S., and Safir, A. 1978. A model-
based method for computer-aided medical decision-making. Artificial
Intelligence 11: 145-172
Weizenbaum, J. 1967. Contextual understanding by computers Commu-
nications of the Association for ComputingMachinery 10(8): 474-480.
1976. Computer Power and HumanReason: From Judgment to Calcu-
lation. San Francisco: Freeman.
Wilson, J. V. K. 1956. Two medical texts from Nimrud. IRAQ 18: 130-
146.
1962. The Nimrud catalogue of medical and physiognomical om-
ina. IRAQ 24: 52-62
Winograd, T. 1972. Understanding natural language Cognitive Psychology
3: 1-191
1975. Frame representations and the procedural/declarative con-
troversy. In Representation and Understanding,Studies in Cognitive Science,
eds. D. G. Bobrow and A. Collins, pp. 185-210 New York: Academic
Press
1977. A framework for understanding discourse. Report no. AIM-
297, Artificial Intelligence Laboratory, Stanford University.
738 References

1980. Extended inference modes in reasoning by computer sys-


tems. Artificial Intelligence 13: 5-26.
Winston, P. H. 1970. Learning structural descriptions from examples. Re-
port no. TR-76, Project MAC,Massachusetts Institute of Technology.
--. 1977. Artificial Intelligence. Reading, MA:Addison-Wesley.
1979. Learning and reasoning by analogy. Communications of the
Association for Computing Machinery 23( 12): 689-703.
Winston, P. H., and Horn, B. 1981. LISP. Reading, MA:Addison-Wesley.
Wirtschafter, D. D., Gams, R., Ferguson, C., Blackwell, W., and Boackle, P.
1980. Clinical protocol information system. In Proceedings of the 4th
Symposiumon ComputerApplications in Medical Care (Washington, D.C.),
pp. 745-752.
Woods, W. A. 1970. Transition network grammars for natural language
analysis. Communications of the Association for Computing Machinery
13(10): 591-606.
1975: Whats in a link: Foundations for semantic networks. In
Representation and Understanding: Studies in Cognitive Science, eds. D. G.
Bobrow and A. Collins, pp. 35-82. NewYork: Academic Press.
Yu, V. L., Buchanan,B. G., Shortliffe, E. H., Wraith, S. M., Davis, R., Scott,
A. C., and Cohen, S. N. 1979a. An evaluation of the performance of
a computer-based consultant. Computer Programs in Biomedicine 9: 95-
102.
Yu, V. L., Fagan, L. M., Wraith, S. M., Clancey, W. J., Scott, A. C., Han-
nigan, J. E, Blum, R. L., Buchanan, B. G., and Cohen, S. N. 1979b.
Antimicrobial selection by a computer: A blinded evaluation by infec-
tious disease experts. Journal of the AmericanMedicalAssociation 242(12):
1279-1282. (Appears as Chapter 31 of this volume.)
Zadeh, L. A. 1965. Fuzzy sets. Information and Control 8: 338-353.
1975. Fuzzy logic and approximate reasoning. Synthese 30: 407-
428.
--. 1978. Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets and
Systems 1: 3-28.
Name Index

ABEL, 387 Brown, J. S., 133, 134, 141, Cooper, G., 218, 335, 582,
Abelson, R., 615, 617 173,456, 457,459, 465, 6O3
ACT, 46 468, 470, 478, 5(36, 551, Cronbach, L. J., 639
ACTORS, 520 552, 555 CRYSALIS, 563
Adams, J. B., 214, 263 Bruce, B. C., 471 Cullingford, R., 615
ADVICE TAKER, 670 Buchanan, B. G., 8-9, 50, Cumberbatch, J., 263
AGE, 394f 92, 149, 153, 174, 201,
Aiello, N., 394-395 210, 221ff, 233, 263- Dambola, J., 16
Aikins, J. S., 19, 157,221ff, 271,302,383,455,507, DART, 11, 312
312, 392,424, 441,561, 525, 561,562,589, 699 Davis, R., 10, 18, 20, 42, 48,
565, 692, 699 Bullwinkle, C., 624 51, 171-205,326, 333,
AI/MM, 335, 396 Burton, R. R., 173, 456, 469, 338, 348, 355, 383, 396,
ALGOL,27, 39, 40, 41, 47 478, 621 464, 475, 477, 493ff,
Allen, J., 6 505, 506, 507, 520, 524,
AM, 505, 561,563, 564 CADUCEUS(see 528, 533f, 539, 564, 576,
Anderson, J., 21, 41, 46 INTERNIST) 699
Anthony, J. R., 364 Campbell, B., 653 Day, E., 635
ARL, 152, 307, 324-325, Carbonell, J. R., 9, 55, 199, De Dombal, E T., 263
687 200, 331,377,455,469 De Finetti, B., 241
Carden, T. S., 16 DeJong, G., 615
Armstrong, R., 590
Carnap, R., 242-244 De Kleer, J., 468, 505
Axline, S., 8-10, 55, 209,
571,599 Carr, B., 456, 468, 469, 479 Delfino, A. B., 462
CASNET, 506 DENDRAL, 8, 11, 23, 25ff,
Catanzarite, V., 500 29, 32, 37, 39, 46f, 49,
BAIL, 173 55f, 149, 151, 171,209,
BAOBAB, 11,602,613-634 CENTAUR,11, 19, 392-
461-462,494, 506, 562,
Barker, S. E, 244 394, 424-440, 444, 561,
671,674, 676f, 687f,
Barnett, G. O., 234-236 562,565,676, 679
692, 700
Barnett, H. L., 365 Chandrasekaran, B., 275
Dempster, A., 215, 272
Barnett, J. A., 272, 288, 292, Charniak, E., 6, 615, 619, Deutsch, B. G., 471
5O5 625
Ditlove, J., 364
Barr, A., 153 Chi, M., 456 Doyle, J., 682
Bartlett, E C., 613 CHRONICLER, 138-146 Duda, R. O., 3, 55, 211,214,
BASIC, 394 Ciesielski, V., 153 374, 392,425, 505
Beckett, T., 461,698 Clancey, W. J., 19, 57, 133,
Bennet, J. E., 590 214,217, 221ff, 328, Edelmann, C. M., 365
Bennett, J. S., 12, 152, 307, 333-334, 335-337,338, Edwards, L. E, 18
312, 314, 412, 589, 686 372, 383, 396, 455ff, Edwards, W., 236, 259
Bennett, S. W., 334, 363, 589 464, 494, 504, 506, 531, Eisenberg, L., 635
BIP, 479 557,561,582, 589, 679, ELIZA, 693
Bischoff, M. B., 604, 653 698, 699 Elstein, A. S., 211,439, 451,
BIount, S. E., 478 Clark, K. L., 333 552, 651
BLUEBOX, 312 Clayton, J. E., 394, 441 EMYCIN,6, 11, 18, 60, 132,
Blum, B. l., 604 CLOT, 11,312, 314, 318- 152, 154-157, 160-162,
Blum, R. L., 153,687 323,500f, 698 165, 196, 210, 214-216,
Bobrow, D. G., 45, 134, 141, Cohen, S. N., 8-10, 55, 209, 284, 295-301,302-313,
471,525,614, 615 571,589 314-328, 393, 412, 439,
Bobrow, R. G., 134, 141 Colby, K. M., 151,333, 352, 451,494ff, 602,605,
Bonnet, A., 312, 498, 602, 688, 693 653, 658-663, 670,
613 Collen, M. E, 263 674f, 685f, 696ff
Boyd, E., 365, 370 Collins, A., 199, 200, 455, Engelmore, R. S., 314, 563
Brachman, R. J., 532 457,469, 484, 613 EPAM, 26
Bransford, J., 613 Conchie, J. M., 364 Erman, L. D., 395, 505,561
Brown, B. W., 210 CONGEN, 11,600 EURISKO, 505

739
Evans, A., 22 HEADMED, 312 Interlisp, 110, 140, 157,
EXPERT, 152 Heaps, H. S., 263 173, 188, 2t9, 308,
Hearn, A. C., 151 431,472,600, 605,
Fagan, L. M., 19, 392-393, HEARSAY-1I,44, 195, 561, 616, 660, 664, 699
397,589, 658, 699 563 LITHO, 312,498
Falk, G., 189 Heiser, J. E, 312 LOGO, 455
Fallat, R., 393 Helmer, O., 233 London, R., 335
Faught, W., 471 Hempel, C. G., 244 Luce, R. D., 246
Feigenbaum, E. A., 8, 11, 23, Hendrix, G. G., 412,621, Ludwig, D., 214
26, 151, 153, 171,201, 624
392-393, 397, 561 Hewitt, C., 49, 103, 520 Manna, Z., 528
Feigin, R. D., 590 Hilberman, M., 398 Mayne, J. G., 635
Feldman, J., 8, 86 Hollander, C. R., 312 McCabe, E G., 333
Feltovich, P., 456 Horn, B., 6 McCarthy, J., 6, 86, 152, 670,
Feurzeig, W., 465 Horwitz, J., 604,655 672, 692
Fisher, L. S., 364 Howrey, S. E, 370 McDermott, D. V., 681
Floyd, R., 6, 22 McDermott, J., 174, 201
Forsythe, G., 8 Interlisp (see LISP) MEDIPHOR, 8-9
Fox, M., 505 INTERNIST, 283, 289, 386- Melhorn, J. M., 636
Franks, J. L., 613 387, 425,580 Merigan, T., 9, 590
Friedman, L., 272 Meta-DENDRAL, 1l, 153,
Friedman, R. B., 635 Jacobs, C. D., 653 494
FRL, 614 Jacques, J. A., 263 Michie, D., 151
FRUMP, 615 Jaynes, J., 12 Miller, R. A., 282,289, 387,
Jelliffe, R. W., 365 580
Garvey, T. D., 272 Johnson, P. E., 457, 581 Minsky, M. L., 60, 392,425,
Gaschnig, J., 578 613, 615,617
Genesereth, M. R., 456, 505, Kahneman, D., 211 Mitchell, T. M., 174
5O6 Kay, A., 520 Model, M., 213, 337
Gerring, P. E., 605, 653 Keynes, J. M., 242 MOLGEN,561,563, 565
Gibaldi, M., 367 King, J. J., 20 Moran, T., 23, 38, 42, 46, 52
Ginsberg, A. S., 263 Kintsch, W., 613
Moses, J., 171,304
Glantz, S. A., 635 Koffman, E. B., 478 Muller, C., 16
Glesser, M. A., 263 KRL, 614
Mulsant, B., 312
Goguen, J. A., 245,373 Kulikowski, C., 152, 506
Goldberg, A., 520 Kunin, C. M., 16, t8
NEOMYCIN, 11,396, 460-
Goldman, D., 314 Kunz, J. C., 335, 396, 397, 461,506, 557, 560, 561,
Goldstein, I. P., 374, 456, 424, 441,506, 603
562,565, 567, 676, 679,
457, 464, 465, 468, 469,
698
478, 551,552, 555,614, Langlotz, C. J., 335-336,
615, 617 603, 611,657,660, 664, NESTOR, 335
Gordon, J., 215, 272 692 Neu, H. C., 370
Gorry, G. A., 234-236, 263, Leaper, D. J., 386 Newell, A., 6, 8, 22, 25, 27,
332, 371 Lederberg, J., 8, 296 40, 45, 52, 171,303, 455
GPS, 303,304 Ledley, R. S., 259 Nie, N. H., 639
GRAVIDA,11,500f, 698 Lenat, D. B., 44, 149, 153, Nii, H. E, 11,213, 392-394
Grayson, C. J., 241 442, 464, 505, 562, 563, Nilsson, N. J., 304
Green, C. C., 8-10 573 Norusis, M. J., 263
Greiner, R., 442 Lerner, E, 590
Grinberg, M. R., 506 Lesgold, A. M., 456, 581 OBrien, T. E., 364
Grosz, B., 614 Letsinger, R., 396, 460, 506, ONCOCIN,11, 58, 152, 156,
GUIDON,11, 19, 126, 372- 557, 561,583 159-170, 335, 396, 599,
373,451,458-463, Levitt, K. N., 528 601,603,604-612,
464-492, 494, 531ff, Levy, A. H., 636 653ff, 676, 685, 692,
690, 691,698 LIFER, 624 693,698
Gustafson, D. H., 263,635 Linde, C., 373 Osborn, J. J., 392-394, 397,
Lindsay, R. K., 8, 23, 29, 398
Hannigan, J. E, 589 153, 304
Harr6, R., 240, 244, 249, 257 LISP, 6, 9, 23, 46, 47, 49, 50, Papert, s., 455
Hartley, J., 469 57, 70, 80, 81, 86, 90, PARRY, 333,688, 693
Hasling, D. W., 336 93, 124, 132, 149, 154, Parzen, E., 239
HASP/SIAP(see SU/X) 164, 169, 174, 182, 194, PAS-1I, 25, 32, 40, 44, 46-48
Hayes-Roth, E, 149f, 174, 296, 307, 407, 601,670f, Patil, R. S., 381,387, 396,
201,573 687, 698 505,506, 691

740
Pauker, S. G., 214, 217, 332, SCA paradigm, 141 Teach, R., 336, 603, 635
386, 387-388,411,417, Schank, R., 333,615,617 TEIRESIAS, 11, 18, 152ff,
425,451,540, 551 Scheckler, W. E., 16 153-157, 160f, 165,
Perrier, D., 367 Schefe, E, 214 168, 171-205, 310, 333,
Peterson, O. L., 16 Scheinok, E A., 263 493ff, 507ff, 601,687f
PIP, 332, 386-387 SCHOLAR,9, 55, 331 Teitelman, W., 110, 173
Pipberger, H. V., 263 Schwartz, G. J., 365 Terry, A., 563
PLANNER,49, 50, 103 Schwartz, W. B., 234, 635 Tesler, L. G., 23
Poker player, 149-153 Scott, A. C., 10, 159, 212, Trigoboff, M., 214
Politakis, 1:, 152 221ff, 333-334, 338, Tsuji, S., 659, 664
Polya, G., 455 363, 653, 699 Turing, A. M., 694
Pople, H. F., 304, 386-387, Scragg, G. W., 133, 134 Tversky, A., 246
425,464, 5(16, 683 Selfridge, O., 619
Popper, K. R., 249 Sendray, J., 370 Van Lehn, K., 456
Post, E., 20 Servan-Schreiber, D., 312 Van Melle, W., 18, 67, 157,
Present Illness Pr()gram (see Shackle, G. L., 246, 247 215, 295-301,302, 325,
PIP) Shackleford, E J., 590 494, 653, 699
PROLOG, 333 Shafer, G., 215, 272, 282 VIS, 23, 25, 38, 42, 26
PROSPECTOR, 55, 211, Shortliffe, E. H., 3, 8-9, 50, Visconti, J. A., 16, 18
214,425, 505,578, 581 58, 78, 92, 106-107, VM,11, 19, 313, 392-394,
PSG, 23, 25, 26, 38, 40, 47, 153, 159, 210, 211,214, 397-423,658, 685, 692
48 221ff, 233, 252, 263- Vosti, K., 226
PUFE 11,312, 393-394, 271,272, 302,333,338,
417,424,437-440, 348, 371,373,386, 458, Waldinger, R., 528
441-452,495, 565, 675, 525, 571,576, 599, 601, Wallis, J., 220, 335,371
692, 698 603,611,635, 653, 657, Warner, D. (see Hasling, D.
660, 664, 659, 692, 695, W.)
Rahal, J. J., 590 698, 699 Warner, H. R., 234, 236, 267
Ramsey, E P., 241 Siber, G. R., 364 Waterman, D. A., 8, 32-34,
Reddy, D. R., 195 Sidner, C., 614, 624 40, 41, 46, 48, 52, 149,
Reimann, H. H., 16 Simmons, H. E., 16 153, 201,573
Reiser, J. E, 173 Simon, H. A., 22, 27, 52, Wehrle, E E, 590
Remington, J. S., 590 171,303 Weiner, J. L., 373
Resnikoff, M., 635 Sleeman, D. H., 468 Weiss, C. E, 364
Resztak, K. E., 16 Slovic, E, 267 Weiss, S. M., 152, 374, 469,
Rieger, C., 374 SMALLTALK, 520 5O6
Rinaldo, J. A., 263 Smith, D. E., 394, 441 Weizenbaum, J., 365,693
RLL, 442 Smith, D. H., 462 WEST, 469, 478
Roberts, A. W., 16, 17 SOPHIE, 457, 470 WHEEZE, 392-394, 441-
Roberts, B., 464, 614, 617 Sprosty, P. J., 469 452, 676
Robinson, R. E., 590 Startsman, T. S., 635, 636 Williams, R. B., 16
ROGET,152, 307, 686 Stefik, M. J., 561,565, 614, Wilson, J. V. K., 12
Rosenberg, S., 614 617 Winograd, T., 28, 30, 32,
Ross, E, 263 Stevens, A. L., 456 133, 134, 352, 471,525,
Rubin, M. I., 365 Stolley, E D., 16 558, 614, 627
Rumelhart, D., 627 STUDENT, 45 Winston, E, 6, 153, 174, 673
RX, 153 Suppes, E, 210, 211,246 Wirtschafter, D. D., 604
Rychener, M. D., 44, 45 Suwa, M., 159 Woods, W. A., 532,621
SU/X, 11,393 Wraith, S. (see Bennett, S.
SACON,11,304, 312, 417, Swanson, D. B., 567 W.)
495ff, 675, 691,697 Swartout, W. R., 328, 372- WUMPUS,469, 479
Sager, N., 615 373,691
SAIL, 86, 173, 664 Swinburne, R. G., 240, 242, Yeager, A. S., 590
Salmon, W. C., 245,257 246 Yu, V. L., 221ff, 572, 589,
SAM, 615 Szolovits, E, 214, 217,386- 590, 599
Sanner, L., 612 388, 411,417,425,451,
Savage, L. J., 241 540, 551,655 Zadeh, L. A., 210, 245

741
Subject Index

abbreviated rule language (ARL) (see rule causal knowledge (see knowledge)
language) causal models, 374, 381,456, 460, 484, 539,
acceptance, by user community (see human 548ff
engineering) certainty factors (see also inexact inference),
acid-base disorders, 381 23, 61, 63, 65, 210ff. 81, 91-93, 112,
adaptive behavior, 52 202, 209-232, 233, 247ff, 262, 267-271,
agenda, 441-452, 525f, 561 272ff, 321,374,434, 443f, 472,485,
algebra, 304 525,540, 545, 582,675, 679ff, 700
algorithm (see also therapy algorithm), 3, 125, assigning values to, 154f, 221ff, 252
133, 134, 150, 185, 283 with associative triples, 70
allergies (see drugs, contraindications) combining function, 116, 216, 219, 254ff,
anatomical knowledge (see knowledge, 277, 284
structural) gold standard for, 221 ff
SAND(see also predicates), 80, 97ff, 105 justification for, 56, 221ff, 239ff, 681
AND/ORgoal tree, 49, 103-112 propagation of, 162, 212ff, 255,444
answers, to questions (see dialogue) sensitivity analysis, 217ff, 582, 682f
antecedent rules (see rules) threshold, 94, 211,283
antecedents (see also rules; syntax), CFs (see certainty factors)
architecture (see control; representation) chemistry (see also DENDRAL), 8, 26, 37,
artificial intelligence, 3, 6, 86, 150, 331f, 360, 149, 304
chunks of knowledge (see modularity)
381,424, 455, 663f, 687
circular reasoning, 63, 116ff
as an experimental science, 19, 672
classification problems, 312,426, 675, 697
ASKFIRST(LABDATA), 64, 89, 105, 120,
clinical algorithms, 683
374
clinical parameters (see parameters)
associative triples (see representation)
closed-world assumption, 469, 675
attitudes of physicians (see also human
CNTXT(see contexts)
engineering), 57, 602f, 605, 635-652
code generation (see automatic programming)
attributes (see parameters) cognitive modeling(see also psychology), 26, 211
automatic programming, 188, 193f, 520
combinatorial explosion, 524
commonsense knowledge (see knowledge)
backtracking, 82, 127,410, 420. 697 deductions (see unity path)
backward chaining (see control) completeness (see also knowledge base; logic),
batch mode(see patient data) 199f, 305, 656, 684
Bayes Theorem, 79, 210, 211,214, 215, complexity, 335, 375,377ff, 387
234ff, 263ff, 385, 386 computer-aided instruction (see tutoring)
belief (see certainty factors) concept broadening (see diagnosis, strategies
biases (see evaluation) fi~r)
big switch, 13 concept identification (see knowledge
blackboard model, 395,563 acquisition, conceptualization)
blood clotting (see CLOT) conceptual framework, 374f, 391,495, 684f
bookkeeping information, 433,472,516, 527, conceptualization (see knowledge acquisition,
676 conceptualization)
bottom-up reasoning (see control, forward conclude function, 113ff
chaining) confirmation (see also certainty factors), 57,
breadth-first reasoning (see control) 210, 218, 240, 241,242, 243-245, 247,
272,426, 681
CAI(see tutoring) conflict resolution, 22, 38, 43, 48, 50, 162
cancer chemotherapy (see ONCOCIN) conflicts (see knowledgebase, conflicts in)
case library (see also patient data), 137, 156, consequents (see also rules),
479, 583, 594, 602 consequent theorems (see rules, consequent)
case-method tutoring (see tutoring) consistency (see also rule checking;
categorical reasoning (see certainty factors; subsumption), 65, 77, 156, 159-170,
knowledge, inexact), 56, 209 195,202, 324,432,440, 456, 656, 686

742
checking, 41, 180 data:
contradictions, 308 acceptable values (see expectations)
constraints, 135ff, 145 collection, 398, 409f, 655
constraint satisfaction, 133, 313,685, 697 snapshop of, 313, 393, 675
consuhation, 3,201,302, 3611, 422,426, 457, time varying, 409f, 655ff
610, 635f[, 671,691,701 uncertainty in, 674, 684, 696
example of, 69f, 298ff, 319f, 323f, 427- data base (see also patient data), 22, 112, 386,
430, 533, 553, 704-711 655, 692
subprogram in MYCIN, 5, 10, 67-73, 78- data-directed reasoning (see control, forward
132, 184 chaining)
content-directed invocation (see control) data structures (see representation)
contexts, 60, 64, 70-71, 82, 99, 163,297, debugging {see also knowledge base,
344, 353, 360, 493,670 refinement), 51, 152, 159
context tree, 60, 62, 79, 82-86, 99 104, decision analysis (see also utilities), 217,234,
112, 118ff, 128, 132, 295, 324,494- 332
503, 675,678 decision trees, 23f, 311
context types, 82ff, 495ff declarative knowledge (see knowledge;
instantiation, 62, 118ff, 495ff representation)
in ONCOCINsrules, 163ff, 659 deep knowledge {see knowledge, causal)
contextual information, 179, 185-198,201, defaults (see knowledge)
203, 335, 393f, 396, 398, 410, 421ff, definitional rules (see rules)
471,477,677 definitions (see knowledge, support)
contradictions (see consistency) demand ratings, 637,644-647
contraindications (see drugs, demons(see control)
contraindications), 543 Dempster-Shafer theory of evidence, 215,
control (see also control knowledge), 28, 32, 272ff, 681
33, 43-45, 48-50, 60-65, 103-112, depth-first reasoning (see control)
220f, 358, 416, 435ff, 441-452, 493, design considerations, 3ff, 10, 19, 51, 57-59,
495, 526, 531f, 670, 673,677ff, 696f 67, 78, 176, 238, 304, 331,340, 342f,
backward chaining, 5, 27, 40, 57, 60, 71if, 349, 397f, 403f, 417, 421ff, 458, 467f,
104, 176, 187, 304, 346, 376, 395,426, 505, 531,576ff, 603, 605f, 636, 648,
447,465,511,532, 539, 601,659ff, 649ff, 671ff
677,681,700 diagnosis, 13-16, 234, 312,441,461,545
blocks, 659t" strategies for, 426, 448f, 537, 552ff, 673,
content-directed inw)cation, 527, 539 679, 702
data-directed (see control, torward dialogue (see also humanengineering), 335,
chaining) 467ff, 615, 670, 687
demons, 29, 619 evaluation, 575
of dialogue, 71
exhaustive search, 56, 521 management of, 9, 60, 71, 105, 110, 119,
127, 260, 374, 395,439f, 447,456,
forward chaining, 4f, 13, 27, 57, 60, 195,
387,419, 426, 449, 456, 461,511,539, 459, 465, 470ff, 480ff, 483ff, 601,
606ff, 613f, 618, 651,656
561,601,606, 626, 658, 659, 661ff,
677, 681 mixed initiative, 455,458
goal-directed (see control, backward dictionary (see also humanengineering), 68,
chaining) 73, 99, 193, 306, 349, 620
hypothesis-directed (see control, backward disbelief (see also inexact inference), 247ff,
chaining) 273
message passing, 561 disconfirmation (see confirmation)
model-directed, 195 discourse (see dialogue)
MONITORfunction (see MONITOR) discrimination nets, 625
prototypes f~)r (see prototypes) disease hierarchies (see inference structure)
of search, 57, 220, 04, 521,674 documentation, 529
select-execute loop, 24 domain independence (see generality)
control knowledge(see also rules, meta-rules), drugs:
134, 394ff, 677 allergies to (see drugs, contraindications)
explicitness, 394 antibiotics, 13if, 122ff, 234, 363ff, 372,
correctness (see evaluation) 395, 593, 600
cost-benefit analysis, 62, 215,217, 235, 246, contraindications, 15ff, 135
522, 565, 576, 578, 680 dosing, 17, 125f, 137, 163-170, 334, 363-
COVERFOR,222, 223,474ff, 486, 554 370
credit assignment (see also knowledge base, optimal therapy (see also therapy), 137
refinement), 177,688 overprescribing, 16ff
critiquing model, 467,692 prophylactic use, 17

743
sensitivities, 15, 133, 135 user models in (see user models)
toxicity (see drugs, contraindications) WHY?/HOW?, 75t, 111, 173, 310, 373,
533f, 601,689t"
editor (see also rule editor; rule language), explicitness (see also knowledge; transparency;
180, 307, 391,670 understandability; modularity), 545, 564f
education (see also tutoring), 337,450, 575 extensibility (see flexibility)
efficiency, 48, 576, 578
electronics, 396 facets, 617,619
ELSEclauses, 61, 79ff, 115 facts (see representation, of facts)
English understanding (see dialogue; human fear of computers, 648
engineering; natural language) feedback, 9f, 204, 459, 513, 551, 577, 686,
entrapment, 483ff 702
error checking (see rule checking) FINDOUT (see also rule interpreter), 105-
EVAL, 71 110, 116f, 121, 125f, 130, 132
evaluation, 67, 137, 155ff, 337,439f, 450, flexibility (see al~o knowledgebase,
571-588, 589f1. 602, 651,674,694f, 701 refinement), 3, 6, 50, 149, 296, 311,342,
of acceptability, 575, 578, 602, 636 450, 465, 470, 488, 493, 559f, 565, 669f,
of attitudes, 610f, 635-652 687
gold standard, 572, 579 inflexibility, 503, 520
methodology, 573, 579, 581,588, 590 focus of attention (see also control), 179, 186,
of MYCIN, 571-577, 583-588, 589-596 441,447, 471,479
of ONCOCIN, 606, 610 FOREACH, 223
of performance, 218, 574, 644 formal languages, 6
sensitivity analysis, 217-219, 582 formation problems (see synthesis problems)
events, representation of, 500 forward chaining (see control)
evidence, 498, 550 frames, 60, 63, 394ff, 425,431ff, 437, 441-
evidence gathering (see also control; 452,505,613ff, 617, 633,672,676
confirmation), 5, 176, 460, 469, 674L function templates (see templates; predicates)
696, 700 funding, 599, 698
evidential support (see inexact inference) fuzzy logic, 210, 214, 245-247
hard and soft evidence, 152
exhaustive search, 505, 534 game-playing, 150
EXPECT (attribute of parameters), 88ff, 350 generality (see also EMYCIN),451,465,656,
expectations, 177, 182f, 188, 195, 203,401f, 674, 677,695f, 701
411,417-419, 450, 511,637ff generate and test, 135ff, 145, 674, 697
expertise, 580ff, 636 geography (see SCHOLAR)
nature of, 233, 373, 456, 459f, 467f geology (see PROSPECTOR)
transfer of (see knowledgeacquisition) glaucoma (see CASNET)
use in explanation, 378ff global criteria, 135
experts, 158, 170, 234, 236, 242, 262, 264, goal-directed reasoning (see control)
580, 686 goal rule, 104, 554f
agreement among, 584ff, 592 goal tree (see rule invocation, record of)
disagreement among, 582, 584-588, 682 gold standard (see evaluation; certainty
evaluations of, 582, 584-588 factors)
interactions with (see knowledge grain size (see modularity)
engineering) grammar, 22, 80f, 620-624
expert systems, 3ff, 7, 25, 247, 272, 282,385, graphics and graphical presentations, 336,
455f, 460, 530, 568, 574, 577ff, 634 368, 399f, 419, 608ff
building (see also knowledgeacquisition), GRID/GRIDVAL, 102f
150, 387,577, 670, 686ff
validating (see evaluation) handcrafting (see knowledge acquisition)
explanation (see also question-answering; hardware, 575,578, 612, 659, 665
reasoning status checker; natural help facilities, 64, lllf, 310, 474, 480f, 599,
language), 27, 31, 42, 65, 133, 161, 171, 704f
233, 331-337, 338-362,363-370, 371- HERSTORY list (see rule invocation, record
388, 394,451,457,465,475, 493,531- of)
568, 575, 599f, 644, 651,664, 670, 674, heuristics, 3, 48, 50, 133, 144, 150, 2ll, 482,
677,688ff, 693, 695, 705, 707f 524, 550f, 676, 681
of drug dosing, 363-370 heuristic search (see control)
of meta-rules, 526, 528 hierarchical organization of knowledge (see
of rules, 38, 72, 132, 133, 238, 305f knowledge)
subprogram in MYCIN,4, 7, 10, 57, 67, Hodgkins disease (see also ONCOCIN),656
73ff, 79, 111, 112,339, 371f, 458, 532, HOW?(see explanation)
537 human engineering, 19, 42, 146, 156, 308,
of therapy (see therapy, explanation of) 309f, 331-337, 338, 349, 411,439, 599-

744
612, 674, 678, 688t"t", 691t"t" causal, 335, 374ff, 377ff, 385ff, 396, 460,
acceptance, 3"421", 337, 37 lff, 578,595,599, 503,552ff, 672, 676, 702
637,688, 695 commonsense, 73, 150, 540, 559, 651
dictionary of terms and synonyms (see compiled, 503f, 541,551,566, 679, 690
dictionary) default, 61, 164f, 376, 432, 509, 559, 620,
English understanding (see also natural 659
language), 67, 73, 76, 693L 701 domain-specific (see also vocabulary), 149
I/O handling (.gee also dialogue), 68, 110f, hierarchic organization (see also contexts,
297,600 context tree), 274f, 292, 403f, 515ff,
modelsof interaction, 67 l, 691 f, 70 l 678
preview (.gee preview mechanism) inexact (see also certainty factors), 67, 209ff,
unity path (.gee unity path) 416f, 673, 683ff
hypothesis-directed reasoning (see control) interactions, 582
hypntbesis tormation, 8 intermediate concepts, 551,560
hysteresis, 406, 422 judgmental, 3,236ff, 316, 525,540, 663,
682
I/O (.tee dialogue) about knowledge (see meta-level knowledge)
ICAI (see tutoring) meta-level (see also rules, meta-rules), 172-
IDENT, 93, 107, ll6, 123, 222, 223 205, 328, 336, 342, 396, 458, 461,464,
ill-structured problems, 9, 209, 683, 686 474, 476ff, 488, 493-506, 507-530
importance (see alto CFs), 335, 375, 377ff, multiple uses of, 468f, 477, 507, 529, 673
387, 4.42, 438, 442,449 pedagogical (see also tutoring), 464, 691
incnmpletcncss (.gee completeness) procedural, 57, 64, 341,446, 528, 554,
inconsistency (.gee consistency) 557, 619, 677
independence, 258f, 264, 267, 270, 386, 685 separation from inference procedure, 6,
indexing, 13,416, 441,524, 538f, 557,562, 174, 175, 295-301,464, 527,678, 696
565,670, 677,679, 697 separation of types, 134, 437,457, 460f,
indirect referencing (see also control, content- 493, 506, 508,531,670, 676, 679, 691
directed invocation), 56.4 strategy (see also rules, meta-rules), 19, 56,
induction, 174, 201,687f 73,315, 336, 407,467, 470, 503,
inexact inference (.gee also certainty factors), 504ff, 508, 521ff, 531,537ff, 551-
50, 56, 63, 162, 209, 233ff, 255f, 392, 559, 564f, 678, 691,702
416, 433,442ff, 482,664, 679-685 structural, 316, 496, 504ff, 516, 538ff,
vs. categorical reasoning, 56, 295, 317 562ff, 676, 691
coml)ining function, (.)3, l l6, 21 l, 216, support, 126, 372, 385,464, 469, 474,
one-number calculus, 214 475f, 504ff, 539, 556, 565
precision, 210, 680, 682, 700 of syntax (see templates)
inexact knowledge (.gee knowledge) taxonomic, 396, 425,670, 676
infectious diseases, 13ff, 55, 104, 214, 217, temporal, 406f, 416, 420, 658
234, 260, 370, 591 textbook, 456
inference (.gee also control): knowledge acquisition (see also ROGET;
deductive (.gee logic) TEIRESIAS; knowledge engineering),
engine (see also rule interpreter), 175f, 33, 50f, 55f, 59, 76f, 149-158, 159ff,
295tt 159f, 168, 171-205,225ff, 297ff, 306ff,
structure (see also contexts, context tree), 314, 318, 325ff, 372, 387,411,461,462,
55, 314, 316f, 321f, 326f, 374ff, 392, 493, 507,510ff, 517ff, 560, 670, 673,
4()7f, 448f, 485f, 534ff, 542ff, 554f, 676f, 682,686ff, 700
567 advice taking, 670
inheritance, 515, 563,676t" conceptualization, 155, 161, 170, 314, 326f,
INITIALDATA(MAINPROI)S), 56, 60, 119, 503, 686
120, 705 debugging (see knowlege base, refinement),
intensive care unit (ICU), 393,397-423 160ff
interaction (see models of interaction) hand crafting, 151, 171,513,687
interdiscil)linary research, 8ff learning, 33, 52, 152f, 186f, 203, 205,513,
interface (.tee humanengineering) 644, 651
lnterlisp (.gee LISP) models of, 150ff, 687f
Interviewer (in ONCOCIN),605, 653, 656 subprogram in MYCIN,4, 7, 10, 67, 76f
iteration, 313 knowledge base, 342, 343, 465, 697, 700
completeness, 156, 159ff, 159-170
jaundice, 273ff conflicts in, 162, 559, 582
construction (see knowledge acquisition)
key factors, in rules, 477, 543, 550, 702 czar, 221-228, 687
keyword matching (see parsing) display of (see also explanation), 160, 169
knowledge: maintenance, 309, 519, 521,582,644,
algorithmic, 57, 66, 124 686ff

745
refinement, 9, 72, 137, 150, 152ff, 159, chunks of knowledge, 27, 39, 42, 52, 55,
161, 172ff, 187f, 297ff, 310f, 327f, 71, 72, 85, 154, 224, 238, 242, 438
331,337, 391,439, 528, 582, 644, 686 global, 30, 32
structure of, 493-506 grain size, 503f, 672
validation (see also evaluation), 129, 152, modus ponens (see logic)
594 MONITOR (see also rule interpreter), 105-
knowledge-based system (see expert system) 110, 116f, 121f, 125f, 130, 132
knowledge engineering, 5-7, 55f, 145f, 149- monitoring, 9, 393, 397-423,675
158, 159f, 170, 202, 567,672, 686, 700 MYCINgang, 222-232, 699, 703
tools for, 152-158, 170, 171,295-301,
302-313, 324, 655, 686ff, 699 natural language (see also human
knowledge sources, 557, 560ff engineering), 57, 67, 73, 76, 144, 176,
KNOWN (see predicates) 179f, 182, 188-196, 202, 210, 306, 331,
333, 335,340, 342, 348ff, 422, 458,601,
LABDATA(see ASKFIRST) 605,613-634, 693f
language: nonmonotonic reasoning (see logic)
formal, 22
understanding (see natural language) object-centered programming, 56
learning (see knowledge acquisition) oncology (see ONCOCIN)
least commitment, 565 opportunistic control (see blackboard model)
lesson plan, 47 I, 479 optimization (see also constraints, satisfaction),
LHS(see also rules), 133
linguistic variables (see fuzzy logic) ordering (see also control):
logic, 65, 392, 212, 343, 345,672, 681 of clauses/questions (see also dialogue,
completeness, 156 managementof), 61, 63, 72, 130f, 395,
conflict, 162 535, 554, 678f
consistency, 41, 42, 43, 238 of rules (see also rules, meta-rules), 130,
contradiction, 41,230, 238 535, 679
modus ponens, 21, 65 organisms (see infectious diseases)
nonmonotonic (see also backtracking), 558, overlay model (see student models)
681
predicate calculus, 28, 233 parallel processing, 82
quantification, 62, 65 parameters, 70, 86-90, 118, 163f, 297, 298ff,
redundancy, 162 321,353, 374, 376, 407ff, 496, 659
subsumption, 41, 156, 162,230, 259 multi-valued, 87, 108, 283, 534, 619
LOOKAHEAD,89f, 115, 355 properties of, 88-90, 408
LTM(see also memory), 33 single-valued, 87, 282, 619
symbolic values for, 403,418f
MAINPROPS(see INITIALDATA) types, 87, 408
maintenance (see knowledge-base typical values tor, 445
yes-no, 87, 93f, 534
maintenance)
management (see project management) parsing, 73, 76, 188, 193ff, 333, 349-354,
man-machine interface (see dialogue) 412, 480, 511,616, 620ff, 693, 701
mass spectrometry (see DENDRAL) part-whole relations (see also contexts, context
matching (see also predicates), 186 tree), 498, 545, 677
mathematical models, 316, 334, 335, 396 patient data, 65, 79, 112-115, 127-129,
445f, 583
mathematics, 151
pattern matching, 73
MB/MD (see also certainty factors), 211,215,
patterns, in rules (see rule models)
247ff, 265ff, 288, 679 pedagogical knowledge (see knowledge)
medicine, use of computers in, 304, 640, 652 performance (see evaluation)
memory, 22, 26, 31, 33, 44, 613 pharmacokinetics, 334, 363ff
meningitis, 217 philosophy of science, 210, 239ff
message passing (see control) planning, 136, 313, 336, 534, 563
meta-rules (see rules) poker, 8, 46
mineral exploration (see PROSPECTOR) precision, 210, 680, 682, 700
missing rules (see also knowledge base, predicates (see also templates), 37, 62, 65, 70,
completeness), 162f, 511 72, 80, 87, 93-99 182, 192, 324,412-
models (see rule models) 415,421), 510
models of interaction (see also consulation; presentation methods (see dialogue)
critiquing model; monitoring), 30If, 692 preview mechanism, 61, 63, 72, 131,395,
modifiability (see design considerations; 493,678, 679
flexibility) probabilities (see also inexact reasoning; Bayes
modularity, 10, 47f, 56, 305, 361,458, 529, ~Iheorem), 70, 79, 91,234ft. 239-242,
670, 676, 684, 702 259, 263-271,385-387, 680

746
problem difficuhy, 675 uniform, 52, 396, 441,526, 532, 568, 675
problem solving (see control; evidence REPROMPT, 210
gathering) resource allocation, 505
production systems, 6ff, 12t, 20ff, 672, 675, response time (see human engineering)
7OO restart (see also backtracking), 129
appropriate domains, 28 RHS(see also rules),
pure, 20, 30 risks (see utilities)
taxonomy, 21, 45 robustness, 67, 685, 692
programming: rule-based system, 672
environment, 306-311 rule checking (see also knowledgebase,
knowledge programming, 153, 670, 688 completeness), 180, 183, 197f, 307f, 324,
style, 529f 513
program understanding, 528 rule compilation, 311
project management, 674 rule editor, 180, 195f, 493, 512
PROMPT,88, 110, 118, 210, 617, 619 rule interpreter (see also inference engine),
prompts, 64, 88 24, 31, 61, 71ff, 212, 304f, 310, 341,
propagation of" uncertainty (see certainty 524, 534
factors; knowledge, inexact) rule invocation, record of, 65, 74, 115, 133,
protocols, 604ff, 654 138ff, 160, 187, 333, 345, 354, 358, 458,
prototypes (see also frames; rule models), 56, 469
189f, 424-440, 505 rule language, 153, 297
prototypical values (see knowledge, default) rule model, 76, 156, 165, 168, 189-200, 202,
psychology, 25, 47, 52, 210, 338, 388, 439, 355, 477, 508, 509ff, 520, 539
448, 451, 46 I, 566, 613, 651 rule network (see inference structure)
psychopharmacology (see BLUEBOX; rule pointers, 374
HEADMED) rules, 4, 6, 12f, 55-66, 79-103, 134, 209,
pulmonary physiology (see PUFF; VM) 297, 305, 375-377, 410-413,431-434,
675-677
QA(see question-answering) advantages, 72, 238, 669f
quantification (see logic) annotations in, 62, 367
question-answering (see also explanation), 73, antecedent, 60, 678
138ff, 198ff, 306, 333, 340, 342, 348- Babylonian, 12f
362, 457, 601 causal, 383, 540f
examples, 74, 143, 348, 349, 350f, 355ff, circular (see circular reasoning)
361, 711-713 consequent, 49, 103
default, 164
randomized controlled trials, 579 definitional, 164, 295, 383,541,676, 678
Reasoner (in ONCOCIN),606, 653, 657 domain fact, 541
reasoning network, 103ff, 108 examples of, 71, 100, 164, 238, 296, 317,
reasoning status checker (RSC) (see also 322, 344, 432, 447, 543ff, 660
explanation), 73, 75, 340ff, 346ff grain size (see modularity)
recursion, 524 identification, 540
redundancy, 157, 162, 684f independence of (see modularity)
refinement (see control; knowledge indexing, 164
acquisition) initial, 164
reliability (see robustness) justifications for, 367, 475, 506, 531ff,
relevancy tags, 377 540ff, 675, 690
renal thilure (see also drugs, dosing), 332, mapping, 62
365ff meta-rules, 19, 48, 56, 63, 65, 73, 130, 212,
representation (see also frames; logic; 383, 395, 521-527, 535, 556ff, 676,
prototypes; rules; schemata; semantic 678f
networks), 8, 19, 161, 173, 323ff, 391t"t, ordering of clauses in (see ordering)
406t", 424-440, 441-452, 514ff, 527ff, predictive, 462
531-568, 651,673, 675tt, 697 premises of, 496
associative triples, 23, 68, 76, 86, 87, 190, production rules, 21ff, 55ff, 59ff, 70ff, 70f,
209, 282, 304, 509, 516 136, 161,391f, 700
explicitness of (see explicitness) refinement rules, 434
expressive power of, 134, 670, 676f, 686 restriction clauses, 550
of facts (see also representation, associative schemata (see schemata)
triples), 431,434 screening, 661
lists, 99 screening clauses in, 61,394f, 544f, 549,
procedures, 20, 28, 57, 64, 392, 446, 557, 566, 679
566 self-referencing, 42, 61, 115, 130, 383, 385,
tabular knowledge, 99t" 394, 558f, 680, 682
uncertainty (see knowledge, inexact) statistics, 157f, 218, 688

747
strategy, 47, 56, 387, 396, 556ff theory of choice, 246
syntax of (see also predicates), 4, 35, 46f, therapy, 9, 13-18, 57, 133-146, 234, 336,
70, 76, 79, 157,212,392, 401,410- 399-407,411,593,671,713-715
412 algorithm, 57, 63, 66, 122ff, 132, 133ff,
summary rules, 434ff 261-262,685
tabular, 62, 217, 546ff comparison, 141-144
therapy, 136, 140 explanation of, 133, 138-141, 144f, 333,
translations of, 71, 90, 102f, 238 715
triggering, 434, 441,444 protocols, 163-170,654f
tutoring (see tutoring) threshold, in CF model (see also certainty
uncertainty in, 674 factors), 211,216, 220, 222-232,681
world fact, 540 time (see knowledge, temporal)
rule types, 383 top-down refinement (see also control), 555,
562, 565
SAME(see predicates) topic shifts, 615ff
scene analysis (see vision) toxicity (see contraindications)
schemata, 476, 508, 514-520, 613ff, 616ff, trace, of reasoning (see rule invocation,
624, 627,633 record of)
screening clauses (see rules) tracing, of parameters, 64, 108, 304, 345
scripts, 548ff, 615, 617 TRANS,90, 102f, 119, 210, 617,619
search (see control) transfer of expertise (see knowledge
second-guessing (see expectations) acquisition)
semantic nets, 9, 55, 374, 392, 425,545 transition network, 138ff, 145, 348, 404ff,
sensitivity analysis (see evaluation, sensitivity 421
analysis) transparency (see understandability)
signal understanding, 343 trigger (see control, forward chaining), 387
simplicity, 323f, 392, 670, 676f triples (see representation)
simulation, of human problem solving, 313, Turing machines, 21, 52
315, 327,439, 461 Turings test (see evaluation)
smart instruments, 345 tutoring (see also GUIDON),19, 58, 126, 145,
Socratic dialogue, 455,484 238, 328, 335, 371,372,396, 455-463,
speech understanding (see also HEARSAY), 464-489,494, 531-568, 670, 674, 676,
201,692f 688ff, 701
spelling correction (see humanengineering, case method, 457,467ff
I/O handling) rules, 372,463,472ff, 690
spirometer (see PUFF)
state transition network (see also uncertainty (see certainty factors; knowledge,
representation), 134, 138, 404-407,421 inexact)
statistics (see also rules, statistics) 209, 210, understandability (see also explanation), 3, 9,
234, 239, 509, 591,603,639 671 41, 56, 150, 174, 176, 331f, 334, 337,
STM(see also memory), 22ff 403,437-440, 450f, 493, 503, 506
strategies (see knowledge, strategy) uniformity, of representation (see
structural analysis (see SACON) representation)
structured programming, 35 unity path, 63, 73, 130, 377, 396, 493
student models, 466, 471,473,478, 483ff UPDATED-BY,90, 105, 229, 231,355, 679
subsumption, 156, 162, 308, 324, 685 user interaction (see humanengineering)
summaries of conclusions, 399, 419, 430 user models (see also student models), 335,
symbolic reasoning (see artificial intelligence) 373ff, 387,466
synonyms(see dictionary) utilities (see cost-benefit analysis)
syntax (see also rules, syntax of), 35, 508, 521,
529, 620ff validation (see evaluation)
verification/checking, 159, 161, 184
tabular data, 62, 482 vision, 189, 201,613
tabular knowledge (see representation) vocabulary, of a domain, 73, 150, 210, 442ff,
TALLY (see also certainty factors), 98, 114, 467, 503,564, 684,686, 702
211 volunteered information (see also control,
taxonomy (see knowledge, taxonomy) forward chaining), 602,613ff, 678, 691,
teaching (see tutoring) 693
technology transfer, 395,698f examples, 628ff
templates, for functions or predicates, 37, 72,
157, 164f, 188, 194, 305, 344 477, 508, weight of evidence (see inexact inference)
520f what-how spectrum, 315
terse mode, 64 WHY?(see explanation)
test cases (see case library) workstations (see hardware)
testing (see evaluation) world knowledge (see knowledge, common
theorem proving (see logic) sense)
748

You might also like