0% found this document useful (0 votes)
458 views130 pages

Natural Language Processing: Perception, Communication, and Expert Systems

Uploaded by

seemasurana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
458 views130 pages

Natural Language Processing: Perception, Communication, and Expert Systems

Uploaded by

seemasurana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 130

PART

Perception, Communication, and Expert


Systems

A en

Natural Language
Processing

Perception and communication are essential components of intelligent behavior.


They provide the ability toeffectively interact with our environment. Humans perceive
and communicate through their five basic senses of sight, hearing, touch. nwfl.
and taste, and (heir ability to generate meaningful utterances. Two.of the senses,
sight and hearing are especially complex and require cóncious inferencing. Developing
programs that understand natural language and that comprehend visual scenes are
two of the most difficult tasks facing Al researchers.
Developing programs that understand a natural language is a difficult problem.
Natural languages are large. They Contain an infinity of different sentences. No
matter how many sentences a person has heard or seen, new ones can always be
produced. Also, there is much ambiguity in a natural language. Many words have
several meanings such as can, bear, fly, and orange, and sentences can.have different
meanings in different contexts This makes the creation of programs that." understand"
a natural language, one of the most challenging tasks in Al. It requires that a
program transform sentences occurring as part of a dialog into data strictures which
convey the intended meaning of the sentences to a reasoning program. In general,
this means that the reasoning program must know a lot about the structure of the
language, the possible semantics, the beliefs and goals of the user, and a great
deal of general world knowledge.

227

228 Natural Language Processing Chap. 12

12.1 INTRODUCTION

Developing programs to understand natural language is important in Al because a


natural form of communication with systems is essential for user acceptance. Further-
more, one of the most critical tests for intelligent behavior is the ability to communicate
effectively. Indeed, this was the purpose of the test proposed by Alan Turing (see
Chapter 2). Al programs must be able to communicate with their human counterparts
in a natural way, and natural language is one of the most important mediums for
that purpose.
Before proceeding further, a definition of understanding as used here should
be given. We say a program understands a natural language if it behaves by taking
a (predictably) correct or acceptable action in response to the input. For example,
we say a child demonstrates understanding if it responds with the correct answer
to a question. The action taken need not be an external response. It may simply be
the creation of some internal data structures as would occur in learning some new
facts. But in any case, the structures created should be meaningful and correctly
interact with the world model representation held by the program. In this chapter
we explore many of the important iss"ues related to natural language understanding
and language generation.

12.2 OVERVIEW OF LINGUISTICS

An understanding of linguistics is not a prerequisite to the study of natural language


understanding, but a familiarity with the basics of grammar is certainly important.
We must understand how words and sentences are combined to produce meaningful
word strings before we can expect to design successful language understanding
systems. In a natural language, the sentence is the basic language element. A sentence
is made up of words which express a complete thought. To express a complete
thought, a sentence must have a subject and a predicate. The subject is what the
sentence is about, and the predicate says something about the subject.
Sentences are classified by structure and usage. A simple sentence has one
independent clause comprised of a subject and predicate. A compound sentence
consists of two or more independent clauses connected by a conjunction or a semicolon.
A complex sentence consists of an independent clause and one or more dependent
clauses. Sentences are used to assert, query, and describe. The way a sentence is
used determines its mood, declarative, imperative, interrogative, or exclamatory.
A word functions in a sentence as a part of speech. Parts of speech for the
English language are nouns, pronouns, verbs, adjectives, adverbs, prepositions,
conjunctions,, and interjections. -
A noun is a name for something (person. place, or thing). Pronouns replace
nouns when the noun is already known. Verbs express action, being, or state of
being. Adjectives are used to modify nouns and pronouns, and adverbs modify
verbs, adjectives, or other adverbs. Prepositions establish the relationship between
Sec. 12.2 Overview of Linguistics 229

a noun and some other part of the Sentence. Conjunctions join words or groups of
words together, and interjections are used to express strong feelings apart from the
rest of the sentence.
Phrases are made up of words but act as a single unit within a sentence.
These form the building blocks for the syntactic structures we consider later.

Levels of Knowledge Used in Language Understanding

A language understanding program must have considerable knowledge about the


structure of the language including what the words are and how they combine into
phrases and sentences. It must also know the meanings of the words and howthey
contribute to the meanings of a sentence and to the context within which they are
being used. Finally, a program must have some general world knowledge as well
as knowledge of what humans know and how they reason. To carry on a conversation
with someone requires that a person (or program) know about the world in general,
know what other people know, and know the facts pertaining to a particular conversa-
tional setting. This all presumes a familiarity with the language structure and a
minimal vocabulary.
The component forms of knowledge needed for an understanding of natural
language are sometimes classified according to the following levels.

Phonological. This is knowledge which relates sounds to the words we


recognize. A phoneme is the smallest unit of sound. Phones are aggregated into
word sounds.

Morphological. This is lexical knowledge which relates to word construc-


tions from basic Units called morphemes. A morpheme is the smallest unit of meaning:
for example, the construction of friendly from the root friend and the suffix 1'..

Syntactic. This knowledge relates to how words are put together or structured
to form grammatically correct sentences in the language.

Semantic. This knowledge is concerned with the meanings of words and


phrases and how they combine to form sentence meanings.

Pragmatic. This is high-level knowledge which relates to the use of sentences


in different contexts and how the Context affects the meaning of the sentences.

World. World knowledge relates to the language a user must have in order
to understand and carry on a conversation. It must include an understanding of the
other person's beliefs and goals.
The approaches taken in developing language understanding programs generally
follow the above levels or stages. When a string of words has been detected, the
230 Natural Language Processing Chap. 12

sentences are parsed or analyzed to determine their structure (syntax) and grammatical
correctness. The meanings (semantics) of the sentences are then determined and
appropriate representation structures created for the inferencing programs. The whole
process is a series of transformations from the basic speech sounds to a complete
set of internal representation structures.
Understanding written language or text is easier than understanding speech.
To understand speech, a program must have all the capabilities of a text understanding
program plus the facilities needed to map spoken sounds (often corrupted by noise)
into textual form. In this chapter, we focus on the easier problem, that of natural
language understanding from textual input and information processing. The process
of translating speech into written text is considered in Chapter. 13 under Pattern
Recognition and the process of generating text is considered later in this chapter.

General Approaches to Natural Language Understanding

Essentially, there have been three different approaches taken in the development of
natural language understanding programs. (1) the use of keyword and pattern match-
ing, (2) combined syntactic (structural) and semantic directed analysis, and (3).compar-
ing and matching the input to real world situations (scenario representations).
The keyword and pattern matching approach is the simplest. This approach
was first used in programs such as ELIZA described in Chapter 10. It is based on
the use of sentence templates which contain key words or phrases such as -
my mother ," "I am ___________. and, "1 don't like ," that
are matched against input sentences. Each input template has associated with it
one or more output templates, one of which is used to produce a response to the
given input. Appropriate word substitutions are also made from the input to the
output to produce the correct person and tense in the response (I and me into you
to give replies like "Why are you "). The advantage of this approach is
that ungrammatical, but meaningful sentences are still accepted. The disadvantage
is that no actual knowledge structures are created; so the program does not really
understand.
The third approach is based on the use of structures such as the frames or
scripts described in Chapter 7. This approach relies more on a mapping of the
input to prescribed primitives which are used to build larger knowledge structures.
It depends on the use of constraints imposed by context and world knowledge to
develop an understanding of the language inputs. Prestored descriptions and details
for commonly occurring situations or events are recalled for use in understanding a
new Situation. The stored events are then used to fill in missing details about the
current scenario. We will be returning to this approach later in this chapter. Its
advantage is that much of the computation required for syntactical analysis is bypassed.
The disadvantage is that a substantial amount of specific, as well as general world
knowledge must be prestored.
The second approach is one of thq most popular approaches currently being
Sec. 12.3 Grammars and Languages 231,

used and is the main topic of the first part of this chapter. With this approach.
knowledge structures are constructed during a syntactical and semantical analysis
of the input sentences. Parsers are used to analyze individual sentences and to
build structures that can be used directly or transformed into the required knowledge
formats. The advantage of this approach is in the power and versatility it provides.
The disadvantage is the large amount of computation required and the need for
still further processing to understand the contextual meanings of more than one
sentence.

12.3 GRAMMARS AND LANGUAGES

A language L can be considered as 4 set of strings of finite or infinite length,


where a string is constructed by concatenating basic atomic elements called symbols.
The finite set v of symbols of the language is called the alphabet or vocabulary.
Among all possible strings that can be generated from v are those that are well-
formed, the sentences (such as the sentences found in a language like English).
Well-formed sentences are constructed using a set of rules called a grammar. A
grammar G is a formal specification of the sentence structures that are allowable in
the language, and the language generated by the grammar G is denoted by 1(G).
More formally, we define a grammar G as

G = (v.v.s,p)

where v,, is a set of nonterminal symbols, v 1 a set of terminal symbols, s is a


starting symbol, and p is a finite set of productions or rewrite rules. The alphabet
v is the union of the disjoint sets v, and v, which includes the empty string e. The
terminals v1 .are symbols which cannot be decomposed further (such as adjectives,
nouns or verbs in English), whereas the nonterminals can be decomposed (such as
a noun or verb phrase).
A general production rule from P has the form

xv: — xw:

where x. y. :, and w are strings from v. This rule states that v should be rewritten
as w in the context of x to z where x and can be any string including the empty
string e.
As an example of a simple grammar G, we choose one which has component
pans or constituents from English with vocabulary Q given by

QM = { S, NP, N, VP, V. ART}


Q.1 {
boy, popsicle, frog, ate, kissed, flew, the, a}

and rewrite rules given by



232 Natural Language Processing Chap. 12

P: S—.NPVP
NP— ART N
VP V NP
N - boy I popsicle I frog
V -ate i kissed I flew
ART—I' the I a
where the vertical bar indicates alternative choices.
S is the initial symbol (for sentence here), NP stands for noun phrase.. VP
stands for verb phrase. N stands for noun. V is an abbreviation for verb, and ART
stands for article.
The grammar (i defined above generates only a small fraction of English,
but it illustrates the general concepts of generative grammars. With this G. sentences
such as the following can he generated.

The boy ate a popsicle.


The frog kissed a boy.
A boy ate the frog.

To generate a sentence, the rules from P are applied sequentially starting


with S and proceeding until all nonterminal symbols are eliminated. As an example,
the first sentence given above can be generated using the following sequence of
rewrite rules:

S - NP VP
ART N VP
the N VP
-. the boy VP
the boy V NP
-. the boy ate NP
-. the boy ate ART N
- the boy ate a N
-* the boy ate a popsicle

It should be clear that a grammar does not guarantee the generation of meaningful
sentences, only that they are structurally correct. For example, a gramatically correct,
but meaningless sentence like "The popsicle flew a frog" can be generated with
this grammar.
We learn a language by learning its structure and not by memorizing all of
the sentences we have ever heard, and we are able to use, the language in a variety
of ways because of this familiarity. Therefore, a useful model of language is one
which characterizes the permissible structures through the generating grammars.
Unfortunately, it has not been possible to formally characterize natural languages
with a simple grammar. In other words, it has not been possible to classify natural
languages in a mathematical sense as we did in the example above. More constrained
Sec. 12.3 Grammars and Languages 233

languages (formal progrmmiflg languages) have been classified and studied through
the use of similar grammars. including the Chomsky classes of languages (1965).

The Chomsky Hierarchy of Generative Grammars

Noarn Chomskv defined a hierarchy of grammars he called types 0. I. 2. and 3.


Type 0 grammar is the most general. It is obtained by making the simple restriction
that r cannot be the empty string in the rewrite form xv: — xiv:. This broad generality
requires that a computer having the power of a Turing machine be used to recognize
sentences of type 0.
The next level down in generality is obtained with type I grammars which
are called context-sensitive grammars. They have the added restriction that the length
of the string on the right-hand side of the rewrite rule must he at least as long as
the string on the left-hand side. Furthermore, in productions of the form xv: -
xw:, v must be a single nonterminal symbol and w. a nonempty string. Typical
rewrite rules for type I grammars take the forms
S - as
S -. aAB
AB - BA
aA - ab
aA -. aa
where the capitalized letters are nonterminals and the lower case letters terminals.
The third type. the type 2 grammar, is known as a context-free grammar. It
is characterized by rules with the general form <symbol> <symbol!>
<synibolk> where k I and where the left-hand side is a single nonterminal
s y mbol. A xv: where .4 is a single nonterminal. Productions for this t y pe take
forms such as
S - as
S - ash
S -. aB
S - aAB
A—a
B—b
The final and most restrictive type is type 3. It is also called a finite state or
regular grammar. whose rules are characterized by the forms
A aB
A—a
The languages generated by these grammars are also termed types 0. I (context-
sensitive). 2 (context-free), and 4 (regular) corresponding to the grammars which
generate them.
The regular and context-free languages have been the most widely studied
234 Natural Language Processing Chap. 12

and best understood. For example, formal programmtnglanguages are typically


based on context-free languages. Consequently, much of the work in human language
understanding has been related to these two types. This is understandable since
type 0 grammars are too general to be of much practical use, and type I grammars
are not that well understood yet.

Structural Representations

It is convenient to represent sentences as a tree or graph to help expose the structure


of the constituent parts. For example, the sentence "The boy ate a popsicle" can
be represented as the tree structure depicted in Figure 12.1. Such structures are
also called phrase markers because they help to mark or identify the phrase structures
in a sentence.
The root node of the tree in Figure 12.1 corresponds to the whole sentence
S. and the constituent parts of S are subtrees. For this example, the left subtree is
a noun phrase, and the right subtree a verb phrase. The leaf or terminal nodes
contain the terminal symbols from v1.
A tree structure such as the above represents a large number of English sentences.
It also represents a large class of ill-formed strings that are nonsentences like 'The
popsicle flew a tree." This satisfies the above structure, but has no meaning.
For purposes of computation, a tree must be represented as a record, a list or
similar data structure. We saw in earlier chapters that a list can always be used to
represent a tree structure. For example, the tree in Figure 12.1 could be represented
as the list

(S (NP ((ART the)


(N boy))
(VP (V ate)
(NP (ART a)
(N popsicle)))))

NP VP

I
ART N

he boy at. ART N

Figure 12.1 A phrase marker or


a popicte Syntactic tree.
Sec. 123 Grammars and Languages 235

A more extensive English grammar than the one given above can be obtained
with the Wition of other constituents such as prepositional phrases PP. adjectives
ADJ, determiners DEl. adverbs ADV, auxiliary verbs AUX. and so on. Additional
rewrite rules permitting the use of these constituents could include some of the
following:
PP - PREP NP
VP-I. V ADV
v p V PP
VP-. V NP PP
VP-. AUX V NP.
DET-. ART ADJ
DET-.ART
These extensions broaden the types of sentences that can he generated by permitting
the added constituents in sentence forms such as

The mean boy locked the dog in the house.


The cute girl worked to make some extra money.

These sentences have the form S -* NP VP PP.

Tranfotietion& Grammars

The generative grammars described above generally produce different structures


for sentences having different syntactical forms even though they may have the
same semantic content: For example, the active and passive forms of a sentence
will result in two different phrase marker structures. The sentences be kissed
Sue" (active voice) and "Sue, was kissed by Joe' (passive voice( result in the
structures depicted in Figure 12.2 where the subject and object roles for Joe and
Sue are switched. .
Obtaining different structures from sentences having the same meaning is unde-
sirable in language understanding systems. Sentences with the same meaning should
always map to the same internal knowledge structures. In an attempt to repiir these
shortcomings in generative grammars, Chomsky (l965)extended them by incorporat-
ing two additional components to the basic syntactic component. The added compo-
nents provide a mechanism to produce single representations tr sentences having
the same meanings through a series of transformations. This extended grammar is
called a transformational generative grammar. its additions include a semantic compo-
nent and a phonological component. They are used to interpret the output of the
syntactic component by producing meanings and sound sequences. The transforma-
tions are essentially tree manipulatiort- rules which depend on the use of an extended
lexicon (dictionary) containing a number of semantic features for each word.
Using a transformational generative grammar. a sentence is analyzed in two
stages. In one stage the basic structure of the sentence is analyzed to determine the
236 Natural Language Processing Chap, 12

""^
I
S
NP Vp NP 'VP

V NP Sue VERB pp

kiss Si.is AUX V PREP NP

was kiss by Joe

(a) (b)

Figure 12.2 Sti-ucturei for (a) active and (b) passive voice.

grammatical constituent parts. This reveals the surface structure of the sentence,
the way the sentence is used in speech or in writing. This structure can be transformed
into another one where the deeper semantic structure of the sentence is determined.
Application of the transformation rules can produce a change from passive
voice to active voice, change a question to declarative form, and handle negations.
subject-verb agreement, and so on For example, the structure in 12.2(b) could be
transformed to give the same basic structure as that of 12.2(a) as is illustrated in
Figure 12.3.
Transformational grammars were never widely adopted as computational models
of natural language. Instead, other grammars. including case grammars, have had
more influence on such models.

Case Grammars

A case relates to the semantic role that a noun phrase plays with respect to verbs
and adjectives. Case grammars use the functional relationships between noun phrases
and verbs to reveal the deeper case of a sentence. These grammars use the fact

V NP PASSIVE
Joe

I
kiss
I
Sue
Iby Figure 12.3 Passive voice transformed
to active voice.
Sec. 12.3 Grammars and Languages 237

that verbal elements provide the main source of structure in a sentence since they
describe the subject and objects.
In inflected languages like Latin, nouns generally have different ending forms
for different cases. In English these distinctions are less pronounced and the forms
remain more constant for different cases. Even so, they provide some constraints.
English cases are the nominative (subject of the verb). possessive (showing possession
or ownership). and objective (direct and indirect objects). Fillmore (1968. 1977)
revived the notion of using case to extract the meanings of sentences. He extended
the transformational grammars of Chornsky by focusing more on the semantic aspects
of a sentence.
In case gramniars, a sentence is defined as being composed of a proposition
P. a tensele.ss set of relationships among verbs and noun phrases and a modality
constituent M. composed of mood, tense, aspect. negation, and so on. Thus, a
sentence can he represented as

S—.M + P

where P in turn Consists of one or more distinct cases Cl, C2.....Ck,

P—*Cl+C2+. . .+Ck.

The number of cases suggested by Fillmore were relatively few. For example,
the original list contained only some six cases. They relate to the actions performed
by agents, the location and direction of actions, and so on. For example, the case
of an instigator of an action is the agenhive for agent), the case of an instrument or
object used in an action is the instrumental, and the case of the object receiving
the action or change is the objective. Thus, in sentences like "The soldier struck
the suspect with the rifle butt" the soldier is the agentive case. the suspect the
objective case, and the rifle butt the instrumental case. Other basic cases include
dative (an animate entity affected by an action). factitive (the case of the object or
of being that which results from an event), and locative (the case of location of
the event).. Additional tases or substitutes for those given above have since been
introduced, including beneficiary, source, destination, to or from, goal, and time.
Case frames are provided for verbs to identify allowable cases. They give
the relationships which are required and those which are optional. For the above
sentence, a case frame for the verb struck might he

STRIJCK[OBJECTIVE (AGENTIVE) (INSTRUMENTAL)]

This may be interpreted as stating that the verb struck must occur in sentences
with a noun phrase in the objective case and optionally (parentheses indicate optional
use) with noun phrases in the agentive and instrumental cases.
A tree representation for a case grammar will identify the words by their
modality and case. For example, a case grammar tree for the sentence "Sue did -
not take the car" is illustrated in Figure 12.4.

238 Natural Language Processing Chap. 12

s
/f\
. "'-^

Declarative V Cl C2
negation
pas t
Figure 12.4 Case grammar tree
take Se the car representation.

To build a tree structure like this requires that a word Lexicon with sufficient
information be available in which to determine the case of sentence elements.

Systemic Grammars

Systemic grammars emphasize function and purpose in the analysis of language.


They attempt to account for the personal and social aspects which influence communi-
cation through language. As such, context and pragmatics play a more important
role in systemic grammars.
Systemic grammars were introduced by Michael Halliday (1961) and Winograd
(1972) in an attempt to account for the principles that underlie the organization of
different structures. He classifies language by three functions which relate to content,
purpose, and coherence.

1. The ideational function relates to the content : intended by the speaker.


This function provides irrna!ion about the kinds of activities being described,
who the actors are, whether there are other participants, and the circumstances
related to time and place. These concepts bear some similarity to the case grammars
described above.
2. The interpersonal function is concerned with the purpose and mood of the
statements, whethet a question is being asked, an answer being given, a request
being made, an opinion being offered, or information given.
3. The textual function dictates the necessity for continuity and coherence
between the current and previously stated expressions. This function is concerned
with the theme of the conversation, what is known, and what is newly expressed.
Halliday proposed a model of language which consisted of four basic categories.

Language units. A hierarchy for sentences based on the sentence, clause,


phrase group, word, and morpheme.

Role structure of units. A unit consists of one or moTe units of lower


rank based on its role, such as subject, predicate, complement, or adjunct.
Sec. 12.3 Grammars and Languages

Classification of units. Units are classified by the role they play at the
next higher level. For example, the verbal serves as the predicate, the nominal
serves as the subject or complement, and so on.

System constraints. Tbese are constraints in combining . component fea-


tures. For example, the network structure given below depicts the constraints in an
interpretation.

declarative
independen.— imperative yes-no
clause---- - interrogative-1 wli-
dependen'

Given these few- principles, it is possible to build a grammar which combines


much semantic information with the syntactic. Many of the ideas from systemic
'grammars were use .d in the successful system SHROLU developed by Terry Winograd
(1972). This syst& ri is described later in the chapter.

Semantic Gran imars

Semantic gram mars encode semantic information into a syntactic grammar. They
use context-fre e rewrite rules with nonterminal semantic constituents. The constituents
are categories, or metasymbols such as attribute, object, present (as in display),
and ship, ratl ter than NP, VP. N. V. and so on. This approach greatly restricts the
range of sen tences which can be generated and requires a large number of rewrite
rules.
Sema tic grammars have proven to be successful in limited applications includ-
ing LIFER., a data base query system distributed by the Navy which is accessible
lough A RPANET (Hendrix et al.. 1978), and a tutorial system named SOPHIE
which is 1 .jsed to teach the debugging of circuit faults. Rewrite rules in these systems
cssentialy take the forms

S— What is <OUTPUT-PROPERTY> of <CIRCUIT-PART>'


OUTPUT-PROPERTY -p the <OUTPUT-PROP>
OUTPUT-PROPERTY - < OUTPUT-PROP>
CIRCUIT-PART—. C23
CIRCUIT-PART—. D12
OUTPUT-PROP—. voltage
OUTPUT-PROP —. current

In the LIFER system, there are rules to handle numerous forms of wh-queries
such as

What is the name and location of the carrier nearest to New York
Who commands the Kennedy
240 Natural Language Processing Chap. 12

Which convoy escorts have inoperative radar units


When will they be repaired
What Soviet ship has hull number 820

These sentences are analyzed and words matched to metasymbols contained


in lexicon entries. Forexample, the input statement "Print the length of the Enterprise"
would lit with the LIFER top grammar rule (L.T.G.) of the form

-PRESENT the A1TR IOU TE ' of -SHIP,

where print matches <PRESENT>, length matches <AT1'RIBUTE>, and the Enter-
prise matches <SHIP>. Other typical lexicon entries that can match <ATTRIBUTE>
include CLASS, COMMANDER. FUEL. TYPE. BEAM. LENGTH, and so on.
LIFER can also accommodate elliptical (incomplete) inputs. Given the query
is the length of the Kennedy?" a subsequent query consisting of the abbreviated
form "of the Enterprise?" will elicit a proper response (see also the third and
fourth, example queries above).
Semantic grammars are suitable for use in systems with restricted grammars
since computation is limited. They become unwieldy when used with general purpose
language understanding systems, however.

12.4 BASIC PARSING TECHNIQUES

Before the meaning of a sentence can be determined, the meanings of its constituent
parts must be established. This requires a knowledge of the structure of the sentence.
the mcanings of individual words and how the words modify each other. The process
of determining the syntactical structure of a sentence is knowh as parsing.
Pa;sing is the process of arrzag a sentence by taking it apar' word-by-
word and deterrniairg its structur from its constituent parts and subsatis. The
structure of a sentence can be represented with a.syntactic tree or a list as duwribcd
in the previous section. The parsing process is basically the inverse of the sentence
g eneration process since it involves finding a grammatical sentence structue from
an input string. When given an input string, the lexical pans or terms (root words)
must first he identified by type, and then the role they play in a sentence must he
determined. These parts can then be combined successively into larger units until a
complete tree structure has been completed.
To determine the meaning f a word. a parser must have access to a lexicon.
When the parser selects a word from the input stream it locates the word in the
lexicon and obtains the word's possible function and other features, including semantic
information. This information is then used in building a tree or other representation
structure. The general parsing process is illustrated in Figure 12.5.
Sec. 12.4 Basic Parsing Techniques
241

Input
string Parsereaen tat ion
structure

Lexicon
Figure 12.5 Parsing an input to create
an output structure

The Lexicon

A lexicon is a dictionary of words (usually morphemes or root words together with


their derivatives), where each word contains some syntactic, semantic, and possibly
some pragmatic information. The information in the lexicon is needed to help deter-
mine the function and meanings of the words in a sentence. Each entry in a lexicon
will contain a root word called the head. Different derivatives of the word, it any,
will also be given, and the roles it can play in a sentence (e.g. its part of speech
and Sense). A fragment of a simplified lexicon is illustrated in Figure 12 6
The abbreviations Is, 2s.....3p in Figure 12.6 stand for first person
singular, second person singular,... , third person plural, respectivel
y . Note
that some words have more than one type such as can which is both a noun and a
verb, and orange which is both an adjective and a noun. A lexicon may also be
organized to contain separate entries for words with more than one function by
giving them separate identities, can I and can2. Alternatively, the entries in a lexicon
could be grouped and given by word category (by articles, nouns, pronouns,
verbs,

Word Type Features


a Determiner {3s}
be Verb rrans intransitive
bo y Noun 13S}
can Noun b. 2s, Is, 1p, 2p, 3,1,1
VC rb Trans. intransitive
carried Verb lorrn: past, past participle

orange Adjective
Noun {3s)
the Determiner (Is. jp
to Prcposuon
we Pronoun
Case, subjective
yelIo Adjective

Figure
Figure 12.6 Typical entries in a lexicon

17—
7--
Natural Language Processing Chap. 12
242
categories
and so on). and all ssords contained within the lexicon listed within the
to hich the belong
The organization and entries of a lexicon will vary from one implementation
to another, but the are usually made up of variable length data structures such as
lists or records arranged in alphabetical order. The word order may also he given
in tCrflis of usage frequency so that frequently used words Uke a, the, and an will
appear at the beginning of the list facilitating the search.
Access to the w!rds in.i he facilitated by indexing. with binary searches,
hd h i n. or combinations of these methods A lexicon may also he partitioned to
eneral. frequentiv used words and domain specific
contain a base lexicon set of g
component' of words

Transition Networks
ed to represent formal and natural
Transition networks are another popular method us
are based on the application of directed graphs (digraphs)
language structures The y
and finite state automata. A transition network consistS of a number of nodes and
labeled arcs. The nodes represent different states in traversing a sentence. and the
arcs represent rules or test conditions required to make the transition from one
state to the-next. A path through a transition network corresponds to a permissible
sequence of word types for a given grammar. Thus, if a transition network can be
successfully' traversed, it will have recognized a permissible sentence structure. For
example, a network used to recognize a sentence consisting of a determiner, a
noun and a verb ("The child runs'') would he represented by the three-node graph
as follows.
noun verb
deierm ner

3N4

Starting at node NI. the transition from node NI to N2 will be made if a


the first input word found. If successful, state N2 is entered. The
determiner is
transition from N2 to N3 can then he made if a noun is found next. The final
transition (from N3 to N4) will he made if the last word is a verb. If the three-
word category sequence is not tound, the'parse fails Clearly, this type of network
is very limited since it will only recognize simple sentences of the form DET N V.
The utility of a network such as this could be increased if more than a single
choice were permitted at some of the nodes. For example, if several arcs were
constructed between nodes NI and N2 where each arc represented a different noun
pes would be increased substantially.
phrase, the number of permissible sentence t y
Individual arcs could be a noun, a pronoun, a determiner followed by a noun, a
v e followed by a noun, or some other type of
determiner followed by an adjecti
noun phrase which we wish the parser to be capable of recognizing. These alternatives
are depicted in Figure 12.7.
Sec. 12.4 Basic Parsing Techniques

determiner djetne
pronoun '\ \, ,' noun
NP (,N I_ N?) N3
proper noun

Figure 12.7 A noun phra sce S flieD C ot a Iran o fin neiw ork

To move from state, NI to N2 in this transition network, it is necessray to


first find an adjective, a dtterminer.a pronoun, a proper noun, or none of these by
"jumping" directly to N2. This network extends the possible types of sentences
that can be recognized substantially over the simple network given above. Forexample.
it will recognize noun phrases havin g forms such as

Big white fluffy clouds


Our bright children
A large beautiful white flower
Large green leaves
Buildings
Boston's best seafood restaurants

Top-Down versus Botton-up Parsing

Parsers may he designed to process a sentence using either a top-down or a bottom-


up approach. A top-down parser begins by hypothesizing a sentence (the symbol
S) znd successively predicting lower level constituents until individual preterminal
symbols are written. These are then replaced by the input sentence words which
match the terminal categories For example, a possible top-down parse of the sentence
Kathy jumped the horse" would be given by
S -NP VP
— NAME VP
—. Kath y VP
- Kathy V NP
Kathy jumped NP
Kathy jumped ART N
- Kathy jumped the N
— Kathy jumped the horse
A bottom-up parse, on the other hand, begins with the actual words appearing
in the sentence and is, therefore, data driven. A possible bottom-up parse of the
same sentence might proceed as follows.
Natural Language Processing Chap. 12
244

Kathy jumped the horse


- NAME jumped the horse
-'. NAME V the horse
- NAME V ART horse
- NAME V ART N
NP V ART N
-. NP V NP
-. NP VP
—3'S

Words in the input sentence are replaced with their syntactic categories and those
in turn are replaced by constitutents of the same or smaller size until S has been
rewritten or until failure occurs.

Deterministic versus NondeterministiC Parsers

Parsers may also be classified as deterministic or nondeterministic depending on


the parsing strategy employed. A deterministic parser permits only one choice (arc)
for each word category Thus, each arc will have a different test condition. Conse-
quently, if an incOrrec test choice is accepted from some state, the parse will fail
since the parser cannot backtrack to an alternative choice. This may occur, for
example, when a word satisfies more than one category such as a noun and a verb
or an adjective, noun, and verb. Clearly, in deterministic parsers, care must be
taken to make correct test choices at each stage of the parsing. This can be facilitated
with a look-ahead feature which checks the categories of one or more subsequent
words in the input sentence before deciding in which category to place the current
word. Some researchers prefer to use deterministic parsing since they feel it more
closely models the way humans parse input sentences.
Nondeterministic parsers permit different arcs to be labeled with the same
test. Consequently, the next test from any given state may not be uniquely determined
by the state and the current input word. The parser must guess at the proper constituent
and then backtrack if the guess is later proven to be wrong. This will require saving
more than one potential Structure during parts of the network traversal. Examples
of both deterministic and nondeterministic parsers are presented in Figure 12.8.
Suppose the following sentence is given to a deterministic parser with the
grammar given by the network of Figure 12.8(a): "The strong bear the loads." If
the parser chose to recognize strong as an adjective and bear as a noun, the parse
would fail, since There is no verb following bear. A nondeterministic parser, on
the other hand, would simply recover by backtracking when failure was detected
and then taking another arc which accepted strong as a noun.

Example of a Simple Parser in Prolog

The reader may have noticed the close similarity between rewrite rules and Horn
clauses, especially when the Horn clauses are written in the form of PROLOG
Sec. 12.4 Basic Parsing Techniques
245
adjective
article noon verb article noon
NI N2 N3 N5 N6 N7
aux verb
verb

(a) A determ-stic network


article - noun verb

N2 N5 INS
S_
noun
verb

Ib) A nondeterrp pnjsjic netw-,rk


Figure 12.8 Determinivi i c and nondeterministic networks.

rules. This similarity makes it a straightforward task to write parsers in a language


like PROLOG. For example, the grammar rule that states that S is a sentence if it
is a noun phrase followed by a verb phrase (S - NP VP) may be written in
PROLOG as

sentence(AC) - nounPhrase)A,B), verbPh(ase(BC)

The variables A. B. and C in this statement represent lists of words. The argument
A is the whole list of words to be tested as a sentence, and C is the list of remaining
words, if any. Similar assumptions hold for A. B, and C in the noun and verb
phrase conditions respectively.
Rule definitions which rewrite the noun phrases and verb phrases must also
be defined. Thus, an NP may be defined with statements such as the following:

nOunPhrase(A.C) arlicIe(A,B), noun)B,C).


nounPhrase(A,B)

Like the
the above rule, these rules state that (I) a noun phrase can be either an article
which consists of a list A and remaining list B (if any) and a noun which is a list
B and remaining list. C or (2) a noun consisting of the list A with remaining list B
(if any). Similarly, a verb phrase may be defined with rules like the following:
Natural Language Processing Chap. 12
246

verbPhrase(A,B verb(A,B).
verbPhrase(A,C) = verb(AB). nounphrase(BC).
vorbPhrase(AC) : = verb(A,B), prepositionPhras)B.C)

Definitions for the prepositional phrase as well as lexical terminals must also
he given. These can include the following:

preposition Phrase(A.C) pepositionlA,B) rounPhrasO(8.C.

prepositionhlatXi.X).
a rt deC Ia I . X C
a rticle( theXI,X).
nounoldogi XIX).
noun(lcow XIX).
nounil m000 XLX)
verb(tba rked XIX).
verb) lwinkedXI,X).

With this simple parser we can determine if strings ufthe following t y pe are grammati-
callN correct. -

The dog barked at the cow.


The moon winked at the dog.
A cow harked at a moon.

To do so. we must enter sentence queries as lists such as the following tor the
PROLOG interpreter:

?
X=ll
? - sentence)) barked,a,mOOfl.dOg.thelXI
no

Since the remainder of the sentence hound toX is the empty set, it is recognited
a corrct The second sentence failed since it could not instantiate with the correct
constituent parts.
Of course, for a parser to be of much practical use, other constituent , and a
great many more words should be defined. 1 he example illustrates the utility of
using PROLOG as a basic parser.

Recursive Transition Networks

The simple networks described above are not powerful enough to recognize the
variety of sentences a human language system could be expected to cope with. In
fact. they fail to recognize all languages that can be generated by a context-free
Sec. 12.4 Basic Parsing Techniques 247

grammar. Other extensions are needed to accept a wider range of sentences but
still avoid the necessity for large complex networks. We can achieve such extensions
by labeling some arcs as a separate network state (such as an NP) and then constructing
i subnetwork which recognizes the different noun phrases required. In this way, a
single subnetwork for an NP can be called from several places in a sentence Similar
arcs can be labeled for other sentence constituents including VP. PP (prepositional
phrases) and others. With these additions, complex Sentences having .embedded
phrases can he parsed with relatively simple networks, This leads directly to the
notion of using recursion in a network.
A recursive transition network (RTN) is a transition network which permits
are labels to refer to other networks (including the network's own name), and .they
in turn may refer back to the referring network rather than just permitting word
categories used previously. For example, an RTN described by William Woods
1970) is illustrated in Figure 12.9 where the main network calls two subnetworks
and an NP and PP network as illustrated in 12.9(b) and (c).
The top network in the figure is the top level (sentence) network, and the
lower level networks are for NP and PP arc states. The arcs corresponding to these
states will be traversed only if the corresponding subnetworks (b) or (c) are successfully
traversed.

NP
S: POP
N: VsO: i
AUX
::

(a) Top lewl ATN

ADJ PP

NP N2 N4 POP

Ni NPR POP
N3

(b) Noun phrase subnetwork

NP'
PP _.._E__ POP
)c) Prepositional phrase network

Figure 12.9 Revursive transition network.



248 Natural Language Processing Chap. 12

In traversing a network, it is customary to test the arcs in a clockwise order.


Thus, in the top level RTN, the NP arc will be called first. If this arc fails, the arc
labeled AUX will be tested next.
During the traversal of an RTN, a record must be maintained of the word
position in the input sentence and the current state or node position and return
nodes to be used as return points when control has been transferred to a lower
level network. This information can be maintained as a triple like POS CND RI..IST
where POS is the current input word position, CND is the current node, and RUST
is the list of return points. The RUST can be maintained as a stack data structure.
In Figure 12.9, the arc named POP is used as a dummy are to signal the
successful completion of the subnetwork and a return to the node following the arc
front it was called. Some other arc types that will be useful in what follows
are summarized in Table 12. I.
The CAT arc represents a test for a specific word category such as a noun or
a verb. When the input word is of the specified category, the CAT test succeeds
and the input pointer is advanced to the next word. The JUMP arc may be traversed
without satisfying any test condition in which case the word pointer is not advanced
when a JUMP arc is traversed. An are labeled with a state, such as NP or PP, is
defined as a PUSH arc. This arc initiates a call to another network with the indicated
state (such as an NP or PP). When a PUSH arc is taken, a return state must be
saved. This is accomplished by pushing the return pointer onto a stack. The POP
arc, as noted above, is a dummy test which pops the top return node pointer that
was previously pushed onto the stack. A TEST arc allows the use of an arbitrary
test to determine if the arc is to he taken. For example. TEST can be used to
determine II a sentence is declarative or interrogative, if one or more negatives
occur, and so on. A WORD arc corresponds to a specific word test such as to.
from, and at. (In some s y stems a list of words may apply rather than a single
word,)
To see how an interpreter operates with the grammar given for the RTN of
Figure 12.6. we appl y it to the following sentence (the subscripted numbers gise
the word positions):
The ,big tree S shades S tne(, old 7 house 5 by.4the i0stream

TABLE 12.1 ARC LABELS FOR TRANSITION NETWORKS

Tv pc
01 arc Purpose of arc Example

CAT a lest label for the current word category V


JUMP requires no test to succeed jump
POP a label for the end of a network pop
PUSH a label for a call to a network NP
TEST a label for an arbitrary test negatIves
WORD a label for a specific word type from
Sec. 12.4 Basic Parsing Techniques 249

Starting with CND set to SI. POS Set to I. and RLIST set to nil, the first arc test
(NP) would be completed. Since this test is for a state, the parser would PUSH
the return node S2 onto RLIST. set CND to NI. and call the NP network. Trying
the first test DEl (a CAT test) in the NP network, a match would be found with
word position 1. This would result in CND being updated to N2 and POS to position
2. The next word (big) satisfies the ADJ test causing CND to be updated to N2
again, and POS to be updated to position 3. The ADJ test is then repeated for the
word tree, but it fails. Hence, the arc test for N is made next with no change
made to l'OS and CND. This time the test succeeds resulting in updates of N4 to
CND and position 4 to POS. The next test is the POP which signals a successful
completion of the NP network and causes the return node (SI) to be retrieved
from the RUST stack and CND to be updated with S2. POP does not cause an
advance in the-word position POS.
The only possible test from S2 is for category V which succeeds on the word
"shades" with resultant updates of S5 to CND and 5 to POS. At S5, the only
possible test is the NP. This again invokes a call to the lower level NP network
which is traversed successfully with the noun phrase "the old house. After a
return to the main network, CND is set to S6 and POS is set to position b. At this
point, the lower PP network is called with CND being set to Pt and So pushed
onto RLIST. From P1. the CAT test for PREP passes with CND being set to P2
and POS being set to 9. NP is then called with CND being set to NI and P2 being
pushed onto RLIST. As before, the NP network is traversed with the noun phrase
"the stream" resulting in a POS value of II, P3 being popped from RLIST and a
return to that node. The test at P3 (POP) results in S6 being popped from RLIST
and a return to the S6 node. Finally, the POP test at N6. together with the period
at position II results in a successful traversal and acceptance of the sentence.
During a network traversal, a parse can fail if (I) the end of the input sentence
(a period) has been reached when the test from the CND node value is not a terminal
(POP) value or(2) if a word in the input sentence fails to satisfy any of the available
arc tests from some node in the network.
The number of sentences accepted by an RTN can be extended if backtracking
is permitted when a failure occurs. This requires that states having alternative transi-
tions be remembered until the parse progresses past possible failure points. hi this
w, if a failure occurs at some point, the interpreter can backtrack and try alternative
paths. The disadvantage with this approach is that parts of a sentence may be parsed
more than one time resulting in excessive computations.

Augmented Transition Networks

The networks considered so far are not very useful for language understanding.
They have only been capable of accepting or rejecting a sentence based on the
grammar and syntax of the sentence. To be more useful, an interpreter must be
able to build structures which will ultimately be used to create the required knowledge
entities for an Al system. Furthermore, the resulting data structures should contain

a
250 Natural Language Processing Chap. 12

more information than just the syntactic information dictated by the grammar alone.
Semantic information should also be included. For example, a number of sentence
features can also be established, and recorded, such as the subject NP, the object
NP, the subject-verb number agreement, the mood (declarative o interrogative),
tense, and so on. This means that additional tests must be performed to determine
the possible semantics a Sentence may have. Without these additional tests, much
ambiguity will still be present and incorrect or meaningless sentences accepted.
We can achieve the additional capabilities required by augmenting an RIN
with the ability to perform additional tests and store immediate results as a sentence
is being parsed. When an 'RTN is given these additional features, it is called an
augmented transition network or ATN.
When building a representation structure, an ATN uses a number of different
registers as temporary storage to hold the different sentence constituents. Thus,
one set of registers would he used for an NP network, one for a PP network, one
for a V. and so on. Using the register contents, an ATN builds a partial structural
description of the sentence as it moves from state to state in the network. These
registers provide temporary storage which is easily modified, switched, or discarded
until the final sentence structure is constructed. The registers also hold flags and
other indicators used in conjunction with some arcs. When a partial structure has
been stored in registers and a failure occurs, the interpreter can clear the registers.
backtrack, and start a new set of tests. At the end of a successful parse, the contents
of the registers are combined to form the final sentence data structure required for
output.

An ATN Specification Language

A specification language developed by Woods 41970. 1986 for ATNs takes the
lorni of an extended context-free grammar. This language is given in Figure 12.10
where the vertical bar indicates alternative choices for a construction and the *
Kleene star) signifies repeatable (zero or more) elements. All nonterrninals are
enclosed in angle brackets. Some of the capitalized words appearing in the language
were defined earlier as arc tests and actions. The other words in uppercase correspond
to functions which perform many of the tasks related to the construction of the
structure using the registers.
The specification language is read the same as rewrite rules. Thus, it specifies
that a transition network is composed of a list of arc sets, where each arc set is in
turn a list with first element beinL a state name and the remaining elements being
arcs which emanate from that state An arc can be any of the forms CAT. JUMP.
PUSH, TEST. WORD or POP. Vor example. as noted earlier, the TEST arc corre-
sponds to an arbitrary test which determines whether the arc is to be traversed or
not. Note that a sequence of actions is associated with the arc tests. These actions
are executed during the arc traversals. They are used to build pieces of structures
such as a tree or a list. The te,minal action of any arc specifies the state to which
control is passed to complete the transition.
21
Sec. 12.4 Basic Parsing Techniques

<transition net>- . (<arc set><arc set>)


<arc set>— . (<state><arC>9
<arc>-. (CAT <category name><test><aCtiOfl><tarm act>)
(PUSH <state><teSt><aCtiOfl><term act>(
(1ST (arbitrary label> <test><oCtiOfl> <term act>))
(POP <form><test>)
<action>— (SETR <register><form>)I
(SENOR <re9ister><fOrm>)I
(LIFTR <register><fOrm>)
<term act> -. (TO <state>)
(JUMP <state>)
<form>—. (GETR <register>)I
(àI
(GETF <feature>)
(BUILDQ <fragment><register>)I
(LIST <form>)I
(APPEND <torrn><form>))
(QUOTE <arbitrary structure>)

Figure 12.10 A specification language for ATNS.

Among other things, an action can be any of the three function forms SETR.
SENDR, and LIFTR which cause the indicated register values to be Set to the
value of form. Terminal actions can be either TO or JUMP where TO requires that
the input sentence pointer should be advanced, and JUMP requires that the pointer
remain fixed and the input word continue to be scanned. Finally, a construction
form can be any of the seven alternatives in the bottom group of Figure 12. 10,
including the symbol @ which is a terminal symbol placeholder for form.
The function SETR causes the contents of the indicated registers to be set
equal to the value of the corresponding form. This is done at the current level in
the network, while SENDR causes it to be done by sending it to the next lower
level of computation. L(FR returns information to the next higher level of computa-
tion. The function GETR returns the value of the indicated register. and GETF
returns the value of a specified feature for the current input word. As noted before.
the value of @ is usually an input word. The function BUILDQ takes lists from
the indicated registers (which represent fragments of a parse tree with marked nodes)
and builds the sentence structures.
An ATN network similar to the RTN illustrated in Figure 12.9 is presented
in Figure 12.11. Note that the arcs in this network have some of the tests described
above. These tests will have the basic forms given in Figure 12.10, together with
the indicated actions. The actions include building the final sentence structure which
may contain more features than those considered thus far, as well as certain semantic
features.
Using the specification language, we can represent this particular network
with the constituent abbreviations and functions described above in the form of a
LISP program. For example, a partial description of the network is depicted in
252
Natural Language Processing Chap. 12

PUSH(NP) CATtY) PUSH(NP)


S S
pop

:2 PUSH(NP)
cAlivi POP

(a) lop lesel ATN

CAT(ADJ PUSFBPPI

N CAT(OET(
NP
Pop

C R(
Pop

(bfr Noun phrase subnetwork

CATiPREP) PUSH(NP)
PP

(c) Preposinona) phrase network

Figure 12.11 Augmented transition network.

Fi g ure 12.12 (where T in the expressions is the equivalent of non-nil or true in


LISP).
From the language of Figure 12.12. it can be seen that the ATN begins building
a sentence representation in which the first constituent is either type declarative
(DCI.) or type interrogative (Q for question), depending on whether the first successful
test is an NP or AUX, respectively. The next constituent is the subject (SURJ)
NP. and the third is either an auxiliary verb (AUX) or nil. The fourth constituent
Is
a VP. An ATN is traversed much the same as the RTN described above
An example of its operation will help to demonstrate how the structure is
built during a parse of the sentence

"The big dog likes the small boy.''

I. Starting with state S. PUSH down a level to the NP flet\sork, if an NP i


found (1' for true), execute lines 2. 3, and 4.
2. In the lower level NP net . ork, the noun phrase 'the big dog" is found
with successive CAT tests for determiner, adjective, and noun. During these tests.
NP registers are set to indicate the word constituents When the terminal node


Sec. 12.4 Basic Parsing Techniques 253

1. IS/ (PUSH NP/ T


2. (SETA SUBJ (o)
3. (SETR TYPE (QUOTE DCL))
4. (TO Si)
5. (CAT AUX T
6 (SETA AUX ((
7. (SETA TYPE (QUOTE 0))
8. (TO $2)))
9. (Si (CAT VT
10. (SETR AUX NIL(
11. )SETRV(o)
12. (TO S4(()
13. (CAT AUX I
14. (SETR AUX ti.)
15. (TO S3)))
16. (S2 (PUSH NP T
17. (SETR SUBJ (u)
18 (TO S3(()
19. (S3 (CAT VT
20. (SETR V ()
21. (TO S4)))
22. (S4 (POP BUILDQ IS * *(VP+)( TYPE SUBJ AUX V) T)
23. (PUSH NP' T
24. (SETR VP BIJILDQ (VP (V( s) V)
25. (TO Sb)))
26. (Sb (POP (BUILDQ IS... .) TYPE SUBJ AUX VP) T)
27, (PUSH PP T
28. )SETR VP (APPEND (GETF1 VP) (UST (
29. 1 T S5))(
Figure 12.12 An A'F" spc..h tr
(anguage.

(N4) is tested and the PP test subsequently fails. POP is executed and a rcturfl 01
control is made to statement 2.

3. The register SUBJ is set to the value of 01 which is the list structure
(NP(dog(big) DEF) returned from the NP reiscrs. DEF sign ihes that the determiner
is definite.
4. In line 3. register TYPE is set to DCL (for declarative).
5. Control is transferred t', S I with the statement TO in line 4 and the input
pointer is moved past the noun phrase to the verb-likes.-
6. If an auxiliary verb had been found at the beginning of the sentence
instead of an NP, control would have been passed to line 5 where statements 5. 7.
and 8 would have beer' executed. This would have resulted in registers AtJX .'t'
TYPE being set to the values (o and Q respectively.
254 Natural Language Processing Chap. 12

7. At SI. a category test is made for a V. Since this succeeds (is 1). statements
II, 12. and 13 are executed. This results in register AUX being set to nil, and
register V being set to the contents of (i to give (V likes). Control is then passed
to S4 and the input pointer is moved to the word 'the,"
8. If the test for V had failed, and an auxiliary verb had been found, statements
14 and 15 would have been executed.
9. Since S4 is a terminal node, a sentence structure can be built there. This
will be the case if the end of the sentence has been reached. If so, the BUILDQ
function creates a list structure with first element S. followed by the values of the
three registers TYPE, SUBJ. AUX, corresponding to the three plus (+) signs.
These are then followed with VP and the contents of the V register. For example,
with an input sentence of.

The boy can whistle.

the structure (S DCI. (NP (boy) DEF) (AUX can) (VP whistle)) would he constructed
from the tour re g isters TYPE. SUB), AUX. and V.
10. Because more input words remain, the BUILDQ in line 22 is not executed,
and control drops to the next line where a push is made to the lower NP network.
As before, the NP succeeds with the structure (NP (boy (Small) DEF)) being returned
as the value of (I Register VP is then set to the list returned by BUILDQ (line
24) which consists of VP followed by the verb phrase and control is passed to SS.
H. Since S5 is a terminal node and the end of the input sentence has been
reached. BU1LDQ will build the final sentence structure from the TYPE, SUB),
AUX. and VP register contents. The final structure constructed is

(S DCL (NP dog (big) DEFI)


(VP (V likes((NP (boy (small) DEN(()

The use of recursion, are tests, and a variet y of arc and node combinations
give the ATNs the power of a Turing Machine. This means that an ATN can
recognize any language that a genera! purposecomputer can recognize. This versatility
also makes It possible to build, deep sentence structures rather than just structure.s
with surface features only. (Recall that surface features relate to the torm of words.
phrases, and sentences, whereas deep features relate to the content or meaning of
these elements). The ability to build deep structures requires that other appropriate
tests. be included to cheek pronoun references, tense, number agreement, and other
featdres.
Because of their power and versatility. ATNs have become popular as a model
for general purpose parsers. They have been used successfully in a number of natural
language systems as well as front ends for databases and expert systems.
Semantic Analysis and Representation Structures 255
Sec. 12.5

12.5 SEMANTIC ANALYSIS AND REPRESENTATION STRUCTURES

We have now seen how thestructure of a complex sentence can be determined


and how a representation of that structure can be constructed using different types
of parsers. In particular, it should now be clear how an ATN can be used to build
structures for different grammars, like those described in Section 12.3. But, we
have not yet explained how the final semantic knowledge structures are created to
satisfy the requirements of a knowledge base used to represent some particular
world model. Experience has shown the semantic interpretation to he the most
difficult stage in the transformation process.
As an example of some of the difficulties encountered in extracting the full
intended meaning of some utterances, consider the following situation.

It turned into a black day. In his haste to catch the flight, he hacked over Tom's
bicycle. He should never have left it there. It was damaged beyond repair..That caused
the tailpipe to break. It would be impossible to make it now. . . - It was all because
of that late movie. He would he heartbroken when he found Out about It.

Although a car was never explicitly mentioned, it must be assumed that a car
was the object which was backed over Tom's bicycle. A program must be able to
infer this. The "black day" metaphor also requires some inference. Days are not
usually referred to by color. And sorting out the pronoun references can also he an
onerous task for 'a program. Of the seven uses of it, two refer to the bicycle. two
to the flight, two refer to the situation in general, and one to the status of the day.
There are also four uses of he referring to two different people and a that which
refers to the accident in general. The placement of the pronouns is almost at random
making it difficult to give any rule of association. Words that point hack or refer
to people, places, objects, events, times, and so on that occurred before, are culled
anaphors. Their interpretation may require the use of heuristics, syntactic and semantic
constraints, inference, and other forms of object analysis within the discourse content.
This example should demonstrate again that language cannot be separated
from intelligence and reasoning. To fully understand the above situation requires
that a program be able to reason about people's goals, beliefs. motives, and facts
about the world in general.
The semantic structures constructed from utterances such as the above, must
account for all aspects of meaning in what is known as the domain. context. and
the task. The domain refers to the knowledge that is part of the world model the
system knows about. This includes object descriptions, relationships, and other rele-
vant concepts. The cornea relates to previous expressions, the setting and time ot
the utterances, and the beliefs. ,esires, and intentions of the speakers. A hO/s IS
part of the service the system otters, such as retrieving information from a data
base, providing expert advice, or performing a language translation The domain.
context, and task are what we have loosely referred to before as semantics, pragmatle,
and world knowledge.
256
Natural Language Processing Chap. 12

Semantic interpretations require that utterances be transformed into coherent


expressions in the firm of FOPL clauses, associative networks, frames, or script-
like structures that can be manipulated by the understanding program. There are a
number of different approaches to the transformation problem. The approach we
have been following up to this point is one in which the transformation is made in
stages. In the first stage, a syntactic analysis is performed and a tree-like structure
is produced using. a parser. This stage is followed by use of a semantic analyzer to
produce either an intermediate or final semantic structure.
Another approach is to transform the sentences directly into the target structures
with little syntactical analysis. Such approaches typically depend on the use of
constraints given by semantic grammars or place strong reliance on the use of key
words to extract meaning.
Between these two extremes, are approaches which perform syntactic and
semantic analyses concurrently, using semantic information to guide the parse, and
the Structure learned through the syntactical analysis is used to help determine meaning.
Another dimension related to semantic interpretation is the approach taken in
extracting the meaning of an expression, (I) whether it can or should be derived
by paraphrasing input utterances and transforming them into structures containing
a few generic terms or (2) whether meanings are best derived by composing the
meanings of clauses and larger units from the meanings of their constituent parts.
These two methods of approach are closely tied to the form of the target semantic
strLctures.
The first of these approaches we call unit or Ie.rieal semaniics to emphasize
the role pla y ed by the pccial prim i
tive words used to represent the meanings of
all expressions. In this approach the meaning is constructed through a restatement
of' the expression in terms of linked primitive generic words such as those used in
Shank's conceptual dependenc y theory (Chapter 7.
The second approach r , called eonwosoiiil semantics
since the meaning of
In express i on is derived from thc ..anings ascribed te the constituent palls The
structures createi thrcoih this ap p roach are usuall y characterized as logical formulae
in some calculus such as FPl.. or an extended FOPI.

Lexical Semantics Approaches

The semantic grammars described in Sect Lon 12.2 are one form of approach based
on the use of lexical semantics . With this approach, input sentences are transformed
throu g h the ue of domain dependent semantic rewrite rules which create the target
knowledge strucoires. A second example of an iifcrmal lexical-semantic approach
is one which USCS c'oncepttai dependency theory
. Conceptual dependency structures
provide a form of inked Knowlde that can be uscd in larger structures such a
scenes and script..
The construction o t c ncetual dependency structures is accomplished without
performing any direct s y ntactic analysis. Making the jump between utterance and
Sec. 12.5 Semantic Analysis and Representation Structures 257

ACTOR: (a PP with animate attributes)


OBJECT: (a PP(
ACTION: lone of the primitive acts with tense)
DIRECTION: (from-to direction of the action)
INSTRUMENT: (object with which the act is performed)
LOCATION: (event locationinformation) Figure 12.13 Cn..cptu.l Jcp-lcnc'.
TIME: (time of the eventinformation) structure.

these structures requires that more information be contained in the lexicon. The
lexicon entries must include word sense and other information which relate the
words to a number of prinhltive semantic categories as well as some s y ntactic informa-
tion.
Recall front 7 that conceptualizations are either events or object states.
Event structures include objects and their attributes, picture producers (PPs) or actors.
actions, direction of action (to or from) and sometimes instruments that participate
in the actions, and the location and time of the event. These items are collected
together in a slot-filler structure as depicted in Figure 12.13.
Verbs in the input string are a dominant factor in building conceptual dependency
structures because they denote the event action or state. Consequently, lexicon entries
for verbs will be more extensive than other entry types. They will contain all possible
senses, tense, and other information. Each verb maps to one of the primitive actions;
ATRANS. AT-FEND, CONC, EXPEL, GRASP, INGEST, MBUILD, MOVE.
MTRANS. PROPEL. VrRANs. and SPEAK. Each primitive action will also have
an associated tense: past, present, future, conditional, continuous, interrogative.
end, negation, start, and timeless
The basic process followed in building conceptual dependency structures is
simpl y the three steps listed below.

1. Obtain the next lexical item (a word or phrase).


2. Access the lexical entry for the item and obtain the associated tests and actions.
3. Perform the specified actions given with the entry.

Three t y pes of tests are performed in Step 2.

I. If a certain lexical entry is found, the indicated action is performed. This


corresponds to true (non-nil in LISP).
2. Specific word orderings are checked as the structure is being built and actions
initiated as a result of the orderings. For example, if a PP follows the last
word parsed, action is initiated to fill the object slot.
3. Checks are made for specific words or phrases and, if found, specified actions
"taken. For example, if an intransitive verb such as listen is found, an action

is-
Chap. 12
258 Natural Language Processing

would he initiated to look for associated words which complete the phrase
beginning with to or for.

For the above tests, there are four types of actions taken.

I. Adding additional structure to a partially built conceptual dependency.


2. Filling a slot with a substructure.
3. Activating another action.
4. Deactivating an action.

These actions build up the conceptual dependency structure as the input string
is parsed. For example, the action taken for a verb like drank would be to build a
;ubstructure for the primitive acticn INGEST with unfilled slots for ACTOR. OB-
JECT, and TENSE.

(INGEST (ACTOR nil)


(OBJECT oil)
(TENSE past))

Subsequent words in the input string would initiate actions to add to this structure
and fill in the empty ACTOR and OBJECT slots. Thus, a simple sentence like

The boy drank a soda

would be transformed through a series of test and action steps to produce a structure
such as the following.

(INGEST (ACTOR (PP (NAME boy) (CLASS PHYS-OBJ)


(TYPE ANIMATE) (REF DEF)))
(OBJECT (PP (NAME soda) (CLASS PHYS-OBJ)
(TYPE INANIMATE) (REF INDEF)))
(TENSE PAST))

ompositional Semantics Approaches

n the compositional semantics approach, the meaning of an expression is derived


rom the meanings of the parts of the expression. The target knowledge Structures
onstructed in this approach are typically logic expressions such as the formulas of
DPL. The LUNAR system developed by Woods (1978) uses this approach. The
iput strings are first parsed using an ATN from which a syntactic tree is output.
his becomes the input to a semantic interpreter which interprets the meaning of
ie syntactic tree and Creates the semantic representations.
Sec. 12.6 Natural Language Generation 259

As an example, suppose the following declaration is submitted to LUNAR

Sample24 contains silicon

This would be parsed, and the following tree structure would be output from the
ATN;

IS DCL
(NP (N (Samp(e24())
fAUX (TENSE (PRESENT)))
(VP (V (contain))
(NP IN (silicon))))

Using this structure, the semantic interpreter would produce the predicate clause

(CONTAIN samp(o24 silicon)

which has the FOPL meaning you would expect.


The interpreter used in LUNAR is driven by semantic pattern - action interpreta-
tion rules. The rule that builds the CONTAIN predicate is selected whenever the
input tree has a verb of the form have or contain and a sample as the subject and
either chemical element, isotope, or oxide as an object. The action of the rule
states that such a sentence is to be interpreted as an instance of the schema (CONTAIN
X v) with .v and v being replaced by the ATN's interpretation of subject noun phrase
and object respectively.
LUNAR is also capable of performing quantification of variables in expressions.
The quantifiers are an elaboration of those used in FOPL. They include the standard
existential and universal quantifiers "for every" and .. .or sonic." as well as others
such as ''for the. " "exactl y .'' "the single." ''more than," "at least." and so on.
For example, an expression with unisersal quantification would appear as

(For Every X (SEQ samples): CONTAIN X Overall siliconi

12.6 NATURAL LANGUAGE GENERATION

It is sometimes clairiid that language gen..utiun is the exact inverse of language


understanding. While it is true the two processes have many difkrences, it is an
over simplification to claim that they are exact opposites.
The generation of natural language is more difficult than understanding it.
since a system must not only decide what to say, but how the Utterances should be
stated. A generation system must decide which form is better (active or passive),
which words and structures best express the intent, and when to say what To
260 Natural Language Processing Chap. 12

produce expressions that are natural and close to humans requires more than rules
of syntax, semantics, and discourse. In general, it requires that a coherent plan be
developed to carr y out multiple goats. A great deal of sophistication goes into the
simplest types of utterances when they are intended to convey different shades of
meanings and emotions. A participant in a dialog must reason about a hearers
understanding and his or her knowledge and goals. During the dialog, the system
must maintain proper focus and formulate expressions that either query, explain,
direct, lead or just follow the conversation as appropriate.
The study of language generation falls naturally Into three areas: (I the determi-
nation of content. (2) formulating and developing a text utterance plan and (3)
achieving a realization of the desired utterances.
Content determination is concerned with what details to include in an explana---
tion. a request, a question or argument in order to convey the meanings set forth
by the goals of the speaker. This means the speaker must know what the hearer
already knows, what the hearer needs to know, and what the hearerwants to know.
These topics are related to the domain, task, and discourse context described above.
Text planning is the process of organizing the content to be communicated so as to
best achieve the goals of the speaker. Realization is the process of mapping the
organized content to actual text. This requires that specific words and phrases be
chosen and formulated into a syntactic structure.
Until about 1980, not much work had been done beyond single sentence genera-
tion. Understanding and generation was performed with a single piece of isolated
text without much regard given to context and consideration of the hearer. Following
this early work, a few comprehensive systems were developed. To complete this
section, we describe the basic ideas behind two of these systems. They take different
approaches to those taken by the lexical and compositional semantics understanding
described in the previous section.

Language Planning and Generation with KAMP

KAMP is a knowledge and modalities planner developed for the generation of natural
language text. Developed by Douglas Appeit (1985), KAMP simulates the behavior
of an expert robot named Rob (a terminal) assisting John (a person) in the disassembly
and repair of air compressors.
KAMP uses a planner and a data base of knowledge in (modal) logical form.
The knowledge includes domain knowledge, world knowledge, linguistic knowledge,
and knowledge about the hearer. A description of actions and action summaries
are available to the planner. Given a goal, the planner uses heuristics to build and
refine a plan in the form of a procedural network. Other procedures act as critics
of the plans and help to refine them If a plan is completed, a deduction system is
used to prove that the sequence of actions do, in fact, achieve the goal. If the plan
fails, the planner must do further searching for a sequence of actions that will
work. A completed plan states the knowledge and intentions of the agent, the robot
Sec. 12.6 Natural Language Generation 261

Rob. This is the first step in producing the output text. The process can be summarized
as follows.
Suppose KAMP has determined the immediate goal to be the removal of the
compressor pump from the platform.

Truel'Attached)pump platform))

KAMP first formulates and refines a plan that John adopt Rob's plan to remove
the pump from the platform. The first part of Rob's plan suggests a request for
John to remove the pump leading to the expression

INTENDS) john, REMOVE(pump platform))

After axioms are used to prove that actions in the initial summary plan are
successful, the request is expanded to include details for the pump removal. Rob
decides that John will know he is near the platform and that he knows where the
toolbox is located. but that he does not know what tool to use. Rob, therefore,
determines that John will not need to be told about the platform, but that he must
be informed, with an imperative statement, to remove the pump with a wrench in
the toolbox.

INTENDS)john. REMOVE(purnp platform))


LOCATION(john) LOCATION (platform)

DO(john. REMOVE)pump platform))

DO) rob. REQUEST) john, REMOVE)purnp platform)))


DO)rob, COMMAND)john, REMOVE)pomp platform)))

DO)rob, lN F ORM(john (TOOL)wrench))))


DO(rob, 1NFORM)john LOCATION )wrench)
LOCATION (too l-bøx)))

The next step is for Rob to plan speech acts to realize the request. This
req.ires linguistic knowledge of the structure to use for an imperative request. in
this case, that the sentence should have the form V NP ()* (recall that stands
for optional repetition). Words to complete' the output string are then selected and
ordered accordingl y . -

DO(rob REOUEST)joho, REMOVE(pump platform)))


DO(rob c0MMAND(iohn. REMOVE)pump platform)))
OO(rob INEORM)john, TOOL(wrench)))
DO)rob lNFORMljohn, LOCATION(wrench) =
LOCATION (tool -box)))

262 Natural Language Processing Chap. 12

This leads to the generation of a sentence with the following tree structure.

/\V IPP

remove DET

the
I I N

po,,tp with DET


NP

the wrenth P NP

in DEl N

the toolbo

The overall process of planning and formulating the final sentence "Remove
the pump with the wrench in the toolbox" is very involved and detailed. It requires
planning and plan verificaticn for content, selecting the proper structures, selecting
senses, mood, tense, the actual words, and a final ordering. All of the steps must
be constrained toward the realization of the (possibly multiple) goals set forth. It is
truly amazing we accomplish such acts with so little effort.

Generation from Conceptual Dependency Structures

Niel Goldman (Schank et al.. 1973) developed a generation component called BABEL
which was used as part of several language understanding systems built by Schank
and his students SAM. MARGIE. QUALM. and SO on). This component worked
in conjunction with an inference component to determine responses to questions
about short news and other stories.
Given the general content or primitive event for the response, BABEL selects
and builds an appropriate conceptual dependency structure which includes the intended
word senses. A modified ATN is then used to generate the actual word string for
output
To determine the proper word sense, BABEL uses a discrimination net. For
example, suppose the system is told a story about Joe going into a fast-food restaurant,
ordering sandwich and a soft drink in a can, paying, eating, and then leaving.
After the understanding part of the system builds the conceptual dependency and
script structures for the story, questions about the events could he posed. If asked
what Joe had in the restaurant. BABEL would first need to determine the conceptual
Sec. 12.6 Natural Language Generation 263

INGEST

fl.i(P smoke'
N/" \ES N7/1 "\ES
thr
ough use air' use
mouth? "drink" "smoke"

NO/ \VES N0/ \YES


use use use use Figure 12.14 Discrimination net for
"ingest" "eat" "inhale" ."breathe" INGEST

category of the question in order to select the proper conceptual -dependency pattern
to build. The verb in the query determines the appropriate primitive categories of
eat and drink as being INGEST. 'To determine the correct sense of INGEST as eat
and drink a discrimination net like that depicted in Figure 12.14 would be used. A
traversal of the discrimination net leads to eat and drink, using the relation from
have and sandwich as being taken through the mouth and soft drink as fluid.
Once a conceptual dependency framework has been selected, the appropriate
words must be chosen and the slots filled. Functions are used to operate on the net
to complete it syntactically to obtain the correct tense, mood, form, and voice.
When completed, a modified ATN is then used to transform the conceptual dependency
structure into a surface sentence Structure for output.
The final conceptual dependency structure passed to the ATN would appear
as follows.

joe

P 0 0
joe '" INGEST -e-- oft-dr,nk C can ---- ^T loe

t MOVE
soft drink

Contain Can

DI

ft-drink mouth

An ATN used for text generation differs from one used for analysis. In particular,
the registers and arcs must be different. The value of the register contents (denoted
as (i in the previous section) corresponds to a node or arc in the conceptual dependency
264 Natural Language Processing Chap. 12

(or other type) network rather than the next word in the input sentence. Registers
will be present to hold tense, voice, and the like. For example, a register named
FORM might be set to past and a register VOICE set to active when generating an
active sentence like "Joe bought candy." Following an arc such as a CAT/V arc
means there must be a word in the lexicon corresponding to the node in the conceptual
dependency. The tense of the word then follows from the FORM register contents.

12.7 NATURAL LANGUAGE SYSTEMS

In this section, we briefly describe a few of the more successful natural language
understanding systems. They include LUNAR, LIFER, and SIIRDLU.

The LUNAR System

The LUNAR system was designed as a language interface to give geologists direct
access to a data base containing information on lunar rock and soil compositions
obtained during the NASA Apollo-] I moon landing mission. The design objective
was to build a system that could respond to natural queries received from geologists
such as

What is the average concentration of aluminum in high-alkali rN:ks?


What is the average of the basalt?
In which samples has apatite been identified?

LUNAR has three main components:

I. A general purpose grammar and an ATN parser capable of handling a large


subset of English. This component produces a syntactic parse tree of the input
sentence.
2. A rule-driven semantic interpreter which transf'orrns the syntactic representation
into a logical form suitable for querying the da , .i base. The rules have the
general form of pattern - action as described in Section 12.5. The rules
produce disposable programs to cary out different tasks such as answering a
query.
3. A data base retrieval and inference component which is used to determine
answers to queries and to make changes to the data base

The system has a dictionary of some 3500 words, an English grammar and
two data bases. One data base contains a table of chemical anal y ses of about 13.000
entries, and the other contains 10,000 indexed document topics. LUNAR uses a
meaning representation language which is an extended form of FOPL. The language
uses (1) designators which name objects or classes of objects like nouns, variables,
and classes with range quantifiers, (2) propositions that can be true or false, that
are connected with logical operators and, or, riot, and quantification identifiers.
Sec. 12.7 Natural Language Systems 265

and (3) commands which carry out specific actions (like TEST which tests the
truth value of propositions against given arguments (TEST (CONTAIN sarnple24
silicon).
Although never fully implemented, the LUNAR project was considered an
operational success since it related to a real world problem in need of a solution.
It failed to parse or find the correct semantic interpretation on only about 10% of
the questions presented to it.

The LIFER System

LIFER (Language Interface Facility with Ellipsis and Recursion) was described
briefly in Section 12.2 under semantic grammars. It was developed by Gary Hendnx
(1978) and his associates to be used as a development aid and run-time language
interface to other systems such as a data base management system. Among itS
special features are spelling corrections, processing of elliptical Inputs, and the
'ability of the run-time user to extend the language through the use of paraphrase.
LIFER consists of two major components, a set of interactive functions for
language specifications and a parser. The specification functions are used to define
an application language as a subset of English that is capable of interactirlE with
existing software. Given the language specification, the parser Interprets the lantiage
inputs and translates them into appropriate structures that interact with the application
software.
In using a semantic grammar. LIFER systems incorporate much semantic infor-
mation within the syntax. Rather than using categories like NP. VP, N. and V.
LIFER uses semantic categories like <SHIP-NAME> and <ATTRIBUTE> which
match ship names or attributes. In place of yntactic patterns like NP VP. semantic
patterns like What is the <ATTRIBUTE> of <SHIP>? are used. For each such
pattern, the language definer supplies an expression with which to compute the
interpretations of instances of the pattern. For example, if LIFER were used as the
front end for a database query system, the interpretation would he for a database
retrieval command
LIFER has proven to be effective as a front end (nt a number of systems.
The main disadvantage, as noted earlier, is the potentially large number of patterns
that may be required for a system which requires many. diverse patterns.

The SHRDLU System

SHRDLU was developed by Terry Winograd as part of his doctoral work at


M.I.T. (1972, 1986). The system simulates a simple robot arm that manipulates
blocks on a table. During a dialog which Is interactive, the system can be asked to
manipulate the block objects and build stacks or put things into a box. It can be
questioned about the configuration of things on the table, about events that have
transpired during the dialog, and even about its reasoning. It can also be told facts
which are added to its knowledge base for later reasoning.
Th unique aspect of the system is that the meanings of words and phrases
are encoded into procedures that are activated by input sentences. Furthermore, the
266 Natural Language Processing Chap. 12

syntactic and semantic analysis, as well as the reasoning process are more closely
integrated.
The system can be roughly divided into four component domains: (I) a syntactic
parser which is governed by a large English (systemic type) grammar. (2) a semantic
component of programs that interpret the meanings of words and structures, (3) a
cognitive deduction component used to examine consequences of facts, carry Out
commands, and find answers, and (4) an English response generation component.
In addition, there is a knowledge base containing blocks world knowledge, and a
model of Its own reasoning process. used to explain its actions.
Knowledge is represented with FOPL-likestatements which give the state of
the world at any particular time and procedures for changing and reasoning about
the state. For example. the expressions

(IS bi block)
(IS b2 pyramid)
(AT b (LOCATION 120 120 0))
(SUPPORT bl b2)
(CIEARTOP b2(
(MANIPULATE bi)
(IS blue color)

contain facts describing that bl is a block. b2 is a pyramid, and hi supports h2.


There are also procedural expressions to perform different tasks such as clear the
top or manipulate an object. The ('LEARTOP expression is essentially a procedure
that first checks to see if the object X supports an object Y. If so, it goes to GET.
RID-OF Y and checks again. Integrating the parts of the understanding process
with procedural knowledge has resulted in an efficient and effective understanding
syst.'m. Of course. the domain of SHRDLU is very limited and closed. gretty
simplt'ing the problem.

12.8 SUMMARY

Understanding and generating human language is a difficult problem. It requires -a


knowledge of g rammar and language, of syntax and semantics. of what people
know ard believe, their goals. the contextual setting. pragmatics. and world knowl-
edge.
- We began this chapter with an overview of topics in linguistics, including
sentence types, word functions, and the parts of speech. The different forms of
knowledge used in natural language understanding were then presented: phonological,
morphological. syntactic, semantic, pragmatic, and world. Three general approaches
have been followed in developing natural language systems: keyword and pattern
matching. syntactic and semantic directed analysis., and matching real world scenarios.
Most of the material in this chapter followed the syntactic and semantic directed
approach.
Chap. 12 Exercises 267

Grammars were formally introduced, and the Chomsky hierarchy was presented.
This was followed with a description of structural representations for sentences,
the phrase marker. Four additional extended grammars were briefly described. One
was the transformational grammars, an extension of generative grammars. Transfor-
mational grammars include tree manipulation rules that permit the construction of
deeper semantic structures than the generative grammars. Case, semantic, and sys-
temic grammars were given as examples of grammars that are also more semantic
oriented than the generative grammars.
Lexicons were described, and the role they play in NL systems given. Basic
parsing techniques were examined. We locked at simple transition networks, recursive
transition networks, and the versatile ATN. The ATN includes tests and actions as
part of the arc components and special registers to help in building syntactic structures
With an ATN, extensive semantic analysis is even possible. We defined top-down
bottom-up, deterministic, and nondeterministic parsing methods, and an example
of a simple PROLOG parser was also discussed.
We next looked at the semantic interpretation process and discussed two broad
approi'ches, namely the lexical and compositional semantic approaches. These ap-
proaches are also identified with the type of target knowledge structures generated.
In the compositional semantics approach. logical forms were generated, whereas in
the lexical semantics approach, conceptual dependency or similar network structures
are created.
Language generation is approximately the opposite of the understanding analysis
process, although more difficult. Not only must a system decide what to say but
how to say it. Generation falls naturally into three areas, content determination.
text planning, and text realization. Two general approaches were presented. They
are like the inverses of the lexical and compositional semantic analvsis processes.
The KAMP system uses an elaborate planning process to determine what, when.
and how to state some concepts. The system simulates a robot giving advice to a
human helper in the repair of air compressors. At the other. extreme, the BABEL
system generates output text from conceptual dependenc y and script structures.
We concluded the chapter with a look at three s stems of somewhat disparate
architectures: the LUNAR. LIFER, and SFIRDLU s y stems. These systems typtf.
the state-of-the-art in natural language processing sy'crns.

EXERCISES

12.1. Derive a parse tree,for the sentence "Bill loves the frog." where the following
rewrite rules are used.

S*NPVP
-
NP -'N
NP -.DETN
VP -.VNP
268 Natural Language Processing Chap. 12

DIET -.the
V —e Loves
N —bill 1 frog

12.2. Develop a parse tree for the sentence "Jack slept on the table" using the following
rules.

S -.NPVP
NP -.N
NP -.DETN
VP -.vpr
PP PREP NP
N -. jack table
V -. slept
1ET -. the
PREP-. on

12.3. Give an example ot each of the four types 0. I, 2. and 3 for Chomskys hierarchy
of grammers.
12.4. Modify the grammer of Problem 12.1 to allow the NP (noun phrase) to have zero
to many adjectives.
12.5. Explain the main differences between the following three grammars and describe
the principal features that could be used to develop specifications for a snta-tical
recognition program. Consult additional references for more details regardin g each
grammar
Chomsk y s Transformational Grammar
Fillmore Case Grammar
Systemic Grammars
12.6. Draw an ATN to implement the granlmer of Problem 12.1.
12.7. Given the following parse tree, write down the corresponding context free gramrner.

NP'

DEl ADJ N

th5 silly -,bot ....d DEl


/\ /\
ADJ N PREP NP

the red pyrarrud to DEl ADJ N

the big —le


Exercises 269
Chap. 12

12.8. Create a LISP data structure to model a simple lexicon similar to the one depicted
in Figure 12.6.
12.9. Write a LISP irratch program which checks an input sentence for matchin g words in
the lexicon of the previous problem.
12.10. Derive an ATN for the parse tree of Problem 12.7.
12.11. Dense an ATN sraph to implement the parse tree of Problem 12
12.12. Determine it the following sentence s "ill he accepted bN the grammar I Ii hk m
12.6.
ta The g reen g reen grass of the home
h The red ear drove in the last lane.
12.13. Write PROLOG rul' to implement the grammar used to (lcris' the parse tOe
Pnsrhlcnr 12.7 Omit rules for the individual word categories (like noun ([bail . .,\
Generate a syntas tree using one output parameter..
12.13. Write a PROLOG program that will take grammar rules in the following format:
1NT— (NT I T*

where NT is any vonterminal. T is any terminal, and Kleene star signiltes arr
number of repetitions, and generate the corresponding top-down parser: that i-j.

sentence — noun-phrase, verb-phrase


determiner — [the]

will generate the following:

serrtencell,O( :- no'jn.phrasell.Rl, verbphrase(A.0).


determrner(llheXl,X -1.

12.15. TvlodifN the program in Problem I 2. 12 to accept extra ar g uments used to return
meaningful knowledge structures.

sentencelsentence(NP,VP)( —, noun. phrase(NP), verb phrase (VPI.

12.16. Write a LISP pro g ram which uses property lists to create the recursive transition
network depicted in Figure 12.9. Each node should be given a nalno such as SI.
NI. and P1 and :tssoci:ited with a list of arc and node pairs emanating from the
node.
12.17. Write a recursive program in LISP which tests Input sentences br the F IN developed
in the .previous problem. The program should return t if the sentence is acceptable.
and nil if not
12.18. rviod i fv the pro g ram of Problem 12. 15 to accept sentences of the type depicted in
Figure 12.12
12.19. Write an KN type of program as depicted in Figure 12.12 which builds structures
like those of Figure 12.13.
12.20. Describe in detail the differences between language understanding and IatigUae gcrier
iron. Explain the problems in developing a program which is capable of carrying on
a dialo g with a group of people.

0

270 Natural Language Processing Chap. 12

12.21. Give the processing steps required and corresponding data structures needed for a
robot named Rob to formulate instructions for a helper named John to complete a
university course add-drop request form.
12.22. Give the conceptual dependency graph for the sentence "Mary drove her car to
school" and describe the steps required for a program-to transform the sentence to
an internal conceptual dependency structure.
4r)
IL)

Pattern Recognition

One of the most basic and essential characteristics of living things is the ability to
recognize and identify objects. Certainly all higher animals depend on this ability
for their very survival. Without it they would be unable to function een in a
static, unchanging environment.
In this chapter we consider the process of computer pattern recognition, it
process whereby computer programs are used to recognize various forms of input
stimuli such as visual or acoustic (speech) patterns. This material will help to round
Out the topic of natural language understanding when speech, rather than test. i
the language source. It will also serve as an introduction ts the following chapter
where we take up the general problem of computer vision.
Although some researchers feel that pattern recognition should no longer he
considered a part of Al. we believe many topics from pattern recognition are essential
to an understanding and appreciation of important concepts related to natural language
understanding. computer vision, and machine learning. Consequently. we have in-
cluded in this chapter a selected number of those topics believed to he important.

271
272 Pattern Recognition. Chap. 13

13.1 INTRODUCTION

Recognition is the process of establishing a close match between some new stimulus
and preroiisfy stored stimulus patterns This process is bein g pertrined continually
throughout the lives of all living things. In higher animals this ability is manifested
in man y forms at both the conscious and Unconscious levels, for both abstract as
well as physical objects Throu g h visual sensing and recognition. we identit y many
special objects. such as home, office, school, restaurants. face sof people. handwriting.
and printed words Through aural sensing and recogntion, ccc identif y familiar
VOiCCS, songs and pieces of music, and bird and other animal sounds Through
touch. we identity pIty 'ocai objects such as rens, Cups. automobile controls, and
food items. And through our other senses we identify foods, fresh air. toxic substances
and much else.
At more abstract levels of cognition, we recognize or identif y such thins as
ideas (electromagnetic radiation phenomena, model of the atom, world peace). con-
cepts (beauty, generosit y , complexity), procedures (game playing. making a hank
deposit), plans, old arguments, metaphors, and so On.
Our pervasive use of and dependence on our abilit y to recognize patterns has
motivated much research toward the discovery of mechanical or artificial methods
comparable to those used by intelligent beings. The results of these efforts to date
have been impressive, and numerous applications have resulted. S y stems have now
been developed to reliabl y perform character and speech recognition: fingerprint
and photograph identifications: electroencephelogram (EEG), electrocardiogram
IiCGj, oil log cvell, and Othei graphical pattern analyses various types of medical
and s y stem diagnose': resource identification and evaluation ' (geological, forestry,
h y drological, crop disease): and detection of explosive and hostile threats (submarine,
aircraft, missilei to name a few.
Object classification is closel y related to recognition. The ability to classify
or group objects according to come commonly shared features is a form of class
recognition Classification is essential for decision making. learnin g , and many other
co g nitive acts. Like reco g nition, classification depends on the ability to discover
common patterns amon g objects. This abilit y , in turn, must he acquired through
some learning process. Prominent feature patterns which cht'acterize classes of
objects must be discovered, generalized, and stored for subsequent recall and compari
son
We do not know exactly how humans learn to identify or classify objects.
however, it appears the following processes take place:

New objects are introduced to a human through activation of sensor stiiiiuti The
sensors. depending on their physical properties, are sensitive in varying degrees to
certain attributes which serve to characterize the objects, and the sensor output tends
to he proportional to the more prominent attributes Having perceived a new object,
a cognitive model is lormed from the stimuli patterns and stored in memory. Recurrent
experiences in perceiving the same or similar objects strengthen and refine the similarity
The Recognition and Classification Process 273
Sec. 13.2

patterns. Repeated perception results in the formation of generalized or archetype models


of object classes which become useful in matching, and hence recognition, of similar
objects.

13.2 THE RECOGNITION AND CLASSIFICATION PROCESS

In artificial or mechanical recognition, essentially the same steps as noted above


must be performed. These steps are illustrated in Figure 13.1 and summarized helov:

Step 1. Stimuli produced by objects are perceived by sensory devices. The


more prominent attributes (such as size, shape, color, and texture) produce the
strongest stimuli. The values of these attributes and their relations are used to character-
ize an object in the form of a pattern vector X, as a string generated by some
grammar, as a classification tree, a description graph, or some other means of
representation. The range of characteristic attribute values is known as the measure-
ment space M.
Step 2. A subset of attributes whose values provide cohesive object grouping
or clustering, consistent with some goals associated with the object classifications,
are selected. Attributes selected are those which produce high intraclass and Io
interclass groupings. This subset represents a reduction in the attribute space dimen-
sionality and hence simplifies the classification process. The range of the subset of
attribute values is known as the feature space F.
Step 3. Usingthe selected attribute values, object or class characterization
models are learned by forming generalized photot y pe descriptions, classification
rules, or decision functions. These models are stored for subsequent recognition.
The range of the decision function values or classification rules is known as the
decision space D.
Step 4. Recognition of familiar objects is achieved through application of the
rules learned in Step 3 by comparison and matching of object features with the
stored models. Refinements and adjustments can be performed continuall y thereafter
to improve the quality and speed of recogtiItioo

There are two basic approaches to tne .-on problem. (I) the decision-
theoretic approach and (2) the syntactic approach.

atee
Clsi t ,ct

Learning ___..,Jton
Figure 13.1 The pattern recniton
process

I

274 Pattern Recognition Chap. 13

Decision Theoretic Classification

The decision theoretic approach is based on the use of decision functions to classify
objects. A decision function maps pattern vectors X into decision regions of D.
More formally, this problem can be stated as follows.

I. Given a universe of objects 0 = {o, 0,,..., o,,}, let each o have k observable
attributes and relations expressable as a vector V = ( V 1 . v 2 .....vi).
2. Determine (a) a subset of m k of the v,, say X = ( x1,
whose values uniquely characterize the o, and (b) c 2 groupings or classifica-
tions of the o, which exhibit high intraclass and low interclass similarities
such that a decision function I(X) can be found which partitions D into c
disjoint regions. The regions are used to classify each o, as belonging to at
most one of the c classes.

Determining the feature attributes and decision regions requires stipulating or


learning mappings from the measurement space M to the feature space F and then
a mapping from F to the classification or decision space D,

M—i' F—+ D

When there are only two classes, say C and C 2 . the values of the object's
pattern vectors may tend to cluster into two disjoint groups. In this case, a linear
decision function d(X) can often be used to determine an object's class. For example,
when the classes are clustered as depicted in Figure 13.2, a linear decision function
d is adequate to classify unknown objects as belonging to either C 1 or C, where

d(X) = WIXI + w ..x, + w3

The constants w in d are parameters or weights that are adjusted to find a


separating line for the classes. When a function such as d is used, an object is
classified as belonging to C 1 if its pattern vector is such that d(X) < 0. and as

000 0f07 *
o o 0 oI/ + +
o 0 o A' + + +
C 000/++**++ c,
0 0 0/ + +
o 7* +
/ + + -4-
ooI'+ I
000 ., +++++
00/ +++
f d( X) + X2W2 * 0 Figure 13.2 A linear decision function.
Sec. 13.2 The Recognition and Classification Process 275

belonging to class C2 when d(X) > 0. When d(X) = 0 the classification is indeterini-
nate, so either (or neither) class may be selected.
When class reference vectors, prototypes R 1 . j = I .......are available.
decision functions can be defined in terms of the distance of the X from the reference
sectors. For example, the distance
d,(X) = (X - R,)'(X - R,)

could be computed for each class C, and class CA would then be chosen when
dA = min{C}.
For the general case of c ^ 2 classes. C 1 , C- C_ a decision function
may be defined for each class d 1 , 6, ,...,d,. A class . decision rule in this case
would he defined to select class c1 when
< d(X) for ij = I. 2..... c, and i ^6 j.

When a line d (or more generally a hyperplane in Jr-space) can he found that
separates classes into two or more groups as in the case of Figure 13.2. e
the classes are linearly separable. Classes that overlap each other or surround one
another, as in Figure 13.3, cannot generally be classified -,kith the uc ol irnple
linear decison functions. For such cases, more general nonlinear (or piece\ 'c linear)
functions may be required. Alternatively, some other selection technique t like hcuri'.-
tics) may be needed.
The decision function approach described above is an example of detcrrninitie
recognition since the x, are deterministic variables. In cases where the attribute
values are affected by noise or other random fluctuations, it ma y he more upprupriale
to define probabilistic decision functions In such cases, the attribute vectors X are
treated as random variables., and the decision functions are defined as measures ol
likelihood of class inclusion. For example, using Bayes' rule, one can compute the
v
conditional probability P(C,IX) that the class of an object o is C, gi en the ubsersed
value of X for •,. This approach requires a knowledge of the prior prabahiltt:
P(C,), the probability of the occurrence of samples from C, as shell as !i X (

C
xl

. +4 + 4. 4- • *4. ++
+ •000+ * + 4- 00.
0++0000 4+ 4*000 4- .-
o o + + + 0 0 0 0 +.+ * +O000***
000+4.4000+4+0 + 4-400004+4.
000+4+000+ #00 44+000 * +
o 0 0 0 +1 + 0 0 0 4 0 0 44*4.4+4+
0 0 0 01* + 0 0 0 0 0 4*4*44-
o olo + + * + + 0
000o00
jpeg I
00
Figure 13.3 Examples of nsnIrnearty separable classes.
276 Pattern Recognition Chap. 13

(Note that the C, are treated like random variables here. This is equivalent to the
assumption made in Bayesian classification where the distribution parameter 0 is
assumed to be a random variable since C, may be regarded as a function of 0). A
decision rule for this case is to choose class C1 if
X) > PC, I X) for all i 7^ j.
A more comprehensive probabilistic approach is one which is based on the
use of a loss or risk Bayesian function where the class is chosen on the basis of
minimum loss or risk. Let the loss function L, denote the loss incurred by incorrectly
classifying air actually belonging to class C, as belonging to C1 . When I.,, is
a constant for all i. I. I j. a decision rule can be formulated using the likelihood
ratio defined as (see Chapter 6)
P(XICk)
PXIC,
The rule is to choose class Co whenever the relation
P(X Ck) >
holds for all j ^ k
P(XIcJ ) P((-1,)
Probabilistic decision rules may be constructed as either parametric or nonpara-
metric depending on knowledge of the distribution forms, respectively. For a compre-
hensive treatment of these methods see (Duda and Hart, 1973) or (lou and Gonzales.
1974).

Syntactic Classification

The s)ntactic recognition approach is based on the uniqueness of syntactic "structure''


among the object classes. With this approach, a grammar similar to the grammars
defined in Chapter 10 or the generative grammars of Chapter 12 is defined for
object descriptions. Instead of defining the grammar in terms of an alphabet of
characters or terminal words, the vocabulary is based on shape primitives, For
example, the objects depicted in Figure 13.4 could be defined using the grammar
G(r'_ t',. p, s), where the terminals i, consist of the following shape primitives.

V r : a e
/''\ I

bt

,...- g )

Sec. 13.3 Learning Classification Patterns 277

A A aafagaad6ccid
(
B eghf
C = eghf Figure 13.4 Syntactic characterization
() (). of objects.

Using syntactic analysis, that is parsing and analyzing the string structures,
classification is accomplished by assigning an object to class C1 when the string
describing it has been generated by the grammar Q. This requires that the string
be recognized as a member of the language L(G). If there are only two classes, it
is sufficient to have a single grammar G (two grammars. are needed when strings
of neither class can occur).
When classification fore ^_- 2 classes is required. c — I (ore) different grammars
are needed for class recognition. The decision functions in this case are based on
grammar , recognition functions which choose class C, if the pattern string is found
to be generated by grammar G. that is. if it is a member of L(G). Patterns not
recognized as a member of a defined language are indeterminate.
When patterns are noisy or subject to random fluctuations, ambiguities may
occur since patterns belonging to different classes may appear to he the same. In
such cases, stochastic or fuzzy grammars may be used. Classification for these
to
cases may be made on the basis of least cost transform an input string into a
valid recognizable string, by the degree of class set inclusion or with a similarity
measure using one of the methods described in Chapter 10.

13.3 LEARNING CLASSIFICATION PATTERNS

Before a system can recognize objects. it must possess knowledge of the characteristic
features for those objects. This means that the s y stem designer must ether build
the necessary discriminating rules into the s y stem or the system must learn them.
In the case of a linear decision function, the weights that define class boundaries
must be predefined or learned. In the case of syntactic recognition, the class grammars
must he predefined or learned.
Learning decision functions., grammars, or other rules can be performed in
either of two ways. through supervised learning or unsupervised learning. Supervised
learning is accomplished by presenting training examples to a learning unit. The
examples are labeled beforehand with their correct identities or class.. The attribute
values and object labels are used by the learning component to inductively extract
and determine pattern criteria for each class. This knowledge is used to adjust
parameters in decision functions or grammar rewrite rules. Supervised learning con-
cepts are discussed in some detail in Part V. Therefore, we Concentrate here on
some of the more important notions related to unsupervised learning.
In unsupervised learning, labeled training examples are not available and little
218 Pattern Recognition Chap. 13

is known beforehand regarding the object population. In such cases, the system
must be able to perceive and extract relevant properties from the otherwise unknown
objects, find common patterns among them, and formulate descriptions or descrimina-
tion criteria consistent with the goals of the recognition process.
This form of learning is known as clustering. It is the first step in any recognition
process where discriminating features of objects are not known in advance.

Learning through Clustering

Clustering is the process of grouping or classifying objects on the basis of a dose


association or shared characteristics. The objects can be physical or abstract entities,
and the characteristics can be attribute values, relations among the objects, and
combinations of both. For example, the objects might be streets, freeways, and
other pathways connecting two points in a city, and the classifications., the pathways
which provide fast or slow traversal between the points. At a more abstract level.
the objects might be sonic concept such as the quality of the items purchased. The
classifications in this case might be made on the basis of some subjective criteria.
such as poor, average, or good.
Clustering is essentially a discovery learning process in which similarity patterns
are found among a group of objects. Discovery of the patterns is usually influenced
by the environment or context and motivated by some goal or objective (even it
only for econom y in cognition). For example, finding short-cuts between two fre-
quently visited points is . motivated by a desire to reduce the planning effort and
transit time between the points. Likesise, developing a notion of quality is motivated
by a desire to save time and mone y or to improve one's appearance.
Given different objectives, the same set of objects would, in general, beclustered
differentl y . If the objective given aboe for the streets, freewa y s, and the like were
modified to include sale for bicycle riding, a different object classification would.
in general, result.
Finding the most meaningful cluster groupings among a set of unknown objects
o, requires that similarity patterns be discovered in the feature space. Clustering is
usuall y performed with the intent of capturing any gestalt properties of a group of
objects and not just the commonality of certain attribute values. This is one of the
basic requirements of( ont eptual clustering (Chapter 19) where the objects are grouped
together as members of a concept class. Procedures for conceptual clustering are
based on more than simple distance measures. They must also take into account
the context (environment) of the objects as well as the goals or objectives of the
clustering.
The clustering problem gives rise to several subproblems. In particular. before
an implementation is possible, the following questions must be addressed.
I. What set of attributes and relations are most relevant, and what weights
should be given to each'? In what order should attributes be observed or measured?
(If the observation process is sequential, ordering may influence the effectiveness
of the attributes in discriminating among objects.)
Sec. 13.3 Learning Classification Patterns 279

2. What representation formalism should be used to characterize the objects'


3. What representation scheme should be used to describe the cluster groupings
or classifications? Usually. some simplification results if the single representation
trick can be used (the use of a single representation method for both object and
cluster descriptions).

4. What clustering criteria is most consistent with and effective in achieving


the objectives relative to the context or domain? This requires consideration of an
appropriate distance or similarity measure compatible with the description domains
noted In 2. above.
5. What clustering algorithms can best meet the criteria in 2 within acceptable
time and space complexity bounds
By now questions such as these should be familiar. They are by no means
trivial, bill they must be addressed when designing a system. They depend on many
complex factors for which the tools of earlier chapters become essential. These
problems have been addressed elsewhere; therefore, we focus our attention here on
the clustering process.
The clustering process must be performed with a limited set of observations.
and checking all possible object groupings for patterns is not feasible except with
a small number of objects. This is due to the combinatorial explosion which results
in arranging n objects into an unknown number m of clusters. ' Consequently. methods
which examine only the more promising groupings must be used. Establishing such
groupings requires the use of some measure of similarity, association, or degree of
fit among a set of objects.
When the attribute values are real valued, cluster groupings can sometimes
be found with the use of point-to-point or point-to-set distances, probability measures
(like using the covariance matrix between two populations), scatter matrices, the
sum of squared error distance between objects, or other means (see Chapter 10).
In these cases, an object is clustered in class C- if its proximity to other members
of C, is within some threshold or limiting value.
Many clustering algorithms have been proposed for different tasks. On of
the most popular algorithms developed at the Stanford Research Institute by G. H.
Ball and D. J. Hall (Anderberg. 1973) is known as the ISODATA method. This
method requires that the number of clusters ni be specified, and threshold values
t,. r,, and r 3 be given or determined for use in splitting, merging. or discarding

'The number of wa ys in which n objects can be arranged into m groups is an cxponcnt.iI


quantity.

(T) - t )"
When m is unknown, the number of arrangements increases as the sum of the 5_ that is. as S". F.':
example when,: = 25, the number of arrangements is more than
280 Pattern Recognition Chap. 13

clusters respectively. During the clustering process, the thresholds are used to deter-
mine if a cluster should be split into two clusters, merged with other clusters or
discarded (when too small). The algorithm is given with the follow,ing steps.

I. Select ,n samples as seed points for initial cluster centers. This can be done
by taking the first rn points, selecting random points or by taking the first m
points which exceed some mutual minimum separation distance d.
2. Group each sample with its nearest cluster center.
3. After all samples have been grouped, compute new cluster centers for each
group. The center can be defined as the centroid (mean value of the attribute
vectors) or some similar central measure.
4. If the split threshold t 1 is exceeded for any cluster, split it into two parts and
recompute new cluster centers.
S. If the distance between two cluster centers is less than t 2 , combine the clusters
and recompute new cluster centers.
6. If a cluster has fewer than t 3 members, discard the cluster. It is ignored for
the remainder of the process.
7. Repeat steps 3 through 6 until no change occurs among cluster groupings or
until some iteration limit has been exceeded.

Measures for determining distances and the center location need not be based
on ordered variates. They may be one of the measures described in Chapter ID
(including probabilistic or fuzzy measures) or some measure of similarity between
graphs, strings, and even FOPL descriptions. In any case, it is assumed each object
o is described by a unique point or event in the feature space F.
Up to this point we have ignored the problem of attribute scaling. It is possible
that a few large valued variables may completely dominate the other variables in a
similarity measure. This could happen, for example, if one variable is measured in
units of meters and another variable in millimeters or if the range and scale of
variation for two variables are widely different. This problem is closely related to
the feature selection problem, that is, in the assignment of weights to feature variables
on the basis of their importance or relevance. One simple method for adjusting the
scales of such variables is to use a diagonal weight matrix W to transform the
representation vector X to X' = WX. Thus, for all of the measures described
above, one should assume the representation vectors X have been appropriately
normalized to account for scale variations.
To summarize the above process, a subset of characteristic features which
represent the a, are first selected. The features chosen should be good discriminators
in separating objects from different classes, relevant, and measurable (observable)
at reasonable cost. Feature variables should be scaled as noted above to prevent
any swamping effect when combined due to large valued variables. Next, a suitable
metric which measures the degree of association or similarity between objects should
be chosen, and an appropriate clustering algorithm selected. Finally, during the
Sec. 13.4 Recognizing and Understanding Speech 231

clustering process, the feature variables may need to be weighted to reflect the
relative importance of the feature in affecting the clustering.

13.4 RECOGNIZING AND UNDERSTANDING SPEECH

Developing systems that understand speech has been a continuing goal of Al research-
ers. Speech is one of our most expedient and natural forms of communication, and
so understandably, it is a capability we would like Al systems to possess. The
ability to communicate directly with progqtms offers several advantages. It eliminates
the need for keyboard entries and speeds up the interchange of information between
user and system. With speech as the communication medium, users are also free
to perform other tasks concurrently with the computer interchange. And finally.
more untrained personnel would be able to use computers in a variety of applications.
The recognition of continuous waveform patterns such as speech begins with
sampling and digitizing the waveforms. In this case the feature values are the sampled
points x, = f)
as illustrated in Figure. 13.5.
It is known from information theory that a sampling rate of twice the highest
speech frequency is needed to capture the information content of the speech wave-
forms. Thus, sampling requirements will normally be equivalent to 20K to 30K
bytes per second. While this rate of information in itself is not too difficult to
handle, this, added to the subsequent processing, does place some heavy requirements
on real time understanding of speech.
Following sample digitization, the signals are processed 'at different levels of
abstraction. The lowest level deals with phones (the smailest unit of sound), allophones
(variations of the phoneme as they actually occur in words), and syllables. Higher
level processing deals with words, phrases, and sentences.
The processing approach may be from the bottom, top, or a combination of
both. When bottom processing is used the input signal is segmented into basic
speech units and a search is made to match prestored patterns against these units.
Knowledge about the phonetic composition of words is stored in a lexicon for
comparisons. For the top approach, syntax. semantics (the domain), and pragmatics
(context) are used to anticipate which words the speaker is likely to have said and

f(t)

Figure 13.5 Sampling a continuous


I', t i3 . . . waveform.
282 Pattern Recognition Chap. 13

direct the search for recognizable patterns. A combined approach which uses both
methods has also been applied successfully.
Early research in speech recognition concentrated on the recognition of isolated
words. Patterns of individual words were prestored and then compared to the digitized
input patterns. These early systems met with limited success. They were unable to
tolerate variutions.in speaker voices and were highly susceptible to noise. Although
important. this early work helped little with the general problem of continuous
speech understanding since words appearing as part of a continuous stream differ
significantly from isolated words. In continuous speech, words are run together,
modified, and truncated to produce a great variation of sounds. Thus, speech analysis
must be able to detect different sounds as being part of the same word, but in
different contexts. Because of the noise and variability, recognition is best accom-
plished with some type of fuzzy comparison.
In 1971 the Defense Advanced Research Projects Agency (DARPA) funded
a live year program for continuous speech understanding research (StiR). The objec-
tive of this research was to design and implement systems that were capable of
accepting continuous speech from several cooperative speakers using a limited vocabu-
lary of some 1000 words. The systems were expected to run at slower than real
time speeds. A product of this research were several systems including HEARSAY
I and II, HARPY, and HWIM. While the systems were only moderately successful
in achieving their goals, the research produced other important byproducts as well,
particularly in systems architectures, and in the knowledge gained regarding control.
The HEARSAY system was important for its introduction of the blackboard
architecture (Chapter 15). This architecture is based on the cooperative efforts of
several specialist knowledge components communicating by way of a blackboard
in the solution of a class of problems. The specialists are each expert in a different
area. For example, speech analysis experts might each deal with a different level
of the speech problem. The solution process is opportunistic, with each expert making
a contribution when it can. The solution to a given problem is developed as a data
structure on the blackboard. As the solution is developed, this data structure is
modified by the contributing expert. A description of the systems developed under
StiR is given in Barr and Feigenbaum (1981).

13.5 SUMMARY

Pattern recognition systems are used to identify or classify objects on the basis of
their attribute and attribute-relation values. Recognition may be .ccomplished with
decision functions or structural grammars. The decision functions as well as the
grammars may be deterministic, probabilistic, or fuzzy.
Before recognition can be accomplished, a system must learn the criteria for
object recognition. Learning may be accomplished by direct designer encoding,
supervised learning, or unsupervised learning. When unsupervised learning is re-

Chap. 13 Exercises 283

quired. some form of clustering may be performed to learn the object class characteris-
tics.
Speech understanding first requires recognition of basic speech patterns. These
patterns are matched against lexicon patterns for recognition. Basic speech Units
such as phonemes are the building blocks for longer units such as syllables and
words.

EXERCISES

13.1. Choose three common Objects and determine live of their most discriminating visual
attributes.
13.2. For the previous problem. determine three additional nonvisual attributes for the
objects which are most discriminating
13.3. Find a linear decision function which separates the following - v points into two
distinct classes.

-1.8 -5.--1 -3.3 -3.0 1.3 -1.1


0.1 3.4 0.0 2.3 -4. I -2.3
13.4. Describe how you would design a pattern recognition program which must validate
hand written signatures. Identify some potential problem areas
13.5. Compare the deterministic decision function approach to the probabilistic decision
function apporoach in pattern recognition applications. Give examples when each
would be the appropriate function to use.
13.6. Define a set of rewrite rules for a grammar for ssntactic . generation (recognition) of
objects such as the object of Figure 13.4.
13.7. give two examples of unsupersised learning in humans in which they learn to recognize
objects through clustering. Describe how different goals can influence the learning
process.
13.8. Apply the ISODATA algorithm to find three different clusters among the following
.r-v data points.

-2.1 -1.3 -1,2 -1.0 0.2 13 1.2 1.1


1.0 2.9 2.8 2.5 2.0 3.9 3.7 3.6
4.8 4.6 4.3 4.2 5.4 5.3 5.2 6.3
- 6.3 7.7 7.5 7.4 7.2 7.0 -1.6 -2.6
13.9. Consider a cluster algorithm which builds clusters of objects by forming small regions
in normalized attribute space (spheres in n-dimensional space) about each object,
and includes them in a cluster if and only if the sphere overlaps with at least one
other neighboring object sphere. Show how such a scheme could be used to partition
the attribute space into subspace with nonlinear boundaries.
13.10. Define an alphabet of shape primitives for a s y ntactic recognition grammar which

284 Pattern Recognition Chap. 13

can be used to recognize the integer characters 0. I, 2. 3, 4, 5, 6. 7, 8, and 9.


Check to see that the resultant character strings for each character are unique.
13.11. Give two examples where the single representation trick simplifies ctustering among
Unknown objects.
13.12. Compute the number of ways five objects can be arranged into I. 2, 3. and 4 groups.
From this, try to develop an inductive proof of arranging n objects into m groups.
13.13. Read 'High Level Knowledge Sources in Usable Speech Recognition Systems' by
Sheryl Young. Alexander Hauptmann, Wayne Ward, Edward Smith, and Philip Wer-
ner. in Communications of the ACM, Vol. 32, Number 2, Feb., 1989. Summarize
some of the more complicated problems associated with general speech recognition.
IA
I'-F

Visual Image
Understanding

Vision is perhaps the most remarkable of all of our intelligent sensing capabilities.
Through our visual system, we are able to acquire information about our environment
without direct contact. Vision permits us to acquire information at a phenomenal
rate and at resolutions that are most impressive. For example, one only needs to
compare the resolution of a TV camera system to that of a human to see the difference.
Roughly speaking, a TV camera has a resolution on the order of 500 parts per
square cm, while the human eye has a limiting resolution on the order of some 25
X 106 parts per square cm. Thus, humans have a visual resolution several orders
of magnitude better (more than 10,000 times finer) than that of a TV camera.
What is even more remarkable is the ease with which we humans sense and perceive
a variety of visual images. It is so effortless, we are seldom consciou a of the act.
In this chapter, we examine the processes and the problems involved in building
computer vision systems. We look at some of the approaches taken thus far and at
some of the more successful vision systems constructed to date.

14.1 INTRODUCTION

Because of its wide ranging potential, computer vision has become one of the most
intensely studied areas of Al and engineering during the past few decades. Some
typical areas of application include the following.

285
Chap. 14
286 Visual Image Understanding

MANUFACTURING

Parts inspection for quality control


Assembly, sorting, dispensing, locating, and packaging of parts

MEDICAL

Screening x-ray, tomographic, ultrasound, and other medical images

DEFENSE

Photo reconnaisance, analysis, and scene interpretation


Target detection, identification, and tracking
Microbe detection and identification
Weapons guidance
Remote and local site monitoring

BUSINESS

Visual document readers


Design tools for engineers and architects
Inspection of labels for identification
Inspection of products for contents and packaging

ROBOTICS

Guidance of welders and spray paint nozzles


Sorting, picking, and bin packing of items
Autonomous guidance of land, air and sea vehicles

SPACE EXPLORATION

Discovery and interpretation of astronomical images


Terrestial image mapping and interpretation for plant disease, mineral deposits,
insect infestations, and soil erosion -

Vision in an organic system is the process of sensing a pattern of light energy,


and developing an interpretation of those patterns. The sensing part of the process
consists of selectively gathering light from some area of the environment, focusing
and projecting it onto a light sensitive surface, and converting the light into electro-
chemical patterns of impulses. The perception part of the process involves the transfor-
mation and comparison of the transmitted impulse patterns to other prestored patterns
Sec. 14.1 Introduction 287

Illumination

Retina

Transparent
lens

Figure 14.1 The human process of visual interpretation.

together with some form of inference. The basic vision process as it occurs in
humans is depicted in Figure 14. 1
Light from illuminated objects is collected by the transparent lens of the eye,
focused, and projected onto the retina where Some 250 million light sensitive sensors
(cones and rods) are excited. When excited, the sensors send impulses through the
optic nerve to the visual cortex of the occipital lobes of the brain where the images
are interpreted and recognized.
Computer vision systems share some similarities with human visual systems,
at least as we now understand them. They also have a number of important differences.
Although artificial vision systems vary widely with the specific application, we
adopt a general approach here, one in which the ultimate objective is to determine
a high-level description of a three-dimensional scene witha competency level compara-
ble to that of human vision systems. Before proceeding farther we should distinguish
between a scene and an image of a scene. A scene is the set of physical objects in
a picture area, whereas an image is the projection of the scene onto a two-dimensional
plane.
With the above objectives in mind, a typical computer vision system should
be able to perform the following operations:

1. Image formation, sensing, and digitization


2. Local processing and image segmentation
3., Shape formation and interpretation
4 Semantic analysis and description. -

The sequence of these operations is depicted in Figure 14.2.


As we proceed through the processing stages of computer vision, the reader
will no doubt be impressed by the similarities and parallels one can draw between
vision processing and natural language processing. The image-sensor stage in vision
corresponds to speech recognition in language understanding, the low and intermediate
processing levels of vision correspond to syntactic and semantic language processing
respectively, and high level processing, in both cases, corresponds to the process
of building and interpreting high level knowledge structures.
Visual Image Understanding Chap. 14
288
Object processing Interpretation
Scene Local processing

uWU:_T]
Intermedate High level Sernant
Image sensor Low level
level descrr,t,n

IMAGE PROCESSING STAGES

Figure 14.2 Procesung stages in ciniputce vision s ,tcnss

Vision Processing Overview

The input to a vision system is a two dimensional image collected on some form
of light sensitive surface. This surface is scanned by some means to produce a
continuous voltage output that is proportional to the light intensity of the image on
the surface. The output voltage fix, y) is sampled at a discrete number of x and
points or pixel (picture element) positions and converted to numbers. The numbers
coirespond to the gray level intensity for black and oxhite images. For color images,
the intensity value is comprised of three separate arrays of numbers, one for the
intensity value of each of the basic o!ors (red. green, and blue).
Thus, through the digitization process.
' the image is transformed from a continu-
ous light source into an airay of numbers s'.hich correspond to the local image
in'ens:tlCs at the corresponding s-s piscl positions on the light sensitise surface.
sing the array of number'. certlia low level operations are performed. such
as smoothing of nighhoring points to reduce noise. finding outlines of oh1ls or
edge elements. thresholding recordit'e niav:Ii1um and minimum values only. depend-
ing on some fixed intensit y ihi lcvel i. and determining texture, color, and
other object features. These irihial processing steps are ones which are LiSC to
locate and accentuate object boundaries and other structure within the image.
The next stage of processing. the intermediate level, involves connecting.
tilling in, and combining boundaries. detcrmining regions. and assigning descriptise
labels to objects that have been accentuated in the first stage. This stage builds
higher level structures from the lower level elements of tile first stage. When complete.
it passes on labeled surfaces such as geometrical objects that may be capable of
identification.
High-level image processing consists of identifying the important objects in
:s.age and their relationships for subsequent dc.;cription as well-defined knoss ledge
strcures and hence, for use by a reasoning component.
Special types of vision systems may also require three dimensional processing
and analysis as well as motion detection and analysis.
Sec. 14.1 Introduction 289

The Objectives of Computer Vision Systems.

The ultimate goats of computer image understanding is to build systems that equal
or exceed the capabilities of human vision systems. Ideally, a computer vision
system would be capable of interpreting and describing any complex scene in complete
detail. This means that the system must not only be able to identify a myriad of
complex objects, but must also be able to reason about the objects, to describe
their function and purpose, what has taken place in the scene, why any visible or
implied events occurred, what is likely to happen, and what the objects in the
scene are capable of doing.
Figure 14.3 presents an example of a complex scene that humans can interpret
well with little effort. It is the objective of many researchers in computer vision to
build systems capable of interpreting, describing, and reasoning about scenes of
this type in real time. Unfortunately, we are far from achieving this level of compe-
tency. To he sure, some interesting vision systems have been developed, but they
are quite crude compared to the elegant vision systems of humans.
Like natural language understanding, computer vision interpretation is a difficult
problem. The amount of processing and storage required to interpret and describe
a complex scene can be enormous. For example, a single image for a high resolution
aerial photograph may result in some four to nine million pixels (bytes) of information
and require on the average some 10 to 20 computations per pixel. Thus, when
several frames must be stored during processing, as many as 100 megab ytes of
storage may be needed, and more than 100 million computations performed,

"0 1*1 SIDE By GARY LARSON

Figure 14.3 Ernpte of a cepks n.


(filE PAR SIDE COPYRIGHT 1994
UNIVERSAL PRESS SYNDICATE.
Repted by
AU i.hts.reserved.i
290 Visual Image Understanding Chap. 14

14.2 IMAGE TRANSFORMATION AND LOW-LEVEL PROCESSING

In this section, we examine the first stages of processing. This includes the process
of forming an image and transforming it to an array of numbers which can then be
operated on by a computer. In this first stage, only local processing is performed
on the numbers to reduce noise and other unwanted picture elements, and to accentuate
object boundaries.

Transforming Light Energy to Numbers

The first step in image processing requires a transformation of light energy to numbers,
the language of computers. To accomplish this, some form of light sensitive transducer
is used such as a vidicon tube or charge-coupled device (CCD).
A vidicon tube is the type of sensor typically found in home or industrial
video systems. A lens is used to project the image onto a flat surface of the vidicon.
The tube surface is coated with a photoconductive material whose resistance is
inversely proportional to the light intensity falling on it. An electron gun is used to
produce a flying-spot scanner with which to rapidly scan the surface left to right
and top to bottom. The scan results in a time varying voltage which is proportional
to the scan spot image intensity. The continuously varying output voltage is then
fed to an analog-to-digital converter (ADC) where the voltage amplitude is periodically
sampled and converted to numbers. A typical ADC unit will produce 30 complete
digitized frames consisting of 256 x 256. or 512 x 512 (or more) samples of an
image per second. Each sample is a number (or triple of numbers in the case of
color systems) ranging from (ito 64 (six bits) or 0 to 255 (eight bits). The image
conversion process is depicted in Figure 14.4.
A CCD is typical of the class of solid state sensor devices known as charge
transfer devices that are now being used in many vision systems. A CCD is a
rectangular chip consisting of an array of capacitive photodeteCtorS, each capable
of storing an electrostatic charge. The charges are scanned like a clock-driven shift
register and converted into a time varying voltage v,hich Is proportional to the
incident light intensity on the detectors. This voltage is sampled and converted to
integers using an ADC unit as in the case of the vidicon tube. The density of the

I Vithcon tube Ti— varying


voltage

Figure 14.4 Transfonmog


Anaio/dtQtvI
converter

the image to numbers.


Array of
numbers
Sec. 14.2 Image Transformation and Low-Level Processing
291
detectors on the chip is quite high. For example, a CCD chip of about
five square
centimeters in area may contain as many as 1000 by 1000 detectors.
The numeric outputs from the ADC units are collected as arrays of numbers
which correspond to the light intensity of the image on the surface of the transducer.
This is the input to the next stage of processing
illus trated in Figure 14.4.

Processing the Quantized Arrays

I 'array of numbers produced from the image sensing device may be thought of
as the Jowct, T1Oct primitive level of abstraction in the vision understanding process.
The next step in the P r
ocessing hierarchy is to find some structure among the
pixels
such as pixel clusters W1ii<h define object boundaries or regions within
the image.
Thus, it is necessary to transform the array of raw pixel data into regions of discont
ties and hom j nuj
ogeneity, to find edges and other delimiters of these object regions.
A raw digitized image Will contain some noise
and distortion, lheretofe, compu-
tations to reduce these effects may be necessary before locating edges and regions.
Depending on the particular application, low level processing will often require
local smoothing of the array to eliminate this noise. Other low level operations
include threshold processing to help define homogeneous regions, and different forms
of edge detection to define boundaries. We examine some of these low level methods
next.
Thresholding is the process of
t ransforming a gray level representation to a
binary representation of the image. All digitized array values above some threshold
level T are set equal to the maximum gray-level value (black), and values less
than or equal to I are set equal to zero (white). For simplicity, assume gray-level
values have been normalized to range between zero and one, and suppose a threshold
level of T = 0.7 has been chosen. Then all array values
to 1 and values g(x,y) 0.7 g(x,v) > 0.7 are set equal
are set equal to 0. The result is an array of binary 0
and I values. An example of an image that has been thresholded at
0.7 to produce
a binary image is illustrated in Figure 14.5.
Thresholding is one way to segment the image into sharpen object regicns by
enhancing some portions and reducing others like noise and other unwanted features.
Thresholding can also help to simplify subsequent processing steps. And in many
cases, the use of several different threshold levels may be necessary since low
intensity object surfaces will be lost to high threshold levels, and unwanted background
will be picked up and enhanced by low threshold levels.
T hresholding at several
levels may be the best way to determine different regions in the image when it is
necessary to compensate for variations in illumination or poor Contrast.
Selecting one or more appropriate threshold level settings 1', will require addi-
tional co
mputations, such as first producing a histogram of the image gray-level
intensities. A histogram gives the frequencies of occurrence of different intensity
(or some other feature) levels within the image. An analysis of a histogram can
reveal where concentrations of different intensity levels occur, where peaks and
broad fiat levels occur and where abrupt differences in level occur. From this informa-
Visual Image Understanding Chap. 14
292

Binary image
(b)
(a)
Figure 14.5 Threshold transformation of an Image.
,,ice of I values are often made apparent For example, a histogram
tisn the best (:l ) rations between intensity levels that have a relatively
with two or more clear sepa
high frequency of occurrence will usually suggest the best threshold levels for object
identification and separation. This is seen in Figure 14.6.
Smoothing is a form a
Next, we turn to the question of image smoothing.
digital filtering. It is used to reduce noise and other unwanted features and to enhanc
certain image features. Smoothing is a form of image transformation that tends tc
eliminate spikes and flaten widely fluctuating intensity values. Various forms of
smoothing techniques have been employed, including local averaging, the use of
models, and parametric form fitting.
One common method of smoothing is to replace each pixel in an arra) witl'
a weighted average of the pixel and its neighboring values. This can be accomplishe
with the use of filter masks which use some configuration of neighboring pixe
values to compute a smoothed replacement value. Two typical masks consist o
either four or eight neighboring pixels whose intensity values are used in the weightin

Potaible
threthold
Fregoenes

CrayIeel intensity

Figure 14.6 Histogram of tight intensity levels.

Courtesy of Kenneth Chapman and INTELLEDEX. INC.


Sec. 14.2 Image Transformation and Low-Level Processing

computation. If smoothing is being performed at pixel location (x,y), the neighboring


pixels are at the eight locations: (x +
(x,y + I), (x,y - I), (x - l,y -
l ,y - I), (x + l,y), (x + l,y + I),
1), (x - l,y), and (x - l,y - I). From these,
I- -q
either the four immediate neighbors (top, bottom, left, and right) or all eight neighbors
are sometimes chosen.
Examples of smoothing masks for four and eight neighborhood pixels are as
follows:
1142 3/32 1/321 1/8
I 3/32 j 3/32 1/8 . 1/2 1/8
[1/32 3/3 2 l/32J 1/8
The underlined number in each array identifies the pixel being smoothed.
(Note that the filter weights in a mask should sum to one to avoid distortions.)
Applying a. mask to an image array has the effect of reducing spurious noise as
well as sharp boundaries. It reduces sharp spikes but also tends to blur the image.
For example, when the eight mask filter given above is applied to the array shown
in Figure 14.7, the blurring effects are quite pronounced.
The degree of smoothing and hence blurring can, of course, be controlled
with the use of appropriate weighting values in the mask. Weighted smoothing of
this type over a region is known as convolution. Convolution is sometimes used to
smooth an image prior to the application of differential operators which detect edges.
We return to the subject of convolution smoothing after we look at the edge detection
problem.
Local edge detection is
the process of finding a boundary or delimiter between
two regions. An edge will show up as a relatively thin line or are which appears
as a measurable difference in Contrast between two otherwise homogeneous regions.

OrignaI image array


Smoothed image array
(a)
(b)
Figure 14.7 Application of a smoothing mask.1
2 Courtesy of Kenneth Chapman and INTELLEDEX, INC.
V
294
Visual Image Understanding

Regions belonging to the same object are usually distinguishable by one or more
Chap. 14

features which are relatively homogeneous throughout, such as color, texture, three-
dimensional how effects, or intensity.
Boundaries which separate adjoining regions represent a disco'ntiflUitY in one
or more of these features, a fact that can be exploited by measuring the rate of
change of a feature value over the image surface. For example, the rate of change
and vertical directions can be measured
or gradient in intensity in the horizon tal
defined as
with difference functions D, and D
= f(x.y) -fix -
D, =jiv.y)

a small integer greater than or equal to I.


where n is D and D will vary
When an image is scanned horizontall y or vertically.
little over homogeneous regions, and show a sharp increase or decrease at locations
where discontinuities occur. They are the discrete equivalents of the continuous
differential operators used in the calculus. The rate of change of the gradient can
also be useful in finding local edges as we will see below. For discrete functions.
te of change of gradient. comparable
second order difference operators provide the ra
to second order differential operators.
Since we are interested in locating edges with any given orientation, a better
gradient measure is one which is sensitive to intensity changes in any direction.
and D such as the vector
We can achieve this with a directional norm of D
gradient.
2
= (D + D

= tan
are most easily computed by application of the equivalent
For n = I, D, and D
respectiVelY.
weighting masks; the two element masks are (—I ) and
An example of the application of these two masks to an image array is illustrated
in Figure 14.8 where a vertical edge is seen to be quite pronounced. Masks such
as these have been generalized to measure gradients over wider regions covering
several pixels. This has the effect of reducing spurious noise and other sharp spikes.
Two masks deserving particular attention are the Prewitt (1970) and Sobel
(1970) masks as depicted in Figure 14.9. These masks are used to compute a broadened
normalized gradient than the simple masks given above. We leave the details of
the computations as one of the exercises at the end of this chapter.
We return now to the methods of edge detection which employ smoothing
followed by an application of the gradient. For this, the Continuous case is considered
first.

Sec. 14.2 Image Transformation and Low-Level Processing 295

Result of Dx and Dy
Original Array I Mask Applied to I
(a) (b)

Figure 14.8 Application of difference functions to an image.

The Continuous analog of discrete smoothing in one dimension is the convolution


of two functions land g (writtenf * g) where

h(y) = f * g = ffix)g(y - x)dx

Convolving the two functions f and g is similar to computing the cross con-elation,
a process that reduces random noise and enhances coherent or structural changes.
One particular form of weighting function g has a symmetric bell shape or
normal form, that is the Gaussian distribution. The two dimensional form of this
function is given by

g(u,v) = ce2*2)2
where c is a normalizing constant.
Because of their rotational symmetry, Gaussian filters produce desirable effects

— 1 0 1 —1 0 1
P,.= —1 0 1 S —2 0 2
—1 0 1 —1 0 1

1 1 1 1 2 1
p,= o 0 sr = 0 0 0
—1 —1 —1 —1 —2 —1
Figure 14.9 Generalized edge detection
Prewitt Masks Sobel Masks masks. -

296 Visual Iffige Understanding Chap. 14

J image
I_Gradient
$J'U-
Second order Gradient applied
intensity gradient to convolution

Figure 14.10 Application of Gaussian and second degree differential operators.

as an edge detector when followed by an appiication of the second degree differential


(gradient) operator. Over discontinuous regions, the transformed intensity results
in a zero crossing as depicted in Figure 14.10. The smoothing and differencing
operations may be combined into a single operator and approximated with a digital
mask of the types given above (Marr and Hildreth, 1980).
There is some psychological evidence to support the belief that the human
eye uses a form of Gaussian transformation called Lateral inhibition which has the
effect of enhancing the contrast between gradually changing objects, such as an
object and its background.
Another approach used to filter the digitized image applies frequency domain
transforms such as the Fourier transform. Since edges represent higher frequency
components, the transformed image can be analyzed on the basis of its frequency
distribution. For this, the Fourier transform has become one of the most popular
transform methods, since an efficient computation algorithm has been developed.
It is known as the Fast Fourier transform. The discrete two-dimensional version of
the Fourier transform is given by

F(u ,i') = ! flx,v) e2'"


I/n

fl
Applying this transform to an array of intensity values produces an array of
complex numbers that correspond to the spatial frequency components of the image
(sums of sine and cosine terms). The transformed array will contain all of the
information in the original intensity image, but in a form that is more easily used
to identify regions that contain different frequency components. Filtering with the
Fourier transform is accomplished by setting the high (or low) values of u and i.
equal to zero. For example, the value F(v.i') = F(0,0) corresponds to the zero
frequency or the DC component. and higher values of u and L correspond to the
high frequency components. As with intensity image arrays, thresholding of trans-
formed arrays can be used to separate different frequency components.
The original intensity image with any modifications, is recovered with the
inverse transform given by

1 n — I n —I
flx.v) = - F(u.u) exp - (xu + vv)
1? L fl
Image Transformation and Low-Level Processing 297
Sec. 14.2

Another method of edge detection is model fitting. This is accomplished by


locally fitting a parametric profile of edges to the image array. A model in the
form of a mask is shifted over a region and compared to the corresponding gray
levels. If the fit between the model and the gray-level pattern scores high enough,
an edge with the given orientation is labeled appropriately. Model fitting methods
usually require heavy computations. We omit the details here.

Texture and Color

As suggested earlier, texture and color are also used to identify regions and boundaries.
Texture is a repeated pattern of elementary shapes occurring on an object's surface.
Texture may appear to be regular and periodic, random, or partially periodic. Figure
14.11 illustrates some examples of textured surfaces.
The structure in texture is usually too fine to be resolved, yet still course
enough to cause noticeable variation in the gray levels. Even so, methods of analysis
for texture have been developed. They are, commonly based on statistical analyses
of small groups of pixels, the application of pattern matching, the use of Fourier
transforms, or modeling with special functions known as fractals. These methods
are beyond the scope of our goals in this chapter.
The use of color to identify and interpret regions requires more than three
Limes as much processing as gray-level processing. First, the image must be separated
into its three primary colors with red, green, and blue filters (Figure 14. 12).
The separate color images must then be processed by sampling the intensities
and producing three arrays or a single array of tristimulus values. The arrays are
then processed separately (in some cases jointly) to determine common color regions
and corresponding boundaries. The processes used to find boundaries and regions,
and to interpret color images is similar to that of gray-level systems.
Although the additional computation required in color analysis can be significant,
the added information gained from separate color intensity arrays may be warranted,
depending on the application. In complex. scene analysis, color may be the most
effective method of segmentation and object identification. In Section 14.6 we describe

Figure 14.11 Examples of textured surfaces.


298
Visual Image Understanding Chap. 14

Red

3reen
_=_S__H
Figure 14.12 Color separation and
FiIter processing.

an interesting color scene analyser which is based on a rule based inferencing system
(Ohta. 1985).

Stereo and Optic Flow

A stereoscopic vision system requires two displaced sensors to obtain two views of
objects from different perspectives. The differences between the views makes it
possible to estimate distances and derive a three-dimensional model of a scene.
The displacement of a pixel from one image to a different location in another image
is known as the disparity. It is the dispatity between the two views that permit the
estimation of the distance to objects in the scene. The human vision system is
somehow able to relate the two different images and form a correspondence that
translates to a three-dimensional interpretation. Figure 14.13 illustrates the geometric
relationships used to estimate distances to objects in stereoscopic systems.
The distance k from the lens to the object can be estimated from the relationships
that hold between the sides of the similar triangles. Using the relations i 1 / e 1 = f/ k,
i / e, f/k, and d = e 1 + e2 we can write

k = fd/(j 1 +i)
Since f and d are relatively constant, the distance k
is a function of the disparity,
or sum of the distances it and i2.
In computer vision systems, determining the required correspondence between
the two displaced images is perhaps the most difficult part in determining the disparity.

foci length
of lens
dstnce to
object P

P object
',, it are the
two Image, , Figure 14.13 Disparity in stereoscopic
systems.
299
Intermediate-Level Image processing
Sec. 14.3

Figure 14.14 Example of optical flow


in a scene.

Corresponding pixel groupings in the two images must be located to determine the
disparity from which the distance can be estimated. In practice, methods based on
correlation, gray-level matching. template matching, and edge contour comparisons
have been used to estimate the disparity between stereo images.
scene analysis which
Optic flow is an alternative approach to threedimenSiOfla l
is based on the relative motion of a sensor and objects in the scene. If a sensor is
moving (or objects are moving past a sensor), the apparent continuous flow of the
objects relative to the sensor is known as optical flow. Distances can be estimated
from the change in flow or relative velocity of the sensor and the objects. For
example, in Figure 14.14 if the velocity of the sensor is constant, the change in
x2 is proportional to the change in size of the
distance dx between points x and
power lines h, through the relation
dx/dt=k(dhldt)

This relationship is equivalent to the change in size of regular flowing objects


with aistance from the observer such as highways, railroad tracks, or power lines,
as depicted in Figure 14.14,

14.3 INTERMEDIATE-LEVEL IMAGE PROCESSING

The next major level of analysis builds on the low-level or early processing steps
described above. It concentrates on segmenting the image space into larger global
egions and boundaries formed from
structures using homogeneous features in pixel r
pieces of edges discovered during the low-level processing. This level requires that
pieces of edges be combined into contiguous contours which form the outline of
objects, partitioning the image into coherent regions, developing models of the
segmented objects, and then assigning labels which characterize the object regions.
One way to begin defining a set of objects is to draw a silhouette or sketch
of their outlines. Such a sketch has been called the raw primal sketch by Man
(1982). It requires connecting up pieces of edges which have a high likelihood o:
forming a continuous boundary. For example, the problem is to decide whethe:
two edge pieces such as
300
Visual Image Understanding Chap. 14
(edge (location 21 103)
(edge (location 18 98)
(intensity 0.8)
(intensity 0.6)
(direction 46)) (direction 41

should be connected. This general process of forming contours from pieces of edges
is called segmentation

Graphical Edge Finding

Graphical methods can be used to link up pieces of edges One approach is to use
a minimum spanning tree (MST). Starting at any cluster of pixels known to be
part of an edge, this method performs a search in the neighborhood of the cluster
for graupings with similar feature values. Each such grouping corresponds to a
node in an edge tree. When a number of such nodes have been found they are
Connected using the MST algorithm.
An MST is found by connecting an arc between the first node selected and
its closest neighbor node and labeling the two nodes accordingly. Neighborhoods
of both connected nodes are then searched. Any node found closest to either of the
two connected nodes (below some threshold distance) is then used to form the
next branch in the tree. A second arc is constructed between the newly found node
and the closest connected node, again labeling the new node. This process is repeated
until all nodes having are distances less than some value (such as a function of the
average arc distances) have been connected. An example of an MST is gisen in
Figure 14.15.
Another graphical approach is based on the assignment of a cost or other
measure ofmerit to pisel groupin g
s,The cost assignment can he based on a simple
function of features such as intensity. orientation, or color. A best-first (branch-
and-bound) or other form of graph search is then performed using sonic heuristic
function to determine a least-cost path which represents the edge contour.
Other edge finding approaches are based on fitting a low degree pol
y nomial
to a number of edge pieces which have been found through local searches. The
resultant polynomial curve se g
ment is then taken as the edge boundary. This approach
is similar to one which compares edge templates to short groupings of pieces. If a
particular matching template scores above some threshold, the template pattern is
then used to define the contour.

Figure 14. IS Edge finding using


minimum spanning tree.
Intermediate-Level Image Processing
301
'ec. 14.3

Edge Finding with Dynamic Programming

Edge following can also be formulated as a dynamic programming problem since


picking a best edge path is a type of sequential optimizion problem. Candidate
edge pieces are assigned a local cost value based on some feature such as intensity,
and the path having minimum cost is defined as the edge contour.
Assume that a starting point E, has been selected. Dynamic programming
begins with a portion of the problem and finds an optimal solution for this subproblem.
The subproblem is then enlarged, and an optimal solution is found for the enlarged
problem. This process continues step-by-step until a global optimum for the whole
problem has been found (a path to the terminal edge point Es).
The process can be described mathematically as a recursive process. Let C(s,t,,)
be the total cost of the best path for the remaining path increments, given the
search is at position (state) $ and ready to select t as the next move direction. Let
be the value of t that minimizes C, and C* the corresponding minimum of
C. Thus, at each stage, the following values are computed:

C*(s) = min C,,(s,t) = C(s,f,,*)


1.,

where

C,(s.t,,) (cost at stage n)


+ (minimum costs for stages n + I onwards)
= K(s,f,J + C.1*(r)

where K(s,t) is the Cost at stage n, and C., 1 (t) is the minimum Cost for stages
n + I to the terminal stage.
The computation process is best understood through an example. Consider
the following 5 X 5 array of pixel cost values.

19 7 6 5 I
13 7 2 7 I
41521
6 4 3 7 7
87223

Suppose we wish to find the optimal cost path from the lower left to the upper
right corner of the array. We could work from either direction, but we arbitrarily
choose to work forward from the lower left pixel with cost value 8. We first set
all values except 8 equal to some very large number, say M, and compute the
minimum cost of moving from the position with the 8 to all other pixels in the
bottom row by adding the cost of moving f rom pixel to neighboring pixel. This
results in the following cost array.

302 Visual Image Understanding Chap. 14

MMM.MM
MM M M M
M M M M M
M M M M M
8 15 17 19 22

Next, we compute the minimum neighbor path cost for the next to the last row to
obtain

MM M M M
M M M M M
M M M M M
14 14 17 24 29
8 15 17 19 22

Note that the minimum cost path to the second, third and fourth positions in this
row is the diagonal path (position 5,1 to 4,2) followed by a horizontal right traversal
in the same row, whereas the minimum cost path for the last position in this row
is the path passing through the rightmost position of the bottom row. The remaining
minimum path Costs are computed in a similar fashion, row by row, to obtain the
final cost array.

27 24 23 22 21
18 22 17 24 20
18 iS 19 19 26
14 14 17 24 29
8 15 17 19 22

From this final minimum cost array, the least cost path is easily found to be

(4,!) or (4,2) (3.2) (2,3) (1,4) (1,5)


as depicted with the double line path.

27 24 2322=2l
18 22 17 24 20
18 15r19 19 26
17 24 29
17 19 22

Dynamic programming methods usually result in considerable savings over


exhaustive search methods which require an exponential number of computations
and comparisons (see Chapter 9), whereas the number of dynamic programming
computations for the same size of problem is of linear order.
Sec. 14.3 Intermediate-Level Image Processing 303

Region Segmentation through Splitting and Merging

Rather than defining regions with edges, it is possible to build them. For example,
global structures can be constructed from groups of pixels by locating, connecting,
and defining regions having homogeneous features such as color, texture, or intensity.
The resulting segmented regions are expected to correspond to surfaces of objects
in the real world. Such coherent regions do not always correspond to meaningful
regions, but they do offer another viable approach to the segmentation of an image.
When these methods are combined with other segmentation techniques, the confidence
level that the regions represent meaningful objects will be high.
Once an image has been segmented into disjointed object areas, the areas
can be labeled with their properties and their relationships to other objects, and
then identified through model matching or description satisfaction.
Region segmentation may be accomplished by region splitting, by region grow-
ing (also - called region merging), or by a combination of the two. When splitting
is used, the process proceeds in a top-down manner. The image is split successively
into smaller and smaller homogeneous pieces until some criteria are satisfied. When
growing regions, the process proceeds in a bottom-up fashion. Individual pixels or
small groups of pixels are successively merged into coittiguous, homogeneous areas.
A combined splitting-growing approach will use both bottom-up and top-down tech-
niques.
Regions are usually assumed to be disjointed entities which partition the image
such that (I) a given pixel can appear in a single region only, (2) subregions are
composed of connected pixels, (3) different regions are disjoint areas, and (4) the
complete image area is given by the union of all regions. Regions are usually
defined by some homogeneous property such that all pixels belonging to the region
satisfy the property, and pixels not satisfying the property lie in a different region.
Note that a region need not consist of contiguous pixels only since some objects
may be split or covered by occluding surfaces. Condition 2 is needed to insure
that all regions are accounted for and that they fill up the complete image area.
In region splitting, the process begins with an entire image which is successively
divided into smaller regions which exhibit some coherence in features. One effective
method is to apply multiple thresholding levels which can isolate regions having
homogeneous features. Histograms are first obtained to establish the threshold levels.
This may require masking portions of the image to achieve effective separation of
complex objects. Each threshold level can then produce a binary image consisting
of all of the objects which exceed the thresholded level. Once the binary regions
are formed, they are easily delineated, separated, and marked for subsequent process-
ing. This whole process of masking, computing, and analyzing a histogram, threshold
ing, defining an area, masking, and so on can be performed in a recursive manner.
The process terminates when the masks produce monomodal histograms with the
image fully partitioned.
Segmentation techniques based on region growing start with small atomic
regions (one or a few pixels) and build coherent pixel regions in a bottom-up fashion:
304 Visual Image Understanding Chap. 14

Local features such as the intensity of a group of pixels relative to the average
intensity of neighboring pixels are used as criteria for the merging operation. A
low level of contrast between contiguous groups gives rise to the merging of areas,
while a higher level of contrast, such as found at boundaries, provides the criteria
for region segregation.
Split-and-merge techniques attempt to gain the advantages of both methods.
They combine top-down and bottom-up processing using both region splitting and
merging until some split-merge criterion no longer exists. At each step in the process,
split and merge threshold values can be compared and the appropriate operation
performed. In this way, over-splitting and under-merging can be avoided.

14.4 DESCRIBING AND LABELING OBJECTS

We continue in this section with further intermediate-level processing steps all aimed
at building higher levels of abstraction. The processing steps here are related to
describing and labeling the regions.
Once the image has been segmented into disjointed regions, their shapes,
spatial interrelationships, and other characteristics can be described and labeled for
subsequent interpretation. This process requires that the outlines or boundaries, ver-
tices, and surfaces of the objects he described in some wa y . It should be noted,
however, that a descript i on for a region can be based on a two- or three-dimensional
image interpretation. Initially, we focus on the two-dimensional interpretation.
Typically, a region description will include attributes related jo size, shape,
mnd genera' appearance. For example some or all of the following features might
included.

Region area
Contour length (perimeter) and orientation
Location of the center of alas
Minimum bounding rectangle
Com p actness (area divided by perimeter squared)
Fitted sc.trer matrix of pixels
Number and characteri s tics of holes or internal occlusions
Minimum bounding rectangle
Degree and type of texture
Average intensity (Or average intensities of base colors)
Type of boundary serments (sh:rp, iiziy, and Sc on) and their Iocon
Boundar y contrast
Chain code (described below)
Shape classification number (task specific)
Position and types of vertices (number of adjoining segments)

Some of the above features are illustrated in Figure 14 16.



Sec. 14.4 Describing and Labeling Objects 305

area, average intensity


perimeter length
scatter matrix of pixels
center of mass
number of holes
minimum bounding rectangle

Figure 14.16 Descriptive features for a region.

In addition to these characteristics, the relationships between regions may


also be important, particularly adjacent regions. Relations between regions can include
their relative positions and Orientations, distances betwee .n boundaries, intervening
regions, relative intensities or contrasts in color, degree of abutment, and degree
of connectivity or concentration. When the image domain is known, domain or
task specific features may also be useful..
Next, we examine some of the definitions and methods used for region descrip-
tions.

Desàribing Boundaries

Boundaries can be described as linked straight-line segments, fitted polynomial curves,


or in a number of other ways. One simple method of fitting straight-line segments
to an arbitrary boundary is by successive linear segmentation fitting. This method
permits any degree of fit accuracy at the expense of computation time. The fitting
procedure is illustrated in Figure 14.17.
The fitting begins by connecting a single straight line to the two end points
and using this as an approximation to the curve (a). Additional lines are then con-
structed using the points on the curve at maximum perpendicular distances to the
fitted lines (b, c,and d).

(a)

1^1^ fbI


^"Flgure 14.17 ('ure 6tting stth linear
(c) (di segments

2-

306 Visual Image Understanding Chap. 14

An algorithm to perform the biting would proceed as follows.

I. Starting with the two end points of the boundary curve, construct a straight
line between the points.
2. At successive intervals along the curve, compute the perpendicular dic
to the constructed line. If the maximum distance is within some specified
limit, stop and use the segmented line as an approximation to the houndar)
3. Otherwise, choose the point on the curve at which the largest distance occurs
and use this as a breakpoint with which to construct two new line semeni
which connect to the two endpoints. Continue the process recursvet's v. ui
each subcurve until the stopping condition of Step 2 is satisfied.

Chain Codes

Another method used for boundary descriptions is known as chain coding


code is it sequence of integers which describe the boundary of a region in terms
displacements from some starting point. A chain code is specified by four ot more
direction numbers which give a trace of the directional displacements of successive
unit line segments, An example of a four direction chain code is presented in Figure
14.18.
Chain cude descriptions are useful for certain types of object matchings. If
the starting position is ignored, the chain code is independent of object location. A
derivative or difference (m96 4) for a chain code can also be useful since it is
invariant under object rotation. The derivative is found by counting the number of
90 degree counterclockwise rotations made from segment to segment. Thus, the
derivative for the chain code of Figure 14.18 is the code
100003030010001 03000300003000000000.

Other Descriptive Features

Some other descriptive features include the area, intensity, orientation. center ul
mass, and hounding rectangle. These descriptions are determined in the toIui
way.

20

Figure 14.18 Ia) Region boundary


(5) direction numbers, and (C) chain
code for region boundary.
Chan code 11111003330000110000333332222222222
Sec. 14.4 Describing and Labeling Objects 307

1. The area of a region can be given by a count of the number of pixels contained
in the region.
2. The average region intensity is just the average gray-level intensity taken over
all pixels in the region. If color is used in the image, the average is given as
the three base color intensity averages.
3. The center of mass M. for a region can be computed as the average x-v vector
position (denoted as P1), that is

M = (I In) P1

4. The scatter matrix S an elliptical area which approximates the shape of


the region. It may be computed as the average distance from the center of
the region mass as follows. -

S,, = (P, - Mi .) (P, - M'.)'

where t denotes matrix transposition.


5. The minimum bounding rectangle is found as the rectangular area defined by
the intersection of the horizontal and vertical lines which pass through the
maximum and minimum pixel positions of the region.

Three-Dimensional Descriptions

Up to this point, we have been mainly concerned with developing two-dimensiona


descriptions for images. But for many applications, an analysis which produces it
three-dimensional scene description will be required. When a stereo s ystem is being
used, the methods described in the previous section for stereo analysis can he applied
to estimate such parameters as depths, volumes, and distances of objects. When a
two-dimensional image is being used as the source, this information must be deter-
mined by other means.
Several programs capable of interpreting images consisting of three -dimensional
polyhedral blocks world objects were written beginning in the earl y 196 (Roberts.
1965. Guzman, 1969, Huffman, 1971, Clowes, 1971 and Waltz, 1975). The experi-
ence gained from this work has led to algorithms and techniques with which to
classify and identify regular complex polyhedral types of objects from two-dimensional
images.
Roberts wrote a program which began by finding lines in the image which
corresponded to the edges of the polyhedral objects. It then used descriptions of
the lines to match against stored models of primitive objects such as wedges. cubes,
and prisms. To perfm the match operation, it was necessary to transform the
objects by scaling and performing rotations and translations until a best match was
Visual Image Understanding Chap. 14
308

possible. Once a match was obtained and all objects identified, the program demon-
strated its "understanding'' of the scene by producing a graphic display of it on a
monitor screen.
Guzman wrote a program called SEE which examined how surfaces from the
same object were linked together. The geometric relationships between different
types of line junctions (vertices) helped to determine the object types, Guzman
identified eight commonly occurring edge junctions for his three-dimensional blocks
world objects. The junctions were used by heuristic rules in his program to classify
the different object b y type (Figure 14.19).
Huffman and Cloes. working independently, extended this work by developing
a line labeling scheme which systematized the classification of polyhedral objects.
Eheim 'scheme was used to classify edges as either concave. convex, or occluding.
Concave edges are produced by two adjacent touching surfaces which produce a
concave (less than 180 depth change. Conversely, convex edges produce a convexly
viewed depth change (greater than 180 0 ) . and an occluding edge outlines a surface
that obstructs other objects.
To label a concave edge. a minus sign is used. Convex edgs are labeled
with a plus sign. and a right or left arrow is used to label the occluding or boundary
edges. By restricting vertices to be the intersection of three object faces (trihedral
vertices), it is pos s ible to reduce the number of basic vertex types to only tour: the
L. the T. the Fork. and the Arrow (Figure 14.20). Different label combinations
assigned to these tour types then assist in the classification and identification of
objects.
When a three-dimensional object is viewed from all possible positions. the
four junction types. togcther with the valid edge labels, give rise to eighteen different
permissible junction configurations as depicted in Figure 14.20. From a dictionary
of these valid junction types, a program can classify objects by the sequence of
bounding vertices which describe it. Impossible object configurations such as the
one illustrated in Figure 14.21 can also be detected.
Geometric constraints, together with a consistePlabeling scheme, can greatly
simplify the object identification process. A set of labeling rules which greatly
facilitates this process can be developed for different classes of objects. For example.
using the labels described above, the following rules will apply for many polyhedral

L Y
The L The T The fork The X

x.

T A
The arrow The psi The peak The multi

Fi* 14.19 Three-dimensional


--->
"* j4yhedrat junction types.
Sec. 14.4 Describing and Labeling Objects 309

L typea
L L L L LL
Fork ty
YYYYY
I 4

I tYpes -

/-j 1\
+ + -
Arrow types
/+ \ /
Figure 14.20 Valid junction labels for three-dimcn'ionaI shapes.

Figure 14.21 Example ot an iophte


object.

objects: ( I ) the arrow should be directed to mark boundaries by traversing the object
in a clockwise direction (the object face appears on the right of the arrow). (2)
unbroken lines should have the same label assigned at both ends. (3) when a fork
is labeled with a ± edge, it must have all three edges labeled as +, and (4) arrow
junctions which have a -. label on both barb edges must also have a + label on
the shaft.
These rules can he applied to a pol y gonal object as illustrated in Figure 14.22.

El

Figure 14.22 Example of object


labeling.
310 Visual Image Understanding Chap. 14

Starting with any edge having an object face on its right, the external boundary is
labeled with the in a clockwise direction. Interior lines are then labeled with +
or - consistent with the other labeling rules.

Filtering with Constraint Satisfaction

Continuing with this early work, David Waltz developed a method of vertex constraint
propagation which establishes the permissible types of vertices that can be associated
with a certain class of objects. He broadened the class of images that could be
anal y zed by relaxing lighting conditions and extending the labeling vocabulary to
accommodate shadows, some multiline junctions and other types of interior lines.
His constraint satisfaction algorithm was one of his most important contributions.
To see how this procedure works, consider the image drawing of a pyramid
as illustrated in Figure 14.23. At the right side of the pyramid are all possible
lahe!ings for the four junctions A, B. C, and D.
Using these labels as mutual constraints on connected junctions, permissible
labels for the whole pyriniid can be determined. The constraint satisfaction procedure
works as follows:

I. Starting at an arbitrary junction, say A. a record of all permissible labels


is made for that junction. An adjacent junction is then chosen, say B. and labels
which are inconsistent with the line AB are then eliminated from the permissible
A and B lists. In this case, the line joining B can only he a + . -. or an up-arrow

Junction Possible LabeliryS

A/N
AD C/

Figure 14.23 t'si'ohIc I.thcIin'. For in object


Sec. 14.4 Describing and Labeling Objects 311

-. Consequently, two of the possible label ings can be eliminated with the remaining
four being

2. Choosing junction C next, we find that the BC constraints are satisfied by


all of the B and C lahelings; so no reduction is possible with this step. On the
other hand, the line AC must be labeled as - or as an up-leftarrow - to be
consj' :ent. Therefore, an additional label for A can be eliminated to reduce the
remainder to the following.

3. This new restriction on A now permits the elimination of one B labeling


to maintain consistency. Thus, the permissible B labelings remaining are now

This reduction in turn, places a new restriction on BC, permitting the elimination
of one C label. since BC must now he labeled as a -f only. This leaves the remaining
C labels as

4. Moving now to Junction D, we see that of the six possible D lahelings.


only three satisfy the BD constraint of a - or a down-arrow. Therefore, the remaining
permissible lahelings for D are now

Continuing with the above procedure, it will be found that further label elimina-
tions are not possible since all constraints have been satisfied. The above process
is completed by finding the different combinations of unique labelings that can be
assigned to the figure. This can be accomplished through a tree search process. A
simple enumeration of the remaining labels shows that it is possible to find only
Visual Image UnderStanding Chap. 14
312

the end of this


three different labelings. We leave the labeling as an exercise at
chapter.
The process of constraint satisfaction described here has been called Waltz
filtering. It is a special form of a.more general solution process called relaxation in
which constraints are iteratively reduced or eliminated until equilibrium is reached.
Such processes can be very effective in reducing large search spaces tb manageable
ones.

Template Matching

Template matching is the process of comparing patterns found in an image with


prestored templates that are already named. The matching, process may occur at
lower levels using individual or groups of pixels using correlation techniques or at
higher image processing levels using labeled region structures. Comparisons between
object and template may be based on an exact or on a partial match; and, the
matching process may use whole or component pieces when comparing the two.
Rigid or flexible templates may also be used. (An example of flexible template
matching was described in Section 10.5, Partial Matching, where the idea of a
rubber mask template was introduced.)
Template matching can be effective only when the search process is constrained
in some way. For example, the types of scenes and permissible objects should be
known in advance, thereby limiting the possible pattern-ten--plat, pairs. The use of
some form of informed search can help restrict the size of the search space.

HIGH-LEVEL PROCESSING

Before proceeding with a discussion of the final (high-level) steps in vision processing,
we shall briefly review the processing stages up to this point. We began with an
image of gray-level or tristimulus color intensity values and digitized this image to
obtain an array of numerical pixel values. Next, we used masks or some other
transform (such as Fourier) to perform smoothing and edge enhancement operations
to reduce the effects of noise and other unwanted features. This was followed by
edge detection to outline and segment the image into cohernt regions. The product
of this step is a primal sketch of the objects. Region splitting and/or merging, the
dual of edge finding, can also be used separately or jointly with edge finding as
part of the segmentation process.
Histogram computations of intensity values and subsequent analyses were an
important part of the segmentation process. They help, to establish threshold levels
which serve as cues for object separation. Other techniques such as minimum spanning
tree or dynamic programming are sometimes used in these early processing stages
to aid in edge finding.
Following the segmentation process, regions are analyzed and labeled with
their characteristic features. The results of these final steps in intermediate-level
Sec. 14.5 High-Level Processing 313

processing is a set of region descriptions (data structures). Such structures are used
as the input to the final high-level image processing stage. A summary of the data
structures produced from the lowest processing stage to the final interpretation stage
then can be depicted as follows.

Scene

t
Objects

t
Regions

t
Edges or subregions

t
Pixels

Marr's Theory of Vision

David Man and his colleagues (1982, 1980, and 1978) proposed a theory of vision
which emphasized the importance of the representational scheme used at each stage
of the processing. His proposal was based on the assumption that processing would
be carried out in several steps similar to the summary description given above.
The steps, and the corresponding representations are summarized as follows.

1. Gray-level image. The lowest level in processing consisis of the two-dimen-


sional array of pixel intensity levels. These levels correspond to important physical
properties of world objects, the illumination, orientation with respect to the observer.
geometry, surface reflectances, and object discontinuities. Processing at this stage
is , cal with structure being implicit only. The key aspect of representation at this
level is that it facilitate local and first-order statistical transformations.
2. Raw primal sketch. The primal sketch is a two-dimensional representation
which makes object features more iconic and explicit. It consists of patterns of
edge segments, intensity changes, and local features su-h as texture. The representa-
tion at this stage should emphasize pictorial image descriptions and facilitate transfor-
mations to the next stage where surface features and volumes are described.

3. The 2 1'-dimensional sketch. This sketch is an explicit representation of a


three-dimensional scene in which objects are given a viewer-centered coordinate
system. The models here provide distance, volume, and surface structure. Three-
dimensional spatial reconstruction requires the use of various tools including stereop-
sis, shape contours, shape from shading and texturc, and othLr tools described earlier.
314 Visual Image Understanding Chap. 14

4. The three-dimensional models. The representations at the three-dimensional


model level are symbolic ones giving attribute, relational, and geometric descriptions
of the scere. The use of generalized cones and cylinders help to represent many
object types, and hierarchical descriptions facilitate processing.

High-.evol Processing

High-level processing techniques are less mechanical than either of the prececdng
ima g e processing levels. They are more closely related to classical Al symbolic
methods. In the high-level processing stage, the intermediate-level region descriptions
are transformed into high-level scene descriptions in one of the knowledge repiesenta-
lion formalisms described earlier in Part II (associative nets, frames. FOPL statements,
and SO on: see Figure 1424).
The end objective of this stage is to create high-level knowledge structures
which can he used by an inference program. Needless to say, the resulting structures
should uniquely and accurately describe the important objects in an image including
their interrelationships. In this regard, the particular vision application will dictate
the appropriate level of detail, and what is considered to he important in a scene
description.
There are various approaches to the scene description problem. At one extreme,
it will be sufficient to simply apply pattern recognition methods to classif y certain
objects within a scene. This approach may require no more than application of the
methods described in the preceding chapter. At the other extreme, it may be desirable
to produce a detailed description of some general scene and provide an interpretation
of the function, purpose, intent, and expectations of the objects in the scene. Although
this recuiremeni is beyond the current stae-of-the-art, we can sa y that it will require
a gre. many prestored pattern descriptions and much general world knowledge. It
will also require improvements on many of the processing techmques described in
this chapter.

lregion6
(mass-center 2348)
(shape-code 24)
(area 245)
(number-boundary-segments 6)
(chain-code 1133300011. . .1
(orientation 85)
(borders )region4 (position left-of) (contrast 5))
(region7 (position above) (contrast 2))

(mean-intensity 0.6)
(texture light regular)

Figure 14.24 Typical description of a


segmented region.
Sec. 14.5 High-Level Processing 315

Before a scene can be described in terms of high-level structures, prestored


'model descriptions of the objects must be available. These descriptions must be
compared with the region descriptions created during the intermediate-level stage.
The matching process can take the form of rule instantiations, segmented graph or
network matchings, frame instantiations, traversal of a discrimination network (deci-
sion tree), or even response to message patterns in an object oriented system. The
type of matching will naturally be influenced by the representation scheme chosen
for the final Structures.
To round out this section, we consider some of the approaches used in the
high-level processing stage. In the following section, we consider some complete
vision system architectures.
Associative networks have become a popular representation scheme for scene
descriptions since they show the relationships among the objects as well as object
characteristics. A simple example of an outdoor scene representation is illustrated
in Figure 14.25.
Scene desriptions such as this can be formulated by interpreting region descrip-
tions of the type shown in Figure 14.16. The interpretation knowledge will be
encoded in production rules or other representation scheme. For example, a rule
used to identify a sky region in an outdoor color scene would be instantiated with
sky properties such as intensity, color, shape, and so forth. A rule to identify houses
in an aerial photograph would be instantiated with conditions of area, compactness,
texture, type of border, and so on, as illustrated in Figure 14.26. Rule conditions
will sometimes have fuzzy or probability predicates to allow for similarity or partial

linear I Scene
bodary
trn

building

,/ \\
region 1 mad' brick
divided
Color has-

blue grey (#) made-of


type

rrten. lv type (iiII)


hull -vertieal

Figure 14.25 Associative network scene representation.


Visual Image Understanding Chap. 14


316

matches rather than absolute ones. Rule conclusions will be rated by likelihood or
certainty factors instead of complete certainty. Identification of objects can then be
made on the basis of a likelihood score. In Figure 14.26 (a) pairs of numbers are
given in the antecedent to suggest acceptable condition levels comparable to Dempster-
Shafer probabilities (the values in the figure are arbitrarily chosen with a scale of 0
to 1.0)
When rule-based identification is used, the vision system may be given an
initial goal of identifying each region. This can be accomplished with a high-level
goal statement of the followig type.

(label region
(or (rgn building)
(rgn = bushes)
(rgn = car)
(rgn = house)
(rgn road)
(rgn shadow)
(rgn tree)))

Other forms of matching may also be used in the interpretation process. For
example, a decision tree may be used in which region attributes and relation values
determine the branch taken at each node when descending the tree The leaves of
the decision tree are labeled with the object identities as in Figure 14.27.

(R10-sky
(and (location upper rgn)
(intensity rgn bright (0.4 0.8))
(color rgn (or (blue grey)) (0.7 1.0)1
(textiral rgn low (0.8 1.0))
(linear-boundary rgn rgn2 (0.4 0.7())

(label rgn sky))

(a) Sky Identification Rule


(R32-building
land (intensity-avg rgn > image)
(area >= 60)
(area <= 250)
(compactness >= 0.6)
(texture-variation <= 64.0)
(percent border-planer >= 60)

(label region HOUSE (0.9))))


Figure 14.26 Interpretation rules for a
Sky and a building.
(b) Building Identification Rule


Sec. 14.6 Vision System Architectures 317

jagged circular linear cylindrical other

./esire
/I\ /\ /I\ I\ /\
large medium small

lom

yellow green blue grey white

/I\
A\
}esture
/\

none slight medium heavy

\/ /\

sky
VYvvvv tree road car building sidewalk lawn

Figure 14.27 Object identification tree.

Objects, with their attributes and relations are then used to construct an associa-
tive net scene, a frame network, or other structure.

14.6 VISION SYSTEM ARCHITECTURES

In this section we present two vision systems which are somewhat representative
of complete system architectures. The firsi system is a model-based system, one of
the earliest successful vision systems. The second is a color region analyzer recently
developed at the Universit y of Kyoto. Japan.
318 Visuai im3ge. Understanding Chap. 14

The ACRONYM System

The ACRONYM system is a model-based, domain independent system developed


by Rodney Brooks (1981) while at Stanford University during the late. 1970s. The
system takes user descriptions of sets of objects as models or patterns which are
then used to assist in the identification of structures appearing in monocular images.
Figure 14.28 illustrates the main components of the system.
The user prepares descriptions of objects or general classes of objects and
their spatial relationships and subclass relationships in the form of LISP statements.
For example, to model a class of screwdrivers with lengths between I and 10
inches, descriptions like the following are specified.

(user variable DRIVER-LENGTH ( 10.0 INCHES))


(user-variable HANDLE-LENGTH ) 4.0 INCHES))
(user-constant HANDLE-RADIUS I' 0.5 INCHES))
(user-constant SHAFT-RADIUS ( 0.125 INCHES))
(define object SCREWDRIVER having
subpart SHAFT
subpart HANDLE)
(define object SHAFT having cone-descriptor
(define cone having main cone
(define simple-cone having
cross-section (define cross-section having
type CIRCLE
radius SHAFT-RADIUS)
spine (define spine having
type STRAIGHT
length (- DRIVER-LENGTH HANDLE-LENGTH))
sweeping-rule CSWH)
(affix HANDLE to SCREWDRIVER)
(affix SHAFT to HANDLE with pos HANDLE-LEIGTH 0 0)

The user descriptions are parsed al transformed by the system Into geolTietrIc
and algebraic network representations. These repreentations provide volumetric de-
scriptions in local coordinate systems. A graphic pre.entatiofl, the systems interpreta-
tion of the input models created by the user, provides feedback to the user during
the modeling process. The completed representsikm.s are used by the system to
predict what features (e.g. shape, orientation, and position) of the modeled objects
can be observed from the input image components. The predicted models are stored
as prediction graphs.
The visual input consists of gray-level image processing arrays, a Line finder,
and an edge linker. This part of the system provides descriptions of objects as
defined by segmented edge structures. The descriptions created from this unit are
represented as observation graphs. One output front predictor serves as an input
Sec. 14.6 Vision System Architectures 319

algebra

/
user -... parser ---s.. predictor -
._ edge mapping module

graphicsqeometry

iflterp rein

/ Figure 14.28 Main junctional


nsge - inn finder —.-edge rnappe, tinker components of ACRONYM

to the edge mapping and linking module. This unit uses the predicted information
(predicted edges, ribbons, or ellipses in the modeled objects) to assist in finding
and identifying image objects appearing in the input image. Outputs from both the
predictor and the edge mapper and linker serve as inputs to the interpreter. The
interpreter is essentially a graph matcher. It tries to find the most matches among
suhgraphs of the image observation graph and the prediction graph.
Each match
becomes an interpretation graph. Partial matching is accommodated in the interpreta-
tion process through consistency checks.
The basic interpretation process is summarized in Figure 14.29 where models
are given for two wide bodied aircraft. (a Boeing 747 and a Lockheed L-101 I).
and the interpretation of an aircraft from gray-level image to ACRONYM's interpreta-
tion is shown.

Ohta's Color Scene Analyzer

Yuichi Obta or K yoto University recently


dev eloped a vision system Which performs
region analysis on outdoor color scenes (1986). Outdoor scenes typically include
Objects
such as trees, bushes, sky, roads, buildings, and other objects which are
more naturally defined as regions rather than edges. His system makes use of the
role color call in the seg mentation process.
Starting with tricolor (red, green, and blue) intensity arrays, digitized
are produced from which regions are defined by a images
se gmentation splitting process.
The output of the segmentation process is a
two- dim ensional array which identifies
regions as commonly numbered pixel areas. This array is then transformed into a
structured data network which contains descriptive elements for regions such as
boundary segments, vertices, and the like. With this network, the system
constructs
a semantic description of the scene using model knowledge in the form of
production
rules Figure 14.30 illustrates the main processing steps carried out by the system.
320 Visual Image Understanding chap. 14

(a)

(b) (c)

L1i (d)
nm

Figure 14.29 Models and stages of tntcrpretahbon in ACRONYM (CowtesY of


Rodney A. Brooks)

Sec. 14.6 Vision System Architectures 321

Input color image

Preliminary
segmentation
Bottom-up
process
- Structured data network

Plan generation - ____________

Plan
Top-down
process

Production system

Figure 14.30 Color scene region


Scene description analyzer.

In the preliminary segmentation stage, the image is segmented into coherent


regions using region splitting methods based on color information. Multihistograms
serve as cues for thresholding and region splitting. Color features are selected based
on results from the Karhunen-Loeve transformation (Devijver and Kittler, 1982)
which amounts to choosing those features having the greatest discriminating power
(essentially the greatest variance). These segmented regions become the atomic dc-
merits from which the structured data network is built.-
Regions are characterized by their boundary NE .tiLS . Vertices. line segments,
and holes. These basic descriptions are torined during the preliminary segmentation
phase. Front these, other Features are derived including the area, mean color intensities
red, green, blue), degree of texture. contour length. position of mass Center, number
of holes, minimum bounding rectangle, distance from origin, and orientation. These,
and region relationships are described in a data structure known as a patchery data
structure. Elements in the data network are essentially matched against model knowl-
edge described in the form of production rules. The rule actions then effectively -
construct the scene description.
A plan is a representation of the crude structure of the input scene given as
object labels and their degree of correctness. It is generated by the bottom-up process
to provide clues concerning which knowledge can be applied to different parts of
the scene. -
Knowledge of the task world is represented by sets of production rules. One
set 'is used in the bottom-up process and the other in the top-down process. Each
rule in the bottom-up set has a fuzzy predicate which describes properties of relations

22-
M Visual Image Understanding Chap. 14

between objects. The rules also have weights which indicate the level of uncertainty
of the knowledge. Each rule in the top-down set is a condition-action pair, where
the condition is a fuzzy predicate which examines the Situation of the data base.
The action pert includes operations to construct the scene descntion. An agenda
manages the activation of production rules and schedules the executable actions.
Examples of a typical property rule and a relation rule are as follows:

I(GEN (or (blue sk) (gray sk)) (1.0 . 0.2)) (sk)j

IGEN (and (linear-boundary bl sk)


(not (position up b1'ski))
0.0 .0.5) for sky) (b1 sk))

The first rule is a property rule about the color of the sky (blue or gray). The
second rule is a relation rule about the boundary between a building and the sky.
The boundary between the two has a lot of linear parts, and the building is not on
the upper side of that boundary.
The final product of the analyzer is, of course, a description of the scene.
This is constructed as a hierarchical • network as illustrated in Figure 14.31.
Ohta's system has demonstrated that it can deal with fairly complex scenes,
including objects with substruitures. To validate this claim, a number of outdoot
scenes from the Kyoto University campus were analyzed correctly by the system.

scene

Obiect

region

Subregion

Pitch

Pi,.I

Figure 14.31 Basic structure of the scene description.



Chap. 14 Exercises 323

14.7 SUMMARY

Computer vision is a computation intensive process. It involves multiple transforma-


tions starting with arrays of low level pixels and progressing to high level scene
descriptions. The transformation process can be viewed as three stages of processing:
low- or early-level processing, intermediate-, and high- or late-level processing.
Low-level processing is concerned with the task of finding structure among thousands
of atomic gray-level (or tnstimulus) pixel intensity values. The goal of this phase
is to find and define enough structure in the raw image to permit its segmentation
into coherent regions which correspond todelinable objects from the source scene.
Intermediate-level processing is concerned with the task of accurately forming and
describing the segmented regions. The atomic units Zf this phase are regions and
subregions. Finally, high-level processing requires that the segmented regions from
the intermediate stage be transformed into scene descriptions. This stage of processing
is less , mechanical than the two previous siages and relies more on classical Al
methods of symbolic processing. -
Low-level processing usually involves some form of smoothing operation on
the digitized arrays. Smoothing helps to reduce noise and other unwanted features.
This is followed by some form of edge detection such as the application of difference
operators to the arrays. Edge fragments must then be joined to form Continuous
contours which outline objects. Various techniques are available for these operations.
The dual-of-the-edge-finding approach is region segmentation, which may be
accomplished by region splitting, region growing, or a combination of both. Multihis-
tograms and thresholding are commonly used techniques in the segmentation process.
They are applied to one or more image features such as intensity, color, texture,
shading, or optical flow in the definition of coherent regions. The end product of
the segmentation process will be homogeneous regions. The properties of these
regions and their interrelationships must be described in order that they may be
identified in the high level processing stage. Regions can be described by boundary
segments, vertices, number of holes, compactness. location, orientation, and so
on. .
The final stage is the knowledge application stage, since the regions must be
interpreted and explained. This requires task or domain specific knowledge as well
as some general world knowledge. The current state-of-the-art in computer Vision
does npt permit the inrpretatioti of arbitrary complex scenes such as that depicted
in Figure 14.1. Much work still remains before that degree of sophistication can
be realized.

EXERCISES

14.1. Visual continuity in TV systems is maintained through the generation of 15 frames


per second. What characteristics must an ADC unit have to match this same level
of continuity for a vision system with a 1024 *7 1024 pixel resolution?

324 Visual Image Understanding Chap. 14

14.2. Describe the types of world knowledge a vision system must have to"comprehend"
the scene portrayed in Figure 14.3.
14.3. Suppose the CPU in a vision system takes 200 nanoseconds to perform memory/
register transfers and 500 nanoseconds to perform basic arithmetic operations. Esti-
mate the time required to produce a binary image for a system with a resolution of
256 x 256 pixels.
14.4. How much memory is required to produce and compare five different binary images,
each with a different threshold level? Assume a system resolution of 512 x 512.
Can the binary images be compressed in some way to reduce memory requirements?
14.5. Find the binary image for the array given below when the threshold is set at 35.

23 132 35
36 30 42 38
2 9 34 36
37 36 35 33

14.6. Given the following histogram, what are the most likely threshold points? Explain
why you chose the given points and rejected others.

H ito9ran

14.7. What is the value of the smoothed pixel for the associated mask'?

MASK PIXELS
3/16 [78 9
3/16 1/4 3/16 5 4 6
3/16 4 6 2

14.8. Compare the effects of the eight- and four-neighbor filters described in Section 14.2
when applied to the following array of pixel gray-level values.

58 8 10 12 29 32 30
47 8 9 10 9 30 29
58 7 S II 33 31 34
69 8 10 34 3' 29 33
68 9 32 30 29 5 6
87 31 32 32 28 6 7
7 8 33 33 29 7 8 7
9 30 32 31 28 8 8 9-
Chap. 14 Exercises 325

14.9. Low noise systems should use little or no filtering to avoid unnecessary blurring.
This means that more weight should be given to the pixel being smoothed. Define
two low-noise filters, one a four-neighbor and one an eight-neighbor filter, and compare
their effects on the array of Problem 14.5.
14.10. Using a value of n I, apply D and D (horizontally) to the array of Problem
14.5 and comment on the trace of any apparent edges.
14.11. Apply the vector gradient to the array of Problem 14.5 and compare the results to
those of Problem 14.7.
14.12. This problem relates to the application of template matching using correlation tech-
niques. The objective is to try to match an unknown two-dimensional curve or wave-
form with a known waveform. Assume that both waveforms are discrete and are repre-
sented as arrays of unsigned numbers. Write a program in any suitable language to
match the unknown waveform to the known waveform using the correlation function
given as
c,<x,zi>
lix ii iiZil

where X is the unknown pattern vector, Zi is the known pattern vector at position i,
< X,Z> denotes the inner product of X and Z, and liXil is the norm of X.
lxii =
14.13 Write a program to apply the Sobel edge detection mask to an array consisting of
256 x 256 pixel gray level values.
14.14 Color and texture are both potentially useful in defining regions. Describe an algorithm
that could be used to determine regions that are homogenious in color.
14.15 Referring to Problem 14.14, develop an algorithm that can be used to define regions
that are homogeneous in texture.
14.16 Referring to the two previous problems, develop an algorithm that determines regions
on the basis of homogeniety in both color and texture.
#1I
I-J

Expert Systems
Architectures

This chapter describes the basic architectures of knowledge-based systems with empha-
sis placed on expert systems. Expert systems are a recent product of artificial intelli-
gence. They began to emerge as university research systems during the early 1970s.
They have now become one of the more important innovations of Al since they
have been shown to be successful commercial products as well as interesting research
tools-
Expert systems have proven to be effective in a number of problem domains
which normally require the kind of intelligence possessed by a human expert. The
areas of application are almost endless. Wherever human expertise is needed to
solve problems. expert systems are likely candidates for application. Application
domains include law, chemistry, biology, engineering, manufacturing, aerospace,
military operations, finance, banking, meteorology, geology. geophysics. and more.
The list goes on and on.
In this chapter we explore expert system architectures and related building
tools. We also look at a few of the more important application areas as well. The
material is intended to acquaint the reader with the basic concepts underlying expert
systems and to provide enough of the fundamentals needed to build basic systems
or pursue further studies and conduct research in the area.

326
Sec. 15.1 Introduction 37

15.1 INTRODUCTION

An expert system is a set of programs that manipulate encoded knowledge to solve


problems in a specialized domain that normally requires human expertise. An expert
system's knowledge is obtained from expert sources and coded in a form suitable
for the system to use in its inference or reasoning processes. The expert knowledge
must be obtained from specialists or other sources of expertise, such as texts, journal
articles, and data bases. This type of knowledge usually requires much training
and experience in some specialized field such as medicine, geology, system configura-
tion, or engineering design. Once a sufficient body of expert knowledge has been
acquired, it must be encoded in some form, loaded into a knowledge base, then
tested, and refined continually throughout the life of the system.

Characteristic Features of Expert Systems

Expert s'ystems differ from conventional computer systems in several important ways.
I. Expert systems use knowledge rather than data to control the solution process.
"In the knowledge lies the power" is a theme repeatedly followed and supported
throughout this book. Much of the knowledge used is heuristic in nature rather
than algorithmic.
2. The knowledge is encoded and maintained as an entity separate from the
control program. As such, it is not compiled together with the control program
itself. This permits the incremental addition and modification (refinement) of the
knowledge base without recompilation of the control programs. Furthermore, it is
possible in some cases to use different knowledge bases with the same control
programs to produce different types of expert systems. Such system .s are known as
expert system shells since they may be loaded with different knowledge bases.
3. Expert systems are capable of explaining how a particular conclusion was
reached, and why requested information is needed during a consultation. This is
important as it gives the user a chance to assess and understand the system's reasoning
ability, thereby improving the user's confidence in the system.

4. Expert systems use symbolic representations for knowledge (rules, networks,


or frames) and perform their inference through symbolic computations that closely
resemble manipulations of natural language. (An exception to this is the expert
system based on neural network architectures.)
S. Expert systems often reason with metaknowledge; that is, they reason with
knowledge about themselves, and their own knowledge limits and capabilities.

Background History

Expert systems first emerged from the research laboratories of a few leading U.S.
universities during the 1960 and 1970s. They were developed as specialized problem
328 Expert Systems Architectures Chap. 15

solvers which emphasized the use of knowledge rather than algorithms and general
search methods. This approach marked a significant departure from conventional
Al systems architectures at the time. The accepted direction of researchers then
was to use Al systems that employed general problem solving techniques such as
hill-climbing or means-end analysis (Chapter 9) rather than specialized domain knowl-
edge and heuristics. This departure from the norm proved to be a wise choice. It
led to the development of a new class of successful systems and special system
designs.
The first expert system to be completed was DENDRAL, developed at Stanford
University in the late 1960s. This system was capable of determining the structure
of chemical compounds given a specification of the compound's Constituent elements
and mass spectrometry data obtained from samples of the compound. DENDRAL
used heuristic knowledge obtained from experienced chemists to help constrain the
problem and thereby reduce the search space. During tests, DENDRAL discovered
a number of structures previously unknown to expert chemists.
As researchers gained more experience with DENDRAL, they found how
difficult it was to elicit expert knowledge from experts. This led to the development
of Meta-DENDRAL, a learning component for DENDRAL which was able to learn
rules from positive examples, a form of inductive learning described later in detail
(Chapters 18 and 19).
Shortly after DENDRA1. was completed, the development of MYCIN began
at Stanford University. MYCIN is an expert system which diagnoses infectious
blood diseases and determines a recommended list of therapies for the patient. As
part of the Heuristic Programming Project at Stanford, several projects directly
related to MYCIN were also completed including a knowledge acquisition component
called THEIRESIUS, a tutorial component called GUIDON, and a shell component
called EMYCIN (for Essential MYC[N) EMYCIN was used to build other diagnostic
systems including PUFF, a diagnostic expert for pulmonary diseases. EMYCIN
also became the design model for several commercial expert system building tools.
MYCIN's performance improved significantly over a several year period as
additional knowledge was added. Tests indicate that MYCIN's performance now
equals or exceeds that of experienced physicians. The initial MYCIN knowledge
base contained about only 200 rules. This number was gradually increased to more
than 600 rules by the early 1980s. The added rules significantly improved MYCIN's
performance leading to a 65% success record which compared favorably with experi-
,
enced physicians who demonstrated only an average 60% success rate (Lenat, 1984).
(An example of MYCIN's rules is given in Section 4.9, and the treatment of uncertain
knowledge by MYCIN is described in Section 6.5.)
Other early expert system projects included PROSPECTOR, a system that
assists geologists in the discovery of mineral deposits, and RI (aka XCON), a
system used by the Digital Equipment Corporation to select and configure components
of complex computer systems. Since the introduction of these early expert systems.
numerous commercial and military versions have been completed with a high degree
of success. Some of these application areas are itemized below.
Sec. 15.1 Introduction 329

Applications

Since the introduction of these early expert systems, the range and depth of applications
has broadened dramatically. Applications can now be found in almost all areas of
business and government. They include such areas as

Different types of medical diagnoses (internal medicine, pulmonary diseases.


infectious blood diseases, and so On)
Diagnosis of complex electronic and electromechanical systems
Diagnosis of diesel electric locomotion systems
Diagnosis of software development projects
Planning experiments in biology, chemistry, and molecular genetics
Forecasting crop damage
Identification of chemical compound structures and chemical compounds
Locatiin of faults in computer and communications systems
Scheduling of customer orders, job shop production operations, computer re-
sources for operating systems, and various manufacturing tasks
Evaluation of loan applicants for lending institutions
Assessment of geologic structures fro n dip meter logs
Analysis of structural systems for design or as a result of earthquake daniaee
The optimal configuration of components to meet given specihcanonc for a
complex system (like computers or manufacturing facilities)
Estate planning for minimal taxation and other specified goals
Stock and bond portfolio selection and management
The design of very large scale integration (VLSI) systems
Numerous military applications ranging from battlefield assessment to ocean
surveillance
Numerous applications related to space planning-and exploration
Numerous areas of law including civil case evaluation, product liabilit y , assault
and battery, and general assistance in locating different lass precedents
Planning curricula for students
Teaching students specialized tasks (like trouble shooting equipment faults)

Importance of Expert Systems

The value of expert systems was well established by the early 1980s. A number of
successful applications had been completed by then and they proved to he cost
effective. An example which illustrates this point well is the diagnostic s stem
developed by the Campbell Soup Company.
Campbell Soup uses large sterilizers or cookers to cook soups and other canned
330 Expert Systems Architectures Chap. 15•

products at eight plants located throughout the country. Some of the larger cookers
hold up to 68,000 cans of food for short periods of cooking time. When difficult
maintenance problems occur with the cookers, the fault must be found and corrected
quickly or the batch of foods being prepared will spoil. Until recently, the company
had been depending on a single expert to diagnose and cure the more difficult prob-
lems, flying him tothe site when necessary. Since this individual will retire in a few
years taking his expertise with him, the company decided to develop an expert
system to diagnose these difficult problems.
After some months of development with assistance from Texas Instruments,
the company developed an expert system which ran on a PC. The system has about
150 rules in its knowledge base with which to diagnose the more complex cooker
problems. The system has also been used to provide training to new maintenance
personnel. Cloning multiple copies for each of the eight locations cost the company
only a few pennies per copy. Furthermore, the system cannot retire, and its perfor-
mance can continue to be improved with the addition of more rules. It has already
proven to be a real asset to the company. Similar cases now abound in many diverse
organizations.

15.2 RULE-BASED SYSTEM ARCHITECTURES

The most common form of architecture used in expert and other types of knowledge-
based systems is the production system, also called the rule-based system. This
type of system uses knowledge encoded in the form of production rules, that is, if
then rules. We may remember from Chapter 4 that rules have an antecedent
or condition part, the left-hand side, and a conclusion or action part, the right-
hand side.

IF: Condition-1 and Condition-2 and Condition-3


THEN: Take Action-4

• IF:The temperature is greater than 200 degrees, and


The water level is low
THEN: Open the safety valve.

A&B&C&D—E&F

Each rule represents a small chunk of knowledge relating to the given domain
of expertise. A number of related rules collectively may correspond to a chain of
inferences which lead from some initially known facts to some useful conclusions.
When the known facts support the conditions in the rule's left side, the conclusion
or action part of the rule is then accepted as known (or at least known with some
degree of certainty). Examples of some typical expert system rules were described
in earlier sections (for example see Sections 4.9. 6.5, and 10.6).
Sec. 15.2 Rule-Based System Architectures 331

USER I EXPERT SYSTEM

rence Case
history
Input file

wledge Working
ase memory

rning
dtile
i^v

Figure 15.1 Componenls of a tpicaI expert sytcrn

Inference in production systems is accomplished by a process of chaining


through the rules recursively, either in a forward or backward direction, until a
conclusion is reached or until failure occurs. The selection of rules used in the
chaining process is determined by matching current facts against the doñain knowl-
edge or variables in rules and choosing among a candidate set of rules the ones
that meet some given criteria, such as specificity. The inference process is typically
carried out in an interactive mode with the user providing input parameters needed
to complete the rule chaining process.
'The main components of a typical expert system are depicted in Figure 15.1
The solid lined boxes in the figure represent components found in most systems
whereas the broken lined boxes are found in only a few such systems.

The Knowledge Base

The knowledge base contains facts and rules about some specialized knowledge
domain. An example of a simple knowledge base giving family relationships is
illustrated in Figure 15.2. The rules in this figure are given in the same LISP
format as those of Section 10.6 which is similar to the format given in the OPSS
language as presented by Bronston. Farrell. Kant. and Martin (1985). Each fact
and rule is identified with a name (al, a2.... . rt. r2, . . .). For ease in
reading. the left side is separated from the right by the implication symbol -.
Conjuncts on the left are given within single parentheses (sub!ists). and one or
more conclusions may follow the implication symbol. Variables are identified as a
symbol preceded by a question mark. It . should be noted that rules found in real
working systems may have many conjuncts in the LHS. For example, as many as
eight or more are not uncommon.
Expert Systems Architectures Chap. 15

((al (male bob))


(a2 (female sue))
(a3 (male Sam))
(a4 (male bill))
(a5 (female pam))
(ri ((husband ?x ?y()

(male ?x()
02 ((wife ?x ?y()

(female ?x))
03 ((wife x ?y))

(husband ?y ?x))
(r4 ((mother ?x ?y)
(husband ?z ?x((

(father ?z ?y)(
(rS ((father ?x ?y))
(wife ?z ?x))

(mother ?z ?y))
(r6 ((husband ?x ?y)(

(wife ?y ?x))
fri ((father ?x ?z(
(mother ?y 1z))

(husband ?x ?y))
(r8 ((father ?x ?z)
(mother ?y ?zfl

(wife ?y ?z((
(r9 ((father ?x ?y)
(father ?y 7z))
Figure 1.2 Facts and rules in a simple
(grandfather 'x ?z((( knowledge base.

In PROLOG, rules are written naturally as clauses with both a head and body.
For example, a rule about a patient's symptoms and the corresponding diagnosis
of hepatitis might read in English as the rule

IF: The patient has a chronic disorder, and


the sex of the patient is female, and
the age of the patient is less than 30, and
the patient shows condition A. and
test B reveals biochemistry condition C
THEN: -conclude the patient's diagnosis IS autoimmune-chronic-hepatitis.
Sec. 15.2 Rule-Based System Architectures 333

This rule could be written straightaway in PROLOG as

conclude(patient, diagnosis. autoimmunechronichepatitis) -


same(patien(, disorder, chronic),
same)patient, sex, female),
lessthan(patient. age, 30),
same(paflent, symptom-a, valuea),
same(patient, biochemistry, valuec)

Note that PROLOG rules have at most one conclusion clause.

The Inference Process

The inference engine accepts user input queries and responses to questions through
the I/O interface and uses this dynamic information together with the static knowledge
(the rules and facts) stored in the knowledge base. The kno ledge in the knowledge
base is used to derive conclusions about the current case or situation as presented
by the user's input.
The inferring process is carried Out recursively in three stages: ( I ) match. (2)
select, and (3) execute. During the match stage, the contents of orking memory
are compared to facts and rules contained in the knowledge base. When consistent
matches are found, the corresponding rules are placed in a conflict set. To find an
appropriate and consistent match, substitutions (instantiations) may be required.
Once all the matched rules have been added to the conflict set during a given
cycle, one of the rules is selected for execution. The criteria for selection may be
most recent use, rule condition specificity. (the number of conjuncts on the left).
or simply the smallest rule number. The selected rule is then executed and the
right-hand side or action part of the rule is then carried out. Figure 15.3 illustrates
this match-select-execute cycle.
As an example, suppose the working memory contains the two clauses

(father bob sam)


(mother sue sam)

When the match part of the cycle is attempted, a consistent match will be made
between these two clauses and rules r7 and r8 in the knowledge base. The match
is made by substituting Bob for ?x, Sam for ?z. and Sue for ?. Consequently.
since all the conditions on the left of both r7 and r8 are satisfied, these two rules
will be placed in the conflict set. If there are no other working memor y clauses to
match, the selection step is executed next. Suppose, for one or more ofthe selection
criteria stated above, r7 is the rule chosen to execute. The clause on the right side
of r7 is instantiated and the execution step is initiated. The execution step may
result in the right-hand clause (husband bob sue) being placed in working memory
or it may be used to trigger a message to the user. Following the execution step.
the match-select-execute cycle is repeated. -
Chap. 15
334 Expert Systems Architectures

e Working

It
1flImOrY1

1
conflict

I
2
MIW

I •X11t*
Figure 15.3 The production system
inference cycle.

As another example of matching, mppose the two facts (a6 (father sam bill))
and (a7 (father bill pam)) have been added to the knowledge base and the immediate
goal is a query about Pam's grandfather. When made, assume this query has resulted
in placement of the clause (grandfather ?x pam) into working memory. For this
goal to succeed, consistent substitutions must be made for the variables ?x and ?v
in rule r9 with a6 and a7. This will be the case if Sam and Bill are substituted for
?x and ?v in the subgoal left-hand conditions of r9. The right hand side will then
correctly state that Pam's grandfather is Sam. -
When the left side of a sequence of rules is instantiated first and the rules are
executed from left to right, the process is called forward chaining. This is also
known as data-driven inference since input data are used to guide the direction of
the inference process. For example, we can chain forward to show that when a
student is encouraged, is healthy, ard has goals, the student will succeed.

ENCOU RAGE D(Student) -. MOTIVATED(student)


MOTIVATED(student) & HEALTHY(student) .- WORI(HARD(student)
WORKHARD(student) & HASGOALS(student) -. EXCELL(student)
EXCELL(student) -. SIJCCEED(atud.nt)

On the other hand, when the right side of the rules is instantiated first, the
left-hand conditions become subgoals. These subgoals may in turn cause sub-subgpals
to be established, and so on until facts are found to match the lowest subgoal
conditions. When this form of inference takes place, we say that backward chaining
is performed. This form of inference is also known as goal-driven inference since
an initial goal establishes the backward direction of the inferring.
Sec. 15.2 Rule-Based System Architectures

For example, in MYCIN the initial goal in a consultation is "Does the patient
have a certain disease?" This causes subgoals to be established such as "are certain
bacteria present in the patient?" Determining if certain bacteria are present may
require such things as tests on cultures taken from the patient. This process of
setting up subgoals to confirm a goal continues until all the subgoals are eventually
satisfied or fail. If satisfied, the backward chain is established thereby confirming
the main goal.
When rules are executed, the resulting action may be the placement of some
new facts in working memory, a request for additional information from the user.
or simply the stopping of the search process. If the appropriate knowledge has
been stored in the knowledge base and all required parameter values have been
provided by the user, conclusions will be found and will be reported to the user.
The chaining continues as long as new matches can be found between clauses in
the working memory and rules in the knowledge base. The process stops when no
new rules can be placed in the conflict set.
Some systems use both forward and backward chaining, depending on the
type of problem and the information available. Likewise, rules may be tested exhaus-
lively or selectively, depending on the control structure. In MYCIN. rules in the
KB are tested exhaustively. However, when the number of rules exceeds a few
hundred, this can result in an intolerable amount of searching and matching. In
such cases, techniques such as those found in the RETE algorithm (Chapter 10)
may be used to limit the search.
Many expert systems must deal with uncertain information. This will be the
case when the evidenoe supporting a conclusion is vague, incomplete, or otherwise
uncertain. To accommodate Uncertainties, some form of probabilities, certainty fac-
tors, fuzzy logic, heuristics, or other methods must bp introduced into the inference
process. These methods were introduced in Chapters 5 and 6. The reader is urged
at this time to review those methods to see how they may be applied to expñ
systems.

xpining How or Why

['he explanation module provides the user with an explanation of the reasoning
process'when requested. This is done in response to a how que' or a why query.
To respond to a how query, the explanation module traces The chain of rules
fired during a consultation with the user. The sequence of rules that led tQ the conclusion
is then printed for the user in an easy to understand human-language style. This
permits the user to actually see the reasoning process followed by the system in
arriving at the conclusion. If the user does not agree with the reasoning steps presented.
they may be changed using the editor.
To respond to a why query, the explanation module must be able to explain
why certain information is needed by the inference engine to complete a step in
the reasoning process before it can proceed. For example, in diagnosina car that
will not start, a system might be asked why it needs to know the status of the
336 Expert Systems Architectures Chap. 15

distributor spark. In response, the system would reply ihat it needs this information
to determine if the problem can be isolated to the ignition system. Again, this
information allows the user to determine if the system's reasoning steps appear to
be sound. The explanation module programs give the user the important ability to
follow the inferencing steps at any time during the consultation.

Building a Knowledge Base

The editor is used by developers to create new rules for addition to the knowledge
base, to delete outmoded rules, or to modify existing rules in some way. Some of
the more sophisticated expert system editors provide the user with features not
found in typical text editors, such as the ability to perform some types of consistency
tests for newly created rules, to add missing conditions to a rule, or to reformat a
newly created rule. Such systems also prompt the user for missing information,
and provide other general guidance in the KB creation process.
One of the most difficult tasks in creating and maintaining production systems
is the building and maintaining of a consistent but complete set of rules. This
should be done without adding redundant or unnecessary rules. Building a knowledge
base requires careful planning, accounting, and organization of the knowledge struc-
tures. It also requires thorough validation and verification of the completed knowledge
base, operations which have yet to be perfected. An intelligent" editor can greatly
simplify We process of-building a knowledge base.
TEIRESIAS (Davis. 1982) is an example of an intelligent editor developed
to assist users in building a knowledge base directly without the need for an intermedi-
ary knowledge engineer. TEIRESIUS was developed to work with systems like
MYCIN in providing a direct user-to-system dialog. TEIRESIUS assists the user
in formulating, checking, and modifying rules for inclusion in the performance
program's knowledge base. For this. TEIRESIUS uses some metaknowledge, that
is, knowledge about MYCIN's knowledge. The dialog is carried out in a near English
form so that the user needs to know little about the internal form of the rules.

The I/O Interface

The input-output interface permits the user to communicate with the system in a
more natural way by permitting the use of simple selection menus or the use of a
restricted language which is close to a natural language. This means that the system
must have special prompts or a specialized vocabulary which encompasses the termi-
nology of the given domain of expertise. For example, MYCIN can recognize many
medical terms in addition to various common words needed to communicate. For
this, MYCIN has a vocabulary of some 2000 words.
Personal Consultant Plus, a commercial PC version of the MYCIN architecture,
uses menus and English prompts to communicate with the user. The prompts, written
in standard English, are provided by the developer during the system building stage
How andwhy explanations are also given in natural language form.
Sec. 15.3 NonproductiOfl System Architectures 337

The learning module and history file are not common components of expert
systems. When they are provided, they are used to assist in building and refining
the knowledge base. Since learning is treated in great detail in later chapters. no
description is given here.

15.3 NONPRODUCTION SYSTEM ARCHITECTURES

Other, less common expert system architectures (although no less important) are
those based on nonproduction rule-representation schemes. Instead of rules, these
systems employ more structured representation schemes like associative or semantic
networks, frame and rule structures, decision trees, or 'even specialized networks
like neural networks. In this section we examine some typical s y stem architectures
based on these methods.

Associative or Semantic Network Architectures

Associative or semantic network representation schemes were discussed in Chapter


7. Front description there, we know that an associative network is a network
made up of nodes connected by directed arcs. The nodes represent objects,i attributes.
concepts, or other basic entities, and the arcs, which are labeled, descr be the relation-
ship between the two nodes they connect. Special network links include the IS\
and HASPART links which designate an object as being a certain type of ohec
(belonging to a class of oh jects) and as being a subpart of another object. respeeitcl\
Associatiye network representations ale especially useful in depicting hk'rjr-
chical knowledge structures. where property inheritance is common. Objects belong-
ing to a class of other objects may inherit many of the characteristics of the class.
Inheritance can also he treated as a form of detault reasoning. This facilitates the
storage of information when shared by many objects as well as the inkrermeing
process.
Associative network representations are not a popular form of representation
for standard expert systems. More often, these network representations are used in
natural language or computer vision systems or in conjunction with some other
form of representation.
One expert system based on the use of an associative network representation
is CASNET (Causal Associational Network), which was developed at Ruteers Univer-
sity during the early 1970s (Weiss et al, 1978). CASNET is used to diagnose and
recommend treatment for glaucoma, one of the leading causes of blindness.
The network in CASNET is divided into three planes or types ol knowledge
as depicted in Figure 15.4. The different knowledge types are

Patient observations (tests, symptoms, other signs)


Pathophvsiokgical states
Disease cate tries

23-

338 Expert Systems Architectures Chap. 15

Angk 5IOre
Disease categories
Acute angle Ci,rsc angle
(

I/I \

Ciasstcation links / / F
1 !
\Cuulink
/ (I
(

pathophyttOCleel
/ / I
slates

c.^
Ang"

ii
Asioci.iional iinksti
I
1
I

LL__LI I
Patient observations pg,n
t7
i acuitF lop 45 mn,

TESTS

Figure 15.4 Levels of network description in CASNET. (From Artificial Intelli-


gence journal. Vol. II p. 148. 1978. By permission.)

Patient observations are provided by the user during an interactive session


with the system. The system presents menu type queries, and the user selects one
of several possible choices. These observations help to establish an abnormal condition
caused by a disease process. The condition is established through the causal network
Nonproduction System Architectures 339
Sec. 15.3

model as part of the cause and effect relationship relating symptoms and other
signs to diseases. -
Inference is accomplished by traversing the network, following the most plausi-
ble paths of causes and effects. Once a sufficiently strong path has been determined
through the network, diagnostic conclusions are inferred using classification tables
that interpret patterns of the causal network. These tables are similar to rule interpreta-
tions.
The CASNET system was never used much beyond the initial research stage.
At the time, physicians were reluctant to use computer systems in spite of performance
tests in which CASNET scored well.

Frame Architectures

Frame representations vere described in Chapter 7. Frames are structured sets of


closely related knowledge, such as an object or concept name, the object's main
attributes and their corresponding values, anti possibly some attached procodurs
(if-needed, if-added, if-removed procedures). The attribute. .s. and procedures
are stored in specified slots and slot facets of the frame. Individual frames are
usually linked together as a network much like the nodes in an associative network.
Thus, frames may have many of the features of associative networks, namely, property
inheritance and default reasoning. Several expert systems have been constructed
with frame architectures, and a number of building tools which create and manipulate
frame structured systems have been developed.
.. example of a frame-based system is the PIP system (Present Illness Program)
developed at M.I.T. during the late 1970s and 1980s (Szolovits and Pauker. 1978).
This system was used to diagnose patients using low cost, easily obtained information,
the type of information obtained by a general practitioner during an office examination.
The medical knowledge in PIP is organized in frame structures, where each
frame is composed of categories of slots with names such as

Typical findings
Logical decision criteria
Complimentary relations to other frames
Differential diagnosis
Scoring

The patient findings are matched against frames, and when a close match is
found, a trigger status occurs. A trigger is a finding that is so strongly related to a
disorder that the system regards it as an active hypothesis, one to be pursued further.
A spe.:al is-sufficient slot is used to confirm the presence of a disease when key
findings co-.elate with the slot contents.
340 Expert Systems Architectures Chap. 15

Decision Tree Architectures

Knoedge for expert systems may be stored in the form of a decision tree when
the knowledge can be structured in a top-to-bottom manner. For example, the identifi-
cation of objects (equipment faults, physical objects, diseases, and the like) an he
made through a decision tree structure. Initial and intermediate nodes in the tree
correspond to object attributes, and terminal nodes correspond to the identities of
objects. Attribute values for an object determine a path to a leaf node in the tree
which contains, the object's identification. Each object attribute corresponds to a
nontemiinal node in the tree and each branch of the decision tree corresponds to
all attribute value or set of vilues
A segment of a decision tree knowledge structure taken from an expert system
used to identify objects such as liquid chemical waste products is illustrated in
Figure 15.5 Patterson, 197). Each node in the tree corresponds to an identif'yin
attribute such as molecular weight, boiling point, burn test color, or solubilit y test
results. Each branch emanating froni a node corresponds to a value or ranee of
values for the attribute such as 20-37 degrees C, yellow, or nonsoluble in sulphuric
acid.
An identifica;ioii is made by traversing a path through the free (or network
until the path k-ads to a un i que leaf node which corresponds to the unknown object,
identity.
The knowledge base, which is the decision tree for an identification system.
can be constructed with a special tree-building ediior or with a learning module. In
either case, a set of the most discriminating attributes for the class of objects being
identified should he selected. Only those attributes that discriminate well among
different objects need be used. PriiiissihIe values for each of the attributes arc
grouped into separable sets, and each such set determines a branch front attribute
node to the next node.
New nodes and branches can be added to the tree when additional attributes

attrtxn me . 1

onie

Ves no Y_ no
oIubiiily tell

153 A scrneni of a decision 11cc


compound -38 compound-39 lure
Sec. 15.3 Nonproducilon System Architectures 341

are needed to further discriminate among new objects. As the system gains experience,
the values associated with the branches can be modified for more accurate results.

Blackboard System Architectures

Blackboard architectures refer to a special type of knowledge-based system which


uses a form of opportunistic reasoning. This differs from pure forward or pure
backward chaining in production systems in that either direction may be chosen
dynamically at each stage in the problem solution process. Other reasoning methods
(model driven, for example) may also be used.
Blackboard systems are composed of three functional components as depicted
in Figure 15.6.

1. There are a number of knowledge sources which are separate and independent
sets of coded knowledge. Each knowledge source may be thought of as a
specialist in some limited area needed to solve a given subset of problems.
The sources may contain knowledge in the form of procedures, rules, or other
schemes.
2. A globally accessible data base structure, called a blackboard, contains the
current problem state and information needed by the knowledge sources (input
data, partial solutions, control data, alternatives, final solutions). The knowledge
sources make changes to the blackboard data that incrementally lead to a
solution. Communication and interaction between the knowledge sources takes
place solely through the blackboard.
3. Control information may be contained within the sources, on the blackboard.
or possibly in a separate module. (There is no actual control unit specified as

Blackboard Knowledge sources

Control information

Figure 15.6 Components of blackboard systems.



342 Expert Systems Architectures Chap. 15

part of a blackboard system.) The control knowledge monitors the changes to


the blackboad and determines what the immediate focus of attention should
be in solving the problem.

H. Penny Nil (1986a) has aptly described the blackboard problem solving
.trategy through the following analogy.

Imagine a room with a large blackboard on which a group of experts are piecing
together a jigsaw puzzle. Each of the experts has some special knowledge about solvhnv
puzzles (e.g.. a border expert, a shape expert, a color expert. etc.). Each member
examines his or her pieces and decides if they will fit into the partially completed
puzzle. Those members having appropriate pieces go up to the blackboard and update
the evolving solution. The whole puzzle can be solved in complete silence with no
direct communication among members of the group. Each person is self-activating.
knowing when he or she can contribute to the solution. The solution evolves in this
incremental way with each expert contributing dynamicall y on an opportunistic basis,
that is, as the opportunity to contribute to the solution arises.
The objects on the blackboard are hierarchically organized into levels which facilitate
analysis and solution. Information from one level serves as input to a set ot knnns ledge
sources. The sources modify the knowledge and place it on the same or dnticrcnt
levels. -

The control information is used by the control module to determine the focus
of attention. This determines the next item to be processed. The focus of attention
can be the choice of knowledge sources or the blackboard objects or both. If both,
the control determines which sources to apply to which objects.
Problem solving proceeds with a knowledge source making changes to the
blackboard objects. Each source indicates the contribution it can make to the nc
solution state. Using this information, the control module chooses a focus of attention.
It the focus of attention is a knowledge source, a blackboard object is chosen as
the context of its invocation. If the fOCUS of attention is a blackboard object, a
knowledge source which can process that object is chosen. If the focus ol attention
is both a source and an object, that source is executed within that context.
Blackboard systems have been gaining some popularity recently. they hae
been applied to a number of different application areas. One of the first applications
was in the HEARSAY family of projects, which are speech understanding systems
(Reddy et al.. 1976). More recently, systems have been developed to analyze complex
scenes, and to model the human cognitive processes (Nh. 1986b).

Analogical Reasoning Architectures

Little work has been done in the area of analogical reasoning systems. Yet this is
one of the most promising areas for general problem solving. We humans make
extensive use of our previous experience in solving everyday problems. This is
because new problems are frequently similar to previously encountered problems.
Sec. 15.3 NonprOdUCtiOfl System Architectures

Expert systems based on analogical architectures solve new problems like


humans, by finding a similar problem solution that is known and applying the known
solution to the new problem, possibly with some modifications. For example, if
two even integers is even, we
we know a method of proving that the product of
can successfully prove that the product of two odd integers is odd through much
the same proof steps. Only a slight modification will be required when collecting
product terms in the result. Expert systems using analogical architectures will require
a large knowledge base having numerous problem solutions and other previously-
encountered situations or episodes. Each such situation should be stored as a Unit
in memory and be content-indexed for rapid retrieval. The inference mechanism
must be able to extend known situations or solutions to fit the current problem and
verify that the extended solution is reasonable. The author and one of his students
has built a small toy analogical expert system in LISP to demonstrate many of the
features needed for such systems (Patterson and Chu. 1988).

Neural Network Architectures

Neural networks are large networks of simple processing elements or nodes which
process information dynamically in response to external inputs. The nodes are simpli-
fied models of neurons. The knowledge in a neural network is distributed throughout
the network in the form of internode connections and weighted links which form
the inputs to the nodes. T he link weights serve to enhance or inhibit the input
stimuli values which are then added together at the nodes. If the sum of all the
inputs to a node exceeds some threshold value T, the node executes and produces
an output which is passed on to other nodes or is used to produce some output
response. In the simplest case, no output is produced if the total input is less than
T. In more complex models, the output will depend on a nonlinear activation function.
- Neural networks were originally inspired as being models of the human nervous
system. They are greatly simplified models to be sure (neurons are known to be
fairly complex processors). Even so, they have been shown to exhibit many "intelli-
gent" abilities, such as learning, generalization, and abstraction.
A single node is illustrated in Figure 15.7. The inputs to the node are the
values t, x. ..,X, which typically take on values of - I. 0. I. or real values
within the range (-1.1). The weights w 1 , w, .......,,. correspond to the synaptic
strengths of a neuron. They serve to increase or decrease the effects of the correspond-
n, serve as
ing x, input values. The sum of the prodiicts x w,, I = I. 2, . . . .
the total combined input to the node. If this sum is large enough to exceed the

Figure 15.7 Nlodci of j i g Ie neuron


344 Expert Systems Architectures Chap. 15

threshold amount T, the node fires, and produces an output y, an activation function
value placed on the node's output links. This output may then be the input to
other nodes or the final output response from the network.
Figure 15.8 illustrates three layers of a number of interconnected nodes. The
first layer serves as the input layer, receiving inputs from some set of stimuli. The
second layer (called the hidden layer) receives inputs from the first layer and produces
a pattern of Inputs to the third layer, the output layer. The pattern of outputs from
the final layer are the network's responses to the input stimuli patterns. Input links
to layer = 1.2.3) have weights w for = 1,2..... . n.
General multilayer networks having n nodes (number of rows) in each of m
layers (number of columns of nodes) will have weights represented as an n x in
matrix W. Using this representation, nodes having no interconnecting links will
have a weight value of zero. Networks consisting of more than three layers would,
of course, be correspondingly more complex than the network depicted in Figure
l5..
A neural network can be thought of as a black box that transforms the input
vector x to the output vector y where the transformation performed is the result of
the pattern of connections and weights, that is, according to the values of the weight
matrix W.
Consider the vector product

X * W=

There is a geometric interpretation for this product. It is equivalent to projecting


one vector onto the other vector in n-dimensional space. This notion is depicted in
Figure 15.9 for the two-dimensional case.
The magnitude of the resultant vector is given b
x * w = JX11w1 Cos o

whee JxJ denotes the norm or length of the vector x. Note that this product is
maximum when both vectors point in the same.directjon, that is, when 0 0. The

WI1 WI, WI,


LE VI

I,,

V.,
Figure 15.8 A mulillayer neuraf
layer 1 layer 2 layer 3 network
Sec. 15.3 NonproductiOn System Architectures 345

Figure 15.9 Vector muttiplicatictn is tike


vector projection.

product is a minimum when both point in opposite directions or when 0 = 180


degrees. This illustrates how the vectors in the weight matrix W influence the inputs
to the nodes in a neural network.

Learning pattern weights. The interconnections and weights W in the


neural network store the knowledge possessed by the network. These weights must
be preset or learned in some manner. When learning is used, the process may be
either supervised or unsupervised. In the supervised case, learning is performed by
repeatedly presenting the network with an input pattern and a desired output response.
The training examples then consist of the vector pairs (x,y'), where x is the input
pattern and y' is the desired output response pattern. The weights are then adjusted
until the difference between the actual output response y and the desired response
y' are the same, that is until D = y - y' is near zero.
One of the simpler supervised learning algorithms uses the following formula
to adjust the weights W.

W flW = Wok, + a * D X2
Ix'
where 0 < a < I is a learning constant that determines the rate of learning. When
the difference D is large, the adjustment to the weights W is large, but when the
output response y is close to the target response y' the adjustment will be small.
When the difference D is near zero, the training process terminates at which point
the network will produce the correct response for the given input patterns x.
In unsupervised learning, the training examples Consist of the input vectors x
only. No desired response y' is available to guide the system. Instead, the learning
process must find the weights w,, with no knowledge of the desired output response.
We leave the unsupervised learning description until Chapter IS where learning is,
covered in somewhat more detail.

A neural net expert system. An example of a simple neural network


diagnostic expert system has been described by Stephen Gallant (1988). This system
diagnoses and recommends treatments for acute sarcophagal disease. The system is
illustrated in Figure 15.10. From six symptom variables u,, u,, - , u6 , one of
two possible diseases can be diagnosed, u7 or u. From the resultant diagnosis,
one of three treatments, i, u 0, or u 1 I can then be recommended.
When a given symptom is present, the corresponding variable is given a value

346 Expert Systems Architectures Chap. 15

gr

Swollen Red Hair Dizziness Sensitive Placibin


feet ears loss aretha allergy
Li, U] U4 15 U5

Figure 15.14) A irnsp!e neural network eper1 systern Frrsru S L (iiILsnl ACM
Communications. Viii. 31, No. 2, p. 152. I95. By permission.

of + I (true). Negative symptoms are given an input value of - I (false), and unknown
symptoms are given the value 0. Input symptom values are multiplied h their
corresponding weights R',. Numbers within the nodes are initial bias weights o•,_
and numbers on the links are the other node input weights. When the sum of the
weighted products of the inputs exceeds 0. an output will be present on the correspond-
ing node output and serve as an input to the next layer of nodes.
As an example. suppose the patient has swollen feet (u + I) but not red
ears (u, = - I) nor hair loss (u 3 = - l. This gives a value of u 7 = + I (since
O+2( l)+(-2)(-- l)+(3)(— I) = I). suggesting the patient has superciliosis.
When it is also known that the other symptoms of the patient are false (o =
a6 = - I), it may be concluded that namatosis is absent (a 5 = - I), and
therefore that birambio (u 10 = +1) should be prescribed while placibin should not
be prescribed (u9 = - I). In addition, it will be found that posiboost should also
be prescribed (U 11 = +1).
The intermediate triangular shaped nodes were added by the training algorithm.
These additional nodes are needed so that weight assignments can be made which
permit the computations to work correctly for all training instances.
Knowledge Acquisition and Validation 347
Sec. 15.5

Deductions can be made just as well when only partial information is available.
For example, when a patient has swollen feet and suffers from hair loss, it may be
concluded the patient has superciliosis, regardless of whether or not the patient has
red ears. This is so because the unknown variable cannot force the sum to change
to negative.
A system such as this can also explain how or why a conclusion was reached.
For example, when inputs ,pd outputs are regarded as rules, an output can be
explained as the conclusion to a rule. If placibin is true, the system might explain
why with a statement such as

Piacibin is TRUE due to the following rule:

IF Placibin Alergy (ue) is FALSE, and


Superciliosis is TRUE

THEN Conclude Placibin is TRUE.

Expert systems based on neural network architectures can be designed to possess


many of the features of other expert system types, including explanations for how
and why, and confidence estimation for variable deduction.

15.4 DEALING WITH UNCERTAINTY

In Chapters 5 and 6 we discusseu me problem of dealing with uncertainty in knowledge-


based systems. We found that different approaches were possible including probabilis-
tic (Bayesian or Shafer-Dempster), the use of fuzzy logic, ad-hoc, and heuristic
methods. All of these methods have been applied in some form of system, and
many building tools permit the use of more than a single method. The ad-hoc
method used in MYCIN was described in Section 6.5. Refer to that section to
review how uncertainty computations are performed in a typical expert system.

15.5 KNOWLEDGE ACQUISITION AND VALIDATION

One of the most difficult tasks in building knowledge-based systems is in the acquisi-
tion and encoding of the requisite domain knowledge. Knowledge for expert systems
must be derived from expert sources like experts in the given field, journal articles,
texts, reports, data bases, and so on. Elicitation of the right knowledge can take
several man years and cost hundreds of thousands of dollars. This process is now
recognized as one of the main bottlenecks in building expert and other knowledge-
based systems. Consequently, much effort has been devoted to more effective methods
of acquisition and coding.
Pulling together and correctly interpreting the right knowledge to solve a set
348 Expert Systems Architectures Chap. 15

of complex tasks is an onerous job. Typically, experts do not know what specific
knowledge is being applied nor just how it is applied in the solution of a given
problem. Even if they do know, it Is likely they are unable to articulate the problem
solving process well enough to capture the low-level knowledge used and the inferring
processes applied. This difficulty has led to the use of Al experts (called knowledge
engineers) who serve as intermediaries between the domain expert and the system.
The knowledge engineer elicits information from the experts and codes this knowledge
into a form suitable for use in the expert system.
The knowledge elicitation process is depicted in Figure 15.11. To elicit the
requisite knowledge. a knowlege engineer conducts extensive interviews with domain
experts. During the interviews., the expert is asked to solve typical problems in the
domain of interest and to explain his or her solutions.
Using the knowledge gained front experts and other sources, the knowledge
engineer codes the knowledge in the form of rules or some other representation
scheme. This knowledge is then used to solve sample problems for review and
validation by the experts. Errors and omissions are uncovered and corrected, and
additional knowledge is added as needed. The process is repeated until a sufficient
body of knowledge has been collected to solve a large class of problems in the
chosen domain. The whole process may take as many as tens of person years.
Penny Nit, an experienced knowledge engineer at Stanford University, has
described some useful practices to follow in solving acquisition problems through
a sequence of heuristics she uses. They have been summarized in the book The
Fifth Generation by Feigenbaum and McCorduck (1983) as follows.

You can't be your on expert. By examining the process of your own expertise ou
risk becoming like the Centipede who got tangled up in her own legs and siopped
dead when she tried to figure out how she moved a hundred legs in harmony.
From the beginning, the knowledge engineer must count on throwing efforts away.
Writers make drafts, painters make preliminary sketches; knowledge engineers are no
different.

The problem must be well chosen. Al is a young field and isn't rcad to take on
evers problem the world has to offer. Expert systems work best when the problem is
well bounded, which is computer talk to describe a problem for which large amounts
of specialized knowledge may be needed, but not a general knowledge of the world.
If you want to do any serious application you need to meet the expert more than half
way; if he's had no exposure to computing, your job will be that much harder.
If none of the tools you normally use works, build a new one.
Dealing with anything but facts implies uncertainty. Heuristic knowledge is not hard
and fast and cannot be treated as factual. A weighting procedure has to be built into

Oon'ri SyOen Krtowiedge


expert engineer
KnowI
editor I
I
Figure 15.11 The knowledge
acquisition process.
Sec. 15.6 Knowledge System Building Tools 349

the expert system . to allow for expressions such as "1 strongly believe that ..." or
"The evidence suggests that......
A high-performance program, or a program that will eventually be taken over by the
expert for his own use, must have very easy ways of allowing the knowledge to be
modified so that new information can be added and out-of-date information deleted.
The problem needs to be a useful, interesting one. There are knowledge-based programs
to solve arcane puzzles, but who cares? More important, the user has to understand
the system's real value to his work.

When Nii begins a project, she first persuades a human expert to commit the
considerable time that is required to have the expert's mind mined. Once this is
done, she immerses herself in the given field, reading texts, articles, and other
material to better understand the field and to learn the basic jargon used. She then
begins the interviewing process. She asks the expert to describe his or her tasks
and problem solving techniques. She asks the expert to choose a moderately difficult
problem to solve as an example of the basic approach. This information is then
collected, studied, and presented to other members of the development team so
that a quick prototype can be constructed for the expert to review. This serves
several purposes. First, it helps to keep the expert in the development loop and
interested. Secondly, it serves as a rudimentary model with which to uncover flaws
and other problems. It also helps both expert and developer in discovering the real
way the expert solves problems. This usuall y leads to a repeat of the problem
solving exercise, but this time in a step-by-step walk through of the sample problem.
Nil tests the accuracy of the expert's explanations by observing his or her behavior
and reliance on data and other sources of information. She is concerned more with
the manipulation of the knowledge than with the actual facts. Keeping the expeit
focused on the immediate problem requires continual prompting and encouragement.
During the whole interview process Nii is mentally examining alternative ap-
proaches for the best knowledge representation and inferencing methods to see how
well each would best match the expert's behavior. The whole process of elicitation,
coding, and verification may take several iterations over a period of several months.
Recognizing the acquisition bottleneck in building expert systems, researchers
and vendors alike have sought new and better ways to reduce the burden and reliance
placed on knowledge engineers, and in general, ways to improve and speed up the
development process. This has led to a niunber of sophisticated building tools which
we consider next.

15.6 KNOWLEDGE SYSTEM BUILDING TOOLS

Since the introduction of the first successful expert systems in the late 1970s. a
large number of building tools have been introduced, both b y the academic commuttty
and industry. These tools range from hi g h level programming languages to intelligent
editors to complete shell environment systems. A number of commercial products
350 Expert Systems Architectures Chap. 15

are now available ranging in pi ice from a few hundred dollars to tens of thousands
of dollars. Some are capable of running on medium size PCs while others require
larger systems such as LISP machines. minis, or even main frames.
When evaluating building tools for expert system development The developer
should consider the following features and capabilities that may be offered in systems.

I. Knowledge representation methods available (rules, logic-based network struc-


tures, frames.w ith or without inheritance, multiple world views, object oriented,
procedural, and the methods offered for dealing with uncertainty, if any).
2. Inference and control methods available (backward chaining, forward chainini,
mixed lorward and backward chaining, blackboard architecture approach, logic
driven theorem prover, object-oriented approach, the use of attached procedures.
t y pes of meta control capabilities, forms of uncertainty management, hypotheti-
cal reasoning, truth maintenance management, pattern matching with or without
mdc sing, user functions permitted, linking options to modules written in other
Lirigijages. and linking options to other sources of information such as data

3. User interface characteristics (editor flexibility and ease of USC, use of menus,
use of pop-up windows, developer provided text capabilities for prompts and
help messages, graphics capabilities, consistency checking for newly entered
knowledge. explanation of how and why capabilities, system help facilities,
screen formatting and color selection capabilities, network representation of
knowledge base, and forms of compilation available, batch or interactive).
4. General system characteristics and supp: ri available (types of applications
with which the system has been success tflly used, the base programming
language in which the system was written, the types of hardware the systems
are supported on, general utilities available, debugging facilities, interfacing
flexibility to other languages and databases, vendor training availability and
cost, strength of software suppdrt, and company reputation).

In the remainder of this section, we describe a few representative building


tool systems. For a more complete picture of available systems, the reader is referred
to other sources.

Personal Consultant Plus

A family of Personal Consultant expert system shells was developed by Texas Instru-
ments. Inc. (TI) in the early 1980s. These shells are rule-based building tools patterned
after the MYCIN system architecture and developed to run on a PC as well as on
larger systems such as the TI Explorer. The largest and most versatile of the Personal
Consultant family is Personal Consultant Plus.
Personal Consultant Plus permits the use of structures called frames (different
Sec. 15.6 Knowledge System Building Tools 351

from the frames of Chapter 7) to organize functionally related production rules


into subproblem groups. The frames are hierarchically linked into a tree structure
which is traversed during a user consultation. For example, a home electrical appliance
diagnostic system might have subframes of microwave cooker, iron, blender, and
toaster as depicted in Figure 15.12.
When diagnosing a problem, only the relevant frames would be matched,
and the rules in each frame would address only that part of the overall problem. A
feature of the frame structure is that parameter property inheritance is supported
from ancestor to descendant frames.
The system supports certainty factors both for parameter values and for complete
rules. These factors are propagated throughout a chain of rules used during a consulta-
tion session, and the resultant certainty values associated with a conclusion are
presented. The system also has an explanation capability to show how a conclusion
was reached by displaying all rules that lead to a conclusion and why a fact is
needed to fire a given rule..
.'n nterctive dialog is c 4 e htween user and system during a consultation
.es, j Oti . The . :.wm pi umpt . . ith English statements prov iJ v rhe hvel..
oper. Menu windows with selectable colors provide the user parameter value selec-
tions. A help facility is also available so the developer can provide explanations
and give general assistance to the user during a consultation session.
A system editor provides a number of helpful facilities to build, store, and
print a knowledge base. Parameter and property values, textual prompts, help mes-
sages, and debug traces are all provided through the editor. In addition, user defined
functions written in LISP may be provided by a developer as part of a rule's condition,
and/or actions. The system also provides access to other sources of infonnatior,
such as dBase II and dBase III. An optional graphics package is also available at
extra cost.

Electrical
appliance

microwave I II
Iron I Food Toasterr
cooker I blender I

Mechanical Electrical
I I Figure 15.12 Hierarchical frame
Isubsystem
structure in PC Plus.

'fr
352 Expert Systems Architectures Chap. 15

Radian Rulemaster

The Rulemaster system developed in the early 1980s by the Radian Corporation
was written in C language to run on a variety of mini- and microcomputer systems.
Rulemaster is a rule-based building tool which consists of two main components:
Radial, a procedural, block structured language for expressing decision rules related
to a finite state machine, and Rulemaker, a knowledge acquisition system which
induces decision trees from examples supplied by an expert. A program in Rulemaster
consists of a collection of related modules which interact to affect changes of state.
The modules may contain executable procedures, advice, or data. The building
system is illustrated in Figure 15.13.
Rulemaster's knowledge can be based on partial certainty using fuzzy logic
or heuristic methods defined by the developer. Users can define their own data
types or abstract types much the same as in Pascal. An explanation facility is provided
to explain its chain of reasoning. Programs in other languages can also be called
from Rulemaster.
One of the unique features of Rutemaster is the Rulemaker component which
has the ability to induce rules from examples. Experts are known to have difficulty
in directly expressing rules related to their decision processes. On the other hand,
they can usuall y come up with a wealth of examples in which they describe typical
solution steps. The examples provided by the expert offer a more accurate wa's in

,cti,.ri tiIe

Ruleujake,

User r.jc1ia F !y5 rc( pd Rulernaste,


by njto,naker utitites

Assebje,

Expert system of
hierarchical
radial dulea -
'-S.-
Comptetiori
---p-
txcer,;A n,c-rams -

IPr,.jscot Ro,s Ftgtire 15.13 Rutetiiutcr hutkJin


e,1uprflert Darabasc system.
Sec. 15.6 Knowledge System Building Tools 353

which the problem solving process is carried out. These examples are transformed
into rules by Rulemaker through an induction process.

KEE (Knowledge Engineering Environment)

KEE is one of the more popular building tools for the development of larger-scale
systems. Developed by lntellicorp, this system employs sophisticated representation
schemes structured around frames called units. The frames are made up of slots
and facets which contain object, attribute values, rules, methods, logical assertions,
text, or even other frames. The frames are organized into one or more knowledge
bases in the form of hierarchical structures which permit multiple inheritance down
hierarchical paths. Rules, procedures, and object oriented representation methods
are also supported.
Inference is carried out through inheritance, forward-chaining, backward-chain-
ing, or a mixture of these methods. A form of hypothetical reasoning is also provided
through different viewpoints which may be explored concurrently. The viewpoints
represent different aspects of a situation, views of the same situation taken at different
times, hypothetical Situations, or alternative courses of action. This feature permits
a user to compare competing courses of action or to reason in parallel about par1i*1
solutions based on different approaches.
KEE's support environment includes a graphics-oriented debugging pack:ige
flexible end-user interfaces using 'windows, menus, and an explanation capability
with graphic displays which can show inference chains. A graphics-based simulation
package called SimKit is available at additional cost.
KEE has been used for the development of intelligent user interfaces. genetics.
diagnosis and monitoring of complicated systems.., planning, design, process control,
scheduling, and simulation. The system is LISP based, developed for operation on
systems such as Symbolics machines. Xerox I lOOs. or TI Explorers. Systems can
also be ported to open architecture machines which support Common LISP without
extensive modification.

OPS5 System

The OPS5 and other OPS building tools were developel at Cniegie Mellon University
in conjunction with DEC during the late 1970s. This ssm was developed to
build the RIiXCON expert system which configures Vax arid ;r DEC minicomputer
systems. The system is used to build rule-based production s y stems h'ch use forward
chaining in the inference process (backward and mixed chaining is a'so possible).
The system was written in C language to run on the DEC Vax and other minicomputers.
It uses a sophisticated method of indexing rules (the Rete algorithm) to reduce the
matching times during the match-select-execute cycle. Examples of OPS5 rules
were given above in Section 15.2, and a description of the Pete match al?orithm
was given in Section 10.6.

24
354
Expert Systems Architectures Chap. 15

15.7 SUMMARY

Expert and other knowledge-based Systems are usually composed of 4t least a knowl-
edge base, an inference engine, and some form of user interface. The knowledge
base, which is separate from the inference and control components, contains the
expert knowledge coded in some form such as production rules, networks of frames
or other representation scheme. The inference engine manipulates the knowledge
structures in the knowledge base to perform a type of symbolic reasoning and draw
useful conclusions relating to the current task. The user interface provides the means
for dialog between the user and system. The user inputs commands, queries, and
responses to system messages, and the system, in turn, produces various messages
for the user. In addition to these three components, most systems have an editor
for use in creating and modifying the knowledge base structurea and an explanaton
module which provides the user with explanations of how a conclusion was reached
or why a piece of knowledge s needed. /-. te" s—terns also have some learning
capability and a ca . history file with whie . ,Omr'' " lonsultatio,'s.
A variety of expert system architectureshave been construied including rule-
based systems, frame-based systems, decision tree (discrimination network) systems,
analogical reasoning systems, blackboard architectures, theorem proving systems,
and even neural network architectures. These systems may differ in the direction
of rule chaining, in the handling of uncertainty, and in the search and pattern matching
methods employed. Rule and frame based systems are by far the most popular
architectures used.
Since the introduction of the first expert systems in the late 1970s, a number
of building tools have been developed. Such tools may be as unsophisticated as a
bare high level language or as comprehensive as a complete shell development
environment. A few representative building tools have been described and some
general characteristics of tools for developers were given.
The acquisition of expert knowledge for knowledge-based systems remains
one of the main bottlenecks in building them. This has led to a new discipline
called knowledge engineering. Knowledge engineers build systems by eliciting knowl-
edge from experts. coding that knowledge in an appropriate form, validating the
knowledge, and ultimately constructing a s y stem using a variet y of building tools.

EXERCISES

15.1. What are the main advantages in keeping the knowledge base separate from the
control module in knowledge-based systems?
15.2. Why is it important that an expert system be able to explain the why and how
questions related to a problem solving session?
15.3. Give an example of the use of meiakriowledge in expert systems inference.
Chap. 15 Exercises 355

15.4. Describe and compare the different types of problems solved by four of the earliest
expert systems DENDRAL. MYCIN. PROSPECTOR, and RI.
15.5. Identify and describe two good application areas for expert systems within a university
environment.
15.6. How do rules in PROLOG differ from general production system rules?
15.7. Make up a small knowledge-base of facts and rules using the same syntax as that
used in Figure 15.2 except that they should relate to an office working environment.
15.8. Name four different types of selection criteria that might be used to select the most
relevant rules for firing in a production system.
15.9. Describe a method in which rules could be grouped or organized in a knowledge
base to reduce the amount of search required during the matching pars of the inference
cycle.
15.10. Using the knowledge base of Problem 1.7, simulate three match-select-execute cycles
for a query which uses several rules andior facts.
15.11. Explain the difference between forward and backward chaining and under what condi-
tions each would be best to use for a given set of problems.
15,12. Under what conditions would it make sense to use both for\\
Give an example where both are used.
15.13. Explain why you think associative networks were never very popular forms of knowl-
edge representations in expert systems architectures.
15.14. Suppose you are diagnosing automobile engines using a system having a frame type
of architecture similar to PIP. Show how a trigger condition might be satisfied for
the distributor Ignition system when it is learned that the spark at all spark plugs is
weak.
15.15. Give the advantages of expert system architectures based on decision trees over
those of production rules. What are the main disadvantages?
15.16. Two of the main problems in validating the knowledge contained in the knowledge
bases of expert systems are related to completeness and consistency, that is, whether
or not a system has an adequate breadth of knowledgeto solve the class of problems
it was intended to solve and whether or not the knowledge is consistent. Is it easier
to check decision tree architectures or production rule systems for completeness and
consistency? Give supporting information for your conclusions.
15.17. Give three examples of applications for which blackboard architectures are well suited.
15.18. Give three examples of applications for which the use of analogical architectures
would be suitable in expert systems.
15.19. Consider a simple fully connected neural network containing three input nodes and
a single output node. The inputs to the network are the eight possible binary patterns
000. 001 .....Ill. Find weights u for which the network can differentiate between
the inputs by producing three distinct outputs.
15.20. For the preceding problem, draw projection vectors on the unit circle for the eight
different inputs using the weights determined there.
15.21. Explain how uncertaint y is propagated through a chain of rules during a consultation
with an expertsystem which is based on the MYCIN architecture.
15.22. Select a problem domain that requires some special expertise and consult with an

356 Expert Systems Architectures Chap. 15

expert in the domain to learn how he or she solves typical problems. After collecting
enough knowledge to solve a small subset of problems, create rules which could be
used in a knowledge base to solve the problems. Test the use of the rules on a few
problems which have been suggested by the expert and then get his other confirmation.
15.23. Relate each of the heuristics given by Penny Nii in Section 15.5 to a real expert
system solving problem.
15.24. Discuss how each of the features of expert system building tools given in Section
15.6 can affect the performance of the systems developed.
15.25. Obtain a copy of an expert system building tool such as Personal Consultant Plus
and create an expert system to diagnose automobile engine problems. Consult with
a mechanic to see if your completed system is reasonably good.

You might also like