Natural Language Processing: Perception, Communication, and Expert Systems
Natural Language Processing: Perception, Communication, and Expert Systems
A en
Natural Language
Processing
227
228 Natural Language Processing Chap. 12
12.1 INTRODUCTION
a noun and some other part of the Sentence. Conjunctions join words or groups of
words together, and interjections are used to express strong feelings apart from the
rest of the sentence.
Phrases are made up of words but act as a single unit within a sentence.
These form the building blocks for the syntactic structures we consider later.
Syntactic. This knowledge relates to how words are put together or structured
to form grammatically correct sentences in the language.
World. World knowledge relates to the language a user must have in order
to understand and carry on a conversation. It must include an understanding of the
other person's beliefs and goals.
The approaches taken in developing language understanding programs generally
follow the above levels or stages. When a string of words has been detected, the
230 Natural Language Processing Chap. 12
sentences are parsed or analyzed to determine their structure (syntax) and grammatical
correctness. The meanings (semantics) of the sentences are then determined and
appropriate representation structures created for the inferencing programs. The whole
process is a series of transformations from the basic speech sounds to a complete
set of internal representation structures.
Understanding written language or text is easier than understanding speech.
To understand speech, a program must have all the capabilities of a text understanding
program plus the facilities needed to map spoken sounds (often corrupted by noise)
into textual form. In this chapter, we focus on the easier problem, that of natural
language understanding from textual input and information processing. The process
of translating speech into written text is considered in Chapter. 13 under Pattern
Recognition and the process of generating text is considered later in this chapter.
Essentially, there have been three different approaches taken in the development of
natural language understanding programs. (1) the use of keyword and pattern match-
ing, (2) combined syntactic (structural) and semantic directed analysis, and (3).compar-
ing and matching the input to real world situations (scenario representations).
The keyword and pattern matching approach is the simplest. This approach
was first used in programs such as ELIZA described in Chapter 10. It is based on
the use of sentence templates which contain key words or phrases such as -
my mother ," "I am ___________. and, "1 don't like ," that
are matched against input sentences. Each input template has associated with it
one or more output templates, one of which is used to produce a response to the
given input. Appropriate word substitutions are also made from the input to the
output to produce the correct person and tense in the response (I and me into you
to give replies like "Why are you "). The advantage of this approach is
that ungrammatical, but meaningful sentences are still accepted. The disadvantage
is that no actual knowledge structures are created; so the program does not really
understand.
The third approach is based on the use of structures such as the frames or
scripts described in Chapter 7. This approach relies more on a mapping of the
input to prescribed primitives which are used to build larger knowledge structures.
It depends on the use of constraints imposed by context and world knowledge to
develop an understanding of the language inputs. Prestored descriptions and details
for commonly occurring situations or events are recalled for use in understanding a
new Situation. The stored events are then used to fill in missing details about the
current scenario. We will be returning to this approach later in this chapter. Its
advantage is that much of the computation required for syntactical analysis is bypassed.
The disadvantage is that a substantial amount of specific, as well as general world
knowledge must be prestored.
The second approach is one of thq most popular approaches currently being
Sec. 12.3 Grammars and Languages 231,
used and is the main topic of the first part of this chapter. With this approach.
knowledge structures are constructed during a syntactical and semantical analysis
of the input sentences. Parsers are used to analyze individual sentences and to
build structures that can be used directly or transformed into the required knowledge
formats. The advantage of this approach is in the power and versatility it provides.
The disadvantage is the large amount of computation required and the need for
still further processing to understand the contextual meanings of more than one
sentence.
G = (v.v.s,p)
xv: — xw:
where x. y. :, and w are strings from v. This rule states that v should be rewritten
as w in the context of x to z where x and can be any string including the empty
string e.
As an example of a simple grammar G, we choose one which has component
pans or constituents from English with vocabulary Q given by
P: S—.NPVP
NP— ART N
VP V NP
N - boy I popsicle I frog
V -ate i kissed I flew
ART—I' the I a
where the vertical bar indicates alternative choices.
S is the initial symbol (for sentence here), NP stands for noun phrase.. VP
stands for verb phrase. N stands for noun. V is an abbreviation for verb, and ART
stands for article.
The grammar (i defined above generates only a small fraction of English,
but it illustrates the general concepts of generative grammars. With this G. sentences
such as the following can he generated.
S - NP VP
ART N VP
the N VP
-. the boy VP
the boy V NP
-. the boy ate NP
-. the boy ate ART N
- the boy ate a N
-* the boy ate a popsicle
It should be clear that a grammar does not guarantee the generation of meaningful
sentences, only that they are structurally correct. For example, a gramatically correct,
but meaningless sentence like "The popsicle flew a frog" can be generated with
this grammar.
We learn a language by learning its structure and not by memorizing all of
the sentences we have ever heard, and we are able to use, the language in a variety
of ways because of this familiarity. Therefore, a useful model of language is one
which characterizes the permissible structures through the generating grammars.
Unfortunately, it has not been possible to formally characterize natural languages
with a simple grammar. In other words, it has not been possible to classify natural
languages in a mathematical sense as we did in the example above. More constrained
Sec. 12.3 Grammars and Languages 233
languages (formal progrmmiflg languages) have been classified and studied through
the use of similar grammars. including the Chomsky classes of languages (1965).
Structural Representations
NP VP
I
ART N
A more extensive English grammar than the one given above can be obtained
with the Wition of other constituents such as prepositional phrases PP. adjectives
ADJ, determiners DEl. adverbs ADV, auxiliary verbs AUX. and so on. Additional
rewrite rules permitting the use of these constituents could include some of the
following:
PP - PREP NP
VP-I. V ADV
v p V PP
VP-. V NP PP
VP-. AUX V NP.
DET-. ART ADJ
DET-.ART
These extensions broaden the types of sentences that can he generated by permitting
the added constituents in sentence forms such as
Tranfotietion& Grammars
""^
I
S
NP Vp NP 'VP
V NP Sue VERB pp
(a) (b)
Figure 12.2 Sti-ucturei for (a) active and (b) passive voice.
grammatical constituent parts. This reveals the surface structure of the sentence,
the way the sentence is used in speech or in writing. This structure can be transformed
into another one where the deeper semantic structure of the sentence is determined.
Application of the transformation rules can produce a change from passive
voice to active voice, change a question to declarative form, and handle negations.
subject-verb agreement, and so on For example, the structure in 12.2(b) could be
transformed to give the same basic structure as that of 12.2(a) as is illustrated in
Figure 12.3.
Transformational grammars were never widely adopted as computational models
of natural language. Instead, other grammars. including case grammars, have had
more influence on such models.
Case Grammars
A case relates to the semantic role that a noun phrase plays with respect to verbs
and adjectives. Case grammars use the functional relationships between noun phrases
and verbs to reveal the deeper case of a sentence. These grammars use the fact
V NP PASSIVE
Joe
I
kiss
I
Sue
Iby Figure 12.3 Passive voice transformed
to active voice.
Sec. 12.3 Grammars and Languages 237
that verbal elements provide the main source of structure in a sentence since they
describe the subject and objects.
In inflected languages like Latin, nouns generally have different ending forms
for different cases. In English these distinctions are less pronounced and the forms
remain more constant for different cases. Even so, they provide some constraints.
English cases are the nominative (subject of the verb). possessive (showing possession
or ownership). and objective (direct and indirect objects). Fillmore (1968. 1977)
revived the notion of using case to extract the meanings of sentences. He extended
the transformational grammars of Chornsky by focusing more on the semantic aspects
of a sentence.
In case gramniars, a sentence is defined as being composed of a proposition
P. a tensele.ss set of relationships among verbs and noun phrases and a modality
constituent M. composed of mood, tense, aspect. negation, and so on. Thus, a
sentence can he represented as
S—.M + P
P—*Cl+C2+. . .+Ck.
The number of cases suggested by Fillmore were relatively few. For example,
the original list contained only some six cases. They relate to the actions performed
by agents, the location and direction of actions, and so on. For example, the case
of an instigator of an action is the agenhive for agent), the case of an instrument or
object used in an action is the instrumental, and the case of the object receiving
the action or change is the objective. Thus, in sentences like "The soldier struck
the suspect with the rifle butt" the soldier is the agentive case. the suspect the
objective case, and the rifle butt the instrumental case. Other basic cases include
dative (an animate entity affected by an action). factitive (the case of the object or
of being that which results from an event), and locative (the case of location of
the event).. Additional tases or substitutes for those given above have since been
introduced, including beneficiary, source, destination, to or from, goal, and time.
Case frames are provided for verbs to identify allowable cases. They give
the relationships which are required and those which are optional. For the above
sentence, a case frame for the verb struck might he
This may be interpreted as stating that the verb struck must occur in sentences
with a noun phrase in the objective case and optionally (parentheses indicate optional
use) with noun phrases in the agentive and instrumental cases.
A tree representation for a case grammar will identify the words by their
modality and case. For example, a case grammar tree for the sentence "Sue did -
not take the car" is illustrated in Figure 12.4.
238 Natural Language Processing Chap. 12
s
/f\
. "'-^
Declarative V Cl C2
negation
pas t
Figure 12.4 Case grammar tree
take Se the car representation.
To build a tree structure like this requires that a word Lexicon with sufficient
information be available in which to determine the case of sentence elements.
Systemic Grammars
Classification of units. Units are classified by the role they play at the
next higher level. For example, the verbal serves as the predicate, the nominal
serves as the subject or complement, and so on.
declarative
independen.— imperative yes-no
clause---- - interrogative-1 wli-
dependen'
Semantic gram mars encode semantic information into a syntactic grammar. They
use context-fre e rewrite rules with nonterminal semantic constituents. The constituents
are categories, or metasymbols such as attribute, object, present (as in display),
and ship, ratl ter than NP, VP. N. V. and so on. This approach greatly restricts the
range of sen tences which can be generated and requires a large number of rewrite
rules.
Sema tic grammars have proven to be successful in limited applications includ-
ing LIFER., a data base query system distributed by the Navy which is accessible
lough A RPANET (Hendrix et al.. 1978), and a tutorial system named SOPHIE
which is 1 .jsed to teach the debugging of circuit faults. Rewrite rules in these systems
cssentialy take the forms
In the LIFER system, there are rules to handle numerous forms of wh-queries
such as
What is the name and location of the carrier nearest to New York
Who commands the Kennedy
240 Natural Language Processing Chap. 12
where print matches <PRESENT>, length matches <AT1'RIBUTE>, and the Enter-
prise matches <SHIP>. Other typical lexicon entries that can match <ATTRIBUTE>
include CLASS, COMMANDER. FUEL. TYPE. BEAM. LENGTH, and so on.
LIFER can also accommodate elliptical (incomplete) inputs. Given the query
is the length of the Kennedy?" a subsequent query consisting of the abbreviated
form "of the Enterprise?" will elicit a proper response (see also the third and
fourth, example queries above).
Semantic grammars are suitable for use in systems with restricted grammars
since computation is limited. They become unwieldy when used with general purpose
language understanding systems, however.
Before the meaning of a sentence can be determined, the meanings of its constituent
parts must be established. This requires a knowledge of the structure of the sentence.
the mcanings of individual words and how the words modify each other. The process
of determining the syntactical structure of a sentence is knowh as parsing.
Pa;sing is the process of arrzag a sentence by taking it apar' word-by-
word and deterrniairg its structur from its constituent parts and subsatis. The
structure of a sentence can be represented with a.syntactic tree or a list as duwribcd
in the previous section. The parsing process is basically the inverse of the sentence
g eneration process since it involves finding a grammatical sentence structue from
an input string. When given an input string, the lexical pans or terms (root words)
must first he identified by type, and then the role they play in a sentence must he
determined. These parts can then be combined successively into larger units until a
complete tree structure has been completed.
To determine the meaning f a word. a parser must have access to a lexicon.
When the parser selects a word from the input stream it locates the word in the
lexicon and obtains the word's possible function and other features, including semantic
information. This information is then used in building a tree or other representation
structure. The general parsing process is illustrated in Figure 12.5.
Sec. 12.4 Basic Parsing Techniques
241
Input
string Parsereaen tat ion
structure
Lexicon
Figure 12.5 Parsing an input to create
an output structure
The Lexicon
orange Adjective
Noun {3s)
the Determiner (Is. jp
to Prcposuon
we Pronoun
Case, subjective
yelIo Adjective
Figure
Figure 12.6 Typical entries in a lexicon
17—
7--
Natural Language Processing Chap. 12
242
categories
and so on). and all ssords contained within the lexicon listed within the
to hich the belong
The organization and entries of a lexicon will vary from one implementation
to another, but the are usually made up of variable length data structures such as
lists or records arranged in alphabetical order. The word order may also he given
in tCrflis of usage frequency so that frequently used words Uke a, the, and an will
appear at the beginning of the list facilitating the search.
Access to the w!rds in.i he facilitated by indexing. with binary searches,
hd h i n. or combinations of these methods A lexicon may also he partitioned to
eneral. frequentiv used words and domain specific
contain a base lexicon set of g
component' of words
Transition Networks
ed to represent formal and natural
Transition networks are another popular method us
are based on the application of directed graphs (digraphs)
language structures The y
and finite state automata. A transition network consistS of a number of nodes and
labeled arcs. The nodes represent different states in traversing a sentence. and the
arcs represent rules or test conditions required to make the transition from one
state to the-next. A path through a transition network corresponds to a permissible
sequence of word types for a given grammar. Thus, if a transition network can be
successfully' traversed, it will have recognized a permissible sentence structure. For
example, a network used to recognize a sentence consisting of a determiner, a
noun and a verb ("The child runs'') would he represented by the three-node graph
as follows.
noun verb
deierm ner
3N4
determiner djetne
pronoun '\ \, ,' noun
NP (,N I_ N?) N3
proper noun
Figure 12.7 A noun phra sce S flieD C ot a Iran o fin neiw ork
Words in the input sentence are replaced with their syntactic categories and those
in turn are replaced by constitutents of the same or smaller size until S has been
rewritten or until failure occurs.
The reader may have noticed the close similarity between rewrite rules and Horn
clauses, especially when the Horn clauses are written in the form of PROLOG
Sec. 12.4 Basic Parsing Techniques
245
adjective
article noon verb article noon
NI N2 N3 N5 N6 N7
aux verb
verb
N2 N5 INS
S_
noun
verb
The variables A. B. and C in this statement represent lists of words. The argument
A is the whole list of words to be tested as a sentence, and C is the list of remaining
words, if any. Similar assumptions hold for A. B, and C in the noun and verb
phrase conditions respectively.
Rule definitions which rewrite the noun phrases and verb phrases must also
be defined. Thus, an NP may be defined with statements such as the following:
Like the
the above rule, these rules state that (I) a noun phrase can be either an article
which consists of a list A and remaining list B (if any) and a noun which is a list
B and remaining list. C or (2) a noun consisting of the list A with remaining list B
(if any). Similarly, a verb phrase may be defined with rules like the following:
Natural Language Processing Chap. 12
246
verbPhrase(A,B verb(A,B).
verbPhrase(A,C) = verb(AB). nounphrase(BC).
vorbPhrase(AC) : = verb(A,B), prepositionPhras)B.C)
Definitions for the prepositional phrase as well as lexical terminals must also
he given. These can include the following:
prepositionhlatXi.X).
a rt deC Ia I . X C
a rticle( theXI,X).
nounoldogi XIX).
noun(lcow XIX).
nounil m000 XLX)
verb(tba rked XIX).
verb) lwinkedXI,X).
With this simple parser we can determine if strings ufthe following t y pe are grammati-
callN correct. -
To do so. we must enter sentence queries as lists such as the following tor the
PROLOG interpreter:
?
X=ll
? - sentence)) barked,a,mOOfl.dOg.thelXI
no
Since the remainder of the sentence hound toX is the empty set, it is recognited
a corrct The second sentence failed since it could not instantiate with the correct
constituent parts.
Of course, for a parser to be of much practical use, other constituent , and a
great many more words should be defined. 1 he example illustrates the utility of
using PROLOG as a basic parser.
The simple networks described above are not powerful enough to recognize the
variety of sentences a human language system could be expected to cope with. In
fact. they fail to recognize all languages that can be generated by a context-free
Sec. 12.4 Basic Parsing Techniques 247
grammar. Other extensions are needed to accept a wider range of sentences but
still avoid the necessity for large complex networks. We can achieve such extensions
by labeling some arcs as a separate network state (such as an NP) and then constructing
i subnetwork which recognizes the different noun phrases required. In this way, a
single subnetwork for an NP can be called from several places in a sentence Similar
arcs can be labeled for other sentence constituents including VP. PP (prepositional
phrases) and others. With these additions, complex Sentences having .embedded
phrases can he parsed with relatively simple networks, This leads directly to the
notion of using recursion in a network.
A recursive transition network (RTN) is a transition network which permits
are labels to refer to other networks (including the network's own name), and .they
in turn may refer back to the referring network rather than just permitting word
categories used previously. For example, an RTN described by William Woods
1970) is illustrated in Figure 12.9 where the main network calls two subnetworks
and an NP and PP network as illustrated in 12.9(b) and (c).
The top network in the figure is the top level (sentence) network, and the
lower level networks are for NP and PP arc states. The arcs corresponding to these
states will be traversed only if the corresponding subnetworks (b) or (c) are successfully
traversed.
NP
S: POP
N: VsO: i
AUX
::
ADJ PP
NP N2 N4 POP
Ni NPR POP
N3
NP'
PP _.._E__ POP
)c) Prepositional phrase network
Tv pc
01 arc Purpose of arc Example
Starting with CND set to SI. POS Set to I. and RLIST set to nil, the first arc test
(NP) would be completed. Since this test is for a state, the parser would PUSH
the return node S2 onto RLIST. set CND to NI. and call the NP network. Trying
the first test DEl (a CAT test) in the NP network, a match would be found with
word position 1. This would result in CND being updated to N2 and POS to position
2. The next word (big) satisfies the ADJ test causing CND to be updated to N2
again, and POS to be updated to position 3. The ADJ test is then repeated for the
word tree, but it fails. Hence, the arc test for N is made next with no change
made to l'OS and CND. This time the test succeeds resulting in updates of N4 to
CND and position 4 to POS. The next test is the POP which signals a successful
completion of the NP network and causes the return node (SI) to be retrieved
from the RUST stack and CND to be updated with S2. POP does not cause an
advance in the-word position POS.
The only possible test from S2 is for category V which succeeds on the word
"shades" with resultant updates of S5 to CND and 5 to POS. At S5, the only
possible test is the NP. This again invokes a call to the lower level NP network
which is traversed successfully with the noun phrase "the old house. After a
return to the main network, CND is set to S6 and POS is set to position b. At this
point, the lower PP network is called with CND being set to Pt and So pushed
onto RLIST. From P1. the CAT test for PREP passes with CND being set to P2
and POS being set to 9. NP is then called with CND being set to NI and P2 being
pushed onto RLIST. As before, the NP network is traversed with the noun phrase
"the stream" resulting in a POS value of II, P3 being popped from RLIST and a
return to that node. The test at P3 (POP) results in S6 being popped from RLIST
and a return to the S6 node. Finally, the POP test at N6. together with the period
at position II results in a successful traversal and acceptance of the sentence.
During a network traversal, a parse can fail if (I) the end of the input sentence
(a period) has been reached when the test from the CND node value is not a terminal
(POP) value or(2) if a word in the input sentence fails to satisfy any of the available
arc tests from some node in the network.
The number of sentences accepted by an RTN can be extended if backtracking
is permitted when a failure occurs. This requires that states having alternative transi-
tions be remembered until the parse progresses past possible failure points. hi this
w, if a failure occurs at some point, the interpreter can backtrack and try alternative
paths. The disadvantage with this approach is that parts of a sentence may be parsed
more than one time resulting in excessive computations.
The networks considered so far are not very useful for language understanding.
They have only been capable of accepting or rejecting a sentence based on the
grammar and syntax of the sentence. To be more useful, an interpreter must be
able to build structures which will ultimately be used to create the required knowledge
entities for an Al system. Furthermore, the resulting data structures should contain
a
250 Natural Language Processing Chap. 12
more information than just the syntactic information dictated by the grammar alone.
Semantic information should also be included. For example, a number of sentence
features can also be established, and recorded, such as the subject NP, the object
NP, the subject-verb number agreement, the mood (declarative o interrogative),
tense, and so on. This means that additional tests must be performed to determine
the possible semantics a Sentence may have. Without these additional tests, much
ambiguity will still be present and incorrect or meaningless sentences accepted.
We can achieve the additional capabilities required by augmenting an RIN
with the ability to perform additional tests and store immediate results as a sentence
is being parsed. When an 'RTN is given these additional features, it is called an
augmented transition network or ATN.
When building a representation structure, an ATN uses a number of different
registers as temporary storage to hold the different sentence constituents. Thus,
one set of registers would he used for an NP network, one for a PP network, one
for a V. and so on. Using the register contents, an ATN builds a partial structural
description of the sentence as it moves from state to state in the network. These
registers provide temporary storage which is easily modified, switched, or discarded
until the final sentence structure is constructed. The registers also hold flags and
other indicators used in conjunction with some arcs. When a partial structure has
been stored in registers and a failure occurs, the interpreter can clear the registers.
backtrack, and start a new set of tests. At the end of a successful parse, the contents
of the registers are combined to form the final sentence data structure required for
output.
A specification language developed by Woods 41970. 1986 for ATNs takes the
lorni of an extended context-free grammar. This language is given in Figure 12.10
where the vertical bar indicates alternative choices for a construction and the *
Kleene star) signifies repeatable (zero or more) elements. All nonterrninals are
enclosed in angle brackets. Some of the capitalized words appearing in the language
were defined earlier as arc tests and actions. The other words in uppercase correspond
to functions which perform many of the tasks related to the construction of the
structure using the registers.
The specification language is read the same as rewrite rules. Thus, it specifies
that a transition network is composed of a list of arc sets, where each arc set is in
turn a list with first element beinL a state name and the remaining elements being
arcs which emanate from that state An arc can be any of the forms CAT. JUMP.
PUSH, TEST. WORD or POP. Vor example. as noted earlier, the TEST arc corre-
sponds to an arbitrary test which determines whether the arc is to be traversed or
not. Note that a sequence of actions is associated with the arc tests. These actions
are executed during the arc traversals. They are used to build pieces of structures
such as a tree or a list. The te,minal action of any arc specifies the state to which
control is passed to complete the transition.
21
Sec. 12.4 Basic Parsing Techniques
Among other things, an action can be any of the three function forms SETR.
SENDR, and LIFTR which cause the indicated register values to be Set to the
value of form. Terminal actions can be either TO or JUMP where TO requires that
the input sentence pointer should be advanced, and JUMP requires that the pointer
remain fixed and the input word continue to be scanned. Finally, a construction
form can be any of the seven alternatives in the bottom group of Figure 12. 10,
including the symbol @ which is a terminal symbol placeholder for form.
The function SETR causes the contents of the indicated registers to be set
equal to the value of the corresponding form. This is done at the current level in
the network, while SENDR causes it to be done by sending it to the next lower
level of computation. L(FR returns information to the next higher level of computa-
tion. The function GETR returns the value of the indicated register. and GETF
returns the value of a specified feature for the current input word. As noted before.
the value of @ is usually an input word. The function BUILDQ takes lists from
the indicated registers (which represent fragments of a parse tree with marked nodes)
and builds the sentence structures.
An ATN network similar to the RTN illustrated in Figure 12.9 is presented
in Figure 12.11. Note that the arcs in this network have some of the tests described
above. These tests will have the basic forms given in Figure 12.10, together with
the indicated actions. The actions include building the final sentence structure which
may contain more features than those considered thus far, as well as certain semantic
features.
Using the specification language, we can represent this particular network
with the constituent abbreviations and functions described above in the form of a
LISP program. For example, a partial description of the network is depicted in
252
Natural Language Processing Chap. 12
:2 PUSH(NP)
cAlivi POP
CAT(ADJ PUSFBPPI
N CAT(OET(
NP
Pop
C R(
Pop
CATiPREP) PUSH(NP)
PP
Sec. 12.4 Basic Parsing Techniques 253
(N4) is tested and the PP test subsequently fails. POP is executed and a rcturfl 01
control is made to statement 2.
3. The register SUBJ is set to the value of 01 which is the list structure
(NP(dog(big) DEF) returned from the NP reiscrs. DEF sign ihes that the determiner
is definite.
4. In line 3. register TYPE is set to DCL (for declarative).
5. Control is transferred t', S I with the statement TO in line 4 and the input
pointer is moved past the noun phrase to the verb-likes.-
6. If an auxiliary verb had been found at the beginning of the sentence
instead of an NP, control would have been passed to line 5 where statements 5. 7.
and 8 would have beer' executed. This would have resulted in registers AtJX .'t'
TYPE being set to the values (o and Q respectively.
254 Natural Language Processing Chap. 12
7. At SI. a category test is made for a V. Since this succeeds (is 1). statements
II, 12. and 13 are executed. This results in register AUX being set to nil, and
register V being set to the contents of (i to give (V likes). Control is then passed
to S4 and the input pointer is moved to the word 'the,"
8. If the test for V had failed, and an auxiliary verb had been found, statements
14 and 15 would have been executed.
9. Since S4 is a terminal node, a sentence structure can be built there. This
will be the case if the end of the sentence has been reached. If so, the BUILDQ
function creates a list structure with first element S. followed by the values of the
three registers TYPE, SUBJ. AUX, corresponding to the three plus (+) signs.
These are then followed with VP and the contents of the V register. For example,
with an input sentence of.
the structure (S DCI. (NP (boy) DEF) (AUX can) (VP whistle)) would he constructed
from the tour re g isters TYPE. SUB), AUX. and V.
10. Because more input words remain, the BUILDQ in line 22 is not executed,
and control drops to the next line where a push is made to the lower NP network.
As before, the NP succeeds with the structure (NP (boy (Small) DEF)) being returned
as the value of (I Register VP is then set to the list returned by BUILDQ (line
24) which consists of VP followed by the verb phrase and control is passed to SS.
H. Since S5 is a terminal node and the end of the input sentence has been
reached. BU1LDQ will build the final sentence structure from the TYPE, SUB),
AUX. and VP register contents. The final structure constructed is
The use of recursion, are tests, and a variet y of arc and node combinations
give the ATNs the power of a Turing Machine. This means that an ATN can
recognize any language that a genera! purposecomputer can recognize. This versatility
also makes It possible to build, deep sentence structures rather than just structure.s
with surface features only. (Recall that surface features relate to the torm of words.
phrases, and sentences, whereas deep features relate to the content or meaning of
these elements). The ability to build deep structures requires that other appropriate
tests. be included to cheek pronoun references, tense, number agreement, and other
featdres.
Because of their power and versatility. ATNs have become popular as a model
for general purpose parsers. They have been used successfully in a number of natural
language systems as well as front ends for databases and expert systems.
Semantic Analysis and Representation Structures 255
Sec. 12.5
It turned into a black day. In his haste to catch the flight, he hacked over Tom's
bicycle. He should never have left it there. It was damaged beyond repair..That caused
the tailpipe to break. It would be impossible to make it now. . . - It was all because
of that late movie. He would he heartbroken when he found Out about It.
Although a car was never explicitly mentioned, it must be assumed that a car
was the object which was backed over Tom's bicycle. A program must be able to
infer this. The "black day" metaphor also requires some inference. Days are not
usually referred to by color. And sorting out the pronoun references can also he an
onerous task for 'a program. Of the seven uses of it, two refer to the bicycle. two
to the flight, two refer to the situation in general, and one to the status of the day.
There are also four uses of he referring to two different people and a that which
refers to the accident in general. The placement of the pronouns is almost at random
making it difficult to give any rule of association. Words that point hack or refer
to people, places, objects, events, times, and so on that occurred before, are culled
anaphors. Their interpretation may require the use of heuristics, syntactic and semantic
constraints, inference, and other forms of object analysis within the discourse content.
This example should demonstrate again that language cannot be separated
from intelligence and reasoning. To fully understand the above situation requires
that a program be able to reason about people's goals, beliefs. motives, and facts
about the world in general.
The semantic structures constructed from utterances such as the above, must
account for all aspects of meaning in what is known as the domain. context. and
the task. The domain refers to the knowledge that is part of the world model the
system knows about. This includes object descriptions, relationships, and other rele-
vant concepts. The cornea relates to previous expressions, the setting and time ot
the utterances, and the beliefs. ,esires, and intentions of the speakers. A hO/s IS
part of the service the system otters, such as retrieving information from a data
base, providing expert advice, or performing a language translation The domain.
context, and task are what we have loosely referred to before as semantics, pragmatle,
and world knowledge.
256
Natural Language Processing Chap. 12
The semantic grammars described in Sect Lon 12.2 are one form of approach based
on the use of lexical semantics . With this approach, input sentences are transformed
throu g h the ue of domain dependent semantic rewrite rules which create the target
knowledge strucoires. A second example of an iifcrmal lexical-semantic approach
is one which USCS c'oncepttai dependency theory
. Conceptual dependency structures
provide a form of inked Knowlde that can be uscd in larger structures such a
scenes and script..
The construction o t c ncetual dependency structures is accomplished without
performing any direct s y ntactic analysis. Making the jump between utterance and
Sec. 12.5 Semantic Analysis and Representation Structures 257
these structures requires that more information be contained in the lexicon. The
lexicon entries must include word sense and other information which relate the
words to a number of prinhltive semantic categories as well as some s y ntactic informa-
tion.
Recall front 7 that conceptualizations are either events or object states.
Event structures include objects and their attributes, picture producers (PPs) or actors.
actions, direction of action (to or from) and sometimes instruments that participate
in the actions, and the location and time of the event. These items are collected
together in a slot-filler structure as depicted in Figure 12.13.
Verbs in the input string are a dominant factor in building conceptual dependency
structures because they denote the event action or state. Consequently, lexicon entries
for verbs will be more extensive than other entry types. They will contain all possible
senses, tense, and other information. Each verb maps to one of the primitive actions;
ATRANS. AT-FEND, CONC, EXPEL, GRASP, INGEST, MBUILD, MOVE.
MTRANS. PROPEL. VrRANs. and SPEAK. Each primitive action will also have
an associated tense: past, present, future, conditional, continuous, interrogative.
end, negation, start, and timeless
The basic process followed in building conceptual dependency structures is
simpl y the three steps listed below.
is-
Chap. 12
258 Natural Language Processing
would he initiated to look for associated words which complete the phrase
beginning with to or for.
For the above tests, there are four types of actions taken.
These actions build up the conceptual dependency structure as the input string
is parsed. For example, the action taken for a verb like drank would be to build a
;ubstructure for the primitive acticn INGEST with unfilled slots for ACTOR. OB-
JECT, and TENSE.
Subsequent words in the input string would initiate actions to add to this structure
and fill in the empty ACTOR and OBJECT slots. Thus, a simple sentence like
would be transformed through a series of test and action steps to produce a structure
such as the following.
This would be parsed, and the following tree structure would be output from the
ATN;
IS DCL
(NP (N (Samp(e24())
fAUX (TENSE (PRESENT)))
(VP (V (contain))
(NP IN (silicon))))
Using this structure, the semantic interpreter would produce the predicate clause
produce expressions that are natural and close to humans requires more than rules
of syntax, semantics, and discourse. In general, it requires that a coherent plan be
developed to carr y out multiple goats. A great deal of sophistication goes into the
simplest types of utterances when they are intended to convey different shades of
meanings and emotions. A participant in a dialog must reason about a hearers
understanding and his or her knowledge and goals. During the dialog, the system
must maintain proper focus and formulate expressions that either query, explain,
direct, lead or just follow the conversation as appropriate.
The study of language generation falls naturally Into three areas: (I the determi-
nation of content. (2) formulating and developing a text utterance plan and (3)
achieving a realization of the desired utterances.
Content determination is concerned with what details to include in an explana---
tion. a request, a question or argument in order to convey the meanings set forth
by the goals of the speaker. This means the speaker must know what the hearer
already knows, what the hearer needs to know, and what the hearerwants to know.
These topics are related to the domain, task, and discourse context described above.
Text planning is the process of organizing the content to be communicated so as to
best achieve the goals of the speaker. Realization is the process of mapping the
organized content to actual text. This requires that specific words and phrases be
chosen and formulated into a syntactic structure.
Until about 1980, not much work had been done beyond single sentence genera-
tion. Understanding and generation was performed with a single piece of isolated
text without much regard given to context and consideration of the hearer. Following
this early work, a few comprehensive systems were developed. To complete this
section, we describe the basic ideas behind two of these systems. They take different
approaches to those taken by the lexical and compositional semantics understanding
described in the previous section.
KAMP is a knowledge and modalities planner developed for the generation of natural
language text. Developed by Douglas Appeit (1985), KAMP simulates the behavior
of an expert robot named Rob (a terminal) assisting John (a person) in the disassembly
and repair of air compressors.
KAMP uses a planner and a data base of knowledge in (modal) logical form.
The knowledge includes domain knowledge, world knowledge, linguistic knowledge,
and knowledge about the hearer. A description of actions and action summaries
are available to the planner. Given a goal, the planner uses heuristics to build and
refine a plan in the form of a procedural network. Other procedures act as critics
of the plans and help to refine them If a plan is completed, a deduction system is
used to prove that the sequence of actions do, in fact, achieve the goal. If the plan
fails, the planner must do further searching for a sequence of actions that will
work. A completed plan states the knowledge and intentions of the agent, the robot
Sec. 12.6 Natural Language Generation 261
Rob. This is the first step in producing the output text. The process can be summarized
as follows.
Suppose KAMP has determined the immediate goal to be the removal of the
compressor pump from the platform.
Truel'Attached)pump platform))
KAMP first formulates and refines a plan that John adopt Rob's plan to remove
the pump from the platform. The first part of Rob's plan suggests a request for
John to remove the pump leading to the expression
After axioms are used to prove that actions in the initial summary plan are
successful, the request is expanded to include details for the pump removal. Rob
decides that John will know he is near the platform and that he knows where the
toolbox is located. but that he does not know what tool to use. Rob, therefore,
determines that John will not need to be told about the platform, but that he must
be informed, with an imperative statement, to remove the pump with a wrench in
the toolbox.
The next step is for Rob to plan speech acts to realize the request. This
req.ires linguistic knowledge of the structure to use for an imperative request. in
this case, that the sentence should have the form V NP ()* (recall that stands
for optional repetition). Words to complete' the output string are then selected and
ordered accordingl y . -
This leads to the generation of a sentence with the following tree structure.
/\V IPP
remove DET
the
I I N
the wrenth P NP
in DEl N
the toolbo
The overall process of planning and formulating the final sentence "Remove
the pump with the wrench in the toolbox" is very involved and detailed. It requires
planning and plan verificaticn for content, selecting the proper structures, selecting
senses, mood, tense, the actual words, and a final ordering. All of the steps must
be constrained toward the realization of the (possibly multiple) goals set forth. It is
truly amazing we accomplish such acts with so little effort.
Niel Goldman (Schank et al.. 1973) developed a generation component called BABEL
which was used as part of several language understanding systems built by Schank
and his students SAM. MARGIE. QUALM. and SO on). This component worked
in conjunction with an inference component to determine responses to questions
about short news and other stories.
Given the general content or primitive event for the response, BABEL selects
and builds an appropriate conceptual dependency structure which includes the intended
word senses. A modified ATN is then used to generate the actual word string for
output
To determine the proper word sense, BABEL uses a discrimination net. For
example, suppose the system is told a story about Joe going into a fast-food restaurant,
ordering sandwich and a soft drink in a can, paying, eating, and then leaving.
After the understanding part of the system builds the conceptual dependency and
script structures for the story, questions about the events could he posed. If asked
what Joe had in the restaurant. BABEL would first need to determine the conceptual
Sec. 12.6 Natural Language Generation 263
INGEST
fl.i(P smoke'
N/" \ES N7/1 "\ES
thr
ough use air' use
mouth? "drink" "smoke"
category of the question in order to select the proper conceptual -dependency pattern
to build. The verb in the query determines the appropriate primitive categories of
eat and drink as being INGEST. 'To determine the correct sense of INGEST as eat
and drink a discrimination net like that depicted in Figure 12.14 would be used. A
traversal of the discrimination net leads to eat and drink, using the relation from
have and sandwich as being taken through the mouth and soft drink as fluid.
Once a conceptual dependency framework has been selected, the appropriate
words must be chosen and the slots filled. Functions are used to operate on the net
to complete it syntactically to obtain the correct tense, mood, form, and voice.
When completed, a modified ATN is then used to transform the conceptual dependency
structure into a surface sentence Structure for output.
The final conceptual dependency structure passed to the ATN would appear
as follows.
joe
P 0 0
joe '" INGEST -e-- oft-dr,nk C can ---- ^T loe
t MOVE
soft drink
Contain Can
DI
ft-drink mouth
An ATN used for text generation differs from one used for analysis. In particular,
the registers and arcs must be different. The value of the register contents (denoted
as (i in the previous section) corresponds to a node or arc in the conceptual dependency
264 Natural Language Processing Chap. 12
(or other type) network rather than the next word in the input sentence. Registers
will be present to hold tense, voice, and the like. For example, a register named
FORM might be set to past and a register VOICE set to active when generating an
active sentence like "Joe bought candy." Following an arc such as a CAT/V arc
means there must be a word in the lexicon corresponding to the node in the conceptual
dependency. The tense of the word then follows from the FORM register contents.
In this section, we briefly describe a few of the more successful natural language
understanding systems. They include LUNAR, LIFER, and SIIRDLU.
The LUNAR system was designed as a language interface to give geologists direct
access to a data base containing information on lunar rock and soil compositions
obtained during the NASA Apollo-] I moon landing mission. The design objective
was to build a system that could respond to natural queries received from geologists
such as
The system has a dictionary of some 3500 words, an English grammar and
two data bases. One data base contains a table of chemical anal y ses of about 13.000
entries, and the other contains 10,000 indexed document topics. LUNAR uses a
meaning representation language which is an extended form of FOPL. The language
uses (1) designators which name objects or classes of objects like nouns, variables,
and classes with range quantifiers, (2) propositions that can be true or false, that
are connected with logical operators and, or, riot, and quantification identifiers.
Sec. 12.7 Natural Language Systems 265
and (3) commands which carry out specific actions (like TEST which tests the
truth value of propositions against given arguments (TEST (CONTAIN sarnple24
silicon).
Although never fully implemented, the LUNAR project was considered an
operational success since it related to a real world problem in need of a solution.
It failed to parse or find the correct semantic interpretation on only about 10% of
the questions presented to it.
LIFER (Language Interface Facility with Ellipsis and Recursion) was described
briefly in Section 12.2 under semantic grammars. It was developed by Gary Hendnx
(1978) and his associates to be used as a development aid and run-time language
interface to other systems such as a data base management system. Among itS
special features are spelling corrections, processing of elliptical Inputs, and the
'ability of the run-time user to extend the language through the use of paraphrase.
LIFER consists of two major components, a set of interactive functions for
language specifications and a parser. The specification functions are used to define
an application language as a subset of English that is capable of interactirlE with
existing software. Given the language specification, the parser Interprets the lantiage
inputs and translates them into appropriate structures that interact with the application
software.
In using a semantic grammar. LIFER systems incorporate much semantic infor-
mation within the syntax. Rather than using categories like NP. VP, N. and V.
LIFER uses semantic categories like <SHIP-NAME> and <ATTRIBUTE> which
match ship names or attributes. In place of yntactic patterns like NP VP. semantic
patterns like What is the <ATTRIBUTE> of <SHIP>? are used. For each such
pattern, the language definer supplies an expression with which to compute the
interpretations of instances of the pattern. For example, if LIFER were used as the
front end for a database query system, the interpretation would he for a database
retrieval command
LIFER has proven to be effective as a front end (nt a number of systems.
The main disadvantage, as noted earlier, is the potentially large number of patterns
that may be required for a system which requires many. diverse patterns.
syntactic and semantic analysis, as well as the reasoning process are more closely
integrated.
The system can be roughly divided into four component domains: (I) a syntactic
parser which is governed by a large English (systemic type) grammar. (2) a semantic
component of programs that interpret the meanings of words and structures, (3) a
cognitive deduction component used to examine consequences of facts, carry Out
commands, and find answers, and (4) an English response generation component.
In addition, there is a knowledge base containing blocks world knowledge, and a
model of Its own reasoning process. used to explain its actions.
Knowledge is represented with FOPL-likestatements which give the state of
the world at any particular time and procedures for changing and reasoning about
the state. For example. the expressions
(IS bi block)
(IS b2 pyramid)
(AT b (LOCATION 120 120 0))
(SUPPORT bl b2)
(CIEARTOP b2(
(MANIPULATE bi)
(IS blue color)
12.8 SUMMARY
Grammars were formally introduced, and the Chomsky hierarchy was presented.
This was followed with a description of structural representations for sentences,
the phrase marker. Four additional extended grammars were briefly described. One
was the transformational grammars, an extension of generative grammars. Transfor-
mational grammars include tree manipulation rules that permit the construction of
deeper semantic structures than the generative grammars. Case, semantic, and sys-
temic grammars were given as examples of grammars that are also more semantic
oriented than the generative grammars.
Lexicons were described, and the role they play in NL systems given. Basic
parsing techniques were examined. We locked at simple transition networks, recursive
transition networks, and the versatile ATN. The ATN includes tests and actions as
part of the arc components and special registers to help in building syntactic structures
With an ATN, extensive semantic analysis is even possible. We defined top-down
bottom-up, deterministic, and nondeterministic parsing methods, and an example
of a simple PROLOG parser was also discussed.
We next looked at the semantic interpretation process and discussed two broad
approi'ches, namely the lexical and compositional semantic approaches. These ap-
proaches are also identified with the type of target knowledge structures generated.
In the compositional semantics approach. logical forms were generated, whereas in
the lexical semantics approach, conceptual dependency or similar network structures
are created.
Language generation is approximately the opposite of the understanding analysis
process, although more difficult. Not only must a system decide what to say but
how to say it. Generation falls naturally into three areas, content determination.
text planning, and text realization. Two general approaches were presented. They
are like the inverses of the lexical and compositional semantic analvsis processes.
The KAMP system uses an elaborate planning process to determine what, when.
and how to state some concepts. The system simulates a robot giving advice to a
human helper in the repair of air compressors. At the other. extreme, the BABEL
system generates output text from conceptual dependenc y and script structures.
We concluded the chapter with a look at three s stems of somewhat disparate
architectures: the LUNAR. LIFER, and SFIRDLU s y stems. These systems typtf.
the state-of-the-art in natural language processing sy'crns.
EXERCISES
12.1. Derive a parse tree,for the sentence "Bill loves the frog." where the following
rewrite rules are used.
S*NPVP
-
NP -'N
NP -.DETN
VP -.VNP
268 Natural Language Processing Chap. 12
DIET -.the
V —e Loves
N —bill 1 frog
12.2. Develop a parse tree for the sentence "Jack slept on the table" using the following
rules.
S -.NPVP
NP -.N
NP -.DETN
VP -.vpr
PP PREP NP
N -. jack table
V -. slept
1ET -. the
PREP-. on
12.3. Give an example ot each of the four types 0. I, 2. and 3 for Chomskys hierarchy
of grammers.
12.4. Modify the grammer of Problem 12.1 to allow the NP (noun phrase) to have zero
to many adjectives.
12.5. Explain the main differences between the following three grammars and describe
the principal features that could be used to develop specifications for a snta-tical
recognition program. Consult additional references for more details regardin g each
grammar
Chomsk y s Transformational Grammar
Fillmore Case Grammar
Systemic Grammars
12.6. Draw an ATN to implement the granlmer of Problem 12.1.
12.7. Given the following parse tree, write down the corresponding context free gramrner.
NP'
DEl ADJ N
12.8. Create a LISP data structure to model a simple lexicon similar to the one depicted
in Figure 12.6.
12.9. Write a LISP irratch program which checks an input sentence for matchin g words in
the lexicon of the previous problem.
12.10. Derive an ATN for the parse tree of Problem 12.7.
12.11. Dense an ATN sraph to implement the parse tree of Problem 12
12.12. Determine it the following sentence s "ill he accepted bN the grammar I Ii hk m
12.6.
ta The g reen g reen grass of the home
h The red ear drove in the last lane.
12.13. Write PROLOG rul' to implement the grammar used to (lcris' the parse tOe
Pnsrhlcnr 12.7 Omit rules for the individual word categories (like noun ([bail . .,\
Generate a syntas tree using one output parameter..
12.13. Write a PROLOG program that will take grammar rules in the following format:
1NT— (NT I T*
where NT is any vonterminal. T is any terminal, and Kleene star signiltes arr
number of repetitions, and generate the corresponding top-down parser: that i-j.
12.15. TvlodifN the program in Problem I 2. 12 to accept extra ar g uments used to return
meaningful knowledge structures.
12.16. Write a LISP pro g ram which uses property lists to create the recursive transition
network depicted in Figure 12.9. Each node should be given a nalno such as SI.
NI. and P1 and :tssoci:ited with a list of arc and node pairs emanating from the
node.
12.17. Write a recursive program in LISP which tests Input sentences br the F IN developed
in the .previous problem. The program should return t if the sentence is acceptable.
and nil if not
12.18. rviod i fv the pro g ram of Problem 12. 15 to accept sentences of the type depicted in
Figure 12.12
12.19. Write an KN type of program as depicted in Figure 12.12 which builds structures
like those of Figure 12.13.
12.20. Describe in detail the differences between language understanding and IatigUae gcrier
iron. Explain the problems in developing a program which is capable of carrying on
a dialo g with a group of people.
0
270 Natural Language Processing Chap. 12
12.21. Give the processing steps required and corresponding data structures needed for a
robot named Rob to formulate instructions for a helper named John to complete a
university course add-drop request form.
12.22. Give the conceptual dependency graph for the sentence "Mary drove her car to
school" and describe the steps required for a program-to transform the sentence to
an internal conceptual dependency structure.
4r)
IL)
Pattern Recognition
One of the most basic and essential characteristics of living things is the ability to
recognize and identify objects. Certainly all higher animals depend on this ability
for their very survival. Without it they would be unable to function een in a
static, unchanging environment.
In this chapter we consider the process of computer pattern recognition, it
process whereby computer programs are used to recognize various forms of input
stimuli such as visual or acoustic (speech) patterns. This material will help to round
Out the topic of natural language understanding when speech, rather than test. i
the language source. It will also serve as an introduction ts the following chapter
where we take up the general problem of computer vision.
Although some researchers feel that pattern recognition should no longer he
considered a part of Al. we believe many topics from pattern recognition are essential
to an understanding and appreciation of important concepts related to natural language
understanding. computer vision, and machine learning. Consequently. we have in-
cluded in this chapter a selected number of those topics believed to he important.
271
272 Pattern Recognition. Chap. 13
13.1 INTRODUCTION
Recognition is the process of establishing a close match between some new stimulus
and preroiisfy stored stimulus patterns This process is bein g pertrined continually
throughout the lives of all living things. In higher animals this ability is manifested
in man y forms at both the conscious and Unconscious levels, for both abstract as
well as physical objects Throu g h visual sensing and recognition. we identit y many
special objects. such as home, office, school, restaurants. face sof people. handwriting.
and printed words Through aural sensing and recogntion, ccc identif y familiar
VOiCCS, songs and pieces of music, and bird and other animal sounds Through
touch. we identity pIty 'ocai objects such as rens, Cups. automobile controls, and
food items. And through our other senses we identify foods, fresh air. toxic substances
and much else.
At more abstract levels of cognition, we recognize or identif y such thins as
ideas (electromagnetic radiation phenomena, model of the atom, world peace). con-
cepts (beauty, generosit y , complexity), procedures (game playing. making a hank
deposit), plans, old arguments, metaphors, and so On.
Our pervasive use of and dependence on our abilit y to recognize patterns has
motivated much research toward the discovery of mechanical or artificial methods
comparable to those used by intelligent beings. The results of these efforts to date
have been impressive, and numerous applications have resulted. S y stems have now
been developed to reliabl y perform character and speech recognition: fingerprint
and photograph identifications: electroencephelogram (EEG), electrocardiogram
IiCGj, oil log cvell, and Othei graphical pattern analyses various types of medical
and s y stem diagnose': resource identification and evaluation ' (geological, forestry,
h y drological, crop disease): and detection of explosive and hostile threats (submarine,
aircraft, missilei to name a few.
Object classification is closel y related to recognition. The ability to classify
or group objects according to come commonly shared features is a form of class
recognition Classification is essential for decision making. learnin g , and many other
co g nitive acts. Like reco g nition, classification depends on the ability to discover
common patterns amon g objects. This abilit y , in turn, must he acquired through
some learning process. Prominent feature patterns which cht'acterize classes of
objects must be discovered, generalized, and stored for subsequent recall and compari
son
We do not know exactly how humans learn to identify or classify objects.
however, it appears the following processes take place:
New objects are introduced to a human through activation of sensor stiiiiuti The
sensors. depending on their physical properties, are sensitive in varying degrees to
certain attributes which serve to characterize the objects, and the sensor output tends
to he proportional to the more prominent attributes Having perceived a new object,
a cognitive model is lormed from the stimuli patterns and stored in memory. Recurrent
experiences in perceiving the same or similar objects strengthen and refine the similarity
The Recognition and Classification Process 273
Sec. 13.2
There are two basic approaches to tne .-on problem. (I) the decision-
theoretic approach and (2) the syntactic approach.
atee
Clsi t ,ct
Learning ___..,Jton
Figure 13.1 The pattern recniton
process
I
274 Pattern Recognition Chap. 13
The decision theoretic approach is based on the use of decision functions to classify
objects. A decision function maps pattern vectors X into decision regions of D.
More formally, this problem can be stated as follows.
I. Given a universe of objects 0 = {o, 0,,..., o,,}, let each o have k observable
attributes and relations expressable as a vector V = ( V 1 . v 2 .....vi).
2. Determine (a) a subset of m k of the v,, say X = ( x1,
whose values uniquely characterize the o, and (b) c 2 groupings or classifica-
tions of the o, which exhibit high intraclass and low interclass similarities
such that a decision function I(X) can be found which partitions D into c
disjoint regions. The regions are used to classify each o, as belonging to at
most one of the c classes.
M—i' F—+ D
When there are only two classes, say C and C 2 . the values of the object's
pattern vectors may tend to cluster into two disjoint groups. In this case, a linear
decision function d(X) can often be used to determine an object's class. For example,
when the classes are clustered as depicted in Figure 13.2, a linear decision function
d is adequate to classify unknown objects as belonging to either C 1 or C, where
000 0f07 *
o o 0 oI/ + +
o 0 o A' + + +
C 000/++**++ c,
0 0 0/ + +
o 7* +
/ + + -4-
ooI'+ I
000 ., +++++
00/ +++
f d( X) + X2W2 * 0 Figure 13.2 A linear decision function.
Sec. 13.2 The Recognition and Classification Process 275
belonging to class C2 when d(X) > 0. When d(X) = 0 the classification is indeterini-
nate, so either (or neither) class may be selected.
When class reference vectors, prototypes R 1 . j = I .......are available.
decision functions can be defined in terms of the distance of the X from the reference
sectors. For example, the distance
d,(X) = (X - R,)'(X - R,)
could be computed for each class C, and class CA would then be chosen when
dA = min{C}.
For the general case of c ^ 2 classes. C 1 , C- C_ a decision function
may be defined for each class d 1 , 6, ,...,d,. A class . decision rule in this case
would he defined to select class c1 when
< d(X) for ij = I. 2..... c, and i ^6 j.
When a line d (or more generally a hyperplane in Jr-space) can he found that
separates classes into two or more groups as in the case of Figure 13.2. e
the classes are linearly separable. Classes that overlap each other or surround one
another, as in Figure 13.3, cannot generally be classified -,kith the uc ol irnple
linear decison functions. For such cases, more general nonlinear (or piece\ 'c linear)
functions may be required. Alternatively, some other selection technique t like hcuri'.-
tics) may be needed.
The decision function approach described above is an example of detcrrninitie
recognition since the x, are deterministic variables. In cases where the attribute
values are affected by noise or other random fluctuations, it ma y he more upprupriale
to define probabilistic decision functions In such cases, the attribute vectors X are
treated as random variables., and the decision functions are defined as measures ol
likelihood of class inclusion. For example, using Bayes' rule, one can compute the
v
conditional probability P(C,IX) that the class of an object o is C, gi en the ubsersed
value of X for •,. This approach requires a knowledge of the prior prabahiltt:
P(C,), the probability of the occurrence of samples from C, as shell as !i X (
C
xl
. +4 + 4. 4- • *4. ++
+ •000+ * + 4- 00.
0++0000 4+ 4*000 4- .-
o o + + + 0 0 0 0 +.+ * +O000***
000+4.4000+4+0 + 4-400004+4.
000+4+000+ #00 44+000 * +
o 0 0 0 +1 + 0 0 0 4 0 0 44*4.4+4+
0 0 0 01* + 0 0 0 0 0 4*4*44-
o olo + + * + + 0
000o00
jpeg I
00
Figure 13.3 Examples of nsnIrnearty separable classes.
276 Pattern Recognition Chap. 13
(Note that the C, are treated like random variables here. This is equivalent to the
assumption made in Bayesian classification where the distribution parameter 0 is
assumed to be a random variable since C, may be regarded as a function of 0). A
decision rule for this case is to choose class C1 if
X) > PC, I X) for all i 7^ j.
A more comprehensive probabilistic approach is one which is based on the
use of a loss or risk Bayesian function where the class is chosen on the basis of
minimum loss or risk. Let the loss function L, denote the loss incurred by incorrectly
classifying air actually belonging to class C, as belonging to C1 . When I.,, is
a constant for all i. I. I j. a decision rule can be formulated using the likelihood
ratio defined as (see Chapter 6)
P(XICk)
PXIC,
The rule is to choose class Co whenever the relation
P(X Ck) >
holds for all j ^ k
P(XIcJ ) P((-1,)
Probabilistic decision rules may be constructed as either parametric or nonpara-
metric depending on knowledge of the distribution forms, respectively. For a compre-
hensive treatment of these methods see (Duda and Hart, 1973) or (lou and Gonzales.
1974).
Syntactic Classification
V r : a e
/''\ I
bt
,...- g )
Sec. 13.3 Learning Classification Patterns 277
A A aafagaad6ccid
(
B eghf
C = eghf Figure 13.4 Syntactic characterization
() (). of objects.
Using syntactic analysis, that is parsing and analyzing the string structures,
classification is accomplished by assigning an object to class C1 when the string
describing it has been generated by the grammar Q. This requires that the string
be recognized as a member of the language L(G). If there are only two classes, it
is sufficient to have a single grammar G (two grammars. are needed when strings
of neither class can occur).
When classification fore ^_- 2 classes is required. c — I (ore) different grammars
are needed for class recognition. The decision functions in this case are based on
grammar , recognition functions which choose class C, if the pattern string is found
to be generated by grammar G. that is. if it is a member of L(G). Patterns not
recognized as a member of a defined language are indeterminate.
When patterns are noisy or subject to random fluctuations, ambiguities may
occur since patterns belonging to different classes may appear to he the same. In
such cases, stochastic or fuzzy grammars may be used. Classification for these
to
cases may be made on the basis of least cost transform an input string into a
valid recognizable string, by the degree of class set inclusion or with a similarity
measure using one of the methods described in Chapter 10.
Before a system can recognize objects. it must possess knowledge of the characteristic
features for those objects. This means that the s y stem designer must ether build
the necessary discriminating rules into the s y stem or the system must learn them.
In the case of a linear decision function, the weights that define class boundaries
must be predefined or learned. In the case of syntactic recognition, the class grammars
must he predefined or learned.
Learning decision functions., grammars, or other rules can be performed in
either of two ways. through supervised learning or unsupervised learning. Supervised
learning is accomplished by presenting training examples to a learning unit. The
examples are labeled beforehand with their correct identities or class.. The attribute
values and object labels are used by the learning component to inductively extract
and determine pattern criteria for each class. This knowledge is used to adjust
parameters in decision functions or grammar rewrite rules. Supervised learning con-
cepts are discussed in some detail in Part V. Therefore, we Concentrate here on
some of the more important notions related to unsupervised learning.
In unsupervised learning, labeled training examples are not available and little
218 Pattern Recognition Chap. 13
is known beforehand regarding the object population. In such cases, the system
must be able to perceive and extract relevant properties from the otherwise unknown
objects, find common patterns among them, and formulate descriptions or descrimina-
tion criteria consistent with the goals of the recognition process.
This form of learning is known as clustering. It is the first step in any recognition
process where discriminating features of objects are not known in advance.
(T) - t )"
When m is unknown, the number of arrangements increases as the sum of the 5_ that is. as S". F.':
example when,: = 25, the number of arrangements is more than
280 Pattern Recognition Chap. 13
clusters respectively. During the clustering process, the thresholds are used to deter-
mine if a cluster should be split into two clusters, merged with other clusters or
discarded (when too small). The algorithm is given with the follow,ing steps.
I. Select ,n samples as seed points for initial cluster centers. This can be done
by taking the first rn points, selecting random points or by taking the first m
points which exceed some mutual minimum separation distance d.
2. Group each sample with its nearest cluster center.
3. After all samples have been grouped, compute new cluster centers for each
group. The center can be defined as the centroid (mean value of the attribute
vectors) or some similar central measure.
4. If the split threshold t 1 is exceeded for any cluster, split it into two parts and
recompute new cluster centers.
S. If the distance between two cluster centers is less than t 2 , combine the clusters
and recompute new cluster centers.
6. If a cluster has fewer than t 3 members, discard the cluster. It is ignored for
the remainder of the process.
7. Repeat steps 3 through 6 until no change occurs among cluster groupings or
until some iteration limit has been exceeded.
Measures for determining distances and the center location need not be based
on ordered variates. They may be one of the measures described in Chapter ID
(including probabilistic or fuzzy measures) or some measure of similarity between
graphs, strings, and even FOPL descriptions. In any case, it is assumed each object
o is described by a unique point or event in the feature space F.
Up to this point we have ignored the problem of attribute scaling. It is possible
that a few large valued variables may completely dominate the other variables in a
similarity measure. This could happen, for example, if one variable is measured in
units of meters and another variable in millimeters or if the range and scale of
variation for two variables are widely different. This problem is closely related to
the feature selection problem, that is, in the assignment of weights to feature variables
on the basis of their importance or relevance. One simple method for adjusting the
scales of such variables is to use a diagonal weight matrix W to transform the
representation vector X to X' = WX. Thus, for all of the measures described
above, one should assume the representation vectors X have been appropriately
normalized to account for scale variations.
To summarize the above process, a subset of characteristic features which
represent the a, are first selected. The features chosen should be good discriminators
in separating objects from different classes, relevant, and measurable (observable)
at reasonable cost. Feature variables should be scaled as noted above to prevent
any swamping effect when combined due to large valued variables. Next, a suitable
metric which measures the degree of association or similarity between objects should
be chosen, and an appropriate clustering algorithm selected. Finally, during the
Sec. 13.4 Recognizing and Understanding Speech 231
clustering process, the feature variables may need to be weighted to reflect the
relative importance of the feature in affecting the clustering.
Developing systems that understand speech has been a continuing goal of Al research-
ers. Speech is one of our most expedient and natural forms of communication, and
so understandably, it is a capability we would like Al systems to possess. The
ability to communicate directly with progqtms offers several advantages. It eliminates
the need for keyboard entries and speeds up the interchange of information between
user and system. With speech as the communication medium, users are also free
to perform other tasks concurrently with the computer interchange. And finally.
more untrained personnel would be able to use computers in a variety of applications.
The recognition of continuous waveform patterns such as speech begins with
sampling and digitizing the waveforms. In this case the feature values are the sampled
points x, = f)
as illustrated in Figure. 13.5.
It is known from information theory that a sampling rate of twice the highest
speech frequency is needed to capture the information content of the speech wave-
forms. Thus, sampling requirements will normally be equivalent to 20K to 30K
bytes per second. While this rate of information in itself is not too difficult to
handle, this, added to the subsequent processing, does place some heavy requirements
on real time understanding of speech.
Following sample digitization, the signals are processed 'at different levels of
abstraction. The lowest level deals with phones (the smailest unit of sound), allophones
(variations of the phoneme as they actually occur in words), and syllables. Higher
level processing deals with words, phrases, and sentences.
The processing approach may be from the bottom, top, or a combination of
both. When bottom processing is used the input signal is segmented into basic
speech units and a search is made to match prestored patterns against these units.
Knowledge about the phonetic composition of words is stored in a lexicon for
comparisons. For the top approach, syntax. semantics (the domain), and pragmatics
(context) are used to anticipate which words the speaker is likely to have said and
f(t)
direct the search for recognizable patterns. A combined approach which uses both
methods has also been applied successfully.
Early research in speech recognition concentrated on the recognition of isolated
words. Patterns of individual words were prestored and then compared to the digitized
input patterns. These early systems met with limited success. They were unable to
tolerate variutions.in speaker voices and were highly susceptible to noise. Although
important. this early work helped little with the general problem of continuous
speech understanding since words appearing as part of a continuous stream differ
significantly from isolated words. In continuous speech, words are run together,
modified, and truncated to produce a great variation of sounds. Thus, speech analysis
must be able to detect different sounds as being part of the same word, but in
different contexts. Because of the noise and variability, recognition is best accom-
plished with some type of fuzzy comparison.
In 1971 the Defense Advanced Research Projects Agency (DARPA) funded
a live year program for continuous speech understanding research (StiR). The objec-
tive of this research was to design and implement systems that were capable of
accepting continuous speech from several cooperative speakers using a limited vocabu-
lary of some 1000 words. The systems were expected to run at slower than real
time speeds. A product of this research were several systems including HEARSAY
I and II, HARPY, and HWIM. While the systems were only moderately successful
in achieving their goals, the research produced other important byproducts as well,
particularly in systems architectures, and in the knowledge gained regarding control.
The HEARSAY system was important for its introduction of the blackboard
architecture (Chapter 15). This architecture is based on the cooperative efforts of
several specialist knowledge components communicating by way of a blackboard
in the solution of a class of problems. The specialists are each expert in a different
area. For example, speech analysis experts might each deal with a different level
of the speech problem. The solution process is opportunistic, with each expert making
a contribution when it can. The solution to a given problem is developed as a data
structure on the blackboard. As the solution is developed, this data structure is
modified by the contributing expert. A description of the systems developed under
StiR is given in Barr and Feigenbaum (1981).
13.5 SUMMARY
Pattern recognition systems are used to identify or classify objects on the basis of
their attribute and attribute-relation values. Recognition may be .ccomplished with
decision functions or structural grammars. The decision functions as well as the
grammars may be deterministic, probabilistic, or fuzzy.
Before recognition can be accomplished, a system must learn the criteria for
object recognition. Learning may be accomplished by direct designer encoding,
supervised learning, or unsupervised learning. When unsupervised learning is re-
quired. some form of clustering may be performed to learn the object class characteris-
tics.
Speech understanding first requires recognition of basic speech patterns. These
patterns are matched against lexicon patterns for recognition. Basic speech Units
such as phonemes are the building blocks for longer units such as syllables and
words.
EXERCISES
13.1. Choose three common Objects and determine live of their most discriminating visual
attributes.
13.2. For the previous problem. determine three additional nonvisual attributes for the
objects which are most discriminating
13.3. Find a linear decision function which separates the following - v points into two
distinct classes.
Visual Image
Understanding
Vision is perhaps the most remarkable of all of our intelligent sensing capabilities.
Through our visual system, we are able to acquire information about our environment
without direct contact. Vision permits us to acquire information at a phenomenal
rate and at resolutions that are most impressive. For example, one only needs to
compare the resolution of a TV camera system to that of a human to see the difference.
Roughly speaking, a TV camera has a resolution on the order of 500 parts per
square cm, while the human eye has a limiting resolution on the order of some 25
X 106 parts per square cm. Thus, humans have a visual resolution several orders
of magnitude better (more than 10,000 times finer) than that of a TV camera.
What is even more remarkable is the ease with which we humans sense and perceive
a variety of visual images. It is so effortless, we are seldom consciou a of the act.
In this chapter, we examine the processes and the problems involved in building
computer vision systems. We look at some of the approaches taken thus far and at
some of the more successful vision systems constructed to date.
14.1 INTRODUCTION
Because of its wide ranging potential, computer vision has become one of the most
intensely studied areas of Al and engineering during the past few decades. Some
typical areas of application include the following.
285
Chap. 14
286 Visual Image Understanding
MANUFACTURING
MEDICAL
DEFENSE
BUSINESS
ROBOTICS
SPACE EXPLORATION
Illumination
Retina
Transparent
lens
together with some form of inference. The basic vision process as it occurs in
humans is depicted in Figure 14. 1
Light from illuminated objects is collected by the transparent lens of the eye,
focused, and projected onto the retina where Some 250 million light sensitive sensors
(cones and rods) are excited. When excited, the sensors send impulses through the
optic nerve to the visual cortex of the occipital lobes of the brain where the images
are interpreted and recognized.
Computer vision systems share some similarities with human visual systems,
at least as we now understand them. They also have a number of important differences.
Although artificial vision systems vary widely with the specific application, we
adopt a general approach here, one in which the ultimate objective is to determine
a high-level description of a three-dimensional scene witha competency level compara-
ble to that of human vision systems. Before proceeding farther we should distinguish
between a scene and an image of a scene. A scene is the set of physical objects in
a picture area, whereas an image is the projection of the scene onto a two-dimensional
plane.
With the above objectives in mind, a typical computer vision system should
be able to perform the following operations:
uWU:_T]
Intermedate High level Sernant
Image sensor Low level
level descrr,t,n
The input to a vision system is a two dimensional image collected on some form
of light sensitive surface. This surface is scanned by some means to produce a
continuous voltage output that is proportional to the light intensity of the image on
the surface. The output voltage fix, y) is sampled at a discrete number of x and
points or pixel (picture element) positions and converted to numbers. The numbers
coirespond to the gray level intensity for black and oxhite images. For color images,
the intensity value is comprised of three separate arrays of numbers, one for the
intensity value of each of the basic o!ors (red. green, and blue).
Thus, through the digitization process.
' the image is transformed from a continu-
ous light source into an airay of numbers s'.hich correspond to the local image
in'ens:tlCs at the corresponding s-s piscl positions on the light sensitise surface.
sing the array of number'. certlia low level operations are performed. such
as smoothing of nighhoring points to reduce noise. finding outlines of oh1ls or
edge elements. thresholding recordit'e niav:Ii1um and minimum values only. depend-
ing on some fixed intensit y ihi lcvel i. and determining texture, color, and
other object features. These irihial processing steps are ones which are LiSC to
locate and accentuate object boundaries and other structure within the image.
The next stage of processing. the intermediate level, involves connecting.
tilling in, and combining boundaries. detcrmining regions. and assigning descriptise
labels to objects that have been accentuated in the first stage. This stage builds
higher level structures from the lower level elements of tile first stage. When complete.
it passes on labeled surfaces such as geometrical objects that may be capable of
identification.
High-level image processing consists of identifying the important objects in
:s.age and their relationships for subsequent dc.;cription as well-defined knoss ledge
strcures and hence, for use by a reasoning component.
Special types of vision systems may also require three dimensional processing
and analysis as well as motion detection and analysis.
Sec. 14.1 Introduction 289
The ultimate goats of computer image understanding is to build systems that equal
or exceed the capabilities of human vision systems. Ideally, a computer vision
system would be capable of interpreting and describing any complex scene in complete
detail. This means that the system must not only be able to identify a myriad of
complex objects, but must also be able to reason about the objects, to describe
their function and purpose, what has taken place in the scene, why any visible or
implied events occurred, what is likely to happen, and what the objects in the
scene are capable of doing.
Figure 14.3 presents an example of a complex scene that humans can interpret
well with little effort. It is the objective of many researchers in computer vision to
build systems capable of interpreting, describing, and reasoning about scenes of
this type in real time. Unfortunately, we are far from achieving this level of compe-
tency. To he sure, some interesting vision systems have been developed, but they
are quite crude compared to the elegant vision systems of humans.
Like natural language understanding, computer vision interpretation is a difficult
problem. The amount of processing and storage required to interpret and describe
a complex scene can be enormous. For example, a single image for a high resolution
aerial photograph may result in some four to nine million pixels (bytes) of information
and require on the average some 10 to 20 computations per pixel. Thus, when
several frames must be stored during processing, as many as 100 megab ytes of
storage may be needed, and more than 100 million computations performed,
In this section, we examine the first stages of processing. This includes the process
of forming an image and transforming it to an array of numbers which can then be
operated on by a computer. In this first stage, only local processing is performed
on the numbers to reduce noise and other unwanted picture elements, and to accentuate
object boundaries.
The first step in image processing requires a transformation of light energy to numbers,
the language of computers. To accomplish this, some form of light sensitive transducer
is used such as a vidicon tube or charge-coupled device (CCD).
A vidicon tube is the type of sensor typically found in home or industrial
video systems. A lens is used to project the image onto a flat surface of the vidicon.
The tube surface is coated with a photoconductive material whose resistance is
inversely proportional to the light intensity falling on it. An electron gun is used to
produce a flying-spot scanner with which to rapidly scan the surface left to right
and top to bottom. The scan results in a time varying voltage which is proportional
to the scan spot image intensity. The continuously varying output voltage is then
fed to an analog-to-digital converter (ADC) where the voltage amplitude is periodically
sampled and converted to numbers. A typical ADC unit will produce 30 complete
digitized frames consisting of 256 x 256. or 512 x 512 (or more) samples of an
image per second. Each sample is a number (or triple of numbers in the case of
color systems) ranging from (ito 64 (six bits) or 0 to 255 (eight bits). The image
conversion process is depicted in Figure 14.4.
A CCD is typical of the class of solid state sensor devices known as charge
transfer devices that are now being used in many vision systems. A CCD is a
rectangular chip consisting of an array of capacitive photodeteCtorS, each capable
of storing an electrostatic charge. The charges are scanned like a clock-driven shift
register and converted into a time varying voltage v,hich Is proportional to the
incident light intensity on the detectors. This voltage is sampled and converted to
integers using an ADC unit as in the case of the vidicon tube. The density of the
I 'array of numbers produced from the image sensing device may be thought of
as the Jowct, T1Oct primitive level of abstraction in the vision understanding process.
The next step in the P r
ocessing hierarchy is to find some structure among the
pixels
such as pixel clusters W1ii<h define object boundaries or regions within
the image.
Thus, it is necessary to transform the array of raw pixel data into regions of discont
ties and hom j nuj
ogeneity, to find edges and other delimiters of these object regions.
A raw digitized image Will contain some noise
and distortion, lheretofe, compu-
tations to reduce these effects may be necessary before locating edges and regions.
Depending on the particular application, low level processing will often require
local smoothing of the array to eliminate this noise. Other low level operations
include threshold processing to help define homogeneous regions, and different forms
of edge detection to define boundaries. We examine some of these low level methods
next.
Thresholding is the process of
t ransforming a gray level representation to a
binary representation of the image. All digitized array values above some threshold
level T are set equal to the maximum gray-level value (black), and values less
than or equal to I are set equal to zero (white). For simplicity, assume gray-level
values have been normalized to range between zero and one, and suppose a threshold
level of T = 0.7 has been chosen. Then all array values
to 1 and values g(x,y) 0.7 g(x,v) > 0.7 are set equal
are set equal to 0. The result is an array of binary 0
and I values. An example of an image that has been thresholded at
0.7 to produce
a binary image is illustrated in Figure 14.5.
Thresholding is one way to segment the image into sharpen object regicns by
enhancing some portions and reducing others like noise and other unwanted features.
Thresholding can also help to simplify subsequent processing steps. And in many
cases, the use of several different threshold levels may be necessary since low
intensity object surfaces will be lost to high threshold levels, and unwanted background
will be picked up and enhanced by low threshold levels.
T hresholding at several
levels may be the best way to determine different regions in the image when it is
necessary to compensate for variations in illumination or poor Contrast.
Selecting one or more appropriate threshold level settings 1', will require addi-
tional co
mputations, such as first producing a histogram of the image gray-level
intensities. A histogram gives the frequencies of occurrence of different intensity
(or some other feature) levels within the image. An analysis of a histogram can
reveal where concentrations of different intensity levels occur, where peaks and
broad fiat levels occur and where abrupt differences in level occur. From this informa-
Visual Image Understanding Chap. 14
292
Binary image
(b)
(a)
Figure 14.5 Threshold transformation of an Image.
,,ice of I values are often made apparent For example, a histogram
tisn the best (:l ) rations between intensity levels that have a relatively
with two or more clear sepa
high frequency of occurrence will usually suggest the best threshold levels for object
identification and separation. This is seen in Figure 14.6.
Smoothing is a form a
Next, we turn to the question of image smoothing.
digital filtering. It is used to reduce noise and other unwanted features and to enhanc
certain image features. Smoothing is a form of image transformation that tends tc
eliminate spikes and flaten widely fluctuating intensity values. Various forms of
smoothing techniques have been employed, including local averaging, the use of
models, and parametric form fitting.
One common method of smoothing is to replace each pixel in an arra) witl'
a weighted average of the pixel and its neighboring values. This can be accomplishe
with the use of filter masks which use some configuration of neighboring pixe
values to compute a smoothed replacement value. Two typical masks consist o
either four or eight neighboring pixels whose intensity values are used in the weightin
Potaible
threthold
Fregoenes
CrayIeel intensity
Regions belonging to the same object are usually distinguishable by one or more
Chap. 14
features which are relatively homogeneous throughout, such as color, texture, three-
dimensional how effects, or intensity.
Boundaries which separate adjoining regions represent a disco'ntiflUitY in one
or more of these features, a fact that can be exploited by measuring the rate of
change of a feature value over the image surface. For example, the rate of change
and vertical directions can be measured
or gradient in intensity in the horizon tal
defined as
with difference functions D, and D
= f(x.y) -fix -
D, =jiv.y)
= tan
are most easily computed by application of the equivalent
For n = I, D, and D
respectiVelY.
weighting masks; the two element masks are (—I ) and
An example of the application of these two masks to an image array is illustrated
in Figure 14.8 where a vertical edge is seen to be quite pronounced. Masks such
as these have been generalized to measure gradients over wider regions covering
several pixels. This has the effect of reducing spurious noise and other sharp spikes.
Two masks deserving particular attention are the Prewitt (1970) and Sobel
(1970) masks as depicted in Figure 14.9. These masks are used to compute a broadened
normalized gradient than the simple masks given above. We leave the details of
the computations as one of the exercises at the end of this chapter.
We return now to the methods of edge detection which employ smoothing
followed by an application of the gradient. For this, the Continuous case is considered
first.
Result of Dx and Dy
Original Array I Mask Applied to I
(a) (b)
Convolving the two functions f and g is similar to computing the cross con-elation,
a process that reduces random noise and enhances coherent or structural changes.
One particular form of weighting function g has a symmetric bell shape or
normal form, that is the Gaussian distribution. The two dimensional form of this
function is given by
g(u,v) = ce2*2)2
where c is a normalizing constant.
Because of their rotational symmetry, Gaussian filters produce desirable effects
— 1 0 1 —1 0 1
P,.= —1 0 1 S —2 0 2
—1 0 1 —1 0 1
1 1 1 1 2 1
p,= o 0 sr = 0 0 0
—1 —1 —1 —1 —2 —1
Figure 14.9 Generalized edge detection
Prewitt Masks Sobel Masks masks. -
296 Visual Iffige Understanding Chap. 14
J image
I_Gradient
$J'U-
Second order Gradient applied
intensity gradient to convolution
fl
Applying this transform to an array of intensity values produces an array of
complex numbers that correspond to the spatial frequency components of the image
(sums of sine and cosine terms). The transformed array will contain all of the
information in the original intensity image, but in a form that is more easily used
to identify regions that contain different frequency components. Filtering with the
Fourier transform is accomplished by setting the high (or low) values of u and i.
equal to zero. For example, the value F(v.i') = F(0,0) corresponds to the zero
frequency or the DC component. and higher values of u and L correspond to the
high frequency components. As with intensity image arrays, thresholding of trans-
formed arrays can be used to separate different frequency components.
The original intensity image with any modifications, is recovered with the
inverse transform given by
1 n — I n —I
flx.v) = - F(u.u) exp - (xu + vv)
1? L fl
Image Transformation and Low-Level Processing 297
Sec. 14.2
As suggested earlier, texture and color are also used to identify regions and boundaries.
Texture is a repeated pattern of elementary shapes occurring on an object's surface.
Texture may appear to be regular and periodic, random, or partially periodic. Figure
14.11 illustrates some examples of textured surfaces.
The structure in texture is usually too fine to be resolved, yet still course
enough to cause noticeable variation in the gray levels. Even so, methods of analysis
for texture have been developed. They are, commonly based on statistical analyses
of small groups of pixels, the application of pattern matching, the use of Fourier
transforms, or modeling with special functions known as fractals. These methods
are beyond the scope of our goals in this chapter.
The use of color to identify and interpret regions requires more than three
Limes as much processing as gray-level processing. First, the image must be separated
into its three primary colors with red, green, and blue filters (Figure 14. 12).
The separate color images must then be processed by sampling the intensities
and producing three arrays or a single array of tristimulus values. The arrays are
then processed separately (in some cases jointly) to determine common color regions
and corresponding boundaries. The processes used to find boundaries and regions,
and to interpret color images is similar to that of gray-level systems.
Although the additional computation required in color analysis can be significant,
the added information gained from separate color intensity arrays may be warranted,
depending on the application. In complex. scene analysis, color may be the most
effective method of segmentation and object identification. In Section 14.6 we describe
Red
3reen
_=_S__H
Figure 14.12 Color separation and
FiIter processing.
an interesting color scene analyser which is based on a rule based inferencing system
(Ohta. 1985).
A stereoscopic vision system requires two displaced sensors to obtain two views of
objects from different perspectives. The differences between the views makes it
possible to estimate distances and derive a three-dimensional model of a scene.
The displacement of a pixel from one image to a different location in another image
is known as the disparity. It is the dispatity between the two views that permit the
estimation of the distance to objects in the scene. The human vision system is
somehow able to relate the two different images and form a correspondence that
translates to a three-dimensional interpretation. Figure 14.13 illustrates the geometric
relationships used to estimate distances to objects in stereoscopic systems.
The distance k from the lens to the object can be estimated from the relationships
that hold between the sides of the similar triangles. Using the relations i 1 / e 1 = f/ k,
i / e, f/k, and d = e 1 + e2 we can write
k = fd/(j 1 +i)
Since f and d are relatively constant, the distance k
is a function of the disparity,
or sum of the distances it and i2.
In computer vision systems, determining the required correspondence between
the two displaced images is perhaps the most difficult part in determining the disparity.
foci length
of lens
dstnce to
object P
P object
',, it are the
two Image, , Figure 14.13 Disparity in stereoscopic
systems.
299
Intermediate-Level Image processing
Sec. 14.3
Corresponding pixel groupings in the two images must be located to determine the
disparity from which the distance can be estimated. In practice, methods based on
correlation, gray-level matching. template matching, and edge contour comparisons
have been used to estimate the disparity between stereo images.
scene analysis which
Optic flow is an alternative approach to threedimenSiOfla l
is based on the relative motion of a sensor and objects in the scene. If a sensor is
moving (or objects are moving past a sensor), the apparent continuous flow of the
objects relative to the sensor is known as optical flow. Distances can be estimated
from the change in flow or relative velocity of the sensor and the objects. For
example, in Figure 14.14 if the velocity of the sensor is constant, the change in
x2 is proportional to the change in size of the
distance dx between points x and
power lines h, through the relation
dx/dt=k(dhldt)
The next major level of analysis builds on the low-level or early processing steps
described above. It concentrates on segmenting the image space into larger global
egions and boundaries formed from
structures using homogeneous features in pixel r
pieces of edges discovered during the low-level processing. This level requires that
pieces of edges be combined into contiguous contours which form the outline of
objects, partitioning the image into coherent regions, developing models of the
segmented objects, and then assigning labels which characterize the object regions.
One way to begin defining a set of objects is to draw a silhouette or sketch
of their outlines. Such a sketch has been called the raw primal sketch by Man
(1982). It requires connecting up pieces of edges which have a high likelihood o:
forming a continuous boundary. For example, the problem is to decide whethe:
two edge pieces such as
300
Visual Image Understanding Chap. 14
(edge (location 21 103)
(edge (location 18 98)
(intensity 0.8)
(intensity 0.6)
(direction 46)) (direction 41
should be connected. This general process of forming contours from pieces of edges
is called segmentation
Graphical methods can be used to link up pieces of edges One approach is to use
a minimum spanning tree (MST). Starting at any cluster of pixels known to be
part of an edge, this method performs a search in the neighborhood of the cluster
for graupings with similar feature values. Each such grouping corresponds to a
node in an edge tree. When a number of such nodes have been found they are
Connected using the MST algorithm.
An MST is found by connecting an arc between the first node selected and
its closest neighbor node and labeling the two nodes accordingly. Neighborhoods
of both connected nodes are then searched. Any node found closest to either of the
two connected nodes (below some threshold distance) is then used to form the
next branch in the tree. A second arc is constructed between the newly found node
and the closest connected node, again labeling the new node. This process is repeated
until all nodes having are distances less than some value (such as a function of the
average arc distances) have been connected. An example of an MST is gisen in
Figure 14.15.
Another graphical approach is based on the assignment of a cost or other
measure ofmerit to pisel groupin g
s,The cost assignment can he based on a simple
function of features such as intensity. orientation, or color. A best-first (branch-
and-bound) or other form of graph search is then performed using sonic heuristic
function to determine a least-cost path which represents the edge contour.
Other edge finding approaches are based on fitting a low degree pol
y nomial
to a number of edge pieces which have been found through local searches. The
resultant polynomial curve se g
ment is then taken as the edge boundary. This approach
is similar to one which compares edge templates to short groupings of pieces. If a
particular matching template scores above some threshold, the template pattern is
then used to define the contour.
where
where K(s,t) is the Cost at stage n, and C., 1 (t) is the minimum Cost for stages
n + I to the terminal stage.
The computation process is best understood through an example. Consider
the following 5 X 5 array of pixel cost values.
19 7 6 5 I
13 7 2 7 I
41521
6 4 3 7 7
87223
Suppose we wish to find the optimal cost path from the lower left to the upper
right corner of the array. We could work from either direction, but we arbitrarily
choose to work forward from the lower left pixel with cost value 8. We first set
all values except 8 equal to some very large number, say M, and compute the
minimum cost of moving from the position with the 8 to all other pixels in the
bottom row by adding the cost of moving f rom pixel to neighboring pixel. This
results in the following cost array.
302 Visual Image Understanding Chap. 14
MMM.MM
MM M M M
M M M M M
M M M M M
8 15 17 19 22
Next, we compute the minimum neighbor path cost for the next to the last row to
obtain
MM M M M
M M M M M
M M M M M
14 14 17 24 29
8 15 17 19 22
Note that the minimum cost path to the second, third and fourth positions in this
row is the diagonal path (position 5,1 to 4,2) followed by a horizontal right traversal
in the same row, whereas the minimum cost path for the last position in this row
is the path passing through the rightmost position of the bottom row. The remaining
minimum path Costs are computed in a similar fashion, row by row, to obtain the
final cost array.
27 24 23 22 21
18 22 17 24 20
18 iS 19 19 26
14 14 17 24 29
8 15 17 19 22
From this final minimum cost array, the least cost path is easily found to be
27 24 2322=2l
18 22 17 24 20
18 15r19 19 26
17 24 29
17 19 22
Rather than defining regions with edges, it is possible to build them. For example,
global structures can be constructed from groups of pixels by locating, connecting,
and defining regions having homogeneous features such as color, texture, or intensity.
The resulting segmented regions are expected to correspond to surfaces of objects
in the real world. Such coherent regions do not always correspond to meaningful
regions, but they do offer another viable approach to the segmentation of an image.
When these methods are combined with other segmentation techniques, the confidence
level that the regions represent meaningful objects will be high.
Once an image has been segmented into disjointed object areas, the areas
can be labeled with their properties and their relationships to other objects, and
then identified through model matching or description satisfaction.
Region segmentation may be accomplished by region splitting, by region grow-
ing (also - called region merging), or by a combination of the two. When splitting
is used, the process proceeds in a top-down manner. The image is split successively
into smaller and smaller homogeneous pieces until some criteria are satisfied. When
growing regions, the process proceeds in a bottom-up fashion. Individual pixels or
small groups of pixels are successively merged into coittiguous, homogeneous areas.
A combined splitting-growing approach will use both bottom-up and top-down tech-
niques.
Regions are usually assumed to be disjointed entities which partition the image
such that (I) a given pixel can appear in a single region only, (2) subregions are
composed of connected pixels, (3) different regions are disjoint areas, and (4) the
complete image area is given by the union of all regions. Regions are usually
defined by some homogeneous property such that all pixels belonging to the region
satisfy the property, and pixels not satisfying the property lie in a different region.
Note that a region need not consist of contiguous pixels only since some objects
may be split or covered by occluding surfaces. Condition 2 is needed to insure
that all regions are accounted for and that they fill up the complete image area.
In region splitting, the process begins with an entire image which is successively
divided into smaller regions which exhibit some coherence in features. One effective
method is to apply multiple thresholding levels which can isolate regions having
homogeneous features. Histograms are first obtained to establish the threshold levels.
This may require masking portions of the image to achieve effective separation of
complex objects. Each threshold level can then produce a binary image consisting
of all of the objects which exceed the thresholded level. Once the binary regions
are formed, they are easily delineated, separated, and marked for subsequent process-
ing. This whole process of masking, computing, and analyzing a histogram, threshold
ing, defining an area, masking, and so on can be performed in a recursive manner.
The process terminates when the masks produce monomodal histograms with the
image fully partitioned.
Segmentation techniques based on region growing start with small atomic
regions (one or a few pixels) and build coherent pixel regions in a bottom-up fashion:
304 Visual Image Understanding Chap. 14
Local features such as the intensity of a group of pixels relative to the average
intensity of neighboring pixels are used as criteria for the merging operation. A
low level of contrast between contiguous groups gives rise to the merging of areas,
while a higher level of contrast, such as found at boundaries, provides the criteria
for region segregation.
Split-and-merge techniques attempt to gain the advantages of both methods.
They combine top-down and bottom-up processing using both region splitting and
merging until some split-merge criterion no longer exists. At each step in the process,
split and merge threshold values can be compared and the appropriate operation
performed. In this way, over-splitting and under-merging can be avoided.
We continue in this section with further intermediate-level processing steps all aimed
at building higher levels of abstraction. The processing steps here are related to
describing and labeling the regions.
Once the image has been segmented into disjointed regions, their shapes,
spatial interrelationships, and other characteristics can be described and labeled for
subsequent interpretation. This process requires that the outlines or boundaries, ver-
tices, and surfaces of the objects he described in some wa y . It should be noted,
however, that a descript i on for a region can be based on a two- or three-dimensional
image interpretation. Initially, we focus on the two-dimensional interpretation.
Typically, a region description will include attributes related jo size, shape,
mnd genera' appearance. For example some or all of the following features might
included.
Region area
Contour length (perimeter) and orientation
Location of the center of alas
Minimum bounding rectangle
Com p actness (area divided by perimeter squared)
Fitted sc.trer matrix of pixels
Number and characteri s tics of holes or internal occlusions
Minimum bounding rectangle
Degree and type of texture
Average intensity (Or average intensities of base colors)
Type of boundary serments (sh:rp, iiziy, and Sc on) and their Iocon
Boundar y contrast
Chain code (described below)
Shape classification number (task specific)
Position and types of vertices (number of adjoining segments)
Desàribing Boundaries
(a)
1^1^ fbI
^"Flgure 14.17 ('ure 6tting stth linear
(c) (di segments
2-
306 Visual Image Understanding Chap. 14
I. Starting with the two end points of the boundary curve, construct a straight
line between the points.
2. At successive intervals along the curve, compute the perpendicular dic
to the constructed line. If the maximum distance is within some specified
limit, stop and use the segmented line as an approximation to the houndar)
3. Otherwise, choose the point on the curve at which the largest distance occurs
and use this as a breakpoint with which to construct two new line semeni
which connect to the two endpoints. Continue the process recursvet's v. ui
each subcurve until the stopping condition of Step 2 is satisfied.
Chain Codes
Some other descriptive features include the area, intensity, orientation. center ul
mass, and hounding rectangle. These descriptions are determined in the toIui
way.
20
1. The area of a region can be given by a count of the number of pixels contained
in the region.
2. The average region intensity is just the average gray-level intensity taken over
all pixels in the region. If color is used in the image, the average is given as
the three base color intensity averages.
3. The center of mass M. for a region can be computed as the average x-v vector
position (denoted as P1), that is
M = (I In) P1
Three-Dimensional Descriptions
possible. Once a match was obtained and all objects identified, the program demon-
strated its "understanding'' of the scene by producing a graphic display of it on a
monitor screen.
Guzman wrote a program called SEE which examined how surfaces from the
same object were linked together. The geometric relationships between different
types of line junctions (vertices) helped to determine the object types, Guzman
identified eight commonly occurring edge junctions for his three-dimensional blocks
world objects. The junctions were used by heuristic rules in his program to classify
the different object b y type (Figure 14.19).
Huffman and Cloes. working independently, extended this work by developing
a line labeling scheme which systematized the classification of polyhedral objects.
Eheim 'scheme was used to classify edges as either concave. convex, or occluding.
Concave edges are produced by two adjacent touching surfaces which produce a
concave (less than 180 depth change. Conversely, convex edges produce a convexly
viewed depth change (greater than 180 0 ) . and an occluding edge outlines a surface
that obstructs other objects.
To label a concave edge. a minus sign is used. Convex edgs are labeled
with a plus sign. and a right or left arrow is used to label the occluding or boundary
edges. By restricting vertices to be the intersection of three object faces (trihedral
vertices), it is pos s ible to reduce the number of basic vertex types to only tour: the
L. the T. the Fork. and the Arrow (Figure 14.20). Different label combinations
assigned to these tour types then assist in the classification and identification of
objects.
When a three-dimensional object is viewed from all possible positions. the
four junction types. togcther with the valid edge labels, give rise to eighteen different
permissible junction configurations as depicted in Figure 14.20. From a dictionary
of these valid junction types, a program can classify objects by the sequence of
bounding vertices which describe it. Impossible object configurations such as the
one illustrated in Figure 14.21 can also be detected.
Geometric constraints, together with a consistePlabeling scheme, can greatly
simplify the object identification process. A set of labeling rules which greatly
facilitates this process can be developed for different classes of objects. For example.
using the labels described above, the following rules will apply for many polyhedral
L Y
The L The T The fork The X
x.
T A
The arrow The psi The peak The multi
L typea
L L L L LL
Fork ty
YYYYY
I 4
I tYpes -
/-j 1\
+ + -
Arrow types
/+ \ /
Figure 14.20 Valid junction labels for three-dimcn'ionaI shapes.
objects: ( I ) the arrow should be directed to mark boundaries by traversing the object
in a clockwise direction (the object face appears on the right of the arrow). (2)
unbroken lines should have the same label assigned at both ends. (3) when a fork
is labeled with a ± edge, it must have all three edges labeled as +, and (4) arrow
junctions which have a -. label on both barb edges must also have a + label on
the shaft.
These rules can he applied to a pol y gonal object as illustrated in Figure 14.22.
El
Starting with any edge having an object face on its right, the external boundary is
labeled with the in a clockwise direction. Interior lines are then labeled with +
or - consistent with the other labeling rules.
Continuing with this early work, David Waltz developed a method of vertex constraint
propagation which establishes the permissible types of vertices that can be associated
with a certain class of objects. He broadened the class of images that could be
anal y zed by relaxing lighting conditions and extending the labeling vocabulary to
accommodate shadows, some multiline junctions and other types of interior lines.
His constraint satisfaction algorithm was one of his most important contributions.
To see how this procedure works, consider the image drawing of a pyramid
as illustrated in Figure 14.23. At the right side of the pyramid are all possible
lahe!ings for the four junctions A, B. C, and D.
Using these labels as mutual constraints on connected junctions, permissible
labels for the whole pyriniid can be determined. The constraint satisfaction procedure
works as follows:
A/N
AD C/
-. Consequently, two of the possible label ings can be eliminated with the remaining
four being
This reduction in turn, places a new restriction on BC, permitting the elimination
of one C label. since BC must now he labeled as a -f only. This leaves the remaining
C labels as
Continuing with the above procedure, it will be found that further label elimina-
tions are not possible since all constraints have been satisfied. The above process
is completed by finding the different combinations of unique labelings that can be
assigned to the figure. This can be accomplished through a tree search process. A
simple enumeration of the remaining labels shows that it is possible to find only
Visual Image UnderStanding Chap. 14
312
Template Matching
HIGH-LEVEL PROCESSING
Before proceeding with a discussion of the final (high-level) steps in vision processing,
we shall briefly review the processing stages up to this point. We began with an
image of gray-level or tristimulus color intensity values and digitized this image to
obtain an array of numerical pixel values. Next, we used masks or some other
transform (such as Fourier) to perform smoothing and edge enhancement operations
to reduce the effects of noise and other unwanted features. This was followed by
edge detection to outline and segment the image into cohernt regions. The product
of this step is a primal sketch of the objects. Region splitting and/or merging, the
dual of edge finding, can also be used separately or jointly with edge finding as
part of the segmentation process.
Histogram computations of intensity values and subsequent analyses were an
important part of the segmentation process. They help, to establish threshold levels
which serve as cues for object separation. Other techniques such as minimum spanning
tree or dynamic programming are sometimes used in these early processing stages
to aid in edge finding.
Following the segmentation process, regions are analyzed and labeled with
their characteristic features. The results of these final steps in intermediate-level
Sec. 14.5 High-Level Processing 313
processing is a set of region descriptions (data structures). Such structures are used
as the input to the final high-level image processing stage. A summary of the data
structures produced from the lowest processing stage to the final interpretation stage
then can be depicted as follows.
Scene
t
Objects
t
Regions
t
Edges or subregions
t
Pixels
David Man and his colleagues (1982, 1980, and 1978) proposed a theory of vision
which emphasized the importance of the representational scheme used at each stage
of the processing. His proposal was based on the assumption that processing would
be carried out in several steps similar to the summary description given above.
The steps, and the corresponding representations are summarized as follows.
High-.evol Processing
High-level processing techniques are less mechanical than either of the prececdng
ima g e processing levels. They are more closely related to classical Al symbolic
methods. In the high-level processing stage, the intermediate-level region descriptions
are transformed into high-level scene descriptions in one of the knowledge repiesenta-
lion formalisms described earlier in Part II (associative nets, frames. FOPL statements,
and SO on: see Figure 1424).
The end objective of this stage is to create high-level knowledge structures
which can he used by an inference program. Needless to say, the resulting structures
should uniquely and accurately describe the important objects in an image including
their interrelationships. In this regard, the particular vision application will dictate
the appropriate level of detail, and what is considered to he important in a scene
description.
There are various approaches to the scene description problem. At one extreme,
it will be sufficient to simply apply pattern recognition methods to classif y certain
objects within a scene. This approach may require no more than application of the
methods described in the preceding chapter. At the other extreme, it may be desirable
to produce a detailed description of some general scene and provide an interpretation
of the function, purpose, intent, and expectations of the objects in the scene. Although
this recuiremeni is beyond the current stae-of-the-art, we can sa y that it will require
a gre. many prestored pattern descriptions and much general world knowledge. It
will also require improvements on many of the processing techmques described in
this chapter.
lregion6
(mass-center 2348)
(shape-code 24)
(area 245)
(number-boundary-segments 6)
(chain-code 1133300011. . .1
(orientation 85)
(borders )region4 (position left-of) (contrast 5))
(region7 (position above) (contrast 2))
(mean-intensity 0.6)
(texture light regular)
linear I Scene
bodary
trn
building
,/ \\
region 1 mad' brick
divided
Color has-
matches rather than absolute ones. Rule conclusions will be rated by likelihood or
certainty factors instead of complete certainty. Identification of objects can then be
made on the basis of a likelihood score. In Figure 14.26 (a) pairs of numbers are
given in the antecedent to suggest acceptable condition levels comparable to Dempster-
Shafer probabilities (the values in the figure are arbitrarily chosen with a scale of 0
to 1.0)
When rule-based identification is used, the vision system may be given an
initial goal of identifying each region. This can be accomplished with a high-level
goal statement of the followig type.
(label region
(or (rgn building)
(rgn = bushes)
(rgn = car)
(rgn = house)
(rgn road)
(rgn shadow)
(rgn tree)))
Other forms of matching may also be used in the interpretation process. For
example, a decision tree may be used in which region attributes and relation values
determine the branch taken at each node when descending the tree The leaves of
the decision tree are labeled with the object identities as in Figure 14.27.
(R10-sky
(and (location upper rgn)
(intensity rgn bright (0.4 0.8))
(color rgn (or (blue grey)) (0.7 1.0)1
(textiral rgn low (0.8 1.0))
(linear-boundary rgn rgn2 (0.4 0.7())
Sec. 14.6 Vision System Architectures 317
./esire
/I\ /\ /I\ I\ /\
large medium small
lom
/I\
A\
}esture
/\
\/ /\
sky
VYvvvv tree road car building sidewalk lawn
Objects, with their attributes and relations are then used to construct an associa-
tive net scene, a frame network, or other structure.
In this section we present two vision systems which are somewhat representative
of complete system architectures. The firsi system is a model-based system, one of
the earliest successful vision systems. The second is a color region analyzer recently
developed at the Universit y of Kyoto. Japan.
318 Visuai im3ge. Understanding Chap. 14
The user descriptions are parsed al transformed by the system Into geolTietrIc
and algebraic network representations. These repreentations provide volumetric de-
scriptions in local coordinate systems. A graphic pre.entatiofl, the systems interpreta-
tion of the input models created by the user, provides feedback to the user during
the modeling process. The completed representsikm.s are used by the system to
predict what features (e.g. shape, orientation, and position) of the modeled objects
can be observed from the input image components. The predicted models are stored
as prediction graphs.
The visual input consists of gray-level image processing arrays, a Line finder,
and an edge linker. This part of the system provides descriptions of objects as
defined by segmented edge structures. The descriptions created from this unit are
represented as observation graphs. One output front predictor serves as an input
Sec. 14.6 Vision System Architectures 319
algebra
/
user -... parser ---s.. predictor -
._ edge mapping module
graphicsqeometry
iflterp rein
to the edge mapping and linking module. This unit uses the predicted information
(predicted edges, ribbons, or ellipses in the modeled objects) to assist in finding
and identifying image objects appearing in the input image. Outputs from both the
predictor and the edge mapper and linker serve as inputs to the interpreter. The
interpreter is essentially a graph matcher. It tries to find the most matches among
suhgraphs of the image observation graph and the prediction graph.
Each match
becomes an interpretation graph. Partial matching is accommodated in the interpreta-
tion process through consistency checks.
The basic interpretation process is summarized in Figure 14.29 where models
are given for two wide bodied aircraft. (a Boeing 747 and a Lockheed L-101 I).
and the interpretation of an aircraft from gray-level image to ACRONYM's interpreta-
tion is shown.
(a)
(b) (c)
L1i (d)
nm
Preliminary
segmentation
Bottom-up
process
- Structured data network
Plan
Top-down
process
Production system
22-
M Visual Image Understanding Chap. 14
between objects. The rules also have weights which indicate the level of uncertainty
of the knowledge. Each rule in the top-down set is a condition-action pair, where
the condition is a fuzzy predicate which examines the Situation of the data base.
The action pert includes operations to construct the scene descntion. An agenda
manages the activation of production rules and schedules the executable actions.
Examples of a typical property rule and a relation rule are as follows:
The first rule is a property rule about the color of the sky (blue or gray). The
second rule is a relation rule about the boundary between a building and the sky.
The boundary between the two has a lot of linear parts, and the building is not on
the upper side of that boundary.
The final product of the analyzer is, of course, a description of the scene.
This is constructed as a hierarchical • network as illustrated in Figure 14.31.
Ohta's system has demonstrated that it can deal with fairly complex scenes,
including objects with substruitures. To validate this claim, a number of outdoot
scenes from the Kyoto University campus were analyzed correctly by the system.
scene
Obiect
region
Subregion
Pitch
Pi,.I
14.7 SUMMARY
EXERCISES
14.2. Describe the types of world knowledge a vision system must have to"comprehend"
the scene portrayed in Figure 14.3.
14.3. Suppose the CPU in a vision system takes 200 nanoseconds to perform memory/
register transfers and 500 nanoseconds to perform basic arithmetic operations. Esti-
mate the time required to produce a binary image for a system with a resolution of
256 x 256 pixels.
14.4. How much memory is required to produce and compare five different binary images,
each with a different threshold level? Assume a system resolution of 512 x 512.
Can the binary images be compressed in some way to reduce memory requirements?
14.5. Find the binary image for the array given below when the threshold is set at 35.
23 132 35
36 30 42 38
2 9 34 36
37 36 35 33
14.6. Given the following histogram, what are the most likely threshold points? Explain
why you chose the given points and rejected others.
H ito9ran
14.7. What is the value of the smoothed pixel for the associated mask'?
MASK PIXELS
3/16 [78 9
3/16 1/4 3/16 5 4 6
3/16 4 6 2
14.8. Compare the effects of the eight- and four-neighbor filters described in Section 14.2
when applied to the following array of pixel gray-level values.
58 8 10 12 29 32 30
47 8 9 10 9 30 29
58 7 S II 33 31 34
69 8 10 34 3' 29 33
68 9 32 30 29 5 6
87 31 32 32 28 6 7
7 8 33 33 29 7 8 7
9 30 32 31 28 8 8 9-
Chap. 14 Exercises 325
14.9. Low noise systems should use little or no filtering to avoid unnecessary blurring.
This means that more weight should be given to the pixel being smoothed. Define
two low-noise filters, one a four-neighbor and one an eight-neighbor filter, and compare
their effects on the array of Problem 14.5.
14.10. Using a value of n I, apply D and D (horizontally) to the array of Problem
14.5 and comment on the trace of any apparent edges.
14.11. Apply the vector gradient to the array of Problem 14.5 and compare the results to
those of Problem 14.7.
14.12. This problem relates to the application of template matching using correlation tech-
niques. The objective is to try to match an unknown two-dimensional curve or wave-
form with a known waveform. Assume that both waveforms are discrete and are repre-
sented as arrays of unsigned numbers. Write a program in any suitable language to
match the unknown waveform to the known waveform using the correlation function
given as
c,<x,zi>
lix ii iiZil
where X is the unknown pattern vector, Zi is the known pattern vector at position i,
< X,Z> denotes the inner product of X and Z, and liXil is the norm of X.
lxii =
14.13 Write a program to apply the Sobel edge detection mask to an array consisting of
256 x 256 pixel gray level values.
14.14 Color and texture are both potentially useful in defining regions. Describe an algorithm
that could be used to determine regions that are homogenious in color.
14.15 Referring to Problem 14.14, develop an algorithm that can be used to define regions
that are homogeneous in texture.
14.16 Referring to the two previous problems, develop an algorithm that determines regions
on the basis of homogeniety in both color and texture.
#1I
I-J
Expert Systems
Architectures
This chapter describes the basic architectures of knowledge-based systems with empha-
sis placed on expert systems. Expert systems are a recent product of artificial intelli-
gence. They began to emerge as university research systems during the early 1970s.
They have now become one of the more important innovations of Al since they
have been shown to be successful commercial products as well as interesting research
tools-
Expert systems have proven to be effective in a number of problem domains
which normally require the kind of intelligence possessed by a human expert. The
areas of application are almost endless. Wherever human expertise is needed to
solve problems. expert systems are likely candidates for application. Application
domains include law, chemistry, biology, engineering, manufacturing, aerospace,
military operations, finance, banking, meteorology, geology. geophysics. and more.
The list goes on and on.
In this chapter we explore expert system architectures and related building
tools. We also look at a few of the more important application areas as well. The
material is intended to acquaint the reader with the basic concepts underlying expert
systems and to provide enough of the fundamentals needed to build basic systems
or pursue further studies and conduct research in the area.
326
Sec. 15.1 Introduction 37
15.1 INTRODUCTION
Expert s'ystems differ from conventional computer systems in several important ways.
I. Expert systems use knowledge rather than data to control the solution process.
"In the knowledge lies the power" is a theme repeatedly followed and supported
throughout this book. Much of the knowledge used is heuristic in nature rather
than algorithmic.
2. The knowledge is encoded and maintained as an entity separate from the
control program. As such, it is not compiled together with the control program
itself. This permits the incremental addition and modification (refinement) of the
knowledge base without recompilation of the control programs. Furthermore, it is
possible in some cases to use different knowledge bases with the same control
programs to produce different types of expert systems. Such system .s are known as
expert system shells since they may be loaded with different knowledge bases.
3. Expert systems are capable of explaining how a particular conclusion was
reached, and why requested information is needed during a consultation. This is
important as it gives the user a chance to assess and understand the system's reasoning
ability, thereby improving the user's confidence in the system.
Background History
Expert systems first emerged from the research laboratories of a few leading U.S.
universities during the 1960 and 1970s. They were developed as specialized problem
328 Expert Systems Architectures Chap. 15
solvers which emphasized the use of knowledge rather than algorithms and general
search methods. This approach marked a significant departure from conventional
Al systems architectures at the time. The accepted direction of researchers then
was to use Al systems that employed general problem solving techniques such as
hill-climbing or means-end analysis (Chapter 9) rather than specialized domain knowl-
edge and heuristics. This departure from the norm proved to be a wise choice. It
led to the development of a new class of successful systems and special system
designs.
The first expert system to be completed was DENDRAL, developed at Stanford
University in the late 1960s. This system was capable of determining the structure
of chemical compounds given a specification of the compound's Constituent elements
and mass spectrometry data obtained from samples of the compound. DENDRAL
used heuristic knowledge obtained from experienced chemists to help constrain the
problem and thereby reduce the search space. During tests, DENDRAL discovered
a number of structures previously unknown to expert chemists.
As researchers gained more experience with DENDRAL, they found how
difficult it was to elicit expert knowledge from experts. This led to the development
of Meta-DENDRAL, a learning component for DENDRAL which was able to learn
rules from positive examples, a form of inductive learning described later in detail
(Chapters 18 and 19).
Shortly after DENDRA1. was completed, the development of MYCIN began
at Stanford University. MYCIN is an expert system which diagnoses infectious
blood diseases and determines a recommended list of therapies for the patient. As
part of the Heuristic Programming Project at Stanford, several projects directly
related to MYCIN were also completed including a knowledge acquisition component
called THEIRESIUS, a tutorial component called GUIDON, and a shell component
called EMYCIN (for Essential MYC[N) EMYCIN was used to build other diagnostic
systems including PUFF, a diagnostic expert for pulmonary diseases. EMYCIN
also became the design model for several commercial expert system building tools.
MYCIN's performance improved significantly over a several year period as
additional knowledge was added. Tests indicate that MYCIN's performance now
equals or exceeds that of experienced physicians. The initial MYCIN knowledge
base contained about only 200 rules. This number was gradually increased to more
than 600 rules by the early 1980s. The added rules significantly improved MYCIN's
performance leading to a 65% success record which compared favorably with experi-
,
enced physicians who demonstrated only an average 60% success rate (Lenat, 1984).
(An example of MYCIN's rules is given in Section 4.9, and the treatment of uncertain
knowledge by MYCIN is described in Section 6.5.)
Other early expert system projects included PROSPECTOR, a system that
assists geologists in the discovery of mineral deposits, and RI (aka XCON), a
system used by the Digital Equipment Corporation to select and configure components
of complex computer systems. Since the introduction of these early expert systems.
numerous commercial and military versions have been completed with a high degree
of success. Some of these application areas are itemized below.
Sec. 15.1 Introduction 329
Applications
Since the introduction of these early expert systems, the range and depth of applications
has broadened dramatically. Applications can now be found in almost all areas of
business and government. They include such areas as
The value of expert systems was well established by the early 1980s. A number of
successful applications had been completed by then and they proved to he cost
effective. An example which illustrates this point well is the diagnostic s stem
developed by the Campbell Soup Company.
Campbell Soup uses large sterilizers or cookers to cook soups and other canned
330 Expert Systems Architectures Chap. 15•
products at eight plants located throughout the country. Some of the larger cookers
hold up to 68,000 cans of food for short periods of cooking time. When difficult
maintenance problems occur with the cookers, the fault must be found and corrected
quickly or the batch of foods being prepared will spoil. Until recently, the company
had been depending on a single expert to diagnose and cure the more difficult prob-
lems, flying him tothe site when necessary. Since this individual will retire in a few
years taking his expertise with him, the company decided to develop an expert
system to diagnose these difficult problems.
After some months of development with assistance from Texas Instruments,
the company developed an expert system which ran on a PC. The system has about
150 rules in its knowledge base with which to diagnose the more complex cooker
problems. The system has also been used to provide training to new maintenance
personnel. Cloning multiple copies for each of the eight locations cost the company
only a few pennies per copy. Furthermore, the system cannot retire, and its perfor-
mance can continue to be improved with the addition of more rules. It has already
proven to be a real asset to the company. Similar cases now abound in many diverse
organizations.
The most common form of architecture used in expert and other types of knowledge-
based systems is the production system, also called the rule-based system. This
type of system uses knowledge encoded in the form of production rules, that is, if
then rules. We may remember from Chapter 4 that rules have an antecedent
or condition part, the left-hand side, and a conclusion or action part, the right-
hand side.
A&B&C&D—E&F
Each rule represents a small chunk of knowledge relating to the given domain
of expertise. A number of related rules collectively may correspond to a chain of
inferences which lead from some initially known facts to some useful conclusions.
When the known facts support the conditions in the rule's left side, the conclusion
or action part of the rule is then accepted as known (or at least known with some
degree of certainty). Examples of some typical expert system rules were described
in earlier sections (for example see Sections 4.9. 6.5, and 10.6).
Sec. 15.2 Rule-Based System Architectures 331
rence Case
history
Input file
wledge Working
ase memory
rning
dtile
i^v
The knowledge base contains facts and rules about some specialized knowledge
domain. An example of a simple knowledge base giving family relationships is
illustrated in Figure 15.2. The rules in this figure are given in the same LISP
format as those of Section 10.6 which is similar to the format given in the OPSS
language as presented by Bronston. Farrell. Kant. and Martin (1985). Each fact
and rule is identified with a name (al, a2.... . rt. r2, . . .). For ease in
reading. the left side is separated from the right by the implication symbol -.
Conjuncts on the left are given within single parentheses (sub!ists). and one or
more conclusions may follow the implication symbol. Variables are identified as a
symbol preceded by a question mark. It . should be noted that rules found in real
working systems may have many conjuncts in the LHS. For example, as many as
eight or more are not uncommon.
Expert Systems Architectures Chap. 15
(male ?x()
02 ((wife ?x ?y()
(female ?x))
03 ((wife x ?y))
(husband ?y ?x))
(r4 ((mother ?x ?y)
(husband ?z ?x((
(father ?z ?y)(
(rS ((father ?x ?y))
(wife ?z ?x))
(mother ?z ?y))
(r6 ((husband ?x ?y)(
(wife ?y ?x))
fri ((father ?x ?z(
(mother ?y 1z))
(husband ?x ?y))
(r8 ((father ?x ?z)
(mother ?y ?zfl
(wife ?y ?z((
(r9 ((father ?x ?y)
(father ?y 7z))
Figure 1.2 Facts and rules in a simple
(grandfather 'x ?z((( knowledge base.
In PROLOG, rules are written naturally as clauses with both a head and body.
For example, a rule about a patient's symptoms and the corresponding diagnosis
of hepatitis might read in English as the rule
The inference engine accepts user input queries and responses to questions through
the I/O interface and uses this dynamic information together with the static knowledge
(the rules and facts) stored in the knowledge base. The kno ledge in the knowledge
base is used to derive conclusions about the current case or situation as presented
by the user's input.
The inferring process is carried Out recursively in three stages: ( I ) match. (2)
select, and (3) execute. During the match stage, the contents of orking memory
are compared to facts and rules contained in the knowledge base. When consistent
matches are found, the corresponding rules are placed in a conflict set. To find an
appropriate and consistent match, substitutions (instantiations) may be required.
Once all the matched rules have been added to the conflict set during a given
cycle, one of the rules is selected for execution. The criteria for selection may be
most recent use, rule condition specificity. (the number of conjuncts on the left).
or simply the smallest rule number. The selected rule is then executed and the
right-hand side or action part of the rule is then carried out. Figure 15.3 illustrates
this match-select-execute cycle.
As an example, suppose the working memory contains the two clauses
When the match part of the cycle is attempted, a consistent match will be made
between these two clauses and rules r7 and r8 in the knowledge base. The match
is made by substituting Bob for ?x, Sam for ?z. and Sue for ?. Consequently.
since all the conditions on the left of both r7 and r8 are satisfied, these two rules
will be placed in the conflict set. If there are no other working memor y clauses to
match, the selection step is executed next. Suppose, for one or more ofthe selection
criteria stated above, r7 is the rule chosen to execute. The clause on the right side
of r7 is instantiated and the execution step is initiated. The execution step may
result in the right-hand clause (husband bob sue) being placed in working memory
or it may be used to trigger a message to the user. Following the execution step.
the match-select-execute cycle is repeated. -
Chap. 15
334 Expert Systems Architectures
e Working
It
1flImOrY1
1
conflict
I
2
MIW
I •X11t*
Figure 15.3 The production system
inference cycle.
As another example of matching, mppose the two facts (a6 (father sam bill))
and (a7 (father bill pam)) have been added to the knowledge base and the immediate
goal is a query about Pam's grandfather. When made, assume this query has resulted
in placement of the clause (grandfather ?x pam) into working memory. For this
goal to succeed, consistent substitutions must be made for the variables ?x and ?v
in rule r9 with a6 and a7. This will be the case if Sam and Bill are substituted for
?x and ?v in the subgoal left-hand conditions of r9. The right hand side will then
correctly state that Pam's grandfather is Sam. -
When the left side of a sequence of rules is instantiated first and the rules are
executed from left to right, the process is called forward chaining. This is also
known as data-driven inference since input data are used to guide the direction of
the inference process. For example, we can chain forward to show that when a
student is encouraged, is healthy, ard has goals, the student will succeed.
On the other hand, when the right side of the rules is instantiated first, the
left-hand conditions become subgoals. These subgoals may in turn cause sub-subgpals
to be established, and so on until facts are found to match the lowest subgoal
conditions. When this form of inference takes place, we say that backward chaining
is performed. This form of inference is also known as goal-driven inference since
an initial goal establishes the backward direction of the inferring.
Sec. 15.2 Rule-Based System Architectures
For example, in MYCIN the initial goal in a consultation is "Does the patient
have a certain disease?" This causes subgoals to be established such as "are certain
bacteria present in the patient?" Determining if certain bacteria are present may
require such things as tests on cultures taken from the patient. This process of
setting up subgoals to confirm a goal continues until all the subgoals are eventually
satisfied or fail. If satisfied, the backward chain is established thereby confirming
the main goal.
When rules are executed, the resulting action may be the placement of some
new facts in working memory, a request for additional information from the user.
or simply the stopping of the search process. If the appropriate knowledge has
been stored in the knowledge base and all required parameter values have been
provided by the user, conclusions will be found and will be reported to the user.
The chaining continues as long as new matches can be found between clauses in
the working memory and rules in the knowledge base. The process stops when no
new rules can be placed in the conflict set.
Some systems use both forward and backward chaining, depending on the
type of problem and the information available. Likewise, rules may be tested exhaus-
lively or selectively, depending on the control structure. In MYCIN. rules in the
KB are tested exhaustively. However, when the number of rules exceeds a few
hundred, this can result in an intolerable amount of searching and matching. In
such cases, techniques such as those found in the RETE algorithm (Chapter 10)
may be used to limit the search.
Many expert systems must deal with uncertain information. This will be the
case when the evidenoe supporting a conclusion is vague, incomplete, or otherwise
uncertain. To accommodate Uncertainties, some form of probabilities, certainty fac-
tors, fuzzy logic, heuristics, or other methods must bp introduced into the inference
process. These methods were introduced in Chapters 5 and 6. The reader is urged
at this time to review those methods to see how they may be applied to expñ
systems.
['he explanation module provides the user with an explanation of the reasoning
process'when requested. This is done in response to a how que' or a why query.
To respond to a how query, the explanation module traces The chain of rules
fired during a consultation with the user. The sequence of rules that led tQ the conclusion
is then printed for the user in an easy to understand human-language style. This
permits the user to actually see the reasoning process followed by the system in
arriving at the conclusion. If the user does not agree with the reasoning steps presented.
they may be changed using the editor.
To respond to a why query, the explanation module must be able to explain
why certain information is needed by the inference engine to complete a step in
the reasoning process before it can proceed. For example, in diagnosina car that
will not start, a system might be asked why it needs to know the status of the
336 Expert Systems Architectures Chap. 15
distributor spark. In response, the system would reply ihat it needs this information
to determine if the problem can be isolated to the ignition system. Again, this
information allows the user to determine if the system's reasoning steps appear to
be sound. The explanation module programs give the user the important ability to
follow the inferencing steps at any time during the consultation.
The editor is used by developers to create new rules for addition to the knowledge
base, to delete outmoded rules, or to modify existing rules in some way. Some of
the more sophisticated expert system editors provide the user with features not
found in typical text editors, such as the ability to perform some types of consistency
tests for newly created rules, to add missing conditions to a rule, or to reformat a
newly created rule. Such systems also prompt the user for missing information,
and provide other general guidance in the KB creation process.
One of the most difficult tasks in creating and maintaining production systems
is the building and maintaining of a consistent but complete set of rules. This
should be done without adding redundant or unnecessary rules. Building a knowledge
base requires careful planning, accounting, and organization of the knowledge struc-
tures. It also requires thorough validation and verification of the completed knowledge
base, operations which have yet to be perfected. An intelligent" editor can greatly
simplify We process of-building a knowledge base.
TEIRESIAS (Davis. 1982) is an example of an intelligent editor developed
to assist users in building a knowledge base directly without the need for an intermedi-
ary knowledge engineer. TEIRESIUS was developed to work with systems like
MYCIN in providing a direct user-to-system dialog. TEIRESIUS assists the user
in formulating, checking, and modifying rules for inclusion in the performance
program's knowledge base. For this. TEIRESIUS uses some metaknowledge, that
is, knowledge about MYCIN's knowledge. The dialog is carried out in a near English
form so that the user needs to know little about the internal form of the rules.
The input-output interface permits the user to communicate with the system in a
more natural way by permitting the use of simple selection menus or the use of a
restricted language which is close to a natural language. This means that the system
must have special prompts or a specialized vocabulary which encompasses the termi-
nology of the given domain of expertise. For example, MYCIN can recognize many
medical terms in addition to various common words needed to communicate. For
this, MYCIN has a vocabulary of some 2000 words.
Personal Consultant Plus, a commercial PC version of the MYCIN architecture,
uses menus and English prompts to communicate with the user. The prompts, written
in standard English, are provided by the developer during the system building stage
How andwhy explanations are also given in natural language form.
Sec. 15.3 NonproductiOfl System Architectures 337
The learning module and history file are not common components of expert
systems. When they are provided, they are used to assist in building and refining
the knowledge base. Since learning is treated in great detail in later chapters. no
description is given here.
Other, less common expert system architectures (although no less important) are
those based on nonproduction rule-representation schemes. Instead of rules, these
systems employ more structured representation schemes like associative or semantic
networks, frame and rule structures, decision trees, or 'even specialized networks
like neural networks. In this section we examine some typical s y stem architectures
based on these methods.
23-
338 Expert Systems Architectures Chap. 15
Angk 5IOre
Disease categories
Acute angle Ci,rsc angle
(
I/I \
Ciasstcation links / / F
1 !
\Cuulink
/ (I
(
pathophyttOCleel
/ / I
slates
c.^
Ang"
ii
Asioci.iional iinksti
I
1
I
LL__LI I
Patient observations pg,n
t7
i acuitF lop 45 mn,
TESTS
model as part of the cause and effect relationship relating symptoms and other
signs to diseases. -
Inference is accomplished by traversing the network, following the most plausi-
ble paths of causes and effects. Once a sufficiently strong path has been determined
through the network, diagnostic conclusions are inferred using classification tables
that interpret patterns of the causal network. These tables are similar to rule interpreta-
tions.
The CASNET system was never used much beyond the initial research stage.
At the time, physicians were reluctant to use computer systems in spite of performance
tests in which CASNET scored well.
Frame Architectures
Typical findings
Logical decision criteria
Complimentary relations to other frames
Differential diagnosis
Scoring
The patient findings are matched against frames, and when a close match is
found, a trigger status occurs. A trigger is a finding that is so strongly related to a
disorder that the system regards it as an active hypothesis, one to be pursued further.
A spe.:al is-sufficient slot is used to confirm the presence of a disease when key
findings co-.elate with the slot contents.
340 Expert Systems Architectures Chap. 15
Knoedge for expert systems may be stored in the form of a decision tree when
the knowledge can be structured in a top-to-bottom manner. For example, the identifi-
cation of objects (equipment faults, physical objects, diseases, and the like) an he
made through a decision tree structure. Initial and intermediate nodes in the tree
correspond to object attributes, and terminal nodes correspond to the identities of
objects. Attribute values for an object determine a path to a leaf node in the tree
which contains, the object's identification. Each object attribute corresponds to a
nontemiinal node in the tree and each branch of the decision tree corresponds to
all attribute value or set of vilues
A segment of a decision tree knowledge structure taken from an expert system
used to identify objects such as liquid chemical waste products is illustrated in
Figure 15.5 Patterson, 197). Each node in the tree corresponds to an identif'yin
attribute such as molecular weight, boiling point, burn test color, or solubilit y test
results. Each branch emanating froni a node corresponds to a value or ranee of
values for the attribute such as 20-37 degrees C, yellow, or nonsoluble in sulphuric
acid.
An identifica;ioii is made by traversing a path through the free (or network
until the path k-ads to a un i que leaf node which corresponds to the unknown object,
identity.
The knowledge base, which is the decision tree for an identification system.
can be constructed with a special tree-building ediior or with a learning module. In
either case, a set of the most discriminating attributes for the class of objects being
identified should he selected. Only those attributes that discriminate well among
different objects need be used. PriiiissihIe values for each of the attributes arc
grouped into separable sets, and each such set determines a branch front attribute
node to the next node.
New nodes and branches can be added to the tree when additional attributes
attrtxn me . 1
onie
Ves no Y_ no
oIubiiily tell
are needed to further discriminate among new objects. As the system gains experience,
the values associated with the branches can be modified for more accurate results.
1. There are a number of knowledge sources which are separate and independent
sets of coded knowledge. Each knowledge source may be thought of as a
specialist in some limited area needed to solve a given subset of problems.
The sources may contain knowledge in the form of procedures, rules, or other
schemes.
2. A globally accessible data base structure, called a blackboard, contains the
current problem state and information needed by the knowledge sources (input
data, partial solutions, control data, alternatives, final solutions). The knowledge
sources make changes to the blackboard data that incrementally lead to a
solution. Communication and interaction between the knowledge sources takes
place solely through the blackboard.
3. Control information may be contained within the sources, on the blackboard.
or possibly in a separate module. (There is no actual control unit specified as
Control information
H. Penny Nil (1986a) has aptly described the blackboard problem solving
.trategy through the following analogy.
Imagine a room with a large blackboard on which a group of experts are piecing
together a jigsaw puzzle. Each of the experts has some special knowledge about solvhnv
puzzles (e.g.. a border expert, a shape expert, a color expert. etc.). Each member
examines his or her pieces and decides if they will fit into the partially completed
puzzle. Those members having appropriate pieces go up to the blackboard and update
the evolving solution. The whole puzzle can be solved in complete silence with no
direct communication among members of the group. Each person is self-activating.
knowing when he or she can contribute to the solution. The solution evolves in this
incremental way with each expert contributing dynamicall y on an opportunistic basis,
that is, as the opportunity to contribute to the solution arises.
The objects on the blackboard are hierarchically organized into levels which facilitate
analysis and solution. Information from one level serves as input to a set ot knnns ledge
sources. The sources modify the knowledge and place it on the same or dnticrcnt
levels. -
The control information is used by the control module to determine the focus
of attention. This determines the next item to be processed. The focus of attention
can be the choice of knowledge sources or the blackboard objects or both. If both,
the control determines which sources to apply to which objects.
Problem solving proceeds with a knowledge source making changes to the
blackboard objects. Each source indicates the contribution it can make to the nc
solution state. Using this information, the control module chooses a focus of attention.
It the focus of attention is a knowledge source, a blackboard object is chosen as
the context of its invocation. If the fOCUS of attention is a blackboard object, a
knowledge source which can process that object is chosen. If the focus ol attention
is both a source and an object, that source is executed within that context.
Blackboard systems have been gaining some popularity recently. they hae
been applied to a number of different application areas. One of the first applications
was in the HEARSAY family of projects, which are speech understanding systems
(Reddy et al.. 1976). More recently, systems have been developed to analyze complex
scenes, and to model the human cognitive processes (Nh. 1986b).
Little work has been done in the area of analogical reasoning systems. Yet this is
one of the most promising areas for general problem solving. We humans make
extensive use of our previous experience in solving everyday problems. This is
because new problems are frequently similar to previously encountered problems.
Sec. 15.3 NonprOdUCtiOfl System Architectures
Neural networks are large networks of simple processing elements or nodes which
process information dynamically in response to external inputs. The nodes are simpli-
fied models of neurons. The knowledge in a neural network is distributed throughout
the network in the form of internode connections and weighted links which form
the inputs to the nodes. T he link weights serve to enhance or inhibit the input
stimuli values which are then added together at the nodes. If the sum of all the
inputs to a node exceeds some threshold value T, the node executes and produces
an output which is passed on to other nodes or is used to produce some output
response. In the simplest case, no output is produced if the total input is less than
T. In more complex models, the output will depend on a nonlinear activation function.
- Neural networks were originally inspired as being models of the human nervous
system. They are greatly simplified models to be sure (neurons are known to be
fairly complex processors). Even so, they have been shown to exhibit many "intelli-
gent" abilities, such as learning, generalization, and abstraction.
A single node is illustrated in Figure 15.7. The inputs to the node are the
values t, x. ..,X, which typically take on values of - I. 0. I. or real values
within the range (-1.1). The weights w 1 , w, .......,,. correspond to the synaptic
strengths of a neuron. They serve to increase or decrease the effects of the correspond-
n, serve as
ing x, input values. The sum of the prodiicts x w,, I = I. 2, . . . .
the total combined input to the node. If this sum is large enough to exceed the
threshold amount T, the node fires, and produces an output y, an activation function
value placed on the node's output links. This output may then be the input to
other nodes or the final output response from the network.
Figure 15.8 illustrates three layers of a number of interconnected nodes. The
first layer serves as the input layer, receiving inputs from some set of stimuli. The
second layer (called the hidden layer) receives inputs from the first layer and produces
a pattern of Inputs to the third layer, the output layer. The pattern of outputs from
the final layer are the network's responses to the input stimuli patterns. Input links
to layer = 1.2.3) have weights w for = 1,2..... . n.
General multilayer networks having n nodes (number of rows) in each of m
layers (number of columns of nodes) will have weights represented as an n x in
matrix W. Using this representation, nodes having no interconnecting links will
have a weight value of zero. Networks consisting of more than three layers would,
of course, be correspondingly more complex than the network depicted in Figure
l5..
A neural network can be thought of as a black box that transforms the input
vector x to the output vector y where the transformation performed is the result of
the pattern of connections and weights, that is, according to the values of the weight
matrix W.
Consider the vector product
X * W=
whee JxJ denotes the norm or length of the vector x. Note that this product is
maximum when both vectors point in the same.directjon, that is, when 0 0. The
I,,
V.,
Figure 15.8 A mulillayer neuraf
layer 1 layer 2 layer 3 network
Sec. 15.3 NonproductiOn System Architectures 345
W flW = Wok, + a * D X2
Ix'
where 0 < a < I is a learning constant that determines the rate of learning. When
the difference D is large, the adjustment to the weights W is large, but when the
output response y is close to the target response y' the adjustment will be small.
When the difference D is near zero, the training process terminates at which point
the network will produce the correct response for the given input patterns x.
In unsupervised learning, the training examples Consist of the input vectors x
only. No desired response y' is available to guide the system. Instead, the learning
process must find the weights w,, with no knowledge of the desired output response.
We leave the unsupervised learning description until Chapter IS where learning is,
covered in somewhat more detail.
gr
Figure 15.14) A irnsp!e neural network eper1 systern Frrsru S L (iiILsnl ACM
Communications. Viii. 31, No. 2, p. 152. I95. By permission.
of + I (true). Negative symptoms are given an input value of - I (false), and unknown
symptoms are given the value 0. Input symptom values are multiplied h their
corresponding weights R',. Numbers within the nodes are initial bias weights o•,_
and numbers on the links are the other node input weights. When the sum of the
weighted products of the inputs exceeds 0. an output will be present on the correspond-
ing node output and serve as an input to the next layer of nodes.
As an example. suppose the patient has swollen feet (u + I) but not red
ears (u, = - I) nor hair loss (u 3 = - l. This gives a value of u 7 = + I (since
O+2( l)+(-2)(-- l)+(3)(— I) = I). suggesting the patient has superciliosis.
When it is also known that the other symptoms of the patient are false (o =
a6 = - I), it may be concluded that namatosis is absent (a 5 = - I), and
therefore that birambio (u 10 = +1) should be prescribed while placibin should not
be prescribed (u9 = - I). In addition, it will be found that posiboost should also
be prescribed (U 11 = +1).
The intermediate triangular shaped nodes were added by the training algorithm.
These additional nodes are needed so that weight assignments can be made which
permit the computations to work correctly for all training instances.
Knowledge Acquisition and Validation 347
Sec. 15.5
Deductions can be made just as well when only partial information is available.
For example, when a patient has swollen feet and suffers from hair loss, it may be
concluded the patient has superciliosis, regardless of whether or not the patient has
red ears. This is so because the unknown variable cannot force the sum to change
to negative.
A system such as this can also explain how or why a conclusion was reached.
For example, when inputs ,pd outputs are regarded as rules, an output can be
explained as the conclusion to a rule. If placibin is true, the system might explain
why with a statement such as
One of the most difficult tasks in building knowledge-based systems is in the acquisi-
tion and encoding of the requisite domain knowledge. Knowledge for expert systems
must be derived from expert sources like experts in the given field, journal articles,
texts, reports, data bases, and so on. Elicitation of the right knowledge can take
several man years and cost hundreds of thousands of dollars. This process is now
recognized as one of the main bottlenecks in building expert and other knowledge-
based systems. Consequently, much effort has been devoted to more effective methods
of acquisition and coding.
Pulling together and correctly interpreting the right knowledge to solve a set
348 Expert Systems Architectures Chap. 15
of complex tasks is an onerous job. Typically, experts do not know what specific
knowledge is being applied nor just how it is applied in the solution of a given
problem. Even if they do know, it Is likely they are unable to articulate the problem
solving process well enough to capture the low-level knowledge used and the inferring
processes applied. This difficulty has led to the use of Al experts (called knowledge
engineers) who serve as intermediaries between the domain expert and the system.
The knowledge engineer elicits information from the experts and codes this knowledge
into a form suitable for use in the expert system.
The knowledge elicitation process is depicted in Figure 15.11. To elicit the
requisite knowledge. a knowlege engineer conducts extensive interviews with domain
experts. During the interviews., the expert is asked to solve typical problems in the
domain of interest and to explain his or her solutions.
Using the knowledge gained front experts and other sources, the knowledge
engineer codes the knowledge in the form of rules or some other representation
scheme. This knowledge is then used to solve sample problems for review and
validation by the experts. Errors and omissions are uncovered and corrected, and
additional knowledge is added as needed. The process is repeated until a sufficient
body of knowledge has been collected to solve a large class of problems in the
chosen domain. The whole process may take as many as tens of person years.
Penny Nit, an experienced knowledge engineer at Stanford University, has
described some useful practices to follow in solving acquisition problems through
a sequence of heuristics she uses. They have been summarized in the book The
Fifth Generation by Feigenbaum and McCorduck (1983) as follows.
You can't be your on expert. By examining the process of your own expertise ou
risk becoming like the Centipede who got tangled up in her own legs and siopped
dead when she tried to figure out how she moved a hundred legs in harmony.
From the beginning, the knowledge engineer must count on throwing efforts away.
Writers make drafts, painters make preliminary sketches; knowledge engineers are no
different.
The problem must be well chosen. Al is a young field and isn't rcad to take on
evers problem the world has to offer. Expert systems work best when the problem is
well bounded, which is computer talk to describe a problem for which large amounts
of specialized knowledge may be needed, but not a general knowledge of the world.
If you want to do any serious application you need to meet the expert more than half
way; if he's had no exposure to computing, your job will be that much harder.
If none of the tools you normally use works, build a new one.
Dealing with anything but facts implies uncertainty. Heuristic knowledge is not hard
and fast and cannot be treated as factual. A weighting procedure has to be built into
the expert system . to allow for expressions such as "1 strongly believe that ..." or
"The evidence suggests that......
A high-performance program, or a program that will eventually be taken over by the
expert for his own use, must have very easy ways of allowing the knowledge to be
modified so that new information can be added and out-of-date information deleted.
The problem needs to be a useful, interesting one. There are knowledge-based programs
to solve arcane puzzles, but who cares? More important, the user has to understand
the system's real value to his work.
When Nii begins a project, she first persuades a human expert to commit the
considerable time that is required to have the expert's mind mined. Once this is
done, she immerses herself in the given field, reading texts, articles, and other
material to better understand the field and to learn the basic jargon used. She then
begins the interviewing process. She asks the expert to describe his or her tasks
and problem solving techniques. She asks the expert to choose a moderately difficult
problem to solve as an example of the basic approach. This information is then
collected, studied, and presented to other members of the development team so
that a quick prototype can be constructed for the expert to review. This serves
several purposes. First, it helps to keep the expert in the development loop and
interested. Secondly, it serves as a rudimentary model with which to uncover flaws
and other problems. It also helps both expert and developer in discovering the real
way the expert solves problems. This usuall y leads to a repeat of the problem
solving exercise, but this time in a step-by-step walk through of the sample problem.
Nil tests the accuracy of the expert's explanations by observing his or her behavior
and reliance on data and other sources of information. She is concerned more with
the manipulation of the knowledge than with the actual facts. Keeping the expeit
focused on the immediate problem requires continual prompting and encouragement.
During the whole interview process Nii is mentally examining alternative ap-
proaches for the best knowledge representation and inferencing methods to see how
well each would best match the expert's behavior. The whole process of elicitation,
coding, and verification may take several iterations over a period of several months.
Recognizing the acquisition bottleneck in building expert systems, researchers
and vendors alike have sought new and better ways to reduce the burden and reliance
placed on knowledge engineers, and in general, ways to improve and speed up the
development process. This has led to a niunber of sophisticated building tools which
we consider next.
Since the introduction of the first successful expert systems in the late 1970s. a
large number of building tools have been introduced, both b y the academic commuttty
and industry. These tools range from hi g h level programming languages to intelligent
editors to complete shell environment systems. A number of commercial products
350 Expert Systems Architectures Chap. 15
are now available ranging in pi ice from a few hundred dollars to tens of thousands
of dollars. Some are capable of running on medium size PCs while others require
larger systems such as LISP machines. minis, or even main frames.
When evaluating building tools for expert system development The developer
should consider the following features and capabilities that may be offered in systems.
3. User interface characteristics (editor flexibility and ease of USC, use of menus,
use of pop-up windows, developer provided text capabilities for prompts and
help messages, graphics capabilities, consistency checking for newly entered
knowledge. explanation of how and why capabilities, system help facilities,
screen formatting and color selection capabilities, network representation of
knowledge base, and forms of compilation available, batch or interactive).
4. General system characteristics and supp: ri available (types of applications
with which the system has been success tflly used, the base programming
language in which the system was written, the types of hardware the systems
are supported on, general utilities available, debugging facilities, interfacing
flexibility to other languages and databases, vendor training availability and
cost, strength of software suppdrt, and company reputation).
A family of Personal Consultant expert system shells was developed by Texas Instru-
ments. Inc. (TI) in the early 1980s. These shells are rule-based building tools patterned
after the MYCIN system architecture and developed to run on a PC as well as on
larger systems such as the TI Explorer. The largest and most versatile of the Personal
Consultant family is Personal Consultant Plus.
Personal Consultant Plus permits the use of structures called frames (different
Sec. 15.6 Knowledge System Building Tools 351
Electrical
appliance
microwave I II
Iron I Food Toasterr
cooker I blender I
Mechanical Electrical
I I Figure 15.12 Hierarchical frame
Isubsystem
structure in PC Plus.
'fr
352 Expert Systems Architectures Chap. 15
Radian Rulemaster
The Rulemaster system developed in the early 1980s by the Radian Corporation
was written in C language to run on a variety of mini- and microcomputer systems.
Rulemaster is a rule-based building tool which consists of two main components:
Radial, a procedural, block structured language for expressing decision rules related
to a finite state machine, and Rulemaker, a knowledge acquisition system which
induces decision trees from examples supplied by an expert. A program in Rulemaster
consists of a collection of related modules which interact to affect changes of state.
The modules may contain executable procedures, advice, or data. The building
system is illustrated in Figure 15.13.
Rulemaster's knowledge can be based on partial certainty using fuzzy logic
or heuristic methods defined by the developer. Users can define their own data
types or abstract types much the same as in Pascal. An explanation facility is provided
to explain its chain of reasoning. Programs in other languages can also be called
from Rulemaster.
One of the unique features of Rutemaster is the Rulemaker component which
has the ability to induce rules from examples. Experts are known to have difficulty
in directly expressing rules related to their decision processes. On the other hand,
they can usuall y come up with a wealth of examples in which they describe typical
solution steps. The examples provided by the expert offer a more accurate wa's in
,cti,.ri tiIe
Ruleujake,
Assebje,
Expert system of
hierarchical
radial dulea -
'-S.-
Comptetiori
---p-
txcer,;A n,c-rams -
which the problem solving process is carried out. These examples are transformed
into rules by Rulemaker through an induction process.
KEE is one of the more popular building tools for the development of larger-scale
systems. Developed by lntellicorp, this system employs sophisticated representation
schemes structured around frames called units. The frames are made up of slots
and facets which contain object, attribute values, rules, methods, logical assertions,
text, or even other frames. The frames are organized into one or more knowledge
bases in the form of hierarchical structures which permit multiple inheritance down
hierarchical paths. Rules, procedures, and object oriented representation methods
are also supported.
Inference is carried out through inheritance, forward-chaining, backward-chain-
ing, or a mixture of these methods. A form of hypothetical reasoning is also provided
through different viewpoints which may be explored concurrently. The viewpoints
represent different aspects of a situation, views of the same situation taken at different
times, hypothetical Situations, or alternative courses of action. This feature permits
a user to compare competing courses of action or to reason in parallel about par1i*1
solutions based on different approaches.
KEE's support environment includes a graphics-oriented debugging pack:ige
flexible end-user interfaces using 'windows, menus, and an explanation capability
with graphic displays which can show inference chains. A graphics-based simulation
package called SimKit is available at additional cost.
KEE has been used for the development of intelligent user interfaces. genetics.
diagnosis and monitoring of complicated systems.., planning, design, process control,
scheduling, and simulation. The system is LISP based, developed for operation on
systems such as Symbolics machines. Xerox I lOOs. or TI Explorers. Systems can
also be ported to open architecture machines which support Common LISP without
extensive modification.
OPS5 System
The OPS5 and other OPS building tools were developel at Cniegie Mellon University
in conjunction with DEC during the late 1970s. This ssm was developed to
build the RIiXCON expert system which configures Vax arid ;r DEC minicomputer
systems. The system is used to build rule-based production s y stems h'ch use forward
chaining in the inference process (backward and mixed chaining is a'so possible).
The system was written in C language to run on the DEC Vax and other minicomputers.
It uses a sophisticated method of indexing rules (the Rete algorithm) to reduce the
matching times during the match-select-execute cycle. Examples of OPS5 rules
were given above in Section 15.2, and a description of the Pete match al?orithm
was given in Section 10.6.
24
354
Expert Systems Architectures Chap. 15
15.7 SUMMARY
Expert and other knowledge-based Systems are usually composed of 4t least a knowl-
edge base, an inference engine, and some form of user interface. The knowledge
base, which is separate from the inference and control components, contains the
expert knowledge coded in some form such as production rules, networks of frames
or other representation scheme. The inference engine manipulates the knowledge
structures in the knowledge base to perform a type of symbolic reasoning and draw
useful conclusions relating to the current task. The user interface provides the means
for dialog between the user and system. The user inputs commands, queries, and
responses to system messages, and the system, in turn, produces various messages
for the user. In addition to these three components, most systems have an editor
for use in creating and modifying the knowledge base structurea and an explanaton
module which provides the user with explanations of how a conclusion was reached
or why a piece of knowledge s needed. /-. te" s—terns also have some learning
capability and a ca . history file with whie . ,Omr'' " lonsultatio,'s.
A variety of expert system architectureshave been construied including rule-
based systems, frame-based systems, decision tree (discrimination network) systems,
analogical reasoning systems, blackboard architectures, theorem proving systems,
and even neural network architectures. These systems may differ in the direction
of rule chaining, in the handling of uncertainty, and in the search and pattern matching
methods employed. Rule and frame based systems are by far the most popular
architectures used.
Since the introduction of the first expert systems in the late 1970s, a number
of building tools have been developed. Such tools may be as unsophisticated as a
bare high level language or as comprehensive as a complete shell development
environment. A few representative building tools have been described and some
general characteristics of tools for developers were given.
The acquisition of expert knowledge for knowledge-based systems remains
one of the main bottlenecks in building them. This has led to a new discipline
called knowledge engineering. Knowledge engineers build systems by eliciting knowl-
edge from experts. coding that knowledge in an appropriate form, validating the
knowledge, and ultimately constructing a s y stem using a variet y of building tools.
EXERCISES
15.1. What are the main advantages in keeping the knowledge base separate from the
control module in knowledge-based systems?
15.2. Why is it important that an expert system be able to explain the why and how
questions related to a problem solving session?
15.3. Give an example of the use of meiakriowledge in expert systems inference.
Chap. 15 Exercises 355
15.4. Describe and compare the different types of problems solved by four of the earliest
expert systems DENDRAL. MYCIN. PROSPECTOR, and RI.
15.5. Identify and describe two good application areas for expert systems within a university
environment.
15.6. How do rules in PROLOG differ from general production system rules?
15.7. Make up a small knowledge-base of facts and rules using the same syntax as that
used in Figure 15.2 except that they should relate to an office working environment.
15.8. Name four different types of selection criteria that might be used to select the most
relevant rules for firing in a production system.
15.9. Describe a method in which rules could be grouped or organized in a knowledge
base to reduce the amount of search required during the matching pars of the inference
cycle.
15.10. Using the knowledge base of Problem 1.7, simulate three match-select-execute cycles
for a query which uses several rules andior facts.
15.11. Explain the difference between forward and backward chaining and under what condi-
tions each would be best to use for a given set of problems.
15,12. Under what conditions would it make sense to use both for\\
Give an example where both are used.
15.13. Explain why you think associative networks were never very popular forms of knowl-
edge representations in expert systems architectures.
15.14. Suppose you are diagnosing automobile engines using a system having a frame type
of architecture similar to PIP. Show how a trigger condition might be satisfied for
the distributor Ignition system when it is learned that the spark at all spark plugs is
weak.
15.15. Give the advantages of expert system architectures based on decision trees over
those of production rules. What are the main disadvantages?
15.16. Two of the main problems in validating the knowledge contained in the knowledge
bases of expert systems are related to completeness and consistency, that is, whether
or not a system has an adequate breadth of knowledgeto solve the class of problems
it was intended to solve and whether or not the knowledge is consistent. Is it easier
to check decision tree architectures or production rule systems for completeness and
consistency? Give supporting information for your conclusions.
15.17. Give three examples of applications for which blackboard architectures are well suited.
15.18. Give three examples of applications for which the use of analogical architectures
would be suitable in expert systems.
15.19. Consider a simple fully connected neural network containing three input nodes and
a single output node. The inputs to the network are the eight possible binary patterns
000. 001 .....Ill. Find weights u for which the network can differentiate between
the inputs by producing three distinct outputs.
15.20. For the preceding problem, draw projection vectors on the unit circle for the eight
different inputs using the weights determined there.
15.21. Explain how uncertaint y is propagated through a chain of rules during a consultation
with an expertsystem which is based on the MYCIN architecture.
15.22. Select a problem domain that requires some special expertise and consult with an
356 Expert Systems Architectures Chap. 15
expert in the domain to learn how he or she solves typical problems. After collecting
enough knowledge to solve a small subset of problems, create rules which could be
used in a knowledge base to solve the problems. Test the use of the rules on a few
problems which have been suggested by the expert and then get his other confirmation.
15.23. Relate each of the heuristics given by Penny Nii in Section 15.5 to a real expert
system solving problem.
15.24. Discuss how each of the features of expert system building tools given in Section
15.6 can affect the performance of the systems developed.
15.25. Obtain a copy of an expert system building tool such as Personal Consultant Plus
and create an expert system to diagnose automobile engine problems. Consult with
a mechanic to see if your completed system is reasonably good.