Unit 3 Notes
Unit 3 Notes
MGMC/JRP/BCA/5thSem/AI/Unit 3 Page 1 of 37
• A complex sentence consists of an independent clause and one or more
dependent clauses.
Sentences are used to assert, query, and describe. The way a sentence is used
determines its mood, declarative, imperative, interrogative, or exclamatory. A
word functions in a sentence as a part of speech. Parts of speech for the
English language are nouns, pronouns, verbs, adjectives, adverbs,
prepositions. conjunctions, and interjections.
A noun is a name for something (person, place, or thing). Pronouns replace
nouns when the noun is already known. Verbs express action, being, or state of
being. Adjectives are used to modify nouns and pronouns and adverbs modify
verbs, adjectives, or other adverbs. Prepositions establish the relationship
between a noun and some other part of the sentence. Conjunctions join words
or groups of words together, and interjections are used to express strong
feelings apart from the rest of the sentence.
Phrases are made up of words but act as a single unit within a sentence.
These form the building blocks for the syntactic structures we consider later.
Levels of Knowledge Used in Language Understanding
• A language understanding program must have considerable knowledge
about the structure of the language including what the words are and how
they combine into phrases and sentences.
• It must know the meanings of the words and how they contribute to the
meanings of a sentence and to the context in which they are being used.
• A program must have some general world knowledge and knowledge of what
humans know and how they reason.
The component forms of knowledge needed for an understanding of natural
language are sometimes classified according to the following levels:
• Phonological. This is knowledge which relates sounds to the words we
recognize. A phoneme is the smallest unit of sound. Phonems are aggregated
into word sounds.
• Morphological. This is lexical knowledge which relates to word
constructions from basic units called morphemes. A morpheme is the
smallest unit of meaning; for example, the construction of friendly from the
root friend and the suffix ly.
• Syntactic. This knowledge relates to how words are put together or
structured to form grammatically correct sentences in the language.
• Semantic. This knowledge is concerned with the meanings of words and
phrases and how they combine to form sentence meanings.
MGMC/JRP/BCA/5thSem/AI/Unit 3 Page 2 of 37
• Pragmatic. This is high-level knowledge which relates to the use of
sentences in different contexts and how the context affects the meaning of
the sentences.
• World. World knowledge relates to the language a user must have in order
to understand and carryon a conversation. It must include an
understanding of the other person' s beliefs and goals.
The approaches taken in developing language understanding programs
generally follow the above levels or stages :
• When a string of words has been detected, the sentences are parsed or
analyzed to determine their structure (syntax) and grammatical correctness.
• The meanings (semantics) of the sentences are then determined and
appropriate representation structures created for the inferencing programs.
• The whole process is a series of transformations from the basic speech
sounds to a complete set of internal representation structures.
Understanding written language or text is easier than understanding speech.
To understand speech, a program must have all the capabilities of a text
understanding program plus the facilities needed to map spoken sounds into
textual form.
General Approaches to Natural Language Understanding
Essentially, there have been three different approaches taken in the
development of natural language understanding programs, (1) the use of
keyword and pattern matching, (2) combined syntactic (structural) and
semantic directed analysis, and (3) comparing and matching the input to real
world situations (scenario representations).
(1) the use of keyword and pattern matching : the simplest. Was first used in
programs such as ELIZA. It is based on the use of sentence templates which
contain key words or phrases such as "my mother ____ ," "I am ____" and, "I
don't like ____" that are matched against input sentences.
• Each input template has associated with it one or more output templates,
one of which is used to produce a response to the given input. Appropriate
word substitutions are also made from the input to the output to produce
the correct person and tense in the response (I and me into you to give
replies like "Why are you ").
• Advantage : ungrammatical, but meaningful sentences are still accepted.
• Disadvantage : no actual knowledge structures are created; so the program
does not really understand.
MGMC/JRP/BCA/5thSem/AI/Unit 3 Page 3 of 37
(3) comparing and matching the input to real world situations: is based on the
use of structures such as the frames or scripts.
• It relies more on a mapping of the input to prescribed primitives which are
used to build larger knowledge structures.
• It depends on the use of constraints imposed by context and world
knowledge to develop an understanding of the language inputs. Prestored
descriptions and details for commonly occurring situations or events are
recalled for use in understanding a new situation.
• The stored events are then used to fill in missing details about the current
scenario.
• Advantage : much of the computation required for syntactical analysis is
bypassed.
• Disadvantage : a substantial amount of specific, as well as general world
knowledge must be prestored.
(2) combined syntactic (structural) and semantic directed analysis : one of the
most popular approaches currently being used. Here, knowledge structures
are constructed during a syntactical and semantical analysis of the input
sentences. Parsers are used to analyze individual sentences and to build
structures that can be used directly or transformed into the required
knowledge formats.
• Advantage : It provides power and versatility.
• Disadvantage : large amount of computation required and the need for still
further processing to understand the contextual meanings of more than one
sentence.
GRAMMARS AND LANGUAGES
A language L can be considered as a set of strings of finite or infinite length,
where a string is constructed by concatenating basic atomic elements called
symbols. The finite set v of symbols of the language is called the alphabet or
vocabulary. Well-formed sentences are constructed using a set of rules called a
grammar.
A grammar G is a formal specification of the sentence structures that are
allowable in the language, and the language generated by the grammar G is
denoted by L(G).
More formally, we define a grammar G as G = (vn,vt,s,p) where Vn is a set of
nonterminal symbols, vt a set of terminal symbols, s is a starting symbol, and p
is a finite set of productions or rewrite rules.
The alphabet v is the union of the disjoint sets vn and vt which includes the
empty string e.
MGMC/JRP/BCA/5thSem/AI/Unit 3 Page 4 of 37
The terminals Vt are symbols which cannot be decomposed further (such as
adjectives, nouns or verbs in English), whereas the nonterminals can be
decomposed (such as a noun or verb phrase).
A general production rule from P has the form xyz→xwz where x, y, z and w
are strings from v. This rule states that y should be rewritten as w in the
context of x to z where x and z can be any string including the empty string e.
As an example of a simple grammar G, we choose one which has component
parts or constituents from English with vocabulary Q given by
QN = {S, NP, N, VP, V, ART}
QT = {boy, popsicle, frog, ate, kissed, flew, the, a} (popsicle means candy)
and rewrite rules as shown in the figure.
Here the vertical bar indicates alternative
choices.
S is the initial symbol (for sentence here),
NP - noun phrase, VP - verb phrase, N -
noun, V - verb, and ART stands for article.
The grammar G defined above generates only a small fraction of English but it
illustrates the general concepts of generative grammars. With this G, sentences
such as the following can be generated.
The boy ate a popsicle. The frog kissed a boy. A boy ate the frog.
To generate a sentence, the rules from P are applied sequentially starting with
S and proceeding until all nonterminal symbols are eliminated. As an example,
the first sentence given above can be generated using
the following sequence of rewrite rules:
It should be clear that a grammar does not guarantee
the generation of meaningful sentences, only that they
are structurally correct. For example, a grammatically
correct, but meaningless sentence like "The popsicle
flew a frog" can be generated with this grammar.
MGMC/JRP/BCA/5thSem/AI/Unit 3 Page 5 of 37
Type 1 grammars : are also called context-sensitive grammars. They have the
added restriction that the length of the string on the right-hand side of the
rewrite rule must be at least as long as the string on the left-hand side.
Furthermore, in productions of the form xyz→ xwz, y must be a single
nonterminal symbol and w, a nonempty string. Typical rewrite rules for type1
grammars take the forms
S→aS S→aAB AB→BA aA→ ab aA→ aa
where the capitalized letters are nonterminals and the lower case letters
terminals.
Type 2 grammar : also known as a context-free grammar. It is characterized by
rules with the general form <symbol> → <symbol1> . . . <symbolk> where k ≥1
and where the left-hand side is a single nonterminal symbol, A → xyz where A
is a single nonterminal. Productions for this type take forms such as
S→aS S→ aSb S→aB S→ aAB A→a B→b
Type 3 Grammar : The final and most restrictive type is type 3. It is also called
a finite state or regular grammar, whose rules are characterized by the forms
A→aB A→a
The languages generated by these grammars are also termed types 0, 1
(contextsensitive) 2 (context-free), and 4 (regular) corresponding to the
grammars which generate them.
The regular and context-free languages are the most widely studied languages
(formal programming languages).
Structural
Representations
It is convenient to
represent sentences as
a tree or graph to help
expose the structure
and constituent parts.
For example the
sentence “The boy ate a
popsicle” can be
represented as the tree
structure below. Such
structures are called
phrase markers
because they help to mark or identify the phrase structures in a sentence.
MGMC/JRP/BCA/5thSem/AI/Unit 3 Page 6 of 37
The root node of the tree in figure below corresponds to the whole sentence S
and the constituent parts of S are subtrees. For this example, the left subtree
is a noun phrase and the right subtree a verb phase. The leaf or terminal nodes
contain the terminal symbols from vt.
A tree structure such as the above represents a large number of English
sentences. For purposes of computation, a tree must be represented as a
record, a list or a similar data structure. A list can be used to represent a tree
structure. Refer example,
A more extensive English grammar can be obtained with the addition of other
constituents such as prepositional phrases PP, adjectives ADJ, determiners
DET, adverbs ADV, auxiliary verbs AUX, and so on. Additional rewrite rules
permitting the use of these constituents could include some of the following:
PP→ PREP NP VP→V ADV VP→ V PP VP→ V NP PP
VP→ AUX V NP DET→ ART ADJ DET→ ART
These extensions broaden the types of sentences that can be generated by
permitting the added constituents in sentence forms such as
The cute girl worked to make some extra money.
These sentences have the form S→ NP VP PP.
Transformational Grammars
The generative grammars described above produce different structures for
sentences having different syntactical forms even though they may have the
same semantic content. For example, the active and passive forms of a
sentence will result in two different phrase marker structures.
The sentences "Joe kissed Sue" (active voice) and "Sue was kissed by Joe"
(passive voice) result in the structures depicted in Figure 12.2 where the
subject and object roles for Joe and Sue are switched.
MGMC/JRP/BCA/5thSem/AI/Unit 3 Page 7 of 37
Obtaining different structures from sentences having the same meaning is
undesirable in language understanding systems. Sentences with the same
meaning should always map to the same internal knowledge structures.
In an attempt to repair these shortcomings in generative grammars, Chomsky
extended them by incorporating two additional components to the basic
syntactic component. The added components provide a mechanism to produce
single representations for sentences having the same meanings through a series
of transformations. This extended grammar is called a transformational
generative grammar.
Its additions include a semantic component and a phonological component. They
are used to interpret the output of the syntactic component by producing
meanings and sound sequences. The transformations are tree manipulation
rules which depend on the use of an extended lexicon (dictionary) containing a
number of semantic features for each word.
MGMC/JRP/BCA/5thSem/AI/Unit 3 Page 8 of 37
In case grammars, a sentence is defined as being composed of a proposition P -
a tenseless set of relationships among verbs and noun phrases and a modality
constituent M, composed of mood, tense, aspect, negation, and so on.
Thus, a sentence can be represented as
S→M+P where P consists of one or more distinct cases Cl, C2, ... , Ck,
i.e. P→ C I + C2 +. . . + Ck.
The original list contained only six cases :
agentive (or agent): the case of an instigator of an action
instrumental: the case of an instrument or object used in an action
objective: the case of the object receiving the action or change.
dative : an animate entity affected by an action
factitive : the case of the object or of being that which results from an event
locative : the case of location of the event
E.g : In sentence "The soldier struck the suspect with the rifle butt", the soldier is
the agentive case, the suspect the objective case, and the rifle butt the
instrumental case.
Additional cases or substitutes including beneficiary, source, destination, to or
from, goal, and time have been introduced.
Case frames are provided for verbs to identify allowable cases. They give the
relationships which are required and those which are optional. For the above
sentence, a case frame for the verb struck might be
STRUCK[OBJECTIVE (AGENTIVE) (INSTRUMENTAL)]
This may be interpreted as stating that the verb struck must occur in sentences
with a noun phrase in the objective case and optionally (parentheses indicate
optional use) with noun phrases in the agentive and instrumental cases.
A tree representation
for a case grammar
identifies words by
their modality and
case. For example, a
case grammar tree for
the sentence "Sue did
not take the car" is
illustrated in Figure 12.4.
MGMC/JRP/BCA/5thSem/AI/Unit 3 Page 9 of 37
To build a tree structure like this requires that a word lexicon with sufficient
information be available in which to determine the case of sentence elements.
Systemic Grammars
Systemic grammars emphasize function and purpose in the analysis of
language. They attempt to account for the personal and social aspects which
influence communication through language. As such, context and pragmatics
play a more important role in systemic grammars.
Systemic grammars were introduced by Michael Halliday and Winograd in an
attempt to account for the principles that underlie the organization of different
structures. They classify language by three functions which relate to content,
purpose, and coherence.
1. The ideational function relates to the content intended by the speaker. It
provides information about the kinds of activities being described, who the
actors are, whether there are other participants, and the circumstances related
to time and place. Similar to the case grammars.
2. The interpersonal function is concerned with the purpose and mood of the
statements, whether a question is being asked, an answer being given, a
request being made, an opinion being offered, or information given.
3. The textual function dictates the necessity for continuity and coherence
between the current and previously stated expressions. This function is
concerned with the theme of the conversation, what is known, and what is
newly expressed.
Halliday proposed a model of language which consisted of four basic categories.
• Language units. A hierarchy for sentences based on the sentence, clause,
phrase group, word, and morpheme.
• Role structure of units. A unit consists of one or more units of lower rank
based on its role, such as subject, predicate, complement, or adjunct.
• Classification of units. Units are classified by the role they play at the next
higher level. For example, the verbal serves as the predicate, the nominal
serves as the subject or complement, and so on.
• System constraints. These are constraints in combining component
features. For example, the network structure given below depicts the
constraints in an interpretation.
MGMC/JRP/BCA/5thSem/AI/Unit 3 Page 10 of 37
Given these few principles, it is possible to build a grammar which combines
much semantic information with the syntactic.
Semantic Grammars
Semantic grammars encode semantic information into a syntactic grammar.
They use context-free rewrite rules with nonterminal semantic constituents.
The constituents are categories or metasymbols such as attribute, object,
present (as in example), and ship, rather than NP, VP, N, V, and so on. This
approach greatly restricts the range of sentences which can be generated and
requires a large number of rewrite rules.
Semantic grammars have been used in applications including LIFER, a data
base query system distributed by the Navy which is accessible through,
ARPANET, and a tutorial system named SOPHIE which is used to teach the
debugging of circuit faults.
Rewrite rules in these systems essentially take the forms :
In the LIFER system, there are rules to handle various forms of wh-queries
such as What is the name and location of the carrier nearest to New York ?
These sentences are analyzed and words matched to metasymbols contained in
lexicon entries. For example, the input statement "Print the length of the
Enterprise" would fit with the LIFER top grammar rule (L.T.G.) of the form
<L.T.G.> → <PRESENT> the <ATIRIBUTE> of <SHIP>
where print matches <PRESENT>, length matches <ATTRIBUTE>, and the
Enterprise matches <SHIP>.
Semantic grammars are suitable for use in systems with restricted grammar
since computation is limited. But they become unwieldy when used with
general purpose language understanding systems.
BASIC PARSING TECHNIQUES
Before the meaning of a sentence can be determined, the meanings of its
constituent parts must be established. This requires a knowledge of the
structure of the sentence, the meanings of individual words and how the words
modify each other. The process of determining the syntactical structure of a
MGMC/JRP/BCA/5thSem/AI/Unit 3 Page 11 of 37
sentence is known as parsing. Parsing is the process of analyzing a sentence by
taking it apart word-by- word and determining its structure from its constituent
parts and subparts.
• The structure of a sentence can be represented with a syntactic tree /list.
• The parsing process is the inverse of the sentence generation process since
it involves finding a grammatical sentence structure from an input string.
• When given an input string, the lexical parts (root words) must first be
identified by type, and then the role they play in a sentence must be
determined. These parts can then be combined successively into larger
units until a complete tree structure has been completed.
• To determine the meaning of a word, a parser must have access to a lexicon.
When the parser selects a word from the input stream it locates the word in
the lexicon and obtains the word's possible function and other features,
including semantic information. This
information is then used in building a tree
or other representation structure.
The general parsing process is illustrated in
Figure 12.5. ->
The Lexicon
A lexicon is a dictionary of words
(usually morphemes or root words
together with their derivatives),
where each word contains some
syntactic, semantic, and possibly
some pragmatic information. The
information in the lexicon is needed
to help determine the function and
meanings of the words in a
sentence.
Each entry in a lexicon will contain
a root word called the head.
Different derivatives of the word, if
any, will also be given, and the roles
it can play in a sentence (e.g. its part of speech and sense). Refer figure for a
fragment of a simplified lexicon.
The abbreviations 1s, 2s, 3p stand for first person singular, second person
singular, third person plural respectively. Note that some words have more
MGMC/JRP/BCA/5thSem/AI/Unit 3 Page 12 of 37
than one type such as can which is both a noun and a verb, and orange which
is both an adjective and a noun.
A lexicon may also be organized to contain separate entries for words with
more than one function by giving them separate identities, can1 and can2.
Alternatively, the entries in a lexicon could be grouped and given by word
category (by articles, nouns, pronouns, verbs, and so on).
The organization and entries of a lexicon vary from one implementation to
another, but they are made up of variable length data structures such as lists
or records arranged in alphabetical order.
Access to the words may be facilitated by indexing, with binary searches,
hashing, or combinations of these methods. A lexicon may also be partitioned
to contain a base lexicon set of general, frequently used words and domain
specific components of words.
Transition Networks
• Popular method used to represent formal and natural language structures.
• Based on application of directed graphs (digraphs) and finite state automata.
• Consists of a number of nodes and labeled arcs. The nodes represent different
states in traversing a sentence, and the arcs represent rules or test conditions
required to make the transition from one state to the next.
• A path through a transition network corresponds to a permissible sequence of
word types for a given grammar. Thus, if a transition network is successfully
traversed, it will have recognized a permissible sentence structure.
For example, a network used to
recognize a sentence consisting of
a determiner, a noun and a verb
("The child runs") represented by
the three-node graph. Refer fig →
Starting at node N1, the transition from node N1 to N2 will be made if a
determiner is the first input word found. If successful, state N2 is entered. The
transition from N2 to N3 can then be made if a noun is found next. The final
transition (from N3 to N4) will be made if the last word is a verb. If the three
word category sequence is not found,
the parse fails.
Clearly, this type of network is very
limited since it will only recognize
simple sentences of the form DET N V.
The utility of a network such as this
could be increased if more than a single
MGMC/JRP/BCA/5thSem/AI/Unit 3 Page 13 of 37
choice were permitted at some of the nodes. For example, if several arcs were
constructed between nodes N1 and N2 where each arc represented a different
noun phrase, the number of permissible sentence types would be increased.
These alternatives are depicted in Figure 12.7.
Top-Down versus Botton-up Parsing
Parsers may be designed to process a sentence using either a top-down or a
bottom-up approach.
A top-down parser begins by hypothesizing a
sentence (the symbol S) and successively
predicting lower level constituents until
individual preterminal symbols are written. These
are then replaced by the input sentence words
which match the terminal categories. For
example, a possible top-down parse of the
sentence "Kathy jumped the horse" would be
given by figure →
A bottom-up parse, begins with the actual
words appearing in the sentence and is,
therefore, data driven. A possible bottom-up
parse of the same sentence shown in figure →
Words in the input sentence are replaced with
their syntactic categories and those in turn are
replaced by constituents of the same or smaller
size until S has been rewritten or until failure
occurs.
Deterministic versus Nondeterministic Parsers
Parsers may also be classified as deterministic or nondeterministic depending
on the parsing strategy employed.
A deterministic parser :
❖ permits only one choice (arc) for each word category. Thus, each arc will
have a different test condition.
❖ Consequently, if an incorrect test choice is accepted from some state, the
parse will fail since the parser cannot backtrack to an alternative choice.
➢ This may occur, for example, when a word satisfies more than one
category such as a noun and a verb or an adjective, noun, and verb.
❖ Clearly, in deterministic parsers, care must be taken to make correct test
choices at each stage of the parsing.
MGMC/JRP/BCA/5thSem/AI/Unit 3 Page 14 of 37
➢ This can be facilitated with a look-ahead feature which checks the
categories of one or more subsequent words in the input sentence before
deciding in which category to place the current word.
❖ Some researchers prefer to use deterministic parsing since they feel it more
closely models the way humans parse input sentences.
Nondeterministic parsers :
❖ permit different arcs to be labeled with the same test. Consequently, the
next test from any given state may not be uniquely determined by the state
and the current input word.
❖ The parser must guess at the proper constituent and then backtrack if the
guess is later proven to be wrong.
❖ This will require saving more than one potential structure during parts of
the network traversal.
Examples of both deterministic and nondeterministic parsers are presented in
Figure 12.8.
Suppose the following
sentence is given to a
deterministic parser
with the grammar given
by the network of Figure
12.8(a): "The strong bear
the loads" If the parser
chose to recognize
strong as an adjective
and bear as a noun, the
parse would fail, since
there is no verb
following bear. A
nondeterministic parser,
would simply recover by backtracking when failure was detected and then
taking another arc which accepted strong as a noun.
Example of a Simple Parser in Prolog
It is a straightforward task to write parsers in a language like PROLOG. For
example, the grammar rule that states that S is a sentence if it is a noun
phrase followed by a verb phrase (S → NP VP) may be written in PROLOG as
sentence(A,C) : -nounPhrase(A,B), verbPhrase(B,C)
MGMC/JRP/BCA/5thSem/AI/Unit 3 Page 15 of 37
The variables A, B and C in this statement represent the list of words. The
argument A is the whole list of words to be tested as a sentence and C is the
list of remaining words if any.
An NP may be defined with statements such as the following :
nounPhrase(A,C):=article(A,B),noun(B,C)
nounPhrase(A,B):=noun(A,B)
These rules state that (1) a noun phrase can be either an article which consists
of a list A and remaining list B (if any) and a noun which is a list B and
remaining list C or (2) a noun consisting of the list A with remaining list B (if
any).
Similarly a verb phase may be defined with rules like the following :
verbPhrase(A,B) : = verb(A,B).
verbPhrase(A,C) : = verb(A,B), nounPhrase(B,C).
verbPhrase(A,C) : = verb(A,B), prepositionPhrase(B,C).
Recursive Transition Networks
The simple networks described above are not powerful enough to recognize the
variety of sentences in a human language system. Other extensions are needed
to accept a wider range of
sentences.
We can achieve such extensions
by labeling some arcs as a
separate network state (such as
an NP) and then constructing a
subnetwork which recognizes the
different noun phrases required.
In this way, a single subnetwork
for an NP can be called from
several places in a sentence.
Similar arcs can be labeled for
other sentence constituents
including VP, PP (prepositional
phrases) and others. With these
additions, complex sentences
having embedded phrases can
be parsed with relatively simple
networks. This leads directly to
the notion of using recursion in a network.
MGMC/JRP/BCA/5thSem/AI/Unit 3 Page 16 of 37
A recursive transition network (RTN) is a transition network which permits arc
labels to refer to other networks (including the network's own name), and they
in turn may refer back to the referring network rather than just permitting
word categories used previously. For example, an RTN is illustrated in Figure
12.9 where the main network calls two subnetworks and an NP and PP
network as illustrated in 12.9(b) and (c).
o The top network in the figure is the top level (sentence) network, and the
lower level networks are for NP and PP arc states. The arcs corresponding to
these states will be traversed only if the corresponding subnetworks (b) or
(c) are successfully traversed.
o In traversing a network, the arcs are tested in a clockwise order.
o Thus, in the top level RTN, the NP arc will be called first. If this arc fails, the
labeled AUX will be tested next.
o During the traversal of an RTN, a record must be maintained of the word
position in the input sentence and the current state or node position and
return nodes to be used as return points when control has been transferred
to a lower level network. This information can be maintained as a triple -
POS CND RLIST where POS is the current input word position, CND is the
current node, and RLIST is the list of return points (RLIST maintained as a
stack ).
In Figure 12.9, the arc
POP is used as a dummy
arc to signal the
successful completion of
the subnetwork and a
return to the node
following the arc from
which it was called.
Other arc types are shown in Table 12.1 and explained below :
CAT arc : represents a test for a specific word category such as a noun or a
verb. When the input word is of the specified category, the CAT test succeeds
and the input pointer is advanced to the next word.
JUMP arc : may be traversed without satisfying any test condition in which case
the word pointer is not advanced.
PUSH arc : initiates a call to another network with the indicated state (such as
an NP or PP). When a PUSH arc is taken, a return state must be saved. This is
accomplished by pushing the return pointer onto a stack.
MGMC/JRP/BCA/5thSem/AI/Unit 3 Page 17 of 37
TEST arc : allows the use of an arbitrary test to determine if the arc is to be
taken. For example, it can be used to determine if a sentence is declarative or
interrogative, if one or more negatives occur, and so on.
WORD arc : corresponds to a specific word test such as to, from, and at.
To see how an interpreter operates with the grammar given for the RTN of
Figure 12.6, we apply it to the following sentence (the subscripted numbers
give the word positions):
1The2big3tree4shades5the6old7houses8by9the10stream11
o Starting with CND set to S1, POS set to 1, and RLIST set to nil, the first arc
test (NP) would be completed. Since this test is for a state, the parser would
PUSH the return node S2 onto RLIST, set CND to N1, and call NP network.
o Trying the first test DET (a CAT test) in the NP network, a match found with
word position 1. CND updated to N2 and POS to position 2. The next word
(big) satisfies the ADJ test causing CND to be updated to N2 again, and POS
to position 3. The ADJ test is then repeated for the word tree, but it fails.
Hence, the arc test for N is made next with no change made to POS and
CND. This time the test succeeds, updates of N4 to CND and position 4 to
POS. The next test is the POP which signals a successful completion of the
NP network and causes the return node (S1) to be retrieved from the RLIST
stack and CND to be updated with S2. POP does not cause an advance in
the word position POS.
o The only possible test from S2 is for category V which succeeds on the word
"shades" with resultant updates of S5 to CND and 5 to POS. At S5, the only
possible test is the NP. This again invokes a call to the lower level NP
network which is traversed successfully with the noun phrase "the old
house." After a return to the main network, CND is set to S6 and POS is set
to position 8. At this point, the lower PP network is called with CND being
set to P1 and S6 pushed onto RLIST. From P1, the CAT test for PREP passes
with CND being set to P2 and POS being set to 9. NP is then called with
CND being set to N1 and P2 being pushed onto RLIST. As before, the NP
network is traversed with the noun phrase "the stream" resulting in a POS
value of 11, P3 being popped from RLIST and a return to that node. The test
at P3 (POP) results in S6 being popped from RLIST and a return to the S6
node. Finally, the POP test at N6, together with the period at position 11
results in a successful traversal and acceptance of the sentence.
During a network traversal, a parse can fail if :
(1) the end of the input sentence (a period) has been reached when the test
from the CND node value is not a terminal (POP) value or
MGMC/JRP/BCA/5thSem/AI/Unit 3 Page 18 of 37
(2) if a word in the input sentence fails to satisfy any of the available arc tests
from some node in the network.
The number of sentences accepted by an RTN can be extended if backtracking
is permitted when a failure occurs. This requires that states having alternative
transitions be remembered until the parse progresses past possible failure
points. In this way, if a failure occurs at some point, the interpreter can
backtrack and try alternative paths. The disadvantage with this approach is
that parts of a sentence may be parsed more than one time resulting in
excessive computations.
Augmented Transition Networks
The networks considered so far are not very useful for language understanding.
They have only been capable of accepting or rejecting a sentence based on the
grammar and syntax of the sentence. To be more useful, an interpreter must
be able to build structures which will ultimately be used to create the required
knowledge entities for an AI system. Furthermore, the resulting data structures
should contain both syntactic information and semantic information (such as
the subject NP, the object NP, the subject-verb number agreement, the mood -
declarative or interrogative, tense…). Hence additional tests must be performed
to determine the possible semantics a sentence may have.
We can achieve the additional capabilities required by augmenting an RTN with
the ability to perform additional tests and store immediate results as a
sentence, is being parsed. When an RTN is given these additional features, it is
called an augmented transition network or ATN.
ATN is a top-down parsing procedure was first used in LUNAR system (for
lunar geology). In ATN, class of labels are attached to the arcs that define
transitions between states has been augmented. Arcs are labeled like :
- Specific words, such as "in”
- Word categories, such as "noun".
- Pushes to other networks that recognize major components of a sentence.
E.g.: a network designed to recognize a prepositional phrase (PP) may
include an arc that asks for ("pushes for") NP.
- Procedures that perform arbitrary tests on the current input and on sentence
components which are already identified.
- Procedures that build structures that will form part of the final parse.
Figure 15.8 shows an example of an ATN in graphical notation. To see how an
ATN works, let us write an ATN for the sentence: The long file has printed.
Execution proceeds as follows:
MGMC/JRP/BCA/5thSem/AI/Unit 3 Page 19 of 37
1. Begin in state S.
2. Push to NP.
3. Do a-category test to see if "the" is a determiner.
4. This test succeeds, so set the
DETERMINER register to
DEFINITE and go to state Q6.
5. Do a category test to see if
"long" is an adjective.
6. This test succeeds, so append
"long" to the empty ADJS
register Stay in state Q6.
7. Do a category test to see if
"file" is an adjective. This test
fails.
8. Do a category test to see if
"file" is a noun. This test
succeeds, so set the NOUN
register to "file" and go to state
Q7.
9. Push to PP.
10. Do a category test to see if "has" is a preposition. This test fails, so pop
and signal failure.
11. There is nothing else that can be done from state Q7, so pop and return
the structure (NP (FILE (LONG) DEFINITE))
The return causes the machine to be in state Q1, with the SUBJ register
set to the above structure and the TYPE register set to DCL.
12. Do a category test to see if "has" is a verb. This test succeeds, so set the
AUX register to NIL and set the V register to "has." Go to state Q4.
13. Push to state NP. Since the next word, "printed," is not a determiner or a
proper noun, NP will pop and return failure.
14. The only other thing to do in state Q4 is to halt. But more input remains.
So Backtracking is now required.
15. The last choice point was at state Q1, so return. The registers AUX and V
must be unset.
16. Do a category test to see if "has" is an auxiliary. This test succeeds, so
set the AUX register to "has" and go to state Q3.
17. Do a category test to see if “printed" is a verb. This test succeeds, so set
the V register to "printed." Go to state Q4.
18. Now, since the input is over, Q4 is an acceptable final state. Pop and
return the structure
MGMC/JRP/BCA/5thSem/AI/Unit 3 Page 20 of 37
(S DCL (NP (FILE (LONG) DEFINITE))
HAS
(VP PRINTED))
This structure is the output of the parse.
Note that :
- A single sub network occurs only once, but can be used more than once.
- A network can be called recursively.
- Any number of internal registers may be used to contain result of the parse.
- The result of a network can be built, using the function BUILDQ.
- A single state may be both a final state(complete sentence has been found)
and an intermediate state(only a part of a sentence has been recognized).
- The contents of a register can be modified at any time.
ATNs can be used in many ways :
• The contents of registers can be swapped..
• Arbitrary tests can be placed on the arcs
SEMANTIC ANALYSIS AND REPRESENTATION STRUCTURES
The semantic interpretation can be the most difficult stage in the
transformation process. As an example of some of the difficulties encountered
in extracting the full intended meaning of some utterances, consider the
following situation.
It turned into a black day. In his haste to catch the flight, he backed over Tom's
bicycle. He should never have left it there. It was damaged beyond repair. That
caused the tailpipe to break. It would be impossible to make it now .... It was
all because of that late movie. He would be heartbroken when he found out
about it.
• Although a car was never explicitly mentioned, it must be assumed that a
car was the object which was backed over Tom's bicycle. A program must be
able to infer this.
• The "black day" metaphor also requires some inference. Days are not
usually referred to by color.
• And sorting out the pronoun references is also a difficult task for a program.
Of the seven uses of it, two refer to the bicycle, two to the flight, two refer to
the situation in general, and one to the status of the day.
• There are also four uses of he referring to two different people and a that
which refers to the accident in general. The placement of the pronouns is
almost at random making it difficult to give any rule of association.
MGMC/JRP/BCA/5thSem/AI/Unit 3 Page 21 of 37
Words that point back or refer to people, places, objects, events, times, and so
on that occurred before, are called anaphors. Their interpretation may require
the use of heuristics, syntactic and semantic constraints, inference, and other
forms of object analysis within the discourse content.
This example demonstrates that language cannot be separated from
intelligence and reasoning. To fully understand the above situation requires
that a program be able to reason about people's goals, beliefs, motives, and
facts about the world in general.
The semantic structures constructed from utterances such as the above, must
account for all aspects of meaning in what is known as the domain, context,
and the task :
The domain refers to the knowledge that is part of the world model the system
knows about. This includes object descriptions, relationships, and other
relevant concepts.
The context relates to previous expressions, the setting and time of the
utterances, and the beliefs, desires, and intentions of the speakers.
A task is part of the service the system offers, such as retrieving information
from a data base, providing expert advice, or performing a language
translation.
Semantic interpretations require that utterances be transformed into coherent
expressions in the form of FOPL clauses, associative networks, frames, or
script like structures that can be manipulated by the understanding program.
There are a number of different approaches to the transformation problem. The
approach we have been following is two stages approach where in the first
stage, a syntactic analysis is performed and a tree-like structure, is produced
using a parser. This stage is followed by use of a semantic analyzer to produce
either an intermediate or final semantic structure.
Another approach is to transform the sentences directly into the target
structure with little syntactical analysis - use key words to extract meaning.
The approach taken in extracting the meaning of an expression is based on:
(1) whether it can or should be derived, by paraphrasing input utterances and
transforming them into structures containing a few generic terms (unit or
lexical semantics – uses conceptual dependency theory)
(2) whether meanings are best derived by composing the meanings of clauses
and larger units from the meanings of their constituent parts. (compositional
semantics – uses FOPL or an extended FOPL).
MGMC/JRP/BCA/5thSem/AI/Unit 3 Page 22 of 37
Lexical Semantics Approaches
With this approach, input sentences are transformed through the use of
domain dependent semantic rewrite rules which create the target knowledge
structures. Conceptual dependency structures provide a form of linked
knowledge that can be used in larger structures such as scenes and scripts.
MGMC/JRP/BCA/5thSem/AI/Unit 3 Page 23 of 37
2. Specific word orderings are checked as the structure is being built and
actions initiated as a result of the orderings. For example, if a PP follows the
last word parsed, action is initiated to fill the object slot.
3. Checks are made for specific words or phrases and, if found, specified
actions taken. For example, if an intransitive verb such as listen is found,
an action would be initiated to look for associated words which complete the
phrase beginning with to or for.
For the above tests, there are four types of actions taken.
1. Adding additional structure to a partially built conceptual dependency.
2. Filling a slot with a substructure.
3. Activating another action.
4. Deactivating an action.
These actions build up the CD structure as the input string is parsed. For
example, the action taken for a verb like drank would be to build substructure
for the primitive acticn INGEST with unfilled slots for ACTOR, OBJECT, and
TENSE :
(INGEST (ACTOR nil) (OBJECT nil) (TENSE past))
Subsequent words in the input string would initiate actions to add to this
structure and fill in the empty ACTOR and OBJECT slots. Thus, a simple
sentence like
The boy drank a soda
would be transformed through a
series of test and action steps to
produce a structure such as this ->
Compositional Semantics Approaches
In the this approach, the meaning of an expression is derived from the
meanings of the parts of the expression. The target knowledge structures
constructed in this approach are logic expressions such as the formulas of
FOPL. The LUNAR system developed by Woods (1978) uses this approach.
The input strings are first parsed using an A TN from which a syntactic tree is
output. This becomes the input to a semantic interpreter which interprets the
meaning of the syntactic tree and creates the semantic representations.
As an example, suppose the following declaration is submitted to LUNAR.
Sample24 contains silicon
This would be parsed, and the following tree
structure would be output from the ATN ->
MGMC/JRP/BCA/5thSem/AI/Unit 3 Page 24 of 37
Using this structure, the semantic interpreter would produce the predicate
clause (CONTAIN sample24 silicon).
The interpreter used in LUNAR is driven by semantic pattern → action
interpretation rules. The rule that builds the CONTAIN predicate is selected
whenever the input tree has a verb of the form have or contain and a sample as
the subject and either chemical element, isotope, or oxide as an object.
The action of the rule states that such a sentence is to be interpreted as an
instance of the schema (CONTAIN x y) with x and y being replaced by the ATN's
interpretation of subject noun phrase and object respectively.
LUNAR is also capable of performing quantification of variables in expressions.
The quantifiers include the standard existential and universal quantifiers "for
every" and "for some," and also, "for the," "exactly," "the single," "more than,"
"at least," and so on.
NATURAL LANGUAGE GENERATION
Language generation is the exact inverse of language understanding. The
generation of natural language is more difficult than understanding it, since a
system must not only decide what to say, but also how to state it.
A generation system must decide which form is better (active or passive), which
words and structures best express the intent, and when to say what. To
produce expressions that are natural and close to humans, requires more than
rules of syntax, semantics, and discourse.
A participant in a dialog must reason about a hearer’s understanding and his
or her knowledge and goals. During the dialog, the system must maintain
proper focus and formulate expressions that either query, explain, direct, lead
or just follow the conversation as appropriate.
The study of language generation falls naturally into three areas :
(1) the determination of content : is concerned with what details to include in
an explanation, a request, a question or argument in order to convey the
meanings set forth by the goals of the speaker. This means the speaker
must know what the hearer already knows, what the hearer needs to
know, and what the hearer wants to know.
(2) formulating and developing a text utterance plan : Text planning is the
process of organizing the content to be communicated so as to best
achieve the goals of the speaker.
(3) achieving a realization of the desired utterances : Realization is the process
of mapping the organized content to actual text. This requires that specific
words and phrases be chosen and formulated into a syntactic structure.
MGMC/JRP/BCA/5thSem/AI/Unit 3 Page 25 of 37
NATURAL LANGUAGE SYSTEMS
Three successful natural language understanding systems are LUNAR, LIFER,
and SHRDLU.
The LUNAR System
The LUNAR system was designed as a language interface to give geologists
direct access to a database containing information on lunar rock and soil
compositions obtained during the NASA Apollo-11 moon landing mission. The
design objective was to build a system that could respond to natural queries
received from geologists such as
What is the average concentration of aluminum in high-alkali rocks?
What is the average of the basalt?
In which samples has apatite been identified?
LUNAR has three main components:
1. A general purpose grammar and an ATN parser capable of handling a large
subset of English - produces a syntactic parse tree of the input sentence.
2. A rule-driven semantic interpreter - transforms the syntactic representation
into a logical form suitable for querying the data base. The rules have the
general form of pattern → action and carry out different tasks such as
answering a query.
3. A data base retrieval and inference component - used to determine answers
to queries and to make changes to the data base.
The system has a dictionary of some 3500 words, an English grammar and two
data bases, one containing a table of chemical analyses of about 13,000
entries, and the other 10,000 indexed document topics.
LUNAR uses a meaning representation language which is an extended form of
FOPL. The language uses :
(1) Designators which name objects or classes of objects like nouns,
variables, and classes with range quantifiers
(2) Propositions that can be true or false, that are connected with logical
operators and, or, not, and quantification identifiers
(3) Commands which carry out specific actions (like TEST which tests the
truth value of propositions against given arguments (TEST (CONTAIN
sample24 silicon))).
MGMC/JRP/BCA/5thSem/AI/Unit 3 Page 26 of 37
The LIFER System
LIFER (Language Interface Facility with Ellipsis and Recursion) was developed
by Gary Hendrix (1978) as a development aid and run-time language interface
to other systems such as a data base management system. Among its special
features are spelling corrections, processing of elliptical inputs, and the ability
of the run-time user to extend the language through the use of paraphrase.
LIFER consists of two major components, a set of interactive functions for
language specifications (used to define an application language as a subset of
English that is capable of interacting with existing software) and a parser
(interprets the language inputs and translates them into appropriate structures
that interact with the application software).
LIFER systems incorporate much semantic information within the syntax.
Rather than using categories like NP, VP, N, and V, LIFER uses semantic
categories like <SHIP-NAME> and <ATTRIBUTE> which match ship names or
attributes.
In place of syntactic patterns like NP VP, semantic patterns like What is the
<ATTRIBUTE> of <SHIP>? are used. For each such pattern, the language
definer supplies an expression with which to compute the interpretations of
instances of the pattern.
The main disadvantage of LIFER is that large number of patterns are required
for a system which requires many, diverse patterns.
MGMC/JRP/BCA/5thSem/AI/Unit 3 Page 27 of 37
(3) cognitive deduction component used to examine consequences of facts, carry
out commands, and find answers
(4) an English response generation component.
In addition, there is a knowledge base containing blocks world knowledge, and
a model of its own reasoning process, used to explain its actions,
Knowledge is represented with FOPL-like statements which give the state of the
world at any particular time and procedures for changing and reasoning about
the state.
For example, the expressions in the figure contain
facts describing that b1 is a block, b2 is a pyramid,
and b1 supports b2.
There are also procedural expressions to perform
different tasks such as clear the top or manipulate
an object.
The domain of SHRDLU is very limited and closed.
Pattern Recognition
Recognition is the process of establishing a close match between some new
stimulus and previously stored stimulus patterns. This process is being
performed continually throughout the lives of all living things.
Our ability to recognize patterns has motivated much research toward the
discovery of mechanical or artificial methods comparable to those used by
intelligent beings. As a result, numerous applications have resulted. Systems
have now been developed to reliably perform character and speech recognition;
fingerprint and photograph identifications; electroencephelogram (EEG),
electrocardiogram (ECG) , oil 10g-well, and other graphical pattern analyses;
various types of medical and system diagnoses; resource identification and
evaluation (geological, forestry. hydrological, crop disease); and detection of
explosive and hostile threats (submarine, aircraft, missile) to name a few.
Object classification is the ability to classify or group objects according to some
commonly shared features. It is a form of class recognition and is essential for
decision making, learning, and other cognitive acts. Like recognition,
classification depends on ability to discover common patterns among objects.
THE RECOGNITION AND CLASSIFICATION PROCESS
The steps in artificial or mechanical recognition are : (illustrated in Figure 13.1)
Step 1. Stimuli produced by objects are perceived by sensory devices. The
prominent attributes (such as size, shape, color, and texture) produce the
MGMC/JRP/BCA/5thSem/AI/Unit 3 Page 28 of 37
strongest stimuli. The values of these attributes and their relations are used
to characterize an object in the form of a pattern vector X using some means
of representation. The range of characteristic attribute values is known as the
measurement space M.
Step 2. A subset of attributes whose values provide cohesive object grouping
or clustering, consistent with some goals associated with the object
classifications, are selected. This subset simplifies the classification process.
The range of the subset of attribute values is known as the feature space F.
Step 3. Using the selected attribute values, object or class characterization
models are learned by forming generalized phototype descriptions,
classification rules, or decision functions. These models are stored for
subsequent recognition. The range of the decision function values or
classification rules is known as the decision space D.
Step 4. Recognition of familiar objects is achieved through application of the
rules learned in Step 3 by comparison and matching of object features with
the stored models. Refinements and adjustments can be performed
continually thereafter to improve the quality and speed of recognition.
There are two basic approaches to the recognition problem, (1) the decision
theoretic approach and (2) the syntactic approach.
LEARNING CLASSIFICATION PATTERNS
Before a system can recognize objects, it must possess knowledge of the
characteristic features for those objects. The system designer must either build
the necessary discriminating rules into the system or the system must learn
them.
In the case of a linear decision function, the weights that define class
boundaries must be predefined or learned. In the case of syntactic recognition,
the class grammars must be predefined or learned.
Learning decision functions, grammars, or other rules can be performed in
either of two ways, through supervised learning or unsupervised learning.
• Supervised learning is accomplished by presenting training examples to a
learning unit. The examples are labeled beforehand with their class. The
MGMC/JRP/BCA/5thSem/AI/Unit 3 Page 29 of 37
attribute values and object labels are used by the learning component to
extract and determine pattern criteria for each class. This knowledge is used
to adjust parameters in decision functions or grammar rewrite rules.
• In unsupervised learning, labeled training examples are not available. The
system must be able to perceive and extract relevant properties from the
otherwise unknown objects, find common patterns among them, and
formulate descriptions or discrimination criteria consistent with the goals of
the recognition process.
This form of learning is called clustering. It is the first step in any
recognition process where discriminating features of objects are not known
in advance.
Learning through Clustering
Clustering is the process of grouping or classifying objects on the basis of a close
association or shared characteristics. The objects can be physical or abstract
entities and the characteristics can be attribute values, relations among the
objects, and combinations of both.
For example, the objects might be streets, freeways, and other pathways
connecting two points in a city, and the classifications, the pathways which
provide fast or slow traversal between the points.
Given different objectives, the same set of objects would, in general, be
clustered differently. If the objective given above for the streets, freeways safe
for bicycle riding, a different object classification would result.
Finding the most meaningful cluster groupings among a set of unknown
objects Oi requires that similarity patterns be discovered in the feature space.
The clustering problem gives rise to several subproblems :
1. What set of attributes and relations are most relevant, and what weights
should be given to each? In what order should attributes be measured?
2. What representation formalism should be used to characterize the objects'?
3. What representation scheme should be used to describe the cluster
groupings or classifications?
4. What clustering criterias most consistent with and effective in achieving the
objectives relative to the context or domain?
5. What clustering algorithms can best meet the criteria in 2 within acceptable
time and space complexity bounds?
Due to the combinatorial explosion which results in arranging n objects into an
unknown number m of clusters checking all possible object groupings for
patterns is not feasible. Hence methods which examine only the more
MGMC/JRP/BCA/5thSem/AI/Unit 3 Page 30 of 37
promising groupings must be used. Establishing such groupings requires the
use of some measure of similarity, association among a set of objects.
Many clustering algorithms have been proposed for different tasks. One of the
most popular algorithms is the ISODATA method. This method requires that
the number of clusters m be specified, and threshold values t1, t2 and t3 be
determined for use in splitting, merging, or discarding clusters respectively.
The algorithm is given with the following steps.
1. Select m samples as seed points for initial cluster centers. This can be done
by taking the first m points, selecting m random points or by taking the first
m points which exceed some mutual minimum separation distance d.
2. Group each sample with its nearest cluster center.
3. After all samples have been grouped, compute new cluster centers (centroid)
for each group.
4. If the split threshold t1 is exceeded for any cluster, split it into two parts and
recompute new cluster centers.
5. If the distance between two cluster centers is less than t2, combine (merge)
the clusters and recompute new cluster centers.
6. If a cluster has fewer than t3 members, discard the cluster.
7. Repeat steps 3 through 6 until no change occurs among cluster groupings or
until some iteration limit has been exceeded.
MGMC/JRP/BCA/5thSem/AI/Unit 3 Page 32 of 37
Table 16.1 depicts several
types of knowledge and
possible representation
structures. In building a
knowledge base, it is
necessary to create or
modify structures such as
these for subsequent use by
a performance component
(like a theorem prover or
inference engine).
To be effective, the newly
acquired knowledge should
be integrated with existing
knowledge in some meaningful way so that nontrivial inferences can be drawn
from the resultant body of knowledge.
The knowledge should, of course, be accurate, nonredundant, consistent
(noncontradictory), and fairly complete in the sense that it is possible to
reliably reason about many of the important conclusions for which the system
was intended.
TYPES OF LEARNING
Classification or taxonomy of learning types :
based on the type of knowledge representation used : predicate calculus, rules,
frames
based the type of knowledge learned : concepts, game playing, problem solving
based on area of application : medical diagnosis, scheduling, prediction….
The classification popular among machine learning researchers is based on the
type of inference strategy employed or the methods used in the learning
process. The five different learning methods under this taxonomy are :
• Memorization (rote learning) :
o simplest form of learning.
o requires the least amount of inference
o accomplished by simply copying the knowledge in the same form that it
will be used directly into the knowledge base.
o We use this type of learning when we memorize multiplication tables
MGMC/JRP/BCA/5thSem/AI/Unit 3 Page 33 of 37
• Direct instruction (by being told) :
o requires more inference than rote learning since the knowledge must be
transformed into an operational form before being integrated into the
knowledge base.
o Used when a teacher presents a number of facts directly to us in a well
organized manner.
• Analogy or analogical learning :
o process of learning a new concept or solution through the use of similar
known concepts or solutions.
o Used when solving problems on an exam where previously learned
examples serve as a guide or when we learn to drive a truck using our
knowledge of car driving.
o We make frequent use of analogical learning.
o requires more inferring than previous ones, as difficult transformations
must be made between the known and unknown situations.
• Induction :
o used frequently by humans.
o a powerful form of learning which, like analogical learning, also requires
more inferring than the first two methods.
o requires the use of inductive inference, a form of invalid but useful
inference. We use inductive learning when we formulate a general
concept after seeing a number of instances or examples of the concept.
o For example, we learn the concepts of color or sweet taste after
experiencing the sensations associated with several examples of colored
objects or sweet foods.
• Deduction :
o accomplished through a sequence of deductive inference steps using
known facts.
o From the known facts, new facts or relationships are logically derived.
o requires more inference than the other methods.
Learning methods can be either weak methods or knowledge-rich methods.
Weak methods are general purpose methods in which little or no initial
knowledge is available. They often rely on a form of heuristic search in the
learning process.
KNOWLEDGE ACQUISITION IS DIFFICULT
One of the important lessons learned by AI researchers during the 1970s and
early 1980s is that knowledge is not easily acquired and maintained. It is a
difficult and time-consuming process. Expert and other knowledge-based
MGMC/JRP/BCA/5thSem/AI/Unit 3 Page 34 of 37
systems require an abundant amount of well correlated knowledge to achieve a
satisfactory level of intelligent performance.
Early expert systems initially had a knowledge base consisting of a few
hundred rules, equivalent to less than 106 bits of knowledge. In contrast, the
capacity of a mature human brain has been estimated at some 1015 bits of
knowledge. The amount of knowledge required for expert systems knowledge
bases is approx 1010 bits.
If we were able to build such systems it would require on the order of 104
person years as the time required is directly proportional to the size of the
knowledge base.
Hence we must develop better acquisition and learning methods before we can
implement such systems within a realistic time frame. A system's performance
is strongly dependent on the level and quality of its knowledge, and that "in
knowledge lies power."
GENERAL LEARNING MODEL
Learning can be accomplished using a number of different methods such as
learning by memorizing facts, by being told, or by studying examples. Learning
requires that new knowledge structures be created from some form of input.
This new knowledge must then be assimilated into a knowledge base and be
tested in some way for its utility. Testing means that the knowledge should be
used in the performance of some task from which meaningful feedback can be
obtained, where the feedback provides some measure of the accuracy and
usefulness of the newly acquired
knowledge.
A general learning model is shown in
figure where the environment has been
included as part of the overall learner
system. The environment may be
regarded as either a form of nature
which produces random stimuli or as
a more organized training source such
as a teacher which provides carefully
selected training examples for the learner component.
Some representation language is used communication between the
environment and the learner. The language may be the same representation
scheme as that used in the knowledge base (such as predicate calculus). When
they are chosen to be the same, we say the single representation trick is being
MGMC/JRP/BCA/5thSem/AI/Unit 3 Page 35 of 37
used. This results in a simpler implementation since it is not necessary to
transform between two or more different representations.
The environment may be a user working at a keyboard / program modules to
simulate a particular environment / real physical sensors which interface with
some world environment.
Inputs to the learner component may be physical stimuli of some type or
descriptive, symbolic training examples. These are used to create and modify
knowledge structures in the knowledge base. This same knowledge is used by
the performance component to carry out tasks, such as solving a problem,
playing a game, or classifying instances of some concept.
When given a task, the performance component produces a response
describing its actions in performing the task. The critic module then evaluates
this response relative to an optimal response.
Feedback, indicating whether or not the performance was acceptable, is then
sent by the critic module to the learner component for its subsequent use in
modifying the structures in the knowledge base. If proper learning was
accomplished, the system's performance will have improved with the changes
made to the knowledge base.
This cycle may be repeated a number of times until the performance of the
system has reached some acceptable level, until a known learning goal has
been reached, or until changes cease to occur in the knowledge base after some
chosen number of training examples have been observed.
Factors affecting learning performance : include the types of training provided,
the form and extent of any initial background knowledge, the type of feedback
provided, and the learning algorithms used (Figure 16.2 below).
MGMC/JRP/BCA/5thSem/AI/Unit 3 Page 36 of 37
PERFORMANCE MEASURES
How to evaluate the performance of a given system or compare the relative
performance of two different systems? We could attempt to conduct something
like a Turing test on a system. But would this tell us how general or robust the
system is or how easy it is to implement? Such comparisons are possible only
when standard performance measures are available.
Some relative performance characteristics among the different learning
methods are :
Generality. One of the most important performance measures for learning
method is the generality or scope of the method. Generality is a measure of the
ease with which the method can be adapted to different domains of application.
A completely general algorithm is one which is a fixed or self adjusting
configuration that can learn or adapt in any environment or application
domain.
Efficiency. The efficiency of a method is a measure of the average time required
to construct the target knowledge structures from some specified initial
structures. The relative efficiency of a method can be defined as the ratio of the
time required for the given method to the time required for a purely random
search to find the target structures.
Robustness. is the ability of a learning system to function with unreliable
feedback and with a variety of training examples, including noisy ones. A robust
system must be able to build tentative structures which are subject to
modification or withdrawal if later found to be inconsistent with statistically
sound structures.
Efficacy. The efficacy of a system is a measure of the overall power of the
system. It is a combination of the factors generality, efficiency, and robustness.
We say that system A is more efficacious than system B if system A is more
efficient, robust, and general than B.
Ease of implementation. Ease of implementation relates to the complexity of
the programs and data structures and the resources required to develop the
given learning system.
*****
MGMC/JRP/BCA/5thSem/AI/Unit 3 Page 37 of 37