ai 6

Download as pdf or txt
Download as pdf or txt
You are on page 1of 55

Artificial Intelligence

Module-VI:
Natural Language Processing

Dr. Dwiti Krishna Bebarta

Text Book Referred:


1. Artificial Intelligence, Elaine Rich and Kevin Knight, Tata McGraw -Hill Publications
2. Introduction To Artificial Intelligence & Expert Systems, Patterson, PHI publications
Natural Language Processing
Topics to discuss:
• Steps in Natural Language Processing
• Syntactic Processing and Augmented Transition Nets
• Semantic Analysis and grammars
• Discourse and pragmatic processing
Planning:
• Components of a Planning System
• Goal Stack Planning
• Non-linear Planning using Constraint Posting
• Hierarchical Planning
• Reactive Systems
Introduction to Natural Language Processing
• Language meant for communicating with the world.
• By studying language, we can come to understand more about
the world.
• If we can succeed at building computational mode of language,
we will have a powerful tool for communicating with the world.
• We look at how we can exploit knowledge about the world, in
combination with linguistic facts, to build computational natural
language systems.
Natural Language Processing (NLP) problem can divide into two
tasks:
• Processing written text, using lexical, syntactic and semantic
knowledge of the language as well as the required real-world
information.
• Processing spoken language, using all the information needed
above plus additional knowledge about phonology as well as
enough added information to handle the further ambiguities that
arise in speech.
• Natural language processing i.e. machines follow to analyze, categorize,
and understand spoken and written language.
• NLP rely on deep neural network-style machine learning to mimic the
brain’s capacity to learn and process data correctly.
• Businesses use tools and algorithms that follow NLP to gather insights
from large data sets and make informed business decisions.
Some NLP business applications include:
• Text to speech: Converting text-to-speech data, then reproducing the text
as natural-sounding speech
• Chatbots: Helping chatbots understand and respond to customer inquiries
• Urgency detection: Analysing language to prioritize tasks
• Natural language understanding: Converting speech to text and analyzing
its intent
• Autocorrect: Detecting and removing text errors and suggesting
alternatives
• Sentiment analysis: Revealing the perceptions people have of your goods
and services and those of your competitors
• Speech recognition: Powering applications that understand users’ voices
and translating their meaning
A full NLU system would be able to:
• Paraphrase an input text.
• Translate the text into another language.
• Answer questions about the contents of the text.
• Draw inferences from the text.
Lexical/Morphological analysis
• Lexicon describes the understandable vocabulary that makes up a language.
• Lexical analysis deciphers and segments language into units
• lexemes—like paragraphs, sentences, phrases, and words.
• NLP algorithms categorize words into parts of speech (POS) and split lexemes
into morphemes—meaningful language units that you can’t further divide.
There are 2 types of morphemes:
• Free morphemes function independently as words (like ―cow‖ and ―house‖).
• Bound morphemes make up larger words. The word ―unimaginable‖ contains
the morphemes ―un-‖ (a bound morpheme signifying a negative context),
―imagine‖ (the free morpheme root of the whole word), and ―-able‖ (a bound
morpheme denoting the root morpheme’s ability to end).
• A sentence like this can be analysed ―I want to print Bill’s .init file‖
The morphological analysis must do the following things:
• Pull apart the word ―Bill’s‖ into proper noun ―Bill‖ and the possessive suffix ―’s‖
• Recognize the sequence ―.init‖ as a file extension that is functioning as an
adjective in the sentence.
• This process will usually assign syntactic categories to all the words in the
sentence.
Morphological analysis is the task of segmenting a word into
its morphemes:
• carried : carry + ed (past tense)
• independently : in + (depend + ent) + ly
• Googlers : (Google + er) + s (plural)
• unlockable : un + (lock + able) ? or (un + lock) + able ?
Syntactic analysis
• Syntax describes how a language’s words and phrases
arrange to form sentences.
• Syntactic analysis checks word arrangements for proper
grammar.
• For instance, the sentence ―Dave wrote the paper‖ passes a
syntactic analysis check because it’s grammatically correct.
• On the other hand, a syntactic analysis categorizes a
sentence like ―Dave do jumps‖ as syntactically incorrect.
• ―Boy the go to the store‖
• A syntactic analysis must exploit the results of the
morphological analysis to build a structural description of
the sentence.
• The goal of this process, called parsing, is to convert the flat
list of words that form the sentence into a structure that
defines the units that represented by that flat list.
• The important thing here is that a flat sentence has been
converted into a hierarchical structure. And that the
structure corresponds to meaning units when a semantic
analysis performed.
• Reference markers (set of entities) shown in the
parenthesis in the parse tree.
• Each one corresponds to some entity that has mentioned in
the sentence.
• These reference markers are useful later since they provide
a place in which to accumulate information about the
entities as we get it.
Verb phrase (VP)
V (Value)
Sentence S
noun phrase (NP)
PRO is a pronominal determiner phrase (DP)
Adjectives ADJS

―I want to print Bill’s .init file‖


Semantic analysis
The semantic analysis must do two important things:
• It must map individual words into appropriate objects in the
knowledge base or database.
• It must create the correct structures to correspond to the way the
meanings of the individual words combine with each other.
• Semantics describe the meaning of words, phrases, sentences,
and paragraphs.
• Semantic analysis attempts to understand the literal meaning of
individual language selections, not syntactic correctness.
• For instance, ―Manhattan calls out to Dave‖ passes a syntactic
analysis because it’s a grammatically correct sentence.
• However, it fails a semantic analysis. Because Manhattan is a
place (and can’t literally call out to people), the sentence’s
meaning doesn’t make sense.
A Knowledgebase Fragment
Discourse integration
• Discourse describes communication between 2 or more
individuals.
• Discourse integration analyses prior words and sentences to
understand the meaning of ambiguous language.
• For instance, if one sentence reads, ―Manhattan speaks to all its
people,‖ and the following sentence reads, ―It calls out to Dave,‖
– discourse integration checks the first sentence for context to
understand that ―It‖ in the latter sentence refers to Manhattan.
• ―I want to print Bill’s .init file‖
• Specifically, we do not know whom the pronoun ―I‖ or the proper
noun ―Bill‖ refers to.
• To pin down these references requires an appeal to a model of
the current discourse context, from which we can learn that the
current user is USER068 and that the only person named ―Bill‖
about whom we could be talking is USER073.
• Once the correct referent for Bill known, we can also determine
exactly which file referred to.
Pragmatic analysis
• The final step toward effective understanding is to decide what to do as a result.
• Pragmatism describes the interpretation of language’s intended meaning.
• Pragmatic analysis attempts to derive the intended—not literal—meaning of
language.
• For instance, a pragmatic analysis can uncover the intended meaning of
―Manhattan speaks to all its people.‖
• Methods like neural networks assess the context to understand that the
sentence isn’t literal, and most people won’t interpret it as such.
• A pragmatic analysis deduces that this sentence is a metaphor/representation for
how people emotionally connect with places.
• ―Do you know what time it is?‖ should be interpreted as request to tell the time.
• ―I want to print Bill’s .init file‖, One possible thing to do to record what was said
as a fact and done with it.
• For some sentences, a whose intended effect is clearly declarative, that is the
precisely correct thing to do.
• But for other sentences, including this one, the intended effect is different.
• We can discover this intended effect by applying a set of rules that characterize
cooperative dialogues.
• The final step in pragmatic processing to translate, from the knowledge-based
representation to a command to be executed by the system.
Syntactic Processing and Augmented Transition Nets
• Syntactic Processing is the step in which a flat input
sentence converted into a hierarchical structure that
corresponds to the units of meaning in the sentence. This
process called parsing.
• It plays an important role in natural language understanding
systems for two reasons:
1. Semantic processing must operate on sentence
constituents. If there is no syntactic parsing step, then
the semantics system must decide on its own
constituents. If parsing is done, on the other hand, it
constrains the number of constituents that semantics
can consider.
2. Syntactic parsing is computationally less expensive than
is semantic processing. Thus it can play a significant
role in reducing overall system complexity.
• Although it is often possible to extract the meaning of a
sentence without using grammatical facts, it is not always
possible to do so.
• Almost all the systems that are actually used have two main
components:
1. A declarative representation, called a grammar, of the
syntactic facts about the language.
2. A procedure, called parser that compares the grammar
against input sentences to produce parsed structures.
Grammars and Parsers
• The most common way to represent grammars is a set of
production rules.
• The first rule can read as ―A sentence composed of a noun
phrase followed by Verb Phrase‖; the Vertical bar is OR; ε
represents the empty string.
• Symbols that further expanded by rules called non-terminal
symbols.
• Symbols that correspond directly to strings that must found
in an input sentence called terminal symbols.
• Grammar formalism such as this one underlies many
linguistic theories, which in turn provide the basis for many
natural language understanding systems.
• Pure context-free grammars are not effective for describing
natural languages.
Grammars and Parsers Cont.
• NLPs have less in common with computer language
processing systems such as compilers.
• Parsing process takes the rules of the grammar and
compares them against the input sentence.
• The simplest structure to build is a Parse Tree, which simply
records the rules and how they matched.
• Every node of the parse tree corresponds either to an input
word or to a non-terminal in our grammar.
• Each level in the parse tree corresponds to the application
of one grammar rule.
Sometimes referred as determiner: det
The grammar specifies two things about a language:
1. Its weak generative capacity, by which we mean the set of
sentences that contained within the language. This set made
up of precisely those sentences that can completely match
by a series of rules in the grammar.
2. Its strong generative capacity, by which we mean the
structure to assign to each grammatical sentence of the
language.
Example: A Parse tree for a sentence: Bill Printed the file
A parse of the sentence "the giraffe dreams" is:
s => np vp => det n vp => the n vp => the giraffe vp => the
giraffe iv => the giraffe dreams

tv - transitive verb (takes an object), iv - intransitive


verb
• Transitive verbs require an object to complete their meaning, such
as "She reads a book," where "reads" is the transitive verb and
"book" is the object.
• The director buys his lunch

np vp

det n
tv np

the director
buys lunch
Types of Parsing
The parsing technique can be categorized into two types such
as
• Top down Parsing
• Bottom up Parsing Rahul is eating an apple.

The small tree shades the new


house by the stream
Bottom up Parsing
Augmented Transition Network
• Network Syntactic Processing is the step in which a flat input
sentence is converted into a hierarchical structure that
corresponds to the units of meaning in the sentence.
• This process is called parsing.
• It plays an important role in natural language understanding
systems for two reasons:
1. Semantic processing must operate on sentence
constituents. If there is no syntactic parsing step, then the
semantics system must decide on its own constituents. If
parsing is done, on the other hand, it constrains the
number of constituents that semantics can consider.
2. Syntactic parsing is computationally less expensive than
is semantic processing. Thus it can play a significant role
in reducing overall system complexity.
Augmented Transition Networks (ATNs) are a type of
transition network used for parsing sentences in natural
language processing.

Natural languages pose complexities that cannot be handled


by traditional transition networks designed for modeling regular
languages only.

In this regard, ATNs introduce augmented features which can


store and manipulate extra information as well as permitting
recursive transitions into these networks thereby making them
capable of dealing with some context sensitive as well as
context free aspects of natural language.
Structure of an Augmented Transition Network
An ATN consists of the following components:
• States: Nodes in the network representing various stages of
the parsing process.
• Transitions: Directed edges connecting states, labelled with
conditions and actions.
• Registers: Storage mechanisms for maintaining information
during parsing.
• Tests: Conditions that must be satisfied for a transition to be
taken.
• Actions: Operations performed during transitions, such as
storing information in registers or calling sub-networks.
ATNs parse sentences by traversing through states and
transitions based on the input tokens. The process involves:
• Initialization: Starting from an initial state, the parser reads
input tokens.
• Transition Conditions: For each token, the parser
evaluates the conditions on the outgoing transitions from the
current state.
• State Transitions: If a condition is satisfied, the parser
moves to the next state and performs any specified actions.
• Recursive Descent: The parser can call sub-networks to
handle nested structures, allowing for recursive parsing.
• Completion: The parser completes the process when it
reaches a final state with no more input tokens or when all
tokens are successfully parsed.
ATN features
ATNs have several features, including:
• Registers: ATNs have registers that can store information
between jumps to different sub-networks.
• Arcs: Arcs can have tests associated with them that must be
satisfied before the arc is taken.
• Actions: Actions can be attached to an arc to be executed
whenever it is taken.
Example: Trace the execution of ATN for the following sentence
• ―The long file is printed‖
1. Begin in state S
2. Push to P
3. Do a category test to see if ―The‖ is a determiner
4. This test succeeds, set the determiner register to definite
(the, a, an are referred as DEFINITE ARTICLE) and go to
Q6
5. Do a category test to see if ―long‖ is an adjective
6. This test succeeds, append ―long‖ to ADJS register and
stay at Q6
7. Do a category test to see if ―file‖ is an adjective, test fails
8. Do a category test to see if ―file‖ is an adjective, test
succeeds, set noun register to file go to Q7
Example: Trace the execution of ATN for the following
sentence
• ―The long file is printed‖
9. Push to PP (prepositional phrase)
10. Do a category test to see if ―is‖ is a preposition. Test
fails, pop and signal failure
11. Nothing else can be done from state Q7. pop and return
the structure (NP(FILE(LONG) DEFINITE)), this return
sets the state of Q1, SUBJ register set to the return
value and the TYPE register set to DCL (a constant that
indicates a sentence is declarative).
12. Do a category test to see if ―is‖ is a verb, it succeeds,
set AUX register to NIL and V register to ―is‖. Go to state
Q4.
13. Push to state NP. Next word ―printed‖ not a determiner
or a PN, NP will pop and return failure.
14. More input remains and parse has not found, backtrack to
last choice state Q1. reset AUX and V register
15. Do a category test to see if ―is‖ is an auxiliary. Test
succeeds. Set AUX register to ―is‖ and go to state Q3
16. Do a category test to see if ―printed‖ is a verb. it succeeds,
set V register to ―printed‖. Go to state Q4
17. now the input is exhausted, Q4 is an acceptable final
state. Pop and return the structure

S DCL (NP(FILE(LONG) DEFINITE)) IS (PRINTED))


• Rahul is eating an apple

NPR: A proper noun is someone's name


• Rahul is eating an apple
• Semantic Analysis and grammars

―I want to print Bill’s .init file‖


A semantic grammar can be
used by a parsing system in
exactly the same way in
which syntactic grammar
could be used. ATN parsing
system can also be used for
semantic grammars.
―I want to print
Bill’s .init file‖
• Ambiguities may arise using strictly parse like some of the
interpretations do not make sense semantically and thus
cannot be generated by a semantic grammar.
• Example: I want to print xy.txt on printer3.
• This Prepositional Phrase ―printer3‖ has no general notion of
a PP.
• No. of rules required can become very large since many
syntactic generalizations are missed.
• Parsing process may be expansive
Case Grammars
• Case grammar is a system of linguistic analysis that focuses on the
relationship between verbs and object values, and between
quantities and their grammars
• A different approach to the problem of how syntactic and semantic
interpretation can be combined.
• Grammar rules generally describes syntactic rather than semantic.
• Structures the rules produce correspond to semantic relations.

―Susan‖ and ―the file‖ are identical, their syntactic roles are reversed
Printed(agent Sussan)
Printed(object file)
• ―Mother‖ is the subject of baked and in the other ―the pie‖ is
the subject of baked
• Baked(agent mother) and baked(object pie)
S → M + NP

Where P - Set of relationships among verbs and noun phrases i.e. NP

M - Modality constituent

For example consider a sentence ―Ram did not eat the apple‖.

eat is the verb and


Ram, apple are nouns
(C) which are under the
case C1 and C2
Conceptual Parsing
• Strategy for finding both structure and the meaning of a sentence in one
step.
• Driven by a dictionary that describes the meaning of words as CD
structures.
• It makes use of a verb-ACT dictionary, contains an entry for each
environment in which a verb can appear.
describe a state of being or condition

The motor wants a tune-up.


Wanting something to happen

Wanting an object

Wanting a person
John wanted Mary to go to the store
Discourse and pragmatic processing
• Pragmatics and Discourse Analysis involve the study of
language in its contexts of use.
• Pragmatics focuses on the effects of context on meaning
• Discourse Analysis studies written and spoken language in
relation to its social context.
• It is necessary to consider the discourse and pragmatic
context in which the sentence was uttered.
– Bill had a red balloon
– John wanted it
• The word ―it‖ should be identified as red balloon (referred as
anaphora)
• Parts of entities
– John opened the book he just bought
– The tile page was torn
• Parts of actions
– John went on a business trip to New York
– He left on an early morning flight
• Entities involved in actions
– My house was broken into last week
– They took the TV and the stereo
• In order to recognize these kinds of relations among sentences, a
great deal of knowledge about the world being discussed is
required.
• The way this knowledge is organized is critical to the success of the
understanding.
Using focus in understanding
Two important parts of the process of using knowledge to facilitate
understanding
• Focus on the relevant parts of the available knowledge base
• Use that knowledge to resolve ambiguities and make connections
among things
Modelling Shared Beliefs
Represent the shared beliefs as facts.
Scripts have been used extensively to aid in natural language
understanding
Two steps in the process of using a script to aid in natural language
understanding
• Select the appropriate script(s) from memory
• Use the script(s) to fill in unspecified parts of the text to be
understood
Three different belief spaces
are shown
• S1 believes that Mary hit Bill
• S2 believes that sue hit Bill
• S3 believes that someone hit
Bill

• Who hit the bill?


Using goals and plans for understanding
• John was anxious to get his daughter’s new scooter put
together before Christmas eve. He looked high and low for a
screwdriver
To understand the above story, John had
• A goal, getting the scooter put together
• A plan, putting together the various subparts until the
scooter is complete by using screw driver to screw parts
together.
• To achieve goals, people do plans that can be identified
from the stories, such as
– Satisfaction goals, sleep food water
– Enjoyment goals, entrainment, competition
– Achievement goals, power, status
– Preservation goals, health, possessions
• To understand the text about John,
USE(A,P,G):
• pre-condition: know-what(P), near (A,P), has-control-of(A,P),
ready(P)
• Post-condition: done(G)
• Meaning is, A to use P to perform G, A must know the
location of P, A must be near P, A must have control of P, P
must be ready to use.
To know the location of the screwdriver use the operator
Look-for(A,P):
• pre-condition: can-recognize(A,P)
• Post-condition: know-what(A, location(P))

You might also like