NLP 01
NLP 01
K.R. Chowdhary
Professor & Head CSE Dept.
M.B.M. Engineering College, Jodhpur, India
Abstract
The notes present the basic concepts of Natural Language Pro-
cessing (NLP), the ambiguity issues, various applications of NLP, the
grammars and parsing techniques.
1 Introduction
Developing a program that understands natural language is a difficult prob-
lem. Number of natural languages are large, they contain infinitely many
sentences. Also there is much ambiguity in natural language. Many words
have several meanings, such as can, bear, fly, orange, and sentences have
meanings different in different contexts. This makes creation of programs
that understands a natural language, a challenging task.
2 Challenges of NLP
Many times the word boundaries are mixed and the sentence understood
are totally different.
At the next level, the syntax of the language help us to decide how the
words are being combined to make larger meanings. Hence, when there is
sentence “the dealer sold the merchant a dog,”it is important to be clear
about what is sold to whom. Some of the common examples are:
1
(Is this a statement or question?)
Here the problem is deciding whether “it”in the sentence refers to aisle
or three, the milk, or even the store. ”
The most important part in the above is what is internal representation,
so that these ambiguities in understanding the sentence do not occur and
machine understands the way a human being understands the sentences.
3 Applications
There is huge amounts of data in Internet,at least 20 billions pages. Appli-
cations for processing large amounts of texts require NLP expertise. Some
requirements are:
• Automatic translation
• Question answering
• Knowledge acquisition
2
an Assistant Account Manager to help manage and coordinate interactive
marketing initiatives for a marquee automotive account. Experience in on-
line marketing, automotive and/or the advertising field is a plus. Assistant
Account Manager Responsibilities Ensures smooth implementation of pro-
grams and initiatives Helps manage the delivery of projects and key client
deliverables . . . Compensation: 50, 000−80,000 Hiring Organization: Firm
XYZ.”
Given the above text, the extracted information may be:
INDUSTRY Advertising
POSITION Assistant Account Manager
LOCATION Bigtown, CA.
COMPANY Firm XYZ
SALARY 50, 000−80,000
4 Computational Linguistics
A simple sentence consists a subject followed with predicate. A word in
a sentence acts a part of speech (POS). For English sentence, the parts of
speech are: nouns, pronouns, adjectives, verb, adverb, prepositions, conjunc-
tions, and interjections. Noun tells about names, where as the verb talks of
action. Adjectives and adverbs are modifying the nouns and verbs, respec-
tively. prepositions are relationships between nouns and other POS. Con-
junctions joins words and groups together, and interjections express strong
feelings.
Most of us understand both written and spoken language, but reading is
learned much later, so let us start with spoken language. We can divide the
problem into three areas - acoustic-phonetic, morphological-syntactic, and
semantic-pragmatic processes as shown in figure 1.
3
The components of the knowledge needed to understand the language
are following:
4
of generation, S is start symbol, and P is set of production rules. The
corresponding language of G is L(G).
Consider that various tuples are as given follows:
V = {S, N P, N, V P, V, Art}
P = {S → N P V P,
N P → N,
N P → ART N,
V P → V N P,
N → boy | icecream | dog,
V → ate | like | bite,
Art → the | a}
S → aS
S → aAB
AB → BA
aA → ab
aA → aa
5
where uppercase letters are non-terminals and lowercase are terminals.
The type-2 grammars are:
S → aS
S → aSb
S → aB
S → aAB
A→a
B→b
S → aS
S→a
6 Structural Representation
It is convenient to represent the sentences as tree or a graph to help expose
the structure of the constituent parts. For example, the sentence, ’the boy
ate a icecream’ can be represented as a tree shown in figure 3.
6
(S (NP ((Art the)
(N boy))
(VP (V ate) (NP (Art a) (N Icecream)))))
P P → P rep N P,
V P → V ADV
V P → V P P,
V P → V NP PP
V P → AU X V N P
Det → Art ADJ,
Det → Art
7 Transformational Grammars
The grammar discussed above produce produce different structures for dif-
ferent sentences, even though they have same meaning. For example,
In the above, the subject and object roles are switched. In the first,
subject is RAm and object is Book, while in second sentence they are other
way round. This, is undesirable feature for machine processing of a language.
In fact, sentences having same meaning should map to the same internal
structures.
By adding some extra components, we can produce a single represen-
tation for sentences having the same meaning, through a series of transfor-
mations. This extended grammar is called Transformational grammar. In
addition, the semantic and phonological components components, added as
7
new, helps in interpreting the output of the syntactic components, as mean-
ing and sound sequences. The transformations are tree manipulation rules,
which are taken from dictionary, where words contain semantic featuring
each of the lexicon.
Using transformational generative grammar, a sentence is analyzed in
two stages, (1) basic structure of the sentence is analyzed to determine
the grammatical constitutional parts, which provides the structure of the
sentence. (2) This is transformed into another form, where deeper semantic
structure is determined.
The application of transformations is to produce a change from passive
voice form of the sentence into active voice, change a question to declarative
form, handle negations, and provide subject-verb agreement. The figure 4
shows the three stages of conversion, from passive voice to active voice of a
sentence.
8
PP -> preposition NP; from New Delhi. (the NP can be location,
date, time or others)
The following examples show the substitution rules along with values for
each parts-of-speech to be substituted.
9
S -> VP
VP -> V NP
10 Ambiguous Grammars
The ambiguous grammars have more than one parse-trees, for the same
sentence. Consider the sentence “He drove down the street in the Car”.
The parse-trees are given in figure 7 and 9. A process for drawing the
parse-trees is grouping the words to realize the structure in the sentence.
Figure 6 and 8, demonstrate grouping of the words for parse-trees shown
in figures 7 and 9, respectively.
10
Figure 7: Parsing-1: He drove down the street in the car.
11
Figure 9: Parsing-2: He drove down the street in the car.
S -> NP VP
S -> Aux NP VP
S -> VP
NP -> Det Nom
Nom -> Noun Noam
Nom -> N
NP -> proper-N
VP -> V
VP -> V NP
Det -> a | an | the
N -> book | flight | meal
V -> book | include | proper
Aux -> Does
prep -> from | to | on
Proper-N -> Mumbai
Nomial -> Nomial PP
12
Figure 10: Parsing: Book that flight.
the input sentence, are rejected, leaving behind the trees that represent the
successful parses. Going this way, ultimately get the sentence: Book that
flight.
12 Summary
1. Natural language processing is a complex task, due to variety of struc-
tures of sentences, and ambiguity in the language. The ambiguities
occur at phonetic levels, semantic levels, and pragmatic levels.
13
proper theory of type 0, and 1, the theory of type 2 (context-free)
grammar is applied to NLP also.
(a) Phonetic
(b) Syntactic
(c) Pragmatic
4. Develop the parse tree to generate the sentence “Rajan slept on the
bench”using following rewrite rules:
14
S → NP V P
NP → N
N P → Det N
V P → V PP
P P → P rep N P
N → Rajan | bench
Det → the
prep → on
8. Given the parse-tree in figure 12, construct the grammar for this.
9. Construct the grammars and parse tree for the following sentences.
References
[1] Dan W. Patterson, Introduction to Artificial Intelligence and Expert Sys-
tem , PHI, 2001, Chapter 12.
15