0% found this document useful (0 votes)

48 views11 pages

Natural Language Processing

Uploaded by

cricketplayer242004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

48 views11 pages

Natural Language Processing

Uploaded by

cricketplayer242004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Natural Language processing

Analyzing Sentence Structure

Some Grammatical Dilemmas :

Linguistic Data and Unlimited Possibilities :

The gigantic corpus consisting of everything that has been either uttered or written in English over,
say, the last 50 years. Would we be justified in calling this corpus “the language of modern English”?

“modern English” is not equivalent to the very big set of word sequences in our imaginary corpus.
Speakers of English can make judgments about these sequences, and will reject some of them as
being ungrammatical.

It is easy to compose a new sentence and have speakers agree that it is perfectly good Englis h. For
example, sentences have an interesting property that they can be embedded inside larger
sentences. Consider the following sentences:

a. Usain Bolt broke the 100m record.

b. The Jamaica Observer reported that Usain Bolt broke the 100m record.

c. Andre said The Jamaica Observer reported that Usain Bolt broke the 100m record.

d. I think Andre said the Jamaica Observer reported that Usain Bolt broke the 100m record.

If we replaced whole sentences with the symbol S, we would see patterns like Andre said S and I
think S. These are templates for taking a sentence and constructing a bigger sentence. There are
other templates we can use, such as S but S and S when S. With a bit of ingenuity we can construct
some really long sentences using these templates.

This long sentence actually has a simple structure that begins S but S when S. We can see from this
example that language provides us with constructions which seem to allow us to extend sentences
indefinitely. It is also striking that we can understand sentences of arbitrary length that we’ve never
heard before: it’s not hard to concoct an entirely novel sentence, one that has probably never been
used before in the history of the language, yet all speakers of the language will understand it.

The purpose of a grammar is to give an explicit description of a language. But we think of a grammar
is closely intertwined with what we consider to be a language.

the formal framework of “generative grammar,” a “language” is considered to be nothing more than
an enormous collection of all grammatical sentences, and a grammar is a formal notation that can be
used for “generating” the members of this set. Grammars use recursive productions of the form S →
S and S.

Ubiquitous Ambiguity :

While hunting in Africa, I shot an elephant in my pajamas. How an elephant got into my pajamas I’ll
never know.

take a the ambiguity in the phrase: I shot an elephant in my pajamas. First we need to define a
simple grammar:
Natural Language processing

>>>groucho_grammer=nltk.CFG.fromstring("""

S -> NP VP

PP -> P NP

NP -> Det N | Det N PP | 'I'

VP -> V NP | VP PP

Det -> 'an' | 'my'

N -> 'elephant' | 'pajamas'

V -> 'shot'

P -> 'in'

""")

>>>sent = ['I', 'shot', 'an', 'elephant', 'in', 'my', 'pajamas']

>>>parser = nltk.ChartParser(groucho_grammer)

>>>trees=parser.parse(sent)

>>>for tree in trees:

print(tree)

out put :

(NP I)

(VP

(VP (V shot) (NP (Det an) (N elephant)))

(PP (P in) (NP (Det my) (N pajamas)))))

(NP I)

(VP
Natural Language processing

(V shot)

(NP (Det an) (N elephant) (PP (P in) (NP (Det my) (N pajamas))))))

tree.draw()

out put :

Notice that there’s no ambiguity concerning the meaning of any of the words; e.g., the word shot
doesn’t refer to the act of using a gun in the first sentence and using a camera in the second
sentence.

What’s the Use of Syntax :

Beyond n-grams :

One benefit of studying grammar is that it provides a conceptual framework and vocabulary for
spelling out these intuitions. Let’s take a closer look at the sequence the worst part and clumsy
looking. This looks like a coordinate structure, where two phrases are joined by a coordinating
conjunction such as and, but, or or. Here’s an informal (and simplified) statement of how
coordination works syntactically.

Coordinate Structure: if v1 and v2 are both phrases of grammatical category X, then v1 and v2 is also
a phrase of category X.

Constituent structure is based on the observation that words combine with other words to form
units. The evidence that a sequence of words forms such a unit is given by substitutability —that is, a
sequence of words in a well-formed sentence can be replaced by a shorter sequence without
rendering the sentence ill-formed. To clarify this idea, consider the following sentence:

EX : The little bear saw the fine fat trout in the brook.

The fact that we can substitute He for The little bear indicates that the latter sequence is a unit. By
contrast, we cannot replace little bear saw in the same way. (We use an asterisk at the start of a
sentence to indicate that it is ungrammatical.)

a. He saw the fine fat trout in the brook.

Natural Language processing

b. *The he the fine fat trout in the brook.

we systematically substitute longer sequences by shorter ones in a way which preserves

grammaticality. Each sequence that forms a unit can in fact be replaced by a single word, and we
end up with just two elements.

Substitution of word sequences:

Substitution of word sequences plus grammatical categories:

we have added grammatical category labels to the words we saw in the earlier figure. The labels NP,
VP, and PP stand for noun phrase, verb phrase, and prepositional phrase, respectively.

If we now strip out the words apart from the topmost row, add an S node, Each node in this tree
(including the words) is called a constituent. The immediate constituents of S are NP and VP.
Natural Language processing

Context-Free Grammar :

a simple context-free grammar (CFG). By convention, the lefthand side of the first production is the
start-symbol of the grammar, typically S, and all well-formed trees must have this symbol as their
root label. In NLTK, context free grammars are defined in the nltk.grammar module. we define a
grammar and show how to parse a simple sentence admitted by the grammar.

grammar1 = nltk.CFG.fromstring("""

S -> NP VP

VP -> V NP | V NP PP

PP -> P NP

V -> "saw" | "ate" | "walked"

NP -> "John" | "Mary" | "Bob" | Det N | Det N PP

Det -> "a" | "an" | "the" | "my"

N -> "man" | "dog" | "cat" | "telescope" | "park"

P -> "in" | "on" | "by" | "with"

""")

>>> sent = "Mary saw Bob".split()

>>> rd_parser = nltk.RecursiveDescentParser(grammar1)

Natural Language processing

>>> for tree in rd_parser.parse(sent):

... print(tree)

(S (NP Mary) (VP (V saw) (NP Bob)))

Syntactic Categories

Recursive Descent Parser Demo:

Developing a simple grammar of our own, using the recursive descent parser application,
nltk.app.rdparser().

If we parse the sentence The dog saw a man in the park using the grammar end up with two trees.
Natural Language processing

Parsing with Context-Free Grammar:

A parser processes input sentences according to the productions of a grammar, and builds
one or more constituent structures that conform to the grammar. A grammar is a
declarative specification of well-formedness—it is actually just a string, not a program. A
parser is a procedural interpretation of the grammar. It searches through the space of trees
licensed by a grammar to find one that has the required sentence along its fringe.

A parser permits a grammar to be evaluated against a collection of test sentences, help ing
linguists to discover mistakes in their grammatical analysis. A parser can serve as a model of
psycholinguistic processing, helping to explain the difficulties that humans have with
processing certain syntactic constructions. Many natural language applications involve
parsing at some point.

Recursive Descent Parsing :

Natural Language processing

The simplest kind of parser interprets a grammar as a specification of how to break

a high-level goal into several lower-level subgoals. The top-level goal is to find
an S. The S → NP VP production permits the parser to replace this goal with two
subgoals: find an NP, then find a VP. Each of these subgoals can be replaced in turn
by sub-sub-goals, using productions that have NP and VP on their left-hand side.
Eventually, this expansion process leads to subgoals such as: find the
word telescope. Such subgoals can be directly compared against the input
sequence, and succeed if the next word is matched. If there is no match the parser
must back up and try a different alternative.

The recursive descent parser builds a parse tree during the above process. With
the initial goal (find an S), the S root node is created. As the above process
recursively expands its goals using the productions of the grammar, the parse tree
is extended downwards (hence the name recursive descent). We can see this in
action using the graphical demonstration nltk.app.rdparser(). Six stages of the
execution of this parser are

During this process, the parser is often forced to choose between several possible
productions. For example, in going from step 3 to step 4, it tries to find
productions with N on the left-hand side. The first of these is N → man. When this
does not work it backtracks, and tries other N productions in order, until it gets
to N → dog, which matches the next word in the input sentence. Much later, as
shown in step 5, it finds a complete parse. This is a tree that covers the entire
sentence, without any dangling edges. Once a parse has been found, we can get
the parser to look for additional parses. Again it will backtrack and explore other
choices of production in case any of them result in a parse.

NLTK provides a recursive descent parser:

Natural Language processing

>>> rd_parser = nltk.RecursiveDescentParser(grammar1)

>>> sent = 'Mary saw a dog'.split()

>>> for tree in rd_parser.parse(sent):

... print(tree)

(S (NP Mary) (VP (V saw) (NP (Det a) (N dog))))

Shift-Reduce Parsing :

A simple kind of bottom-up parser is the shift-reduce parser. In common with all bottom-up parsers,
a shift-reduce parser tries to find sequences of words and phrases that correspond to the righthand
side of a grammar production, and replace them with the lefthand side, until the whole sentence is
reduced to an S.

The shift-reduce parser repeatedly pushes the next input word onto a stack this is the shift
operation. If the top n items on the stack match the n items on the righthand side of some
production, then they are all popped off the stack, and the item on the lefthand side of the
production is pushed onto the stack. This replace ment of the top n items with a single item is the
reduce operation. The operation may be applied only to the top of the stack; reducing items lower in
the stack must be done before later items are pushed onto the stack. The parser finishes when all
the input is consumed and there is only one item remaining on the stack, a parse tree with an S node
as its root.

Six stages of the execution of this parser are

Natural Language processing

NLTK provides ShiftReduceParser(), a simple implementation of a shift-reduce parser. This parser

does not implement any backtracking, so it is not guaranteed to find a parse for a text, even if one
exists.

>>> sr_parse = nltk.ShiftReduceParser(grammar1)

>>> sent = 'Mary saw a dog'.split()

>>> print sr_parse.parse(sent)

(S (NP Mary) (VP (V saw) (NP (Det a) (N dog))))

The advantages of shift-reduce parsers over recursive descent parsers is that they only build
structure that corresponds to the words in the input.

The Left-Corner Parser :

One of the problems with the recursive descent parser is that it goes into an infinite loop
when it encounters a left-recursive production. This is because it applies the grammar
productions blindly, without considering the actual input sentence. A left-corner parser is a
hybrid between the bottom-up and top-down approaches.

A left-corner parser is a top-down parser with bottom-up filtering. Unlike an ordinary

recursive descent parser, it does not get trapped in left-recursive productions. Before
starting its work, a left-corner parser preprocesses the context-free grammar to build a
table where each row contains two cells, the first holding a non-terminal, and the second
holding the collection of possible left corners of that non-terminal.

Grammar grammar1 allows us to produce the following parse of John saw Mary:

The following productions for expanding NP :

Natural Language processing

Each time a production is considered by the parser, it checks that the next input word is compatible
with at least one of the pre-terminal categories in the left-corner table.

NLP Unit-2
No ratings yet
NLP Unit-2
42 pages
Optometry AND Ophthalmology: Kerala Government Optometrists Association PSC Training
100% (7)
Optometry AND Ophthalmology: Kerala Government Optometrists Association PSC Training
137 pages
Nelson MHF 4U Advanced Function 1.1
No ratings yet
Nelson MHF 4U Advanced Function 1.1
10 pages
A5 - Group 2 - Morphology and Syntax - NRC 15519
No ratings yet
A5 - Group 2 - Morphology and Syntax - NRC 15519
19 pages
Chapter 2 Automata
0% (1)
Chapter 2 Automata
29 pages
Nayan Ranjan Paul - NLP-3
No ratings yet
Nayan Ranjan Paul - NLP-3
134 pages
Unit Iii - NLP
No ratings yet
Unit Iii - NLP
36 pages
21cse356t NLP Unit 2
No ratings yet
21cse356t NLP Unit 2
89 pages
Constituency Parsing
No ratings yet
Constituency Parsing
94 pages
Series 63 Round Bottom Boats
No ratings yet
Series 63 Round Bottom Boats
54 pages
6 Languages Grammars
No ratings yet
6 Languages Grammars
37 pages
04 - Parsing in NLP
No ratings yet
04 - Parsing in NLP
39 pages
03 - ACS 5000AD TC SW Commissioning Manual
No ratings yet
03 - ACS 5000AD TC SW Commissioning Manual
76 pages
Umts Cell Selection and Reselection
0% (1)
Umts Cell Selection and Reselection
29 pages
NL11SyntaxandContext Free Grammars
No ratings yet
NL11SyntaxandContext Free Grammars
185 pages
LG Catalog107
No ratings yet
LG Catalog107
40 pages
Bedanta Subhankar Dissertation
No ratings yet
Bedanta Subhankar Dissertation
188 pages
Natural Language Processing Artificial Intelligence
100% (2)
Natural Language Processing Artificial Intelligence
81 pages
Natural Language Processing
No ratings yet
Natural Language Processing
47 pages
Synthesis and Characterization of ZnCo2O4 Nanomaterial For Symmetric Supercapacitor Applications
100% (8)
Synthesis and Characterization of ZnCo2O4 Nanomaterial For Symmetric Supercapacitor Applications
4 pages
V1000 - Quick Start Guide Europe
No ratings yet
V1000 - Quick Start Guide Europe
36 pages
Intro 7 Syntax
No ratings yet
Intro 7 Syntax
39 pages
Emilio Aguinaldo College - Manila: Automatic Knock Sensing Door Lock System
No ratings yet
Emilio Aguinaldo College - Manila: Automatic Knock Sensing Door Lock System
7 pages
Ch4-Phrase-Structure Grammars and Dependency Grammars PDF
No ratings yet
Ch4-Phrase-Structure Grammars and Dependency Grammars PDF
48 pages
Natural Language Processing PDF
100% (1)
Natural Language Processing PDF
47 pages
NLP Module-3
No ratings yet
NLP Module-3
67 pages
Palaeogeography, Palaeoclimatology, Palaeoecology
No ratings yet
Palaeogeography, Palaeoclimatology, Palaeoecology
24 pages
Introductory Lecture For TGG (Transformational-Generative Grammar) (Lecture 3)
No ratings yet
Introductory Lecture For TGG (Transformational-Generative Grammar) (Lecture 3)
16 pages
ACMV Book
No ratings yet
ACMV Book
27 pages
Class 12 Maths Project Helpful
No ratings yet
Class 12 Maths Project Helpful
23 pages
Week 4-5-Syntax
No ratings yet
Week 4-5-Syntax
50 pages
Gravitation Notes
No ratings yet
Gravitation Notes
21 pages
Sect 3 Fuel System 1fs Engine Ce303
No ratings yet
Sect 3 Fuel System 1fs Engine Ce303
28 pages
Colipa Guidelines Efficacy - Revised - 5 May 2008
100% (1)
Colipa Guidelines Efficacy - Revised - 5 May 2008
18 pages
Syntax 2023 Notes
100% (1)
Syntax 2023 Notes
33 pages
Syntax Analysis
No ratings yet
Syntax Analysis
87 pages
Workshop 1 - PRF192 - FPTU
No ratings yet
Workshop 1 - PRF192 - FPTU
4 pages
NLP Unit-2
No ratings yet
NLP Unit-2
31 pages
DeepMicrobes Taxonomic Classification For Metagenomics Using Deep Learning
No ratings yet
DeepMicrobes Taxonomic Classification For Metagenomics Using Deep Learning
13 pages
NI Tutorial 3173 en PDF
No ratings yet
NI Tutorial 3173 en PDF
9 pages
Chapter 4-Handout
No ratings yet
Chapter 4-Handout
12 pages
Diode Zener SMD 5v1
No ratings yet
Diode Zener SMD 5v1
4 pages
Natural Language Processing
No ratings yet
Natural Language Processing
57 pages
NLP Module 3
No ratings yet
NLP Module 3
60 pages
Pedestrian Wind Comfort Study Using Computational Fluid Dynamic (CFD) Simulation
No ratings yet
Pedestrian Wind Comfort Study Using Computational Fluid Dynamic (CFD) Simulation
17 pages
Congenital Heart Desease
100% (1)
Congenital Heart Desease
212 pages
MNLP Unit-2
No ratings yet
MNLP Unit-2
96 pages
Unit-4.5 Fuzzy Fication and Defuzzification and Applications
No ratings yet
Unit-4.5 Fuzzy Fication and Defuzzification and Applications
7 pages
Syntax-Semantic Asmaa El Ouafi Class Notes
No ratings yet
Syntax-Semantic Asmaa El Ouafi Class Notes
12 pages
SMC Learning Module For Students in NLP001 1
No ratings yet
SMC Learning Module For Students in NLP001 1
44 pages
Diesel Truck Engine 305 HP: 1150 LB-FT at 1200 RPM Peak Torque
No ratings yet
Diesel Truck Engine 305 HP: 1150 LB-FT at 1200 RPM Peak Torque
2 pages
Syntax Parsing
No ratings yet
Syntax Parsing
95 pages
Grade 7 Multiple 1st Grading
No ratings yet
Grade 7 Multiple 1st Grading
3 pages
Chapter Five
No ratings yet
Chapter Five
96 pages
Natural Language Processing
No ratings yet
Natural Language Processing
21 pages
NLP Unit 2 Part 1
No ratings yet
NLP Unit 2 Part 1
28 pages
Syntax Lecture 1
No ratings yet
Syntax Lecture 1
40 pages
UNIT 3 - Formal Grammar IN ENGLISH
No ratings yet
UNIT 3 - Formal Grammar IN ENGLISH
5 pages
Transformational Grammar.
No ratings yet
Transformational Grammar.
19 pages
ELE303 - Lab Report Template
No ratings yet
ELE303 - Lab Report Template
11 pages
Lecture 6
No ratings yet
Lecture 6
43 pages
Unit 4
No ratings yet
Unit 4
45 pages
Syntactic Analysis - I
No ratings yet
Syntactic Analysis - I
28 pages
Testing and Evaluating Glycol Sample
No ratings yet
Testing and Evaluating Glycol Sample
3 pages
Artificial Intelligence: Natural Language Processing II
No ratings yet
Artificial Intelligence: Natural Language Processing II
51 pages
214 Grammar 2014
No ratings yet
214 Grammar 2014
50 pages
Natural Language Processing - Notes - Unit 3
No ratings yet
Natural Language Processing - Notes - Unit 3
19 pages
Magic Square AP PC Unit 1 Review
No ratings yet
Magic Square AP PC Unit 1 Review
5 pages
Applied Ai U5
No ratings yet
Applied Ai U5
48 pages
NLP Chapter 3
No ratings yet
NLP Chapter 3
50 pages
Syntax PPT Lecture
No ratings yet
Syntax PPT Lecture
10 pages
Units - 2.1
No ratings yet
Units - 2.1
8 pages
Overview of Linguistics
No ratings yet
Overview of Linguistics
19 pages
IOQM Mock Test-3
100% (2)
IOQM Mock Test-3
7 pages
Context-Free Grammars and Constituency Parsing
No ratings yet
Context-Free Grammars and Constituency Parsing
26 pages
Unit V Intelligence and Applications: Morphological Analysis/Lexical Analysis
No ratings yet
Unit V Intelligence and Applications: Morphological Analysis/Lexical Analysis
30 pages
Ai Phases in NLP Sem Vi
No ratings yet
Ai Phases in NLP Sem Vi
3 pages
Ai Unit 5
No ratings yet
Ai Unit 5
19 pages
12
No ratings yet
12
31 pages
Natural Language Understanding
No ratings yet
Natural Language Understanding
41 pages
Syntax: The Sentence Patterns of Language
No ratings yet
Syntax: The Sentence Patterns of Language
47 pages
Lecture-8. Only For This Batch
No ratings yet
Lecture-8. Only For This Batch
46 pages
Context Free Grammars
No ratings yet
Context Free Grammars
38 pages
Chapter 9
No ratings yet
Chapter 9
34 pages
The Study of Language
No ratings yet
The Study of Language
26 pages
How To Describe English Sentences
No ratings yet
How To Describe English Sentences
38 pages
Practising The Piano (By Frank Marrick) (1958)
91% (33)
Practising The Piano (By Frank Marrick) (1958)
138 pages
Lecture 12
No ratings yet
Lecture 12
34 pages

Natural Language Processing

Uploaded by

Natural Language Processing

Uploaded by

Natural Language processing

Analyzing Sentence Structure

Some Grammatical Dilemmas :

Linguistic Data and Unlimited Possibilities :

a. Usain Bolt broke the 100m record.

NP -> Det N | Det N PP | 'I'

Det -> 'an' | 'my'

N -> 'elephant' | 'pajamas'

>>>sent = ['I', 'shot', 'an', 'elephant', 'in', 'my', 'pajamas']

>>>for tree in trees:

(VP (V shot) (NP (Det an) (N elephant)))

(PP (P in) (NP (Det my) (N pajamas)))))

What’s the Use of Syntax :

a. He saw the fine fat trout in the brook.

b. *The he the fine fat trout in the brook.

we systematically substitute longer sequences by shorter ones in a way which preserves

Substitution of word sequences:

Substitution of word sequences plus grammatical categories:

V -> "saw" | "ate" | "walked"

NP -> "John" | "Mary" | "Bob" | Det N | Det N PP

Det -> "a" | "an" | "the" | "my"

N -> "man" | "dog" | "cat" | "telescope" | "park"

P -> "in" | "on" | "by" | "with"

>>> sent = "Mary saw Bob".split()

>>> rd_parser = nltk.RecursiveDescentParser(grammar1)

>>> for tree in rd_parser.parse(sent):

(S (NP Mary) (VP (V saw) (NP Bob)))

Recursive Descent Parser Demo:

Parsing with Context-Free Grammar:

Recursive Descent Parsing :

The simplest kind of parser interprets a grammar as a specification of how to break

NLTK provides a recursive descent parser:

>>> rd_parser = nltk.RecursiveDescentParser(grammar1)

>>> sent = 'Mary saw a dog'.split()

>>> for tree in rd_parser.parse(sent):

(S (NP Mary) (VP (V saw) (NP (Det a) (N dog))))

Six stages of the execution of this parser are

NLTK provides ShiftReduceParser(), a simple implementation of a shift-reduce parser. This parser

>>> sr_parse = nltk.ShiftReduceParser(grammar1)

>>> sent = 'Mary saw a dog'.split()

>>> print sr_parse.parse(sent)

(S (NP Mary) (VP (V saw) (NP (Det a) (N dog))))

The Left-Corner Parser :

A left-corner parser is a top-down parser with bottom-up filtering. Unlike an ordinary

The following productions for expanding NP :

You might also like