Unit 4

NLP

Uploaded by

adhikariprabn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views45 pages

Unit 4

NLP

Uploaded by

adhikariprabn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 45

Unit 4

Syntax
CFG, Probabilistic CFG
Word’s Constituency (Phrase level, Sentence level),
Parsing (Top-Down and Bottom-Up),
CYK Parser, Probabilistic Parsing

Natural Language Processing (NLP)

MDS 555
Objective
●
CFG
●
Probabilistic CFG
●
Word’s Constituency (Phrase level, Sentence
level)
●
Parsing (Top-Down and Bottom-Up)
●
CYK Parser
●
Probabilistic Parsing
Grammar
●
Grammar is the structure and system of a
language
●
It consists of
– Syntax
– Morphology
Constituency
●
Syntactic constituency is the idea that groups of
words can behave as single units, or
constituents.
●
Part of developing a grammar involves building
an inventory of the constituents in the language.
●
Constituents
– are groups of words behaving as single units and
consist of phrases, words, or morphemes
Constituency
●
Consider the noun phrase, a sequence of words surrounding at
least one noun.
– Here are some examples of noun phrases

What evidence do we have that these words group together (or

“form constituents”)?
– One piece of evidence is that they can all appear in similar
syntactic environments, for example, before a verb
Morphemes
●
A morpheme is the smallest unit of meaning in a
language.
●
A typical word consists of one or more morphemes
●
Morphemes lack independence, as words can comprise
multiple morphemes
●
For example, “apple” is a word and also a morpheme.
“Apples” is a word comprised of two morphemes, “apple”
and “-s”, which is used to signify the noun is plural.
Words
●
Words are the smallest units with independent
meanings, making them the focus of our
analysis
●
In NLP in word level we perform: POS Tagging
●
Primary POS/Tags
– Noun(N), Verb(V), Adjective(ADJ), Adverb(ADV)
Phrases
●
A phrase can consist of a single word or a combination of
words, depending on its position and role in a sentence.
●
There are five major categories of phrases:
– Noun Phrase (NP)
– Verb Phrase (VP)
– Adjective Phrase (ADJP)
– Adverb Phrase (ADVP)
– Preposition Phrase (PP)
Phrases
●
Noun phrase (NP): These phrases revolve around a noun as the head word and often
serve as subjects or objects of verbs. They can typically be replaced by a pronoun
without affecting the sentence’s syntactical correctness.
●
Verb phrase (VP): Verb phrases have a verb as the headword, and they can take
various forms. Some include finite verb components, while others focus on the finite verb
itself. These play a significant role in both constituency and dependency grammars.
●
Adjective phrase (ADJP): These phrases feature an adjective as the head word and
serve to describe or qualify nouns and pronouns in a sentence.
●
Adverb phrase (ADVP): Adverb phrases use an adverb as the head word and serve as
modifiers for nouns, verbs, or other adverbs.
●
Prepositional phrase (PP): Prepositional phrases involve a preposition as the head
word and other lexical components, providing additional details that describe other words
or phrases.
Context Free Grammars
●
A widely used formal system for modeling constituent
structure in natural language is the context-free grammar, or
CFG.
●
Context-free grammars are also called phrase-structure
grammars, and the formalism is equivalent to Backus-Naur
form,(BNF).
●
The idea of basing a grammar on constituent structure dates
back to the psychologist Wilhelm Wundt (1900) but was not
formalized until Chomsky (1956) and, independently, Backus
(1959).
CFG
●
A context-free grammar consists of a set of rules or productions,
each of which expresses the ways that symbols of the language can
be grouped and ordered together, and a lexicon of words and
symbols.
NP → Det Nominal
NP → ProperNoun
Nominal → Noun | Nominal Noun

– For example, the above productions express that an NP (or noun phrase) can
be composed of
●
either a Proper Noun or a determiner (Det) followed by a Nominal;
●
a Nominal in turn can consist of one or more Nouns.
CFG
●
A CFG can be thought of in two ways:
●
as a device for generating sentences and
●
as a device for assigning a structure to a given sentence.

●
Viewing a CFG as a generator,
– we can read the → arrow as “rewrite the symbol on the left with the string
of symbols on the right”.
●
So starting from the symbol: NP
●
we can use our first rule to rewrite NP as: DET Nominal
●
and then rewrite Nominal as: Noun
●
and finally rewrite these parts-of-speech as: a flight
CFG
●
We say the string a flight can be derived from
the non-terminal NP
– A CFG can be used to generate a set of strings. This sequence
of rule expansions is called a derivation of the string of words
– It is common to represent a derivation
by a parse tree
(commonly shown inverted with
the root at the top)
CFG
CFG

●
We can use this grammar to generate sentences of this “language”.
– We start with S, expand it to NP VP, then choose a random expansion of NP (let’s
say, to I)
– and a random expansion of VP (let’s say, to Verb NP),
– and so on until we generate the string
I prefer a morning flight
Context-Free Grammar (CFG)
●
Formal Definition
– A context-free grammar (CFG) G is a
quadruple (N, Σ, R, S) where
CFG - Example
●
N = {q, f,}
●
Σ = {0, 1}
●
R = {q → 11q, q → 00f,
●
f → 11f, f → ε }
●
S=q
●
(R= {q → 11q | 00f, f → 11f | ε })
CFG - Rules
●
If A → B, then xAy → xBy and we say that
●
xAy derivates xBy.

●
If s → ··· → t, then we write s * t.
●
A string x in Σ* is generated by G=(V,Σ,R,S)
if S * x.
●
L(G) = { x in Σ* | S * x}.
CFG - Example
●
G = ({S}, {0,1}. {S → 0S1 | ε }, S)
●
ε in L(G) because
S → ε.
●
01 in L(G) because
S → 0S1 → 01.
●
0011 in L(G) because
S → 0S1 → 00S11 → 0011.
●
L(G) = {0n 1n | n > 0}
Context-free Language (CFL)
●
A language L is context-free if there exists a
CFG G such that L = L(G).
●
A grammar G generates a language L
Example
G = (N, T, S, P) P = { S → NP VP
V = {S, NP, VP, PP, Det, Noun, NP → Det Noun | NP PP
Verb, Aux, Pre } PP → Pre NP
VP → Verb NP
T = {‘a’, ‘ate’, ‘cake’, ‘child’, Det → ‘a’ | ‘the’
‘fork’, ‘the’, ‘with’} Noun → ‘cake’ | ‘child’ | ‘fork’
S=S Pre → ‘with’
Verb → ‘ate’}
Example
●
Some notes:
– Note 1: In P, pipe symbol (|) is used to combine productions into
single representation for productions that have same LHS.
●
For example, Det → ‘a’ | ‘the’ derived from two rules Det → ‘a’ and Det →
‘the’. Yet it denotes two rules not one.
– Note 2: The production highlighted in red are referred as grammar,
and green are referred as lexicon.
– Note 3:
●
NP – Noun Phrase, VP – Verb Phrase, PP – Prepositional Phrase, Det –
Determiner, Aux – Auxiliary verb
Sample derivation
●
S → NP VP P = { S → NP VP
→ Det Noun VP NP → Det Noun | NP PP
→ the Noun VP PP → Pre NP
→ the child VP VP → Verb NP
→ the child Verb NP Det → ‘a’ | ‘the’
→ the child ate NP Noun → ‘cake’ | ‘child’ | ‘fork’
→ the child ate Det Noun Pre → ‘with’
→ the child ate a Noun Verb → ‘ate’}
→ the child ate a cake
Probabilistic Context Free Grammar (PCFG)
●
PCFG is an extension of CFG with a probability for each
production rule
●
Ambiguity is the reason why we are using probabilistic version
of CFG
– For instance, some sentences may have more than one underlying derivation.
– The sentence can be parsed in more than one ways.
– In this case, the parse of the sentence become ambiguous.

●
To eliminate this ambiguity, we can use PCFG to find the
probability of each parse of the given sentence
PCFG - Definition
●
A probabilistic context free grammar G is a quintuple G = (N, T, S, R, P)
where
– (N, T, S, R) is a context free grammar
where N is set of non-terminal (variable) symbols, T is set of terminal symbols, S is the start
symbol and R is the set of production rules where each rule of the form A → S

– A probability P(A → s) for each rule in R. The properties governing the probability
are as follows;
●
P(A → s) is a conditional probability of choosing a rule A → s in a left-most derivation, given that
A is the non-terminal that is expanded.
●
The value for each probability lies between 0 and 1.
●
The sum of all probabilities of rules with A as the left hand side non-terminal should be equal to
1.
PCFG - Example
●
Probabilistic Context Free Grammar G = (N, T, S, R, P)

N = {S, NP, VP, PP, Det, Noun, Verb, Pre}

T = {‘a’, ‘ate’, ‘cake’, ‘child’, ‘fork’, ‘the’, ‘with’}

S=S

R={ S → NP VP
NP → Det Noun | NP PP
PP → Pre NP
VP → Verb NP
Det → ‘a’ | ‘the’
Noun → ‘cake’ | ‘child’ | ‘fork’
Pre → ‘with’
Verb → ‘ate’ }
R={

PCFG - Example S → NP VP
NP → Det Noun | NP PP
PP → Pre NP
VP → Verb NP
●
P = R with associated probability Det → ‘a’ | ‘the’
Noun → ‘cake’ | ‘child’ | ‘fork’
as in the table below Pre → ‘with’
Verb → ‘ate’ }

Please observe from the table, the sum of probability values for all rules that have same left hand side is 1
Parse
●
Resolve (a sentence) into its component parts
and describe their syntactic roles.
●
On NLP – Parsing can be visualized in the tree
form
Syntax Parsing
Mostly used in programming
b = c + 1;
a= a - d
Parse Tree
●
A parse of the sentence "the giraffe dreams" is:
– s => np vp => det n vp => the n vp => the giraffe vp
=> the giraffe iv => the giraffe dreams
Parsing
●
In natural language processing, parsing is the
process of analyzing a sentence to determine its
grammatical structure
●
There are two main approaches to parsing:
– top-down parsing
– bottom-up parsing
Top-down Parsing
●
Top-down parsing is a parsing technique that starts with the
highest level of a grammar’s production rules, and then works its
way down to the lowest level.
– It begins with the start symbol of the grammar and applies the production
rules recursively to expand it into a parse tree.
– One example of a top-down parsing algorithm is the Recursive Descent
Parsing.
Top-down Parsing
●
For example, consider the following CFG:
S -> NP VP
NP -> Det N
VP -> V NP
Det -> the | a
N -> dog | cat | boy | girl
V -> chased | hugged

●
A top-down parser would begin with the start symbol “S” and then
apply the production rule “S -> NP VP” to expand it into “NP VP”.
●
The parser would then apply the production rule “NP -> Det N” to
expand “NP” into “Det N”.
Buttom-up Parsing
●
Bottom-up parsing is a parsing technique that starts with the
sentence’s words and works its way up to the highest level of the
grammar’s production rules.
●
It begins with the input sentence and applies the production rules
in reverse, reducing the input sentence to the start symbol of the
grammar.
●
One example of a bottom-up parsing algorithm is the Shift-
Reduce Parsing.
Buttom-up Parsing
●
For example, consider the following CFG:
S -> NP VP
NP -> Det N
VP -> V NP
Det -> the | a
N -> dog | cat | boy | girl
V -> chased | hugged

●
A bottom-up parser would begin with the input sentence “the dog chased the cat” and would apply the
production rules in reverse to reduce it to the start symbol “S”.
●
The parser would start by matching
– “the dog” to the “Det N” production rule,
– “chased” to the “V” production rule, and
– “the cat” to another “Det N” production rule.

●
These reduce steps will be repeated until the input sentence is reduced to “S”, the start symbol of the
grammar.
Probability of a parse tree
●
Use of PCFG
– A sentence can be parsed into more than one way
– We can have more than one parse trees for the
sentence as per the CFG due to ambiguity.
Probability of a parse tree
●
Given a parse tree t,
– with the production rules α1 → β1, α2 → β2, … , αn → βn
– from R (ie., αi → βi ∈ R), we can find the probability of tree t using
PCFG as follows;

●
As per the equation, the probability P(t) of parse tree is the
product of probabilities of production rules in the tree t.
Probability of a parse tree
●
Which is the most probable tree?
– The probability of the parse tree t1 is greater than
the probability of parse tree t2. Hence, t1 is the more
probable of the two parses.
Probability of a sentence
●
Probability of a sentence is the sum of probabilities of all
parse trees that can be derived from the sentence under
PCFG

●
Probability of the sentence “astronomers saw the stars
with ears”
Ambiguity
●
Ambiguity is the most serious problem faced by
syntactic parsers
●
The most common ambiguity is
– Structural ambiguity
Ambiguity
●
The phrase in my pajamas can be part of the NP
headed by elephant or a part of the VP headed
by shot
●
Two parse trees for an ambiguous
sentence. The parse on the left
corresponds to the humorous reading
in which the elephant is in the
pajamas,
●
the parse on the right corresponds to
the reading in which Captain Spaulding
did the shooting in his pajamas
Self Study
●
Chomsky Normal Form (CNF)
●
Cocke–Younger–Kasami (CYK) algorithm
Treebank
●
A corpus in which every sentence is annotated with a
parse tree is called a treebank
●
Treebanks play an important role in parsing as well as in
linguistic investigations of syntactic phenomena
Reference
●
Chapter 17 - Speech and Language Processing (3rd Edition)
●
Automatic Generation of Python Programs Using Context-Free Grammars:
https://arxiv.org/pdf/2403.06503v1
Thank you

ESL Intermediate/Advanced Writing
From Everand
ESL Intermediate/Advanced Writing
Mary Ellen Munoz Page
4.5/5 (3)
2016 - Analysing Sentence - Burton-Roberts
No ratings yet
2016 - Analysing Sentence - Burton-Roberts
22 pages
University Grammar of English Workbook R A Close
93% (15)
University Grammar of English Workbook R A Close
92 pages
Traditional and Modern Approaches To Syntax
87% (15)
Traditional and Modern Approaches To Syntax
55 pages
English Syntax PDF
No ratings yet
English Syntax PDF
25 pages
Sentence Structure Constituents
67% (3)
Sentence Structure Constituents
54 pages
8-Syntax Part1 Merged
No ratings yet
8-Syntax Part1 Merged
139 pages
CFG and PCFG
No ratings yet
CFG and PCFG
7 pages
Unit Iii - NLP
No ratings yet
Unit Iii - NLP
36 pages
Context-Free Grammars and Constituency Parsing
No ratings yet
Context-Free Grammars and Constituency Parsing
26 pages
NLP - Shortnotes Unit 3
No ratings yet
NLP - Shortnotes Unit 3
16 pages
9-Syntax Part1
No ratings yet
9-Syntax Part1
26 pages
Unit-3 Aim 502
No ratings yet
Unit-3 Aim 502
14 pages
NLP Module-3
No ratings yet
NLP Module-3
67 pages
Syntax Parsing
No ratings yet
Syntax Parsing
95 pages
NLP Module 3
No ratings yet
NLP Module 3
60 pages
Constituency Grammars
No ratings yet
Constituency Grammars
31 pages
Jurafsky Martin Edition 3 Draft Chapter 10
No ratings yet
Jurafsky Martin Edition 3 Draft Chapter 10
29 pages
Module 4
No ratings yet
Module 4
7 pages
Nayan Ranjan Paul - NLP-3
No ratings yet
Nayan Ranjan Paul - NLP-3
134 pages
Module No. 3: Parsing Structure in Text
No ratings yet
Module No. 3: Parsing Structure in Text
54 pages
21cse356t NLP Unit 2
No ratings yet
21cse356t NLP Unit 2
89 pages
Unit - 3
No ratings yet
Unit - 3
15 pages
Constituency Parsing PPT 2
No ratings yet
Constituency Parsing PPT 2
33 pages
Syntactic Analysis - I
No ratings yet
Syntactic Analysis - I
28 pages
NLP Chapter 3
No ratings yet
NLP Chapter 3
50 pages
12
No ratings yet
12
31 pages
Statistical Constituency Pars-Ing: C.1 Probabilistic Context-Free Grammars
No ratings yet
Statistical Constituency Pars-Ing: C.1 Probabilistic Context-Free Grammars
21 pages
Lect 11
No ratings yet
Lect 11
7 pages
Context Free Grammars
No ratings yet
Context Free Grammars
38 pages
Natural Language Processing - Notes - Unit 3
No ratings yet
Natural Language Processing - Notes - Unit 3
19 pages
Context-Free Grammar (CFG) : Dr. Nadeem Akhtar
No ratings yet
Context-Free Grammar (CFG) : Dr. Nadeem Akhtar
56 pages
14 Syntax 1
No ratings yet
14 Syntax 1
22 pages
Syntax Analysis
No ratings yet
Syntax Analysis
87 pages
Ch4-Phrase-Structure Grammars and Dependency Grammars PDF
No ratings yet
Ch4-Phrase-Structure Grammars and Dependency Grammars PDF
48 pages
Week 3 - Probablistic Context Free Grammars
No ratings yet
Week 3 - Probablistic Context Free Grammars
18 pages
Context-Free Grammer
No ratings yet
Context-Free Grammer
8 pages
Context-Free Grammar
No ratings yet
Context-Free Grammar
22 pages
Automata Theory Assignment Adam
No ratings yet
Automata Theory Assignment Adam
5 pages
NL11SyntaxandContext Free Grammars
No ratings yet
NL11SyntaxandContext Free Grammars
185 pages
Unit-4 Context Free Grammar
No ratings yet
Unit-4 Context Free Grammar
106 pages
"Context-Free Grammar" From John Martin (3 Edition)
No ratings yet
"Context-Free Grammar" From John Martin (3 Edition)
40 pages
Context Free Grammars: Bachelor of Technology Computer Science and Engineering
No ratings yet
Context Free Grammars: Bachelor of Technology Computer Science and Engineering
10 pages
Toc Mod3
No ratings yet
Toc Mod3
72 pages
Homework and Exams
No ratings yet
Homework and Exams
8 pages
Context Free Grammar
No ratings yet
Context Free Grammar
5 pages
4.chapter5 - Syntactic and Semantic Representations
No ratings yet
4.chapter5 - Syntactic and Semantic Representations
47 pages
Chapter Five
No ratings yet
Chapter Five
96 pages
14 Ai Cse551 NLP 2 PDF
No ratings yet
14 Ai Cse551 NLP 2 PDF
39 pages
2 Contex Free Language
No ratings yet
2 Contex Free Language
13 pages
Lecture 6
No ratings yet
Lecture 6
43 pages
NLP Module 3
No ratings yet
NLP Module 3
11 pages
Motivation For Formal Grammars
No ratings yet
Motivation For Formal Grammars
15 pages
Chapter 12
No ratings yet
Chapter 12
16 pages
Compiler CFG Slides of PowerPoint
No ratings yet
Compiler CFG Slides of PowerPoint
66 pages
Syntax JB Slides
No ratings yet
Syntax JB Slides
90 pages
Lecture CFG
No ratings yet
Lecture CFG
24 pages
Unit 3-1
No ratings yet
Unit 3-1
4 pages
CFG & PCFG
No ratings yet
CFG & PCFG
15 pages
Theory of Computation: Lecture 7: Context-Free Grammar
No ratings yet
Theory of Computation: Lecture 7: Context-Free Grammar
21 pages
Automata
No ratings yet
Automata
4 pages
Chapter - 2 - Finite State Automata - Part - 3
No ratings yet
Chapter - 2 - Finite State Automata - Part - 3
50 pages
Pda Annotated 10 12 2021
No ratings yet
Pda Annotated 10 12 2021
37 pages
Unit II PDF
No ratings yet
Unit II PDF
7 pages
Introduction to Formal Languages
From Everand
Introduction to Formal Languages
György E. Révész
2/5 (1)
Lectures of Grammar 4th Stage Transformational Grammar
No ratings yet
Lectures of Grammar 4th Stage Transformational Grammar
50 pages
Ngu Phap Tieng Anh Hay
No ratings yet
Ngu Phap Tieng Anh Hay
50 pages
English Grammar Notes
No ratings yet
English Grammar Notes
9 pages
Unit 2. Clauses: Learning Opportunity For All
No ratings yet
Unit 2. Clauses: Learning Opportunity For All
35 pages
Lisp Notes
No ratings yet
Lisp Notes
139 pages
Vũ Mai Phương
No ratings yet
Vũ Mai Phương
6 pages
Syntactic Analysis
No ratings yet
Syntactic Analysis
62 pages
Generativism 1سمنار راشد
No ratings yet
Generativism 1سمنار راشد
11 pages
Topic 21 - Infinitive and - Ing Forms. Their Uses - Oposinet
No ratings yet
Topic 21 - Infinitive and - Ing Forms. Their Uses - Oposinet
32 pages
Vũ Mai Phương
No ratings yet
Vũ Mai Phương
6 pages
Veselovsk 2017 Syntax1BA
No ratings yet
Veselovsk 2017 Syntax1BA
113 pages
If You Look at ... Lexical Bundles in University Teaching and Textbooks PDF
No ratings yet
If You Look at ... Lexical Bundles in University Teaching and Textbooks PDF
35 pages
Lesson 17 Quiz 2 Revision Paper - Answer Key Update
No ratings yet
Lesson 17 Quiz 2 Revision Paper - Answer Key Update
4 pages
Phrase Structure Rules, Tree Rewriting, and Recursion Hierarchical Structure: Complements and Adjuncts 1 Trees
No ratings yet
Phrase Structure Rules, Tree Rewriting, and Recursion Hierarchical Structure: Complements and Adjuncts 1 Trees
21 pages
CUỐI KỲ SYNTAX
No ratings yet
CUỐI KỲ SYNTAX
31 pages
123 Syntax
No ratings yet
123 Syntax
57 pages
11 - Chapter 3-1 PDF
No ratings yet
11 - Chapter 3-1 PDF
46 pages
Syntactic Level
No ratings yet
Syntactic Level
5 pages
More On Verbs
No ratings yet
More On Verbs
50 pages
Phrases
100% (4)
Phrases
20 pages
Phrases and Its Types
No ratings yet
Phrases and Its Types
4 pages
UG B.A. English 112 64 Remedial English Grammar BA-Eng Sem-VI 8879
0% (2)
UG B.A. English 112 64 Remedial English Grammar BA-Eng Sem-VI 8879
296 pages
BASIC STRUCTURE - Group 10
No ratings yet
BASIC STRUCTURE - Group 10
10 pages
English Syntax Tai Lieu Luyen Thi Tuyen Sinh Sau Dai Hoc Chuyen Nganh Giang Day Tieng Anh To Minh Thanh
No ratings yet
English Syntax Tai Lieu Luyen Thi Tuyen Sinh Sau Dai Hoc Chuyen Nganh Giang Day Tieng Anh To Minh Thanh
211 pages
Advanced Grammer and Rhetoric - Docx Ok
No ratings yet
Advanced Grammer and Rhetoric - Docx Ok
283 pages

Unit 4

Uploaded by

Unit 4

Uploaded by

Unit 4

Natural Language Processing (NLP)

What evidence do we have that these words group together (or

N = {S, NP, VP, PP, Det, Noun, Verb, Pre}

T = {‘a’, ‘ate’, ‘cake’, ‘child’, ‘fork’, ‘the’, ‘with’}

You might also like