0% found this document useful (0 votes)
8 views43 pages

Lecture06-Syntax Formal Languages

This document is a lecture on syntax and formal languages in natural language processing, covering the structure of sentences, the role of syntax as an interface between morphology and semantics, and various linguistic theories. It discusses constituents, constituency tests, recursion in language, and context-free grammars, including examples and their applications. The lecture also addresses the limitations of regular grammars in capturing the complexities of natural languages.

Uploaded by

yl5404
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views43 pages

Lecture06-Syntax Formal Languages

This document is a lecture on syntax and formal languages in natural language processing, covering the structure of sentences, the role of syntax as an interface between morphology and semantics, and various linguistic theories. It discusses constituents, constituency tests, recursion in language, and context-free grammars, including examples and their applications. The lecture also addresses the limitations of regular grammars in capturing the complexities of natural languages.

Uploaded by

yl5404
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

Natural Language

Processing
Lecture 6: Introduction to Syntax and
Formal Languages.

09/20/2024

COMS W4705
Daniel Bauer
Sentences:
the good, the bad, and the ugly
• Some good sentences:

• the boy likes a girl

• the small girl likes a big girl

• a very small nice boy sees a very nice boy

• Some bad sentences:

• the boy the girl likes

• small boy likes nice girl

• Ugly word salad: very like nice the girl boy


Syntax
• Syntax is the study of structure of language

• How words are arranged in a sentence (word order) and


the relationship between them.

• Goal: relate surface form to semantics (meaning).


Syntax as an Interface
• Syntax can be seen as the interface between morphology
(structure of words) and semantics.

• Why treat syntax separately from semantics?

• Can judge if a sentence is grammatical or not, even if it


doesn’t make sense semantically.
Colorless green ideas sleep furiously.
*Sleep ideas furiously colorless green.
Types of Linguistic Theories
• Prescriptive: “This is how people ought to talk.”
• (“prescriptive linguistics” is an oxymoron)

• Descriptive: provide a formal account of how people


talk.

• Explanatory: explain why people talk a certain way


(identify underlying cognitive, or neural mechanism)

NLP focuses on the descriptive part.


Computational linguistics is interested in nding explanatory theories, but often uses
descriptive methods.
fi
The Big Picture
S
VP
S

descriptive
VP
NP NP VP
DT NN VBZ DT NN
the boy wants the girl
TO VB PRP
to like him
or
theory is about

Formalisms
Empirical Observations
Data Structures & Algorithms [Maud expects

Formal Grammar & Automata there to be a riot.]


[*Maud promises
there to be a riot.]
Distributional Models [Colorless green
ideas sleep
furiously.]

uses predicts
explanatory theory
Linguistic Theory is about
Constituents
• A constituent is a group of words that behave as a single unit (within a
hierarchical structure).

• Noun-Phrase examples:

• [they], [the woman], [three parties from Brooklyn],


[a high-class spot such as Mindy’s], [the horse raced past the barn]

• Noun phrases can appear before verbs (among other things) and
they must be complete:

• *from arrive…
*the is ….
*spot sat….
*green sleep…
Constituency Tests

• On September seventeenth, I’d like to y to New York.

• I’d like to y to New York on September seventeenth.

• I’d like to y on September seventeenth to New York.

• *On I’d like to y to New York September seventeeth.

• *On September I’d like to y seventeenth to New York.


fl
fl
fl
fl
fl
Sentence Structure as Trees
• [the tall girl likes a few very friendly boys]

• [[the tall girl] likes [a few very friendly boys]]

• [[the] tall girl] likes [[a few] [[very friendly] boys]]]

the tall girl likes a few very friendly boys


Constituent Labels
• Choose constituents so each one has one non-bracketed word:
the head.
• Category of Constituent: XP, where X is the part-of-speech of the
head
NP, VP, AdjP, AdvP, DetP

NP

NP
NP
QuantP AdjP
DetP

Det N V Det Quant Adv Adj N


the girl likes a few very friendly boys
Constituent Labels
• Choose constituents so each one has one non-bracketed word:
the head.
• Category of Constituent: XP, where X is the part-of-speech of the
head
NP, VP, AdjP, AdvP, DetP

S
VP

NP

NP
NP
QuantP AdjP
DetP

Det N V Det Quant Adv Adj N


the girl likes a few very friendly boys
Recursion in Language
• One of the most important attributes of Natural Languages is
that they are recursive.

• He made pie
[with apples [from the orchard [near the farm [in …]]]]

• [The mouse [the cat [the dog chased]] ate] died.

• There are in nitely many sentences in a language, but in


predictable structures.

• How do we model the set of sentences in a language and their


structure?
fi
Context Free Grammars
(CFG)
S → NP VP V → saw S
VP → V NP P → with
VP → VP PP D → the
PP → P NP N → cat
NP →DN N → tail
NP → NP PP N → student

Context Free Grammars
(CFG)
S → NP VP V → saw S
VP → V NP P → with
VP → VP PP D → the
PP → P NP N → cat VP
NP →DN N → tail
NP → NP PP N → student

NP
Context Free Grammars
(CFG)
S → NP VP V → saw S
VP → V NP P → with
VP → VP PP D → the
PP → P NP N → cat VP
NP →DN N → tail
NP → NP PP N → student

NP

D N
the student
Context Free Grammars
(CFG)
S → NP VP V → saw S
VP → V NP P → with
VP → VP PP D → the
PP → P NP N → cat VP
NP →DN N → tail NP
NP → NP PP N → student

NP

D N V
the student saw
Context Free Grammars
(CFG)
S → NP VP V → saw S
VP → V NP P → with
VP → VP PP D → the
PP → P NP N → cat VP
NP →DN N → tail NP
NP → NP PP N → student
PP

NP NP

D N V
the student saw
Context Free Grammars
(CFG)
S → NP VP V → saw S
VP → V NP P → with
VP → VP PP D → the
PP → P NP N → cat VP
NP →DN N → tail NP
NP → NP PP N → student
PP

NP NP NP

D N V D N P
the student saw the cat with
Context Free Grammars
(CFG)
S → NP VP V → saw S
VP → V NP P → with
VP → VP PP D → the
PP → P NP N → cat VP
NP →DN N → tail NP
NP → NP PP N → student
PP

NP NP NP

D N V D N P D N
the student saw the cat with the tail
Context Free Grammars
• A context free grammar is de ned by:

• Set of terminal symbols Σ.

• Set of non-terminal symbols N.

• A start symbol S ∈ N.

• Set R of productions of the form A → β,


where A ∈ N and β ∈ (Σ ∪ N)*, i.e. β is a string of
terminals and non-terminals.
fi
Language of a CFG
• Given a CFG G=(N, Σ, R,S):

• Given a string αAγ, where A ∈ N, we can derive αβγ if


there is a production A → β ∈ R.

• α⇒β means that G can derive β from α in a single step.

• α⇒*β means that G can derive β from α in a nite


number of steps.

• The language of G is de ned as the set of all terminal


strings that can be derived from the start symbol.
fi
fi
Derivations and Derived
Strings
• CFG is a string rewriting formalism, so the derived objects
are strings.

• A derivation is a sequence of rewriting steps.

• CFGs are context free: applicability of a rule depends only


on the nonterminal symbol, not on its context.

• Therefore, the order in which multiple non-terminals in a


partially derived string are replaced does not matter.
We can represent identical derivations in a derivation tree.

• The derivation tree implies a parse tree.


Recursion in CFGs
Parse Tree:
S → NP VP V → saw
VP → V NP P → with NP
VP → VP PP D → the
PP → P NP N → cat
NP →DN N → tail
NP → NP PP N → student

Derived String:

NP
Recursion in CFGs
Parse Tree:
S → NP VP V → saw
VP → V NP P → with NP
VP → VP PP D → the
PP → P NP N → cat NP PP
NP →DN N → tail
NP → NP PP N → student

Derived String:

NP PP
Recursion in CFGs
Parse Tree:
S → NP VP V → saw
VP → V NP P → with NP
VP → VP PP D → the
PP → P NP N → cat NP PP
NP →DN N → tail D N
NP → NP PP N → student

Derived String:

the student PP
Recursion in CFGs
Parse Tree:
S → NP VP V → saw
VP → V NP P → with NP
VP → VP PP D → the
PP → P NP N → cat NP PP
NP →DN N → tail D N P NP
NP → NP PP N → student

Derived String:

the student P NP
Recursion in CFGs
Parse Tree:
S → NP VP V → saw
VP → V NP P → with NP
VP → VP PP D → the
PP → P NP N → cat NP PP
NP →DN N → tail D N P NP
NP → NP PP N → student

Derived String:

the student with NP


Recursion in CFGs
Parse Tree:
S → NP VP V → saw
VP → V NP P → with NP
VP → VP PP D → the
PP → P NP N → cat NP PP
NP →DN N → tail D N P NP
NP → NP PP N → student
NP PP

Derived String:

the student with NP PP


Recursion in CFGs
Parse Tree:
S → NP VP V → saw
VP → V NP P → with NP
VP → VP PP D → the
PP → P NP N → cat NP PP
NP →DN N → tail D N P NP
NP → NP PP N → student
NP PP
D N
Derived String:

the student with the cat PP


Recursion in CFGs
Parse Tree:
S → NP VP V → saw
VP → V NP P → with NP
VP → VP PP D → the
PP → P NP N → cat NP PP
NP →DN N → tail D N P NP
NP → NP PP N → student
NP PP
D NP NP
Derived String:

the student with the cat with NP


Recursion in CFGs
Parse Tree:
S → NP VP V → saw
VP → V NP P → with NP
VP → VP PP D → the
PP → P NP N → cat NP PP
NP →DN N → tail D N P NP
NP → NP PP N → student
NP PP
D NP NP
Derived String:
NP PP
the student with the cat with NP PP
Recursion in CFGs
Parse Tree:
S → NP VP V → saw
VP → V NP P → with NP
VP → VP PP D → the
PP → P NP N → cat NP PP
NP →DN N → tail D N P NP
NP → NP PP N → student
NP PP
D NP NP
Derived String:
NP PP
the student with the cat with the tail PP


D N
Regular Grammars
• A regular grammar is de ned by:

• Set of terminal symbols Σ.

• Set of non-terminal symbols N.

• A start symbol S ∈ N.

• Set R of productions of the form A → aB, or A → a


where A,B ∈ N and a ∈ Σ.
fi
Finite State Automata
• Regular grammars can be implemented as nite state
automata.

NP → the N with
N → student PP
N → cat PP student
N → tail PP the cat
NP N PP end
PP → with NP Ɛ
PP → Ɛ tail

• The set of all regular languages is strictly smaller than the


set of context-free languages.
Are natural languages (such as English, speci cally) regular?
No! Example: Center embeddings.
fi
fi
Center Embeddings
S

NP VP
died
the NP

mouse CP

NP VP

D NP ate NP
the
cat CP

NP VP
the dog chased NP
Linguistically, this is not a perfect analysis.
Center Embeddings
S

NP VP
died
the NP

mouse CP

NP VP

D NP ate NP
the
cat CP

NP VP
the dog chased NP
Linguistically, this is not a perfect analysis.
Center Embeddings
S

the mouse CP died

Linguistically, this is not a perfect analysis.


Center Embeddings
S

the mouse CP died

the cat CP ate

Linguistically, this is not a perfect analysis.


Center Embeddings
S

the mouse CP died

the cat CP ate

the dog chased


Problem: Regular grammars cannot capture
long-distance dependencies.
This example follows the pattern anbn.
Can show that is language is not regular (using the
“pumping lemma”).
Linguistically, this is not a perfect analysis.
Is Natural Language
Context Free?
Probably not. An example from Dutch:

“…because Wim saw Jan help Marie teach the children to swim”
Context Free Grammars cannot describe crossing dependencies.
For example, it can be shown that
anbmcndm
is not a context free language.
Complexity Classes
• Regular languages. (regular grammars, regular
expressions, nite state automata)

• Context-free languages. (context-free string grammars,


pushdown automata).

• “Mildly-context-sensitive languages”.

• Context-sensitive languages.

• Recursively enumerable languages (Turing machines).

This is sometimes called the “Chomsky Hierarchy”.


fi
Complexity Classes
recursively
enumerable languages

context sensitive
languages
“mildly” context
sensitive languages

context free
languages

regular
languages
Formal Grammar and
Parsing
• Formal Grammars are used in linguistics, NLP,
programming languages, …

• We want to build a compact model that describes a


complete language.

• Need e cient algorithms to determine if a sentence is in


the language or not (recognition problem).

• We also want to recover the structure imposed by the


grammar (parsing problem).
ffi

You might also like