0% found this document useful (0 votes)
87 views48 pages

Ch4-Phrase-Structure Grammars and Dependency Grammars PDF

This document discusses phrase-structure grammars and dependency grammars. It provides an overview of context-free grammars and how they are used to model constituency in natural languages. Examples of grammar rules and lexical rules are given, along with explanations of key concepts like non-terminals, terminals, parse trees, and formal languages. Sentence structures like declarative, imperative, and question structures are also briefly covered.

Uploaded by

fefwefe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
87 views48 pages

Ch4-Phrase-Structure Grammars and Dependency Grammars PDF

This document discusses phrase-structure grammars and dependency grammars. It provides an overview of context-free grammars and how they are used to model constituency in natural languages. Examples of grammar rules and lexical rules are given, along with explanations of key concepts like non-terminals, terminals, parse trees, and formal languages. Sentence structures like declarative, imperative, and question structures are also briefly covered.

Uploaded by

fefwefe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

Natural Language Processing

CS 1462

Phrase-Structure Grammars and


Dependency Grammars
Computer Science
3 rd semester - 1444
Dr. Fahman Saeed
[email protected]

1
Some slides borrows from Carl Sable
If you are following along in the book…

 The majority of this topic is related to Chapter 12, titled


"Constituency Grammars"
 The textbook covers dependency grammars in Chapter
14, titled "Dependency Parsing"
 We will discuss dependency grammars briefly toward the
end of the topic
 This topic will focus on English and Arabic
 Our next topic will consider grammars of human
languages more generally
 That topic will be based on excerpts from Pinker’s and
Baker's books (not the textbook)

2
Syntax

 According to our textbook, syntax "refers to the way


that words are arranged together" (e.g., to form
sentences)
 Some previous notions we have seen that can be
relevant to syntax include:
 Regular expressions can be used to represent orderings of
strings of words (but we did not use them that way; we used
them to discuss morphology)
 N-grams represent probabilities of short word sequences
 Part-of-speech (POS) categories represent classes of words
that are important for syntax

3
Phrase Structure Grammars

 The fundamental idea of phrase structure grammars is


that a group of words may behave as a single unit called a
phrase or constituent
 Examples of categories of phrases include noun phrases, verb
phrases, prepositional phrases, sentences, etc.
 Phrase structure grammars are also known as constituency
grammars; this is the term used by our textbook
 We will use context-free grammars (CFGs) as a
formalism for modeling constituency structure
 Recall that it was mentioned during a previous topic that
CFGs are part of the Chomsky hierarchy (as were regular
grammars, which can be described by regular expressions)
 Note that the textbook and some other sources use the term
phrase structure grammar to refer to the formalism

4
Phrases
 Examples of noun phrases: "Harry the Horse", "the Broadway coppers", "they",
"a high-class spot such as Mindy's", "the reason he comes into the Hot Box",
"three parties from Brooklyn"
 One piece of evidence that these words group together (i.e., form constituents
or phrases) is that they appear in similar syntactic environments (for example,
before a verb)
 Another piece of evidence is that each phrase can be moved around together in
some sentences
 However, you can't generally break a phrase into parts and put the parts in
different places
 The book discusses an example involving the prepositional phrase "on
September seventeenth" (see the next slide)
 Many examples in this topic come from the Air Traffic Information System
(ATIS) domain; this involves examples of spoken language by users booking
airline reservations
 Book: "ATIS systems were an early example of spoken language systems for
helping book airline reservations"; users would interact with the system to book
5
flights
Phrase Example

 Consider the preposition phrase "on September 17th"


 Note that it can be placed at several locations in a
sentence:

 However, it is not generally the case that the words


in the place can be split and placed in different
locations:

6
Context-free Grammars

 Context-free grammars (CFGs) are the most common grammatical formalism


used by linguists for modeling constituent structure in English and other natural
languages
 A CFG consists of a set of rules, a.k.a. productions, each of which expresses a way
that symbols of the language can be grouped and ordered
 Examples of possible rules include:
NP  Det Nominal
NP  ProperNoun
Nominal  Noun | Nominal Noun
 The book does not state how to read these rules, but I have learned that the arrow is
often read, "can have the form of" or "can consist of"
 The vertical bar is the "or" symbol; it's not technically part of the CFG specification,
but it can be used as a shorthand to represent multiple rules
 A CFG can also includes a lexical rules, defining a lexicon
 Here are some additional possible rules expressing facts about the lexicon:
Det  a
Det  the
Noun  flight

7
CFG Terminology

 The symbols that correspond to words in a language are called terminal


symbols (or terminals)
 Symbols that express abstractions of these (basically, types of phrases) are
called non-terminal symbols (or non-terminals)
 In each rule:
 To the left of the arrow, there is a single non-terminal
 To the right of the arrow, there is an ordered list, or sequence, of one or more terminals and
non-terminals
 Lexical rules include a POS to the left of the arrow and a word (terminal) to
the right of the arrow
 Every grammar must contain a designated start symbol, often called S,
which in linguistics and NLP, is usually interpreted as a sentence node
 This rule shows that a sentence can consist of a noun phrase followed by a
verb phrase:
S  NP VP
 The next two slides show some additional grammar rules and lexical rules;
the textbook refers to this simple grammar as L0

8
Sample Grammar Rules

9
Sample Lexical Rules

10
Parse Trees

 We can view a CFG as a device for generating


sentences or for assigning structure to a sentence
 A sequence of expansions is called a derivation of a
string of words, and it is common to represent a
derivation as a parse tree
 An example of a parse tree is shown on the next slide
 Book: "The problem of mapping from a string of
words to its parse tree is called syntactic parsing"
 We will cover parsing in a future topic

11
Parse Tree Example

12
Formal Languages

 A CFG defines a formal language


 When discussing regular expressions, we learned that a formal language is
a set of strings, typically adhering to some set of rules
 Sentences (strings of words) that can be derived by a grammar are in the
formal language defined by the grammar and are called grammatical
sentences
 Strings of words that cannot be derived by a grammar are not in the
language defined by the grammar and are referred to as ungrammatical
 This is a simplification of how natural languages really work
 In reality, determining whether a given sentence is part of a natural
language often depends on context
 In linguistics, the use of formal languages to model natural languages is
called generative grammar
 This is because the language is defined by the set of possible sentences that
can be generated by the grammar

13
Formal Description of CFGs

 A CFG is defined by four parameters:


 N is a set of non-terminal symbols (a.k.a. variables)
 ∑ is a set of terminal symbols (disjoint from N)
 R is a set of rules, or productions, each of the form A  B, where A is
a non-terminal and B is a string of symbols from the infinite set of
strings (∑ U N)*
 S is the designated start symbol
 Our textbook uses the following conventions:
 Capital letters represent non-terminals
 S specifically represents the start symbol
 Lower-case Greek letters represent strings drawn from (∑ U N)*
 Lower-case English letters represent strings of terminals

14
Sample Rules for English Sentences

 There are several possible structures of English sentences


 Some (by no means all) are indicated by the following rules:
S  NP VP (declarative structure)
S  VP (imperative structure)
S  Aux NP VP (yes-no question structure)
S  Wh-NP VP (wh-subject-question structure)
S  Wh-NP Aux NP VP (wh-non-subject-question structure)
S  S CC S (compound sentence with a coordinating conjunction)
 The different types of sentences will be discussed
over the next several slides

15
Declarative Structure

 Sentences with declarative structure have a subject


noun phrase followed by a verb phrase
 This is the only type of sentence recognized by a rule
in the simple grammar introduced earlier
 Sentences of this structure have many different uses
(e.g., stating a fact, a preference, an opinion, etc.)
 Examples include:
"I prefer a morning flight"
"The return flight should leave at around seven p.m."

16
Imperative Structure

 Sentences with imperative structure often begin


with a verb phrase and have no subject
 They are called imperative because they are often
used for commands or suggestions
 Examples include:
 "Show me the cheapest fare that has lunch"
 "Give me Sunday's flights arriving in Las Vegas from New York
City"
 I would add that this is the only type of English
sentence in which the subject is not explicitly stated!
17
Yes-No Question Structure

 Sentences with yes-no question structure are often


(though not always) used to ask a question
 Sometimes they are used for making requests or
suggestions
 Examples include:
"Do any of these flights have stops?" (question)
"Can you give me the same information for United?" (request)
 These sentences begin with an auxiliary verb,
followed by a subject NP, followed by a VP

18
Wh-Subject-Question Structure

 There are at least two types of wh-question structures


 One of their constituents is a Wh-NP including a wh-
word (e.g., "who", "whose", "when", "where", "what",
"which", "how", "why")
 One type, wh-subject-question structure, is nearly
identical to declarative structure, except that the first
noun phrase contains some wh-word
 Examples include:
 "What airlines fly from Burbank to Denver?"
 "Which of these flights have the longest layover in Nashville?"
 Additional rules would be necessary for the words that
make up the Wh-NP constituent
19
Wh-Non-Subject-Question Structure
 For another type, wh-non-subject-question structure, the wh-phrase is
not the subject, and so the sentence contains another subject
 Structures like these include long-distance dependencies
 for example, in the question "What flights do you have from Burbank to
Tacoma Washington?", "flights" is an argument of "have"
 In some models of parsing, this is considered to be a semantic
relationship, and it is determined during semantic interpretation
(which takes place after syntactic parsing)
 In other models, it is considered a syntactic relation, and the grammar
is modified to insert a marker called a trace, a type of empty category,
after the verb.
 One argument I have heard in favor of the existence of traces is that
speakers usually pause where the trace would be

20
Embedded Sentences

 The book also points out that S can also occur on the right-hand
side of grammar rules
 A sentence within a larger sentence can be called an embedded
sentence
 The embedded S then corresponds "to the notion of a clause, which
traditional grammars often describe as forming a complete thought"
[book]
 Although the book does not give any rules for this, one obvious rule
(I included it on a previous slide) combines two S phrases with a
coordinating conjunction
 My made-up example: "I like sitting in an aisle seat, but I do not like
stopovers."
 Another type of embedded sentence occurs in dependent clauses
after certain verbs (we will not look at a rule for this)
 My made-up example: "I think that the meal contains a flight."

21
English Noun Phrases

 The L0 grammar accounts for only three types of noun phrases:


pronouns, proper nouns, and a determiner followed by a nominal
(nominals are discussed more on the next few slides)
 We will now focus more on the determiners that can modify nominals
 Determiners include articles (i.e., "a", "an", "the") and words such as
"this", "those", "any", and "some"
 The role of the determiner in English noun phrase can also be filled by
more complex expressions; for example, "United's flight" or "Denver's
mayor's mother's canceled flight"
 In these examples, the role of the determiner is filled by a possessive
expression consisting of a noun phrase followed by an 's as a possessive
marker; the rule is: Det  NP 's

22
English Noun Phrases

 Note that this rule is recursive, since an NP can start with a Det
(allowing the previous example)
 In some cases, determiners are optional (but the book does not discuss
modifying the rule)
 For example, if the noun is plural: "Show me flights from San Francisco
to Denver on weekdays"
 Mass nouns, including substances and certain abstract nouns, also
don't require a determiner; for example, "Does this flight serve
dinner?"
 Some noun phrases also include a predeterminer before the
determiner; e.g., "all" in the phrase "all the morning flights" (we would
need a new rule for this)

23
English Nominals

 A nominal consists of a head noun along with various modifiers that can occur before or
after the head
 I add: Although our textbook uses the phrase "head noun" several times in the context of
noun phrases, they don't fully define the notion of a "head"
 The head of a phrase is the word that is syntactically the most significant to the phrase;
they often, but don't always, match my intuition
 We will discuss the concept of heads of phrases in more detail in our next topic
 The book shows several examples of pre- and post-head noun modifiers, which they
consider part of the nominal (as is typical); we will look at some of these over the next
two slides
 Some linguistic theories would call the full phrase, including the determiner, a
determiner phrase, with the determiner as the head, as opposed to a noun phrase
 Some other linguistic theories don't distinguish between noun phrases and nominals at
all
 In the simplest case, the nominal consists only of a single noun
 The rule we have seen in L0 recursively allows multiple nouns to combine to form a
nominal (e.g., to allow nominals such as "dinner flight")
24
Pre-head Noun Modifiers

 Pre-head noun modifiers can appear before the head noun in a nominal; these include
cardinal numbers (e.g., "one", "ten"), ordinal numbers, quantifiers (e.g., "many", "few"),
and adjectives
 Ordinal numbers include "first", "second", etc. but also words like "next", "past", and
"another"
 Adjectives can be grouped into an adjective phrase (AP) and placed before the head noun
 APs can include an adverb before an adjective; for example, "the least expensive fare"
 The textbook does not provide rules for this, but we could include something such as:
Nominal  (Card) (Ord) (Quant) (AP) Nominal
 The parentheses here are used to mark optional constituents of the NP; it is really a
shorthand for multiple rules (with and without the optional constituent)
 Of course, each of the optional constituents would require their own rules
 Cardinal numbers, ordinal numbers, and quantifiers may be considered parts of speech,
but the adjective phrase is a type of phrase consisting of one or more adjectives
 Also note that this rule implies that there is a particular order to prenominal modifiers;
other orders would not sound right

25
Postmodifiers
 A head noun can also be followed by post-head noun modifiers, or postmodifiers,
which are part of the nominal
 Examples include prepositional phrases ("all flights from Cleveland"), non-finite
clauses ("any flights arriving after eleven a.m."), and relative clauses ("a flight that
serves breakfast")
 Prepositional phrases (PPs) can be strung together; for example, "all flights from
Cleveland to Newark" or "a reservation on flight six oh six from Tampa to Montreal"
 A possible rule to handle such PPs is: Nominal  Nominal PP
 Three kinds of non-finite (meaning they do not indicate tense) postmodifiers are the
gerundive (-ing), -ed ("the aircraft used by this flight"), and infinitive forms ("the
last flight to arrive in Boston")
 Gerundive postmodifiers consist of a verb phrase that begins with the gerundive (-
ing) form of a verb, optionally followed by other types of phrases; e.g., "any flights
arriving after eleven a.m."
 Possible new rules for gerundive non-finite postmodifiers are:
Nominal  Nominal GerundVP
GerundVP  GerundV NP | GerundV PP | GerundV | GerundV NP PP
GerundV  being | arriving | leaving | … 26
Postnominal Relative Clauses

 A postnominal relative clause often begins with a relative pronoun


(e.g., "who" or "that") which serves as the subject of the embedded
verb
 Examples include "a flight that serves breakfast", "the one that
leaves at ten thirty five", and "the pilot who showed up late"
 Rules that can help deal with these cases might include:
Nominal  Nominal RelClause
RelClause  (who | that) VP
 Without showing rules, the book points out that the relative
pronoun may also function as the object of the embedded verb
 For example, consider the phrase "the earliest American Airlines
flight that I can get"
 Various postnominal modifiers can be combined
 For example, consider "a friend living in Denver that would like to
visit me in DC"

27
Verb Phrases

 Verb phrases can take many different forms; here


are some possible rules with examples:
VP  Verb (e.g., "disappear")
VP  Verb NP (e.g., "prefer a morning flight")
VP  Verb NP PP (e.g., "leave Boston in the
morning")
VP  Verb PP (e.g., "leaving on Thursday")
VP  Verb S (e.g., "said there were two
flights that were the cheapest")
 Recall that verbs can also be followed by particles,
resembling prepositions; together, the verb and
particle form a phrasal verb
28
Subcategorization Frames

 Traditional grammars distinguish between transitive verbs


and intransitive verbs
 Transitive verbs (e.g., "find") take a direct object NP (e.g., "I
found a flight") while intransitive verbs (e.g., "disappear") do
not
 Modern grammars distinguish as many as 100 subcategories
 We say that a verb like "find" subcategorizes for an NP, while
a verb like "want" subcategorizes for either an NP or a non-
finite VP
 We can call these constituents complements of the verb
 The possible complements for a verb are called
subcategorization frames
 The next slide shows subcategorization frames for some
example verbs
29
Subcategorization Frame Examples

30
Rules for Subcategorization

 To handle this with a CFG, using only techniques


discussed so far requires rules such as these:
 Verb-with-no-complement  disappear | …
 Verb-with-NP-complement  find | leave | repeat | want | …
 Verb-with-S-complement  think | believe | say | …
 Verb-with-Inf-VP-complement  want | try | need | …
 VP  Verb-with-no-complement
 VP  Verb-with-NP-complement NP
 VP  Verb-with-S-complement S
 Note that some verbs can fall into multiple categories if
they take different types of complements in different
contexts
 We will see a more reasonable way to handle
subcategorization later in the topic
31
Auxiliary Verbs

 Discussion of handling auxiliary verbs has been dropped in the


current edition of the textbook
 Auxiliary verbs include:
 Modals ("can", "may", etc.)
 The perfect "have" (e.g., "have booked")
 The progressive "be" (e.g., "am going")
 The passive "be" (e.g., "was delayed")
 A verb phrase can include multiple auxiliary verbs, but they must
appear in a certain order: modal < perfect < progressive < passive
 For example, consider the phrase "might have been prevented"
 Each type of auxiliary verb expects a certain type of complement
 For example, the perfect verb "have" subcategorizes for a VP whose
head verb is a past participle, as in "have booked three flights"

32
Conjunctions

 Major phrase types can be combined with conjunctions to form larger


constructions of the same type
 Possible rules look like this:
NP  NP and NP (e.g., "I need to know the aircraft and the flight
number")
Nominal  Nominal and Nominal (e.g., "I need to know the aircraft and
flight number")
VP  VP and VP (e.g., "What flights do you have leaving Denver and
arriving in San Francisco?")
S  S and S (e.g., "I'm interested in a flight from Dallas to
Washington and I'm also interested in going to Baltimore")
 We can replace "and" with "CC" for more generality
 The way the rules are listed above provides examples of rules with both
terminals and nonterminals on the right-hand side
 The ability to form coordinate phrases though conjunctions is often used as
a test for constituency
 In fact, this provides evidence for the Nominal constituent

33
Agreement

 We have seen when discussing morphology that nouns and verbs have different
forms; in some cases, the form used for one word depends on the context of the
surrounding words
 Example: Most present tense verbs have a separate form for third-person singular
subjects (e.g., "The flight leaves in the morning", "The flights leave in the morning",
"I leave…", etc.)
 One way to handle this requirement for subject-verb agreement is to expand the
grammar with multiple sets of rules; e.g., the rule S  NP VP might be replaced
with these two rules:
S  3sgNP 3sgVP
S  Non3sgNP Non3pVP
 However, handling agreement like this can possibly double the size of the grammar
for every type of agreement (thus increasing the size of the grammar exponentially)
 Other types of agreement include:
 Case agreement; for example, English pronouns have nominative (e.g., "I", "she", "he", "they") and
accusative (e.g., "me", "her", "him", "them") versions
 Some determiners require determiner-noun agreement, meaning that they must agree in number
with the nouns they modify (for example, "this flight" versus "these flights")
 In some languages, there is also a requirement for gender agreement

34
Augmented Grammars

 There are various ways to augment grammars to handle various agreement rules
that must be enforced, without exponentially increasing the size of the grammar
 For example, our textbook discusses lexicalized grammars in Section 12.6, but we
will skip this
 Instead, we are going to briefly discuss a notation used in "Artificial Intelligence: A
Modern Approach" by Stuart Russell and Peter Norvig (I use this textbook for my AI
course)
 Consider the following rules representing part of an augmented grammar:
S(head)  NP(Sbj, pn, h) VP(pn, head) | …
NP(c, pn, head)  Pronoun(c, pn, head) | Noun(c, pn, head) | …
VP(pn, head)  VP(pn, head) NP(Obj, pn, h) | …
PP(head)  Prep(head) NP(Obj, pn, h)
Pronoun(Sbj, 1S, I)  I
Pronoun(Sbj, 1P, we)  we
Pronoun(Obj, 1S, me)  me
Pronoun(Obj, 3P, them)  them
 Here, "c" and "pn" represent parameters, or features, for case (subject or object)
and person/number, respectively; "head" and "h" are parameters representing
heads of phrases

35
Interpreting Augmented Grammars

 Certain facts indicated in the snippet of an augmented grammar on the


previous slide are are:
 "I" is a first-person singular subject, "them" is a third person plural object, etc.
 A sentence can have the form of a subject noun phrase followed by a verb phrase, but only if
the person and number of the noun phrase and the verb phrase match
 When a sentence is formed this way, the head of the sentence is the head of the verb phrase
 A noun phrase can have the form of a single pronoun or single noun, and the noun phrase
takes the case, person, and number of the pronoun or noun
 The head of each type of phrase comes from one of its words or sub-phrases
 Augmentations like these can also be used for determiner-noun agreement,
gender agreement, and verb subcategorization (discussed earlier)
 All of this can be done without greatly increasing the size of the grammar
 Theoretically, this type of augmentation does not add to the expressive
power of the formalism
 That is, for every augmented grammar of this type, there is an equivalent
CFG (although it would possibly be much larger)

36
Why CFGs?

 Recall again that CFGs are one type of grammar that is part of the Chomsky hierarchy
 We have previously learned that regular expressions can be useful to express morphology rules
 It is typically wise to use the least powerful formalism that can be used to solve a task, because it
generally makes it easier compared to using a more powerful formalism
 It has been well established that natural languages cannot be defined by regular grammars
 It has long been debated whether CFGs are capable of defining natural languages
 It has been shown that a few constructs in some languages (but not English) are not context free
 Nonetheless, CFGs have been the most common formalism used by linguists to represent
grammar
 More generally, the previous edition of the textbook discussed some evidence from psychology
experiments supporting the notion that constituents (phrases) may be mentally depicted when
humans process language
 For example, one study started by asking subjects to read sentences that contain a constituent
(e.g., a verb phrase) with a particular structure (e.g., V NP PP)
 If the subjects were then asked to describe a picture, they were more likely to use the same
structure
 One possible explanation for this is that the particular form of the constituent has been primed,
suggesting that the constituent is mentally represented

37
Treebanks

 A treebank is a corpus in which every sentence is syntactically annotated with a parse


 The Penn Treebank project has produced treebanks from the Brown, switchboard, ATIS and
Wall Street Journal corpora of English (as well as treebanks in Arabic and Chinese)
 We have previously heard about the Penn Treebank when we discussed parts-of-speech (which
are also manually annotated in the treebank); one tagset we looked at came from this treebank
 The Penn Treebank is currently maintained and distributed by the Linguistic Data Consortium
(LDC)
 Unfortunately, it is not freely available, but Cooper Union does have access to it (we recently
obtained it!)
 Treebanks are often created by first having parsers automatically parse each sentence in a
corpus, and then having human linguists hand-correct the parses
 The book discusses formats used by treebanks (note that tagsets, formats, and grammars can
differ between treebanks)
 The next slide shows examples from the Brown and ATIS portions of the Penn Treebank (using
different formats)
 Some treebanks, including the Penn Treebank, include traces in their parses to mark certain
long-distance dependencies (mentioned earlier)

38
Sample Parses from the Penn Treebank

39
Treebanks and Grammars

 Every treebank implicitly constitutes a grammar of the language


that can be expressed as a CFG
 Grammars represented by treebanks often include very many and
very long rules
 For example, one rule represented by the Penn Treebank is: VP 
VBP PP PP PP PP PP ADVP PP
 The sentence that would indicate this rule is, "This mostly happens
because we go from football in the fall to lifting in the winter to
football again in the spring."
 The Penn Treebank III Wall Street Journal corpus, which contains
about 1 million words, consists of about 17,500 distinct rule types
 Linguists and NLP practitioners typically don't use grammars
learned from treebanks directly
 However, it can still be useful for linguistic research to search a
treebank for examples of a particular grammatical phenomenon

40
Dependency Grammars

 Dependency grammars offer an alternative to phrase structure


grammars (a.k.a. constituency grammars)
 We are only going to discuss this topic briefly, but it has become very
important in modern NLP
 Dependency grammars are not based on constituents
 Instead, individual words are related to each other through binary
grammatical relations, each between a head word and one of its
dependents
 The meaning of "head" here is similar to the meaning used with
constituency grammars, but the notion of a phrase, a.k.a. constituent, is not
as important
 I add: I don't think that proponents of dependency grammars deny that phrases exist
 However, they are not labeled as whole units, and there are no explicit rules for how they
can be formed
 As with constituency grammars, text is typically analyzed one sentence at a
time

41
Depicting Dependency Parses

 Two common conventions for graphically representing dependency parses are:


 Draw arrows from each head to its dependents (there may be 0, 1, or many arrows leaving each
word, including a special "root" symbol)
 Draw arrows from each dependent to its head (every dependent is associated with a single head,
including one that points to "root")
 The textbook takes the first approach; here is an example of such a dependency tree:

 The arrow from "root" to "prefer" implies that "prefer" is the head of the entire
sentence
 Two slides from now, we will discuss the labels along the arrows
 Dependency parses can also be depicted as parse trees; the next slide shows an
example

42
Parse Trees

43
Dependency Relations

 The arrows in a graphical depiction of a dependency parse can include


labels indicating the types of dependencies
 Different researchers have come up with different sets of categories with
which dependencies can be labeled
 The example from the book uses labels from the Universal Dependencies
project
 The goal of this project is to come up with a set of labels that can be used
for all the world's languages
 The next slide shows a selected subset of relations form the Universal
Dependencies set (including all of those used in the previous example)
 As with constituency grammars, dependency treebanks have been
annotated with dependency relations
 Sometimes an algorithm applies labels first, and then experts correct
mistakes
 Such treebanks can be used to study dependency grammars or to evaluate
dependency parsers

44
Examples of Dependency Relations

45
Constraints on Dependency Parses

 Unlike with constituency grammars, there are no specific rules for how words or phrases can
combine to form other phrases; however, there are constraints imposed on the dependency
parses
 We can consider the constraints to be imposed on the dependency tree, consisting of a set of
vertices V (representing words) and the arcs connecting them (representing dependency
relations)
 Here are three constraints, copied from the textbook:
1. There is a single designated root node that has no incoming arcs.
2. With the exception of the root node, each vertex has exactly one incoming arc.
3. There is a unique path from the root node to each vertex in V.
 Additionally, some dependency grammar formalisms require projectivity
 Book: "An arc from a head to a dependent is said to be projective if there is a path from the head
to every word that lies between the head and the dependent in the sentence"
 It turns out that an equivalent constraint is to impose that arrows cannot cross each other (i.e.,
a sentence is projective if it is possible to draw the dependencies with no crossing arrows)
 It is well-known that some sentences (including in English) violate this constraint (projectivity)
 However, it was a useful constraint to impose for some traditional dependency parsing
algorithms
 We will cover constituency parsing in a future topic, but we will not cover dependency parsing

46
Spoken English

 The previous edition of our textbook discussed the grammar of spoken


English, which differs in several aspects from the grammar of written
English
 In the context of linguistics and NLP, the main unit for spoken English is
sometimes referred to as an utterance as opposed to a sentence
 Differences between spoken utterances and written sentences include:
 Utterances contain a much higher frequency of pronouns
 Utterances often consist of short fragments or phrases
 There are phonological, prosodic, and acoustic characteristics
 There are various kinds of disfluencies (e.g., hesitations, fillers such as "uh" and "um",
restarts, word repetitions, and repairs)
 Pinker emphasizes how complex and non-grammatical spoken English can
be relative to written English in one chapter of "The Language Instinct"
 To back this up, he discusses an excerpt from the Watergate transcripts
(we'll talk about it during our lecture)

47
Wishing you a fruitful educatio
nal experience

48

You might also like