18men14c U4
18men14c U4
SYNTAX
The study of the structure of phrases and sentences. sentences and phrases syntax is the
study of sentence structure. Sentences are composed not directly out of words but of
constituents which may consist of more than one word, called phrases. A phrase is an
expression which is a constituent in a sentence and is the expansion of a head (i.e. key word).
For instance, the constituent the king in (1), or the constituents my brother and an expensive
car in (2) are Noun Phrases, abbreviated as NPs, because their key elements are the nouns (Ns)
king, brother and car, respectively.1 It can happen that a phrase is realised by a single word,
for example the NPs John, Mary and apples in (3) consist of the Ns John, Mary and apples, and
nothing else. In (4) he is a special NP because its head is a pronoun rather than a noun. (1) The
king laughed. (2) My brother bought an expensive car. (3) John gave Mary apples. (4) He went
home. (1)-(4) are sentences.
The terms sentence and clause can be used synonymously. A sentence or clause is an
expression which minimally contains a subject and a predicate, and which may also contain
other types of elements, viz. complements and adjuncts. For instance, (1) consists of just a
subject and a predicate. The NP the king is the subject, and the Verb Phrase (VP), composed
of a single verb (V) laughed, is the predicate. A complement is a constituent whose presence
is structurally “dictated” (required or licensed) by a particular word. The presence of the
complement “follows” from the presence of the word which it is a complement of. For
instance, in (2) above the NP my brother is the subject, the V bought is the predicate, and the
NP an expensive car is a complement, more particularly a direct object, of the verb bought.
An object is a particular kind of complement. In (3) above the subject is the NP John, the
predicate is the V gave, and there are two complements, the NP Mary, functioning as an
indirect object, and the NP apples functioning as a direct object. In (4) the complement of the
V went is the Adverb Phrase (AdvP) home, consisting of the single adverb (Adv) home. The
subject and the complement(s) together are said to be the arguments of the predicate.
Arguments are the participants (entities) that are necessarily involved in the situation
identified by the predicate. For example, in (2) the predicate bought has two arguments: the
subject (somebody did the buying), and the object (something was bought). In English,
subjects typically occur in the nominative case (I, he, etc.), whereas objects occur in the
accusative case (me, him, etc.), but observable case-marking is restricted to pronouns.
Another difference between subjects and complements is that, in English, verbs agree with
their subjects in person and number but do not agree with their complements. Also, subjects
in English typically precede verbs, while complements follow them. In addition to the
subject, verb and complement(s), the sentence or clause may also contain constituents which
are not structurally required by the verb but add optional information about place, time,
manner, purpose, etc. Such constituents are called adjuncts. Some of these function as
adverbials, e.g. the Prepositional Phrase (PP) on Tuesday in (5) is a time adverbial, the
Adverb Phrase (AdvP) very quickly in (6) is a manner adverbial. Some of the adjuncts
function as attributes within noun phrases, e.g. the Adjective Phrase (AP), realised by a single
Adjective (A) expensive in (5), is an attribute of car. (5) My brother bought an expensive car
on Tuesday. (6) He went home very quickly. The terms subject, predicate, object (direct and
indirect), adverbial, attribute; complement and adjunct refer to grammatical functions which
constituents may perform in the sentence, whereas terms such as NP, VP, AP, AdvP, PP, N,
V, A, Adv, P, etc. refer to syntactic categories, i.e. they name the grammatical category to
which the constituent belongs. These two sets of terms are fairly independent of each other,
e.g. an NP can function as subject, or as object, or as the complement of a preposition, or
even as adverbial (e.g. the NP last year). Similarly, the function of adverbial can be
performed by an AdvP (very quickly), a PP (on Tuesday), an NP (last year) or even by an
embedded clause (e.g. when I was writing a letter).
GRAMMAR
Before considering how grammatical structure can be represented, analyzed and used,
we should ask what basis we might have for considering a particular grammar “correct”, or a
particular sentence “grammatical,” in the first place. Of course, these are primarily questions
for linguistics proper, but the answers we give certainly have consequences for computational
linguistics. Traditionally, formal grammars have been designed to capture linguists' intuitions
about well-formedness as concisely as possible, in a way that also allows generalizations
about a particular language (e.g., subject-auxiliary inversion in English questions) and across
languages (e.g., a consistent ordering of nominal subject, verb, and nominal object for
declarative, pragmatically neutral main clauses). Concerning linguists' specific well-
formedness judgments, it is worth noting that these are largely in agreement not only with
each other, but also with judgments of non-linguists—at least for “clearly grammatical” and
“clearly ungrammatical” sentences (Pinker 2007). Also the discovery that conventional
phrase structure supports elegant compositional theories of meaning lends credence to the
traditional theoretical methodology.
However, traditional formal grammars have generally not covered any one language
comprehensively, and have drawn sharp boundaries between well-formedness and ill-
formedness, when in fact people's (including linguists') grammaticality judgments for many
sentences are uncertain or equivocal. Moreover, when we seek to process sentences “in the
wild”, we would like to accommodate regional, genre-specific, and register-dependent
variations in language, dialects, and erroneous and sloppy language (e.g., misspellings,
unpunctuated run-on sentences, hesitations and repairs in speech, faulty constituent orderings
produced by non-native speakers, and fossilized errors by native speakers, such as “for you
and I”—possibly a product of schoolteachers inveighing against “you and me” in subject
position). Consequently linguists' idealized grammars need to be made variation-tolerant in
most practical applications.
The way this need has typically been met is by admitting a far greater number of
phrase structure rules than linguistic parsimony would sanction—say, 10,000 or more rules
instead of a few hundred. These rules are not directly supplied by linguists (computational or
otherwise), but rather can be “read off” corpora of written or spoken language that have been
decorated by trained annotators (such as linguistics graduate students) with their basic phrasal
tree structure. Unsupervised grammar acquisition (often starting with POS-tagged training
corpora) is another avenue (see section 9), but results are apt to be less satisfactory. In
conjunction with statistical training and parsing techniques, this loosening of grammar leads
to a rather different conception of what comprises a grammatically flawed sentence: It is not
necessarily one rejected by the grammar, but one whose analysis requires some rarely used
rules.
Similarly Woods' grammar for lunar was based on a certain kind of procedurally
interpreted transition graph (an augmented transition network, or ATN), where the sentence
subgraph might contain an edge labeled NP (analyze an NP using the NP subgraph) followed
by an edge labeled VP (analogously interpreted). In both cases, local feature values (e.g.,
the number and person of a NP and VP) are registered, and checked for agreement as a
condition for success. A closely related formalism is that of definite clause grammars (e.g.,
Pereira & Warren 1982), which employ Prolog to assert “facts” such as that if the input word
sequence contains an NP reaching from index I1 to index I2 and a VP reaching from index I2
to index I3, then the input contains a sentence reaching from index I1 to index I3. (Again,
feature agreement constraints can be incorporated into such assertions as well.) Given the
goal of proving the presence of a sentence, the goal-chaining mechanism of Prolog then
provides a procedural interpretation of these assertions.
PARSING
Natural language analysis in the early days of AI tended to rely on template matching,
for example, matching templates such as (X has Y) or (how many Y are there on X) to the
input to be analyzed. This of course depended on having a very restricted discourse and task
domain. By the late 1960s and early 70s, quite sophisticated recursive parsing techniques
were being employed. For example, Woods' lunar system used a top-down recursive parsing
strategy interpreting an ATN in the manner roughly indicated in section 2.2 (though ATNs in
principle allow other parsing styles). It also saved recognized constituents in a table, much
like the class of parsers we are about to describe. Later parsers were influenced by the
efficient and conceptually elegant CFG parsers described by Jay Earley (1970) and
(separately) by John Cocke, Tadao Kasami, and Daniel Younger (e.g., Younger 1967). The
latter algorithm, termed the CYK or CKY algorithm for the three separate authors, was
particularly simple, using a bottom-up dynamic programming approach to first identify and
tabulate the possible types (nonterminal labels) of sentence segments of length 1 (i.e., words),
then the possible types of sentence segments of length 2, and so on, always building on the
previously discovered segment types to recognize longer phrases. This process runs in cubic
time in the length of the sentence, and a parse tree can be constructed from the tabulated
constituents in quadratic time. The CYK algorithm assumes a Chomsky Normal Form (CNF)
grammar, allowing only productions of form Np → Nq Nr, or Np → w, i.e., generation of
two nonterminals or a word from any given nonterminal. This is only a superficial limitation,
because arbitrary CF grammars are easily converted to CNF.
Because of their greater expressiveness, TAGs and CCGs are harder to parse in the
worst case (O(n6)) than CFGs and projective DGs (O(n3)), at least with current algorithms
(see Vijay-Shankar & Weir 1994 for parsing algorithms for TAG, CCG, and LIG based on
bottom-up dynamic programming). However, it does not follow that TAG parsing or CCG
parsing is impractical for real grammars and real language, and in fact parsers exist for both
that are competitive with more common CFG-based parsers.
Finally we mention connectionist models of parsing, which perform syntactic analysis
using layered (artificial) neural nets (ANNs, NNs) (see Palmer-Brown et al. 2002; Mayberry
and Miikkulainen 2008; and Bengio 2008 for surveys). There is typically a layer of input
units (nodes), one or more layers of hidden units, and an output layer, where each layer has
(excitatory and inhibitory) connections forward to the next layer, typically conveying
evidence for higher-level constituents to that layer. There may also be connections within a
hidden layer, implementing cooperation or competition among alternatives. A linguistic
entity such as a phoneme, word, or phrase of a particular type may be represented within a
layer either by a pattern of activation of units in that layer (a distributed representation) or by
a single activated unit (a localist representation).
One of the problems that connectionist models need to confront is that inputs are
temporally sequenced, so that in order to combine constituent parts, the network must retain
information about recently processed parts. Two possible approaches are the use of simple
recurrent networks (SRNs) and, in localist networks, sustained activation. SRNs use one-to-
one feedback connections from the hidden layer to special context units aligned with the
previous layer (normally the input layer or perhaps a secondary hidden layer), in effect
storing their current outputs in those context units. Thus at the next cycle, the hidden units
can use their own previous outputs, along with the new inputs from the input layer, to
determine their next outputs. In localist models it is common to assume that once a unit
(standing for a particular concept) becomes active, it stays active for some length of time, so
that multiple concepts corresponding to multiple parts of the same sentence, and their
properties, can be simultaneously active.
A problem that arises is how the properties of an entity that are active at a given point
in time can be properly tied to that entity, and not to other activated entities. (This is
the variable binding problem, which has spawned a variety of approaches—see Browne and
Sun 1999). One solution is to assume that unit activation consists of pulses emitted at a
globally fixed frequency, and pulse trains that are in phase with one another correspond to the
same entity (e.g., see Henderson 1994). Much current connectionist research borrows from
symbolic processing perspectives, by assuming that parsing assigns linguistic phrase
structures to sentences, and treating the choice of a structure as simultaneous satisfaction of
symbolic linguistic constraints (or biases). Also, more radical forms of hybridization and
modularization are being explored, such as interfacing a NN parser to a symbolic stack, or
using a neural net to learn the probabilities needed in a statistical parser, or interconnecting
the parser network with separate prediction networks and learning networks. For an overview
of connectionist sentence processing and some hybrid methods (see Crocker 2010).
IC ANALYSIS
TRANSFORMATIONAL-GENERATIVE GRAMMAR
The most significant development in linguistic theory and research in the 20th century was
the rise of generative grammar, and, more especially, of transformational-generative grammar,
or transformational grammar, as it came to be known. Two versions of transformational
grammar were put forward in the mid-1950s, the first by Zellig S. Harris and the second
by Noam Chomsky, his pupil. It was Chomsky’s system that attracted the most attention. As
first presented by Chomsky in Syntactic Structures (1957), transformational grammar can be
seen partly as a reaction against post-Bloomfieldian structuralism and partly as a continuation
of it. What Chomsky reacted against most strongly was the post-Bloomfieldian concern with
discovery procedures. In his opinion, linguistics should set itself the more modest and more
realistic goal of formulating criteria for evaluating alternative descriptions of
a language without regard to the question of how these descriptions had been arrived at. The
statements made by linguists in describing a language should, however, be cast within the
framework of a far more precise theory of grammar than had hitherto been the case, and this
theory should be formalized in terms of modern mathematical notions. Within a few years,
Chomsky had broken with the post-Bloomfieldians on a number of other points also. He had
adopted what he called a “mentalistic” theory of language, by which term he implied that the
linguist should be concerned with the speaker’s creative linguistic competence and not his
performance, the actual utterances produced. He had challenged the post-Bloomfieldian
concept of the phoneme (see below), which many scholars regarded as the most solid and
enduring result of the previous generation’s work. And he had challenged the structuralists’
insistence upon the uniqueness of every language, claiming instead that all languages were, to
a considerable degree, cut to the same pattern—they shared a certain number of formal
and substantive universals.