NLP UNIT 1 Part 2
NLP UNIT 1 Part 2
(PART - 2)
● Introduction:
A language model in NLP is a probabilistic statistical model that determines the
probability of a given sequence of words occurring in a sentence based on the previous
word. It helps to predict which word is more likely to appear next in the sentence.
There are two approaches for language modeling:
a) Grammar-based language model
b) Statistical language modelling
● Grammar Based Language model:
Various Computational grammars have been proposed and studied.
1) Generative Grammars:
It is a theory of grammar that holds that human language is shaped by a set of
basic principles that are part of the human brain. Language is a relation between
the sound and its meaning.
2) Hierarchical Grammar:
Chomsky(1956) described classes of grammars in hierarchical manner. Where the
top layer contained the grammars represented by its subclasses. Hence, type 0 (or
unrestricted) grammar contains type 1(or context - sensitive grammar), which in
turn contains type 2(context free grammar) and that again contains type 3
grammar(regular grammar). Although this relationship has been given for classes
of formal grammars, it can be extended to describe grammars at various levels,
such as in a class-subclass(embedded) relationship.
3) Government and binding:
It is an approach to the study of the syntax of human languages based on an
abstract underlying representation and transformations successively altering that
structure.
Transformational grammars assume two levels of existence of sentences, one at
the surface level and other at the deep root level. Government and binding(GB)
theories have renamed them as s-level and d-level and identified two more levels
of representation (parallel to each other) called phonetic form and logical form.
According to GB theories, language an be considered for analysis at the levels as
shown in below figure,
If we describe language as the representation of some ‘meaning’ in a ‘sound’
form, then according to above figure, these two end are the logical form(LF) and
phonetic form(PF) respectively. The GB is concerned with LF, rather than PF.
Transformational grammars have hundreds of rewriting rules, which are generally
language - specific and also construct specific. Generation of a complete set of
coherent rules may not be possible.
Let us take an example to d and s structures,
Example: Mukesh was Killed.
(i) In transformational grammar, this can be represented as
S - NP AUX VP as given below,
● Components of GB:
GB comprises a set of theories that map the structures from d-structure to s-structure and
to logical form(LF) (leaving aside the phonetic form).
“A general transformational rule called ‘Move α’ is applied at d-structure level as well as
at s-structure level.” This can move constituents at any place if it does not violate the
constraints put by several theories and principles.
‘Move α means move anything anywhere’
Hence, GB consists of a series of modules that contain constraints and principles applied
at various levels of its representations and the transformation rule, Move α before
elaborating on these modules - which include x-bar theory, projection principle, θ-theory
and θ - criterion, c-command and government, case theory, empty category
principle(ECP), and binding theory . The GB considers all three levels of
representations(d-, s- and LF) as syntactic. However, GB applies the same Move α
transformation to map d-levels to s-levels(or) s-levels to LF level.
Ex - Two countries are visited by most travelers. Its two possible logical forms are
LF1: [S two countries are visited by [ MP most travelers]]
LF2: Applying Move α
[ NP most travelers] [S two countries are visited by ei]
In LF1, the interpretation is that most travelers visit the same two countries. In LF2, when
we move(Most travelers) outside the scope of the sentence, the interpretation can be that
most travelers visit two countries, which may be different travelers.
One of the important concepts in GB is that of constaints. It is the part of the grammar
which prohibits certain combinations and movements otherwise, Move α can move
anything to any possible position.
_
X Theory :
_
The X theory can be pronounced X - bar theory is one of the central concepts is GB. It
clarifies and simplifies core principles became language independent. Thus, noun
phrase(NP), verb phrase(VP), adjective phrase(AP), and prepositional phrase(PP) and
maximal projections of noun(N), verb(V), adjective(A) and preposition(P) respectively
and can be represented as head X of their corresponding phrases(where X = { N, V, A,
P}). Not only that, even the sentence structure (S’, which is projection of sentence) can
be regarded as the maximal projection of inflation(INFL). In GB projections at wo levels
- first at semi phrasal level , denoted by X-bar, and second at the phrasal level denoted by
=
X.
In GB for sentences, the first level projection denoted by S and the second level maximal
projection denoted by S’.
Next, we consider the representation of NP, the food in a dhaba. This is followed by the
representation of VP, AP, and PP. Structure in below figure(c and e) and finally fig (f)
shows representation of a sentence.
_
Projection of Sentence is denoted by S, which has the specifier as
complementizer(COMP).
● Sub Categorization:
It is used a filter to permit various heads to select a certain subset of the range of maximal
projections.
● Projection Principle:
It places a constraint on the three syntactic representations and their mapping from one to
other. The principle states that representations at all syntactic levels (i.e; d-level, s-level,
and LF-level) are projections from the lexicon. Thus, lexical properties of categorical
structure(sub - categorization) must be observed at each level.
● Theta theory (θ - theory):
The sub - categorization only places a restriction on syntactic categories which a head can
accept. GB puts another restriction on lexical heads through which it assigns certain roles
to its arguments. These roles are pre-assigned and cannot be violated at any syntactical
level as per the projection principle. These role assignments are called theta-roles and are
related to ‘semantic - selection’.
● Theta criterion and theta role:
These are certain thematic roles from which a head can select. These are called θ - roles
and they are mentioned in the lexicon.
The θ - criterion states that ‘each argument learns one and only one θ - role, and each θ -
role is assigned to one and only argument. Thus, each argument will have a unique θ -
role and cannot be moved to a position where it may acquire another θ - role.
In GB, d-structure is conceived as some kind of pure representation of arguments and
hence, θ - roles are assigned at d-level only, where as, theta - criterion is applied at all the
three levels.
● C - command and Government:
As Government is a special case of ‘C-command’. C-command, it defines the scope of
maximal projection. It is basic mechanism through which many constraints are defined
on Move α.
If any word or phrase(say α and β) falls within the scope of and is determined by a
maximal projection, we say that it is dominated by the maximal projection.
Now, if there are two structures α and β related in such a way that “every maximal
projection dominating α determines β”, we say that α c-commands β, and this is the
necessary and sufficient condition(iff) for C-command.
“The defination of c-command does not include all maximal projections dominating β,
only those dominating α. If we put this extra contraint, it becomes a kind of mutual c-
command called government”.
Government:
α governs β iff:
α c-command β
α is an X(head, eg: noun, verb, preposition, adjective and intlection) and Maximal
projections are barriers to government. i.e; no maximal projection can intervence between
governor and governee.
Movement, Empty category and co-indexing:
Move α described as ‘move anything anywhere’, through it provides restrictions for valid
movements.
In GB, the active to passive transformation is the result of NP movement. Another
movement wh-movement where,wh-phrase is moved as follows,
What did mukesh eat?
[Mukesh INFL eat what]
The projection principle exist at three levels when applied this to some case of movement
leads to the existence of abstract entity called empty category. There are 4 types of empty
categories, two being empty NP positions called wh-trace and NP-trace and remaining
two pronouns called small ‘pro’ and big ‘PRO’.
This division based on two properties – anaphoric (+a or -a) and pronominal (+p or -p)
Wh – trace -a, -p
Np – trace +a, -p
Small ‘Pro’ -a, +p
Big ‘PRO’ +a,+p
Co – indexing it is the indexing of the subject Np and AUR(Agreement) at d-structure
which are presented by move operations at s-structure.
Core grammatical positions(where subject, object, indirect object etc are positional) are
called A-positions and rest are called ( A-bar )– positions.
Binding Theory:
Binding is defined by sells,
α binds β iff
α c-commands β and
α and β are co-indexed.
In sentence,
[ei INFL kill Mukesh]
[Mukesh was killed (by ei)]
Mukesh was killed
Empty clause(ei) and Mukesh(Npi) are bound. This theory gives a relationship between
Nps (including pronouns and reflexive pronouns)
Now, binding theory can be given as follows,
(a) An anaphor (+a) is bound in its governing category
(b) A pronominal (+p) is free in its governing category
(c) An R-expression (-a,-p) is free.
Ex: A: Muskesh i knows himself i
B: Mukesh i belives that Amrita knows him i
C: Mukesh belives that Amrita j knows Nupur k
Similar rules apply on empty categories aslo
Np-trace: +a, -p : Mukesh i was killed ei
Wh-trace: -a, -p : who i does he i like ei
Empty Category Principle (ECP):
Define proper government,
α properly governs β iff:
α governs β and α is lexical (i.e; N,V, A or p) (or)
α locally (A-bar)-binds β
The ECP says “A trace must be properly governed”
(a) What i do you think that Mukesh ate ei?
(b) What i do you think mukesh ate ei?
Bounding and Control theory:
The long distance movement for complement clause can be explained by bounding theory
if Np and S are taken to be bounding nodes. The theory says that the application of Move
α may not cross more that one bounding node. The theory of control involves syntax,
semantic, and pragmatics.
Case Theory and Case Filter:
In GB, it deals with the distribution of Nps and Mentions that each Np must be assigned a
case. In English, we have nominative, objective, genitive etc cases are assigned to Nps at
particular positions.
Case Filter: An NP is ungrammatical if it has phonetic content (or) if it is an argument
and is not Case-marked.
Lexical Functional Grammar(LFG):
LFG has been designed by kaplan and Bresnan(1982) with a view to provide a
computational formalism for analyzing sentences in Natural laguage. The main problem
it addresses is how to extract grammatical relational from a sentence. It divides two levels
of representation.
1) One based on Constituent structure(c-structure)
2) Other on grammatical functions (F-structure)
Such as subject and object . A major strength of LFG is that it gives explicit algorithms
for extracting grammatical functions. It uses context free grammar (CFG) for Specifying
constituent structure.
A weakness of LFG is that it does not offer any theory regarding lexical ambiguity,
adjuncts and optional theta roles and mapping from grammatical relations to theta roles.
Example:
She saw stars in the sky
CFG rules to handle this sentence are,
Here, ↑ (up arrow) refers to the f-structure of the mother node i.e; on the left hand side of
the rule:0
↓ (Down arrow) refers to the f-structure of the node under which it is denoted.
Hence, in Rule 1, (↑ subj = ↓)indicates that the f-structure of the first Np goes to the f-
structure of the subject of the sentence, while (↑ = ↓) indicates that the f-structure of the
Vp node goes directly to the f-structure of the sentence Vp.
Similarly in Rule 2, the f-Structure Vp is defined by the lexical item V, the two optional
Nps, any number of PPs, and the optional clause(s’). The f-structure of V can be obtained
from the lexicon itself. All terminals in LFG can be thought as annotated with ↑ = ↓. The
Nps can function as object and object2 of the sentence, and their f-structures are obtained
using f-structure of obj and obj2. ‘↑ (↓case) = ↓’ in rule 2 indicates that the f-structure of
the pp and the case of pp determines the f-structure of Vp. ‘comp’ refers to the
compliment in a sentence. Eg: ‘He said that she is powerful.’
Let us, see first lexical entries of various words in the sentence.
She saw stars.
Finally, the f-structure is the set of attribute value pairs, represented as
It is interesting to note that the final f-structure is obtained through the unification of
various f-structutres for subject, object, verb, complement etc. This unification is based
on functional specifications of the verb, which predicts the overall sentence structure.
LFG imposes three conditions on f-structure
Consistency: In a given f-structure, a particular attribute may have at the most
one value. Hence, while unifying two f-structures if the attribute Num has value
SG in one and Ph in the other, it will be rejected.
Completeness: A function is called governable if it appears with in the pred value
of some lexical form. Eg: subj, obj, and obj2. Adjunct is not a governable
function.
When an f-structure and its subsidiary f-structures contain all the functions that
their predicates govern then and only then is the f-structure complete.
Ex: The predicate ‘see <(↑ subj) (↑ obj)>’ contains an object as its governable
function, a sentence like ‘He saw’ will be incomplete.
Coherence: Coherence maps the completeness property in the reverse direction.
It requires that all governable functions of an f-structure, and all its subsidiary f-
structure must be governed by their respective predicates.
Hence, in the f-structure of sentence, an object cannot be taken if its verb does not
allow that object. Thus it wull reject the sentence ‘I laughed a book’.
The completeness and coherence conditions are counterparts of θ – criterion in
GB theory.
Lexical Rules in GB:
Different theories have different kinds of lexical rules and constaints for handling various
sentence – constructs (active, passive, dative, causative etc).
In GB, to express a sentence in its passive form, the verb is changed to its participial form
and the ability of the verb to assign case and external (Agent) θ – role is taken away.
In LFG, the verb is converted to the participial form but the sub – categorization is
changed directly.
Consider, ex:-
Active : Tara are the food.
Passive : The food was eaten by Tara.
Active : ↑Pred = ‘eat<(↑subj) (↑obj)>’
Passive : ↑Pred = ‘eat<(↑oblag) (↑subj)>’
Here, oblag represents oblique agent phrase. Similar rules can be applied in active and
dative constructs for the verbs that accept two objects.
Active : Tara gave a pen to Mounika.
Passive : Tara gave Mounika a pen.
Active: ↑Pred = ‘give<(↑subj) (↑obj2) (↑obj)>’
Paasive: ↑Pred = ‘give<(↑subj) (↑obj) (↑oblgo)>’
Here, oblgo stands for oblique goal phrase,
Paninian Framework: ( Paninian Grammar – Based Model)
Paninian Grammar was written by panini in 500BC in Sanskrit, the frame work can be
used for other Indian languages and possibly some Asian language as well.
Unlike English, Asian languages are SOV(Subject – Object – Verb) ordered.
Some important features of Indian languages:
Indian languages have traditionally used oral communication for knowledge propagation.
The purpose of these languagesisto communicate ideas from the speakers mind to the
listener’s mind.
Ex: 1) "मां बच्चे को खाना देती है।"
Maan Bachche ko khanna detti hai.
Mother child to food give (S)
Mother gives food to the child.
2) "उसने खाना खाया।"
Usne khanna khaayaa
He(subj) food ate
He ate food.
3) "उसने खाना खा लिया।"
Usne khaanaa kha liyaa.
He(subj) food et taken
He ate food. (Completed the action)
Layered Representation in PG:
The surface and semantic levels are obvious. The other two levels should not be confused
with the levels of GB. Vibhakti literally means inflection but here it referes to word
(noun, verb or other) groups based either on Case endings, or past positions, or
compound verbs or main and auxiliary verbs etc. Indian languages can be represented at
the vibhakti level.
Karaka (Pronounced kaarka) means case, and in GB we discussed case theory.
Karaka Theory : It is the central theme of PG Framework. Karaka relations are
assigned based on the rules played by various participants in the main activity.
The various Karaka such as Karta(subject), Karma(object), Karana(instrument),
Sampradana(beneficiary), Apadan(Separation) and Adhikaran(Locus).
Ex: "मां बच्ची को आंगन में हाथ से रोटी खिलाती हैं।"
maan bachchi ko aangan mein haath se rotti khilaatti hei
Mother child – to courtyard – in hand by bread feed(S).
The mother feeds bread to the child by hand in the courtyard.
Issues in PG:
The two problems are,
(i) Computational implementation of PG.
(ii) Adaptation of PG to Indian, and other similar languages.
Statistical Language Model:
A Statistical language model is a probability distribution p(s) overall possible word
sequence. A no. of language models have been proposed. The best approach in statistical
language model is the n – gram model.
N – gram model:
It is one type of a language model ( LM ) which os about finding the probability
distribution over word sequences. This is acheieved by decomposing sentence probability
in to a product of conditional probabilities using the chain rule as follows
P ( S )=P ( w 1 , w 2 , w 3 , … . , wn )
¿ P ( w 1 ) P ( w 2|w 1 ) P ( w 3|w 1 w 2 ) P ( w 4|w 1 w 2 w 3 ) … . P(wn∨w 1 w 2 … wn−1)
n
P ( S )=∏ P(wi∨hi)
i=1