NLP Merged
NLP Merged
Language Syntax
CHAPTER2 8l Semantics
For each token in the text. the Natural Language Provides information about its
internal structure (morphology) and its role in the sentmtt (syntu).
,----------~---~-------------------------,
'· G(t ~ -~rr~&ifi~~Jrpfl8l9gy:. ' ., '" . , , 1
~ - .t~ >~ -- - ;:, .;; ,,;; ...... - - - .. - - - - - ._ - ,_ • - - ·- - - ..... !
~-- .:. - • L '- -:_,,;, _ ·.,. ,.:,
• Morphology it the study of the internal structure of words. Morphology focuses
on how the components within a word (stems, root words, prefixes. suffixes
etc.) are arranged or modified to create different meanings.
• English often adds 's' or "es" to the end of count nouns lo indicate pluralities
and 'd or 'ed' to a verb to indicate past tense. The suffix
"- ly" is added to adjectives lo create adverbs (for example, "happy,. (adjective)
and "happily" (adverb)).
• The natural Language API uses morphological analysis to inter grammatical
infonnation about words.
• Morphology varies greatly between languages. Language such as English lacks
affixes indicating case, rely more on the word order in a sentence to indicate the
respective roles of words.
(2.2) (Lang. Syntax & Seman tics) .. ..Page no. (2-3)
NLP (SPPU-SemB-Comp.) (Lang. Syntax & Seman tlcs) .. .. Page no. NLP (SPPU -Sem8 -Comp .)
s)
• Hence morpho logical analysis depend s heavily on the source
language, aod 111
'2S. 2.2.1 Class ificat ion: (Free and Boun d Morp heme
understanding of what is support ed within that languag e.
ement" , which i.s Every morphe me can be classified as free or bound.
• In English there are numero us exampl es, such as "replac words (e.g. town, dog) and can
", from the elemen ts " Walk" (I) 'Free morphe mes' can function independentJy as
compos ed of re-"place", and -men!, and "walked
appear within lexeme s (e.g. towohall, doghou se).
and--ed. always in conjunction with a
ar, vocabu lary) and (2) 'Bound morphe mes' appear only as pans of words;
• English morpho logy suppon s languag e elemen ts (gramm •root' and sometim es with other bound morphe mes. For exampl e, 'un-' appears
languag e skills (readin g, writing, speakin g) by other morphe mes ro form a word.
only when accomp anied
~ 2.1.1 Surv ey of English Morp holog y '2S. 2.2.2 Kind s a Morp holog y
up from smaller
J. Morpho logy is the study of the way the words are built 1. Inflect.Iona.I : Regula r, applies to every noun, verb, wbuc:ver or at
least the
morphe mes. Morphe mes are defined as the distinction. all verbs
meanin g-beari ng units called majorit y of them. E.G. all count nouns have singuJar/plural
meanin g-beari ng units in a languag e. cats' convers ion rule? Tend to be ve,y Produc tive, Le. are found
minimu m have tense distinct ions, etc.
y we learned how to ed., every verb has a
2. Morpho logical parsing is require d for such a task◊Previousl through out the languag e; every (count) noun cao be pluraliz
'cats' in a regexp. But, how can we
accomm odate both 'car' and its plural fonn past tense, etc.
which are also pluntl forms but do of speech"), e.g.
represent words such as 'geese', 'foxes' etc. 2. l)erivat ionaJ : Morphe mes usually change "fonn class" ("part
follow the 'cat of a noun. or ao adjectiv e out of a verb, etc. Not always very
not maJces a verb out
an affix. The stem is especia lly in the
The words such as 'foxes' are brolcen dowo into a stem and regular , not very produc tiv~ But useful in various ways,
on added to the stem to represe nt ment of 5Cic:utific regisu n.
the root word while the affix is the extensi formati on of abstract nouns, esp. in develop
a whole new class. N.)
same Examp le : Photog raph (n.) ➔ pbotograph-y (anothe r kind of
either a differen t form of the class or
•
:
In English , morpho logy can be broadly classifi ed into two types • clear (adj.) + anoe, +ity, +Dess : clearan ce, clarity, clearne ss; 3 differen t
onal morpho logy : It is the combin ation of a word stem with a
(a) lnfleeti kinds of N's
of the same class as the
gramm atical morphe me which results in a word -ness, -hood, -ize, -dom, -ling. Lilceness, likdiho od (but not -liJce
hood,
inflictio n is simple, only nouns, vcrt>s, and •
origina l stem In English ess); kingdo
.-U.kelin m, princcl ing (but not -icing ling. princed om). -ize is
es adjectiv es can be infficte d. Eg. cat ➔ cats, mous ➔ mice, make ~
sometim
vecy produc tive: can be added to many form classes to
walk: ➔ walling, etc. iz.e.
potent ialiu, maohat tanize, losangcli.ze, maximiz.e., minianu
de.
(b) Derivat ionaJ morpho logy : Ir is the combin ation
of a word stem with a
'nation alize'+
which results in a word of a differen t class. In • nation (n.) + al (adj.) ➔ ' nationa l'+ ize-+ (makes a verb)
gramm atical morphe me ss of making s.L beloog
hard to predict the meaaio g of the stem from the derived ation ➔ ' nationali.zatioo' (back ID a noun) "proce
eoglish it is ve,y tioo' "revers ing the process of
➔ , Jc.ill ➔ Jc.iller ere. to the nation" ) + de- ➔ 'de.uati onaliza
structure. Eg. appoint ➔ appointee, clue clueless
making s.L belong to the nation~
I,_) ~ Tech-Neo Publications...A SACHIN SHAH Venture (PB-66) ~ Tech-Neo PubflCations...A SACHIN SHAH Venlure
NLP (SPPU-SemB-Comp.) (Lang. Syntax & Se~an tics) .... Page no. (2-4) NLP (SPPU-SemB-Comp.) (Lang. Syntax & Semantics)....Page no. (2-5)
l
s, Classes
o PJuraJ : Bikes, Cars, Trucks, Lions, Monke ys, Buses, Matche Noun Vapou r -iz.e Vaporize Verb
Boy's, Girl's, Man's, Mark's , Robert ' s, Samantha's.
o Possess ive :
Noun
Teache r's, Officer 's Verb Read -er reader
sang,
o Tense : cooked, played, marked, waited, watched, roasted , grilled; Adject ive -iz.e "Realiz.e Verb
l drank. drove
Real
I' , Smaller, Adjective
o Comparison : Faster, Slower, Quicke r, Taller, Higher , Shorter Noun Mouth -ful Mouthf ul
Weake r, Stronger, Sharper , Bigger
.I o Superl ative : Fastest , Slowest, Quicke st, Tallest , Highes
t, Shomsl. ( I) Some more exampl es of words which are built up from smaller
parts :
combine to form
C
I Smalle st, Biggest , Weakes t, Stronge st, Sharpe st Black + bird combin e to form black bird, dis + connect
I disconn ect.
~
,. ~ 2.2.4 Type s of Morp holog y suffixes :
rd (2) Some more examp les of English derivatiooal patterns and their
(1) -S 3 person singula r present She waits
(a) adjecti ve - to - noun, - ness (slow ➔ slowness)
(2) -en Past particip le She has eaten
(b) adjecti ve - to - verb, - en (weak - weaken)
e (3) -S Plural Three tables (c) adjecti ve - to - adjecti ve -ly (person al· personally)
(4) 's Possess ive Holly's cat (d) nown - to - adjecti ve - al (recreation- recreational)
f (5) -er Compa rative You are taller
'1)
j (PB-86) [iJ Tech-Nao Publicatlons...A SACHIN SHAH Verrtu" (P8-86) "'Tec h•Neo Publlcalions...A SACHIN SHAH Venture
NLP (SPPU-Sem8-Comp.)
a.
(Lang. Syntax & Semantics) .... Page no. (2~)
•
(Lang. Syntax & Semantics) ....Page no. (2-7)
~
profession subject or style. setting up informat ion retrieval systems.
L-
NLP (SPPU-SemB-Comp.) (Lang. Syntax & Semantics) .... Page no. (2.10)
-
NLf' (SPPU-SemB-Comp.) (Lang. Syntax & Semantics) .... Page no. (2-11)
from a frequently recurring error so that the rest of the program may
; ; To recovescd.r
a. 2.4.2 Differe nce betwee n Stemm ing and
be pr-oces
Lemma tization (iii) To make a parse tree.
(iV) To make a symbol table.
Stemming -~ , (v) ereariJJg intermedi ate representations (IR).
Stemming is a process that stems Lcmmatiz ation considers the context
(i)
and converts the word to its
~ - ,,
or removes last few characters ~ 2.~ . PARSE RS AND ITS RELEVANCE IN NLP
from a word, often leading to meaningfu l base form, which is
incorrect meanings and spellings called Lemma.
Toe word 'Parsing' is used to draw exact meaning or dictionary meaning from
(ii) For example, stemming the word Lcmmatiz ing the word 'caring' .• the text. It is also called syntactic analysis or syntax analysis.
'Caring' would return 'car'. would return 'care'. SyntaX analysis checks the text for meaningfulness. The sentence like NGivc me
(iii) Stemming is used in case of large Lcmmatiz ation is computationally • hot ice-cream ." will be rejected by the parser or syntactic analyser.
dataset where performan ce is an expensive since it involves look-up In this sense, we can define parsing or syntactic analysis or syntax analysis as
issue. tables • follows :
It is defined as the process of analyzing tbc strings of symbols in natural
• language conformin g to the rules of formal grammar.
~ 2.5 SY~TA CTIC REPRE SENTA TIPN OF NLP taken
Laxlcal analyser Parser
Input
• Syntactic analysis is the third phase of Narural language processing (NLP). It is
used to analyse syntax, sometimes called as syntax or parsing analysis.
• Syntax analysis compares the text to formal grammar rules to · determine its
meaning. The statement "heated ice-cream, " for example, would be discarded
by a semantic analyses. Symbol table
• Top down parsing is a parsing technique that ~ t looks at the highest le:;; .-. A terminal node is a linguistic unit or phrase that has a mother or father node
the parse tree and works down the parse tree by usmg the rules of granunar. • and a part-of-spe ech tag.
• And Bottom-up parsing is a parsing technique that first looks at the lo'llcit As an example, "A cat" and "a box beneath the bed", are noun phrases, while
level. of the parse tree and works up the parse tree 'by using the rules of • "write a Jetter" and "drive a car" are verb phrases.
grammar. We consider an example sentence : "I shot an element in my pajamas."
•
/T s"
• We mention below the difference s between these two parsing techniques. We mention the constituen cy parse-tree graphically as :
•
I. It i~ a parsing strategy that first It is a parsing strategy that first loob /VP"
looks at the highest level of parse at the lowest level of the parse Int Pronoun Verb NP
and works up the parse tree by using
tree and works down the parse tree
I D(
2.
by using the rules of grammar.
Top-down parsing attempts to find
the left most derivations for an
input string.
the rules of grammar.
Bottom up parsing can be defined as
an attempt to reduce the input string lo
the start symbol of a grammar.
Jot
' / "-....
an Nominal
Noun
~lnal
IL> -.
PP
In my pajamas
This parsing technique uses right most
3. In this parsing technique uses left
most derivation. derivation.
I
elephant
4. The main leftmost decision is to The main decision is to select when to
(I)
select what production rule to use use a production rule to reduce the
Fig. 2.6.2(Con tcL)
in order to construct the string. string to get the starting symbol
Example : Recursive Descent Example : Its shift reduce parser. The parse tree on the top (I) rcpresen~ catching an elephant carrying paj~
parser.
• while the parse tree on the bottom (II) represents capturing an element m his
~~- .
a. The entire sen~nce is broken down into sub-phrases till we have got _tenmnal
2.6.2 Modelli ng Constit uency •
phrases remaining.
Knowledg e of language is the doorway to wisdom. - Roger BacOII
•
th
Roger Bacon gave the above quote in the 13 century, and it still holds. /s~
•
•
Today, the way of understanding languages has changed completely.
Here, we shall be covering some basic concepts of modeling constituency ct
T /VP ~
VP "-.
constituency parsing, in natural languages. • Pronoun
(PS"86) fiJ Tech-Nao Publications...A SACHIN SHAH Ventulf (P8..a5 lil Tech-Neo Publlcatlons...A SACHIN SHAH Venture
NLP (SPPU-Sem8-Comp.) (Lang. Syntax & Semantics) .... Page no. (2. NL.P (SPPU-SemS-Comp .) (Lang. Syntax & Semantics) ....Page no. (2-15)
141
• VP stands for Verb-Phrases, and NP stands for Noun Phrases. 4.ii Sentence '.
II
det
the parent, and edges to relationships.
~
The dependency parse for "John sees Bill" is as :
•
I Prefer the
~
mo ming flight through Denver
Sees
i
Subject! Jobject
Fig. 2.6.3
John Bill
• The relationships between each linguistic unit, or phrase are expressed by direct
arcs. The root of the tree ''prefer'' varies the pinnacle of the preceding sentence. Fig. 2.6.S
• A dependence tag indicates the relationship between two phrases.
• One should choose the parser type that is closely related to the objective.
• For example, the word "flight'' changes the meaning of the noun "Denver''. :
• For.sub-phrases inside a sentence, then constituency-parse is advisable.
flight ➔ Denver, where flight is the pinnacle and Denver is the kid or
dependent. It is represented by nmod, which stands for the nominal modifier. • But for the connection between words, then dependency-parse is more
• This distinguishes the scenario for dependency between the two phrases where convenient.
one serves as the pinnacle and the other as the dependent '
f.., 2.7 COCKE-YOUNG ER-KASAMI (CYK) ALGORITHM
'a.. 2.6.5 Dependency Parsing V /S Constituency
Parsing • Grammar implies syntactical rules for conversation in natural language. But in
• the theory of formal language, grammar is defined as a set of rules that can
~ the main objective is to intermpt a sentence into sub-phrases, it is ideal to
generate strings.
~pleme_nt constituency parsing. But dependency parsing is the best method for
discovenng the dependencies between phrases in a sentence : • The set of all strings that can be generated from a grammar is called the
• Let us consider an example to note the difference : language of the grammar.
• A c?nstituency ~arse tree denotes the subdivision of a text into sub-phrases. Toe
tree s n?n-termmals are different sort of phrases, the terminals are the &. 2. 7 .1 Context Free Grammar
sentence s words, and the edges are unlabeled.
• A . • We have a context free grammar
be tonstituency parse for the simple statement "John sees Bill" would
~ = (V, X, R, S) and a string w, where :
(i) V is a finite set of variables or non-terminal symbols.
(PS-86)
!il Tech-Neo Publications...A SACHIN SHAH Venture
-------- --------
(P8-ee)
l.B:J
-------- --- ~
Tech-Neo Publlcatlons ...A SACHIN SHAH Venture
NLP (SPPU-Sem8-Comp.) (Lang. Syntax & Semantics) ....Page no· (2·16)
NLP (SPPU-Sem8-Comp.) (Lang. Syntax & Semanlics)....Page no. (2-17)
"'-
(ii) X is a finite set of termin al symbo ls. numbe r i denotes the start index and the column number ·
# In T [i j J, the row
.
(iii) R is a finite set of rule. • • J
denotes the end mdex.
(iv) Sis the start symbo l, a distinc t elemen t V, and
• Let us consid er the phrase , "a very heavy orange book"
(v) V and X are assume d to be disjoin t sets.
a (I) very (2) heavy (3) orange (4) book (5).
The Membership problem is defined as : Grammar
G genera tes a langt1ag L
• string is a membe r of L (G). e " 'e fill up the table from left to right and bottom to top•
accord'mgto th erules as
(G). To check whethe r the given • " '
above ·
1 2 3 4 5
&. 2.7.2 Chom sky Norm al Form : (CNF ) a very heavy orange book
(CNF), if each rule of G
A contex t free gramm ar G is in Choms ky Norma l Form 1 Det - - NP NP
is of the form :
on R.RS. J a
(i) A ➔ BC [with at most two non-ter minal symbo ls
(ii) A ➔ a, or [one termina l symbo l on RHSJ 2 Adv AP Norn Nom
(iii) S ➔ nullstr ing [null string] very
3 A AP Norn Norn
'a. 2.7.3 Cock e-Yo unge r-Kas ami Algo rithm heavy
programming approach.
• This solves the membe rship proble m using a dynam ic 4 NomA ,AP Norn
n to proble m [i, jJ can ~
• The algorit hm is based on the princip le that the solutio
blem [i, k] and solutio n to subpro blem {k.Jl.
orange
constru cted from solutio n to subpro
5 Norn
ky Norma l Form (CNF).
• The algorit hm require s the gramm ar G to be in Choms
ted to CNF. This book
• Observ e that any contex t-free gramm ar can be conver
ion is necess ary becaus e each proble m can only be divided into two
restrict
l subpro blems and not more-t o bound the time comple xity.
.
:
l
I
I
a.. 2. 7.4 How CYK Algo rithm Work s ?
r x N. Each cell in the tabi Markov models
l • For a string of length N, constru ct a table T of siz.e N • PCFGs extend contex t-free gramm ars similar to how hidden
0 e the substri ng spanni ng from
ii T [i, j] is the set of all constit uents that can produc extend regula r gramm ars. Each produc tion is assigned
a probability.
positio n i to j . the probabilities of the
I
l • The proces s involv es filling the table with the solutio
ns to be subproblems • The probab ility of a parse (deriva tion) is the
These
produc
probab
t of
ilities can be viewed as
• productions used in that derivat ion.
l
-up parsing proces s. Theref ore, cells will be filled
0 encoun tered in the bottom
Pll1'llllleters of the model.
I from left to right and bottom to top.
I 2 ·3 4 5 ~ 2.8.1 Som e Impo rtan t Defin ition s
~
~
I from a grammar.
~ [1, 1] [I , 2] [l, 3] [1, 4] [1, 5] (I) Deriva tion : The proces s of recursi ve generation of strings
I~
automa tion.
(H) Parsing : Findin g a valid derivat ion using an
2 [2, 21 [2, 31 [2, 41 [2, 5] {'di)
Parse tree : The alignm ent of the grammar to a sequence.
3 [3, 3] [3, 41 [3, 5] • Ao examp le of a parser for PCFG grammars is the pushdown
automation.
• The algorit hm parses gramm ar nonterm inals from left
to right in a Staek-Li.ke
lj
4 [4,4) [4, SJ nt . .
Dlanner. This brute force is not very efficie
'GI
5 [5, 5J
• An_other examp le of a PCFG parser is the standard statistic al parser which is
(PB-86) li1 Tech-Neo P~blications...A SACHIN SHAH Venture (PS-ae) [iJ Tech-Neo Publicalions.. .A SACHIN SHAH Venture
NLP (SPPU-Sem8-Comp.)
•
pCFGs do not talce lexical infonnation into account.
It IJ)llkes parse plausibility less than ideal.
G = (M, T,R,S,P)
Where
• pCFGs have certain biases, i.e., the probability of a smaller tree is greater than a
(i) M is the set of non-terminal symbols
• targer tree.
(ii) T is the set of terminal symbols:
(iii) R is the set of production rules. ~ 2.8.9 Are PCFGs Ambiguous ?
(iv) S is the start symbol,
Probabilities in a PCFG can be seen as a filtering mechanism.
(v) Pis the set of probabilities on production rules. • For an ambiguous sentence, the trees bearing maximum probability arc singled
• out. while all others are discarded.
~ 2.8.3 Relation with hidden Markov Models The level of ambiguity is related to the size of the singled out set of trees.
•
• PCFG model computes the total probability of all derivations that are consistent
with a given sequence, based on some PCFG.
• Th.is is equivalent to the probability of the PCFG generating the sequence. It is a
measure of the consistency of the sequence with the given grammar.
Shift-Reduce parser attempts for the construction of parse in a similar manner as
• Dynamic programming variants of the CYK algorithm find the Viterbi parse of • is done in bottom-up parsing, i.e. the parse tree is constructed from leaves
a RNA sequence for a PCFG model. The parse is the most likely derivation of
(bottom) to the root (up).
the sequence by the given PCFG.
• A more general form of the shift-reduce parser is the LR parser.
~ 2.8.4 Viterbi PCFG Parsing • This parser requires some data structures i.e.
(i) An input buffer for storing the input string.
• Viterbi PCFG parser is a bottom-up that uses dynamic programming to find (ii) A stack for storing and accessing the production rules.
the single most likely parse for a text.
• It parses texts by iteratively filling in a most likely constituents table. This table a. 2.9.1 Basic Operations
records the most likely tree structure for each span and node value.
(i) Shift : 1bis involves moving symbols from the input buffer onto the stack.
~ 2.8. 5 How a PCFG Differs from CFG ? (ii) Reduce : ff the bandJe appears on top of the stack then, its reductio~ by usin_g
appropriate production rule is done. It means that RHS of a producbon rule is
• A PCFG differs from a CFG by augmenting each rule with a conditional popped QUt of a stack and LHS of a production rule is pushed onto the stack. .
15
probability : A ➔ B [PJ. Here P expresses the probability that non-terminal A (iii) Accept : ff only the start symbol is present in the stack and the input buffer
will be expanded to sequence fl empty, then the parsing action is called acccpL
• Associate a probability with each grammar rule. When accepted action is obtained. it implies that successful pming is done.
I
(iv) Error : This is the situation where the parser can
C ~ 2.8.6 How PCFG Resolves Ambiguity? (i) neither perform shift action
(ii) nor reduce action and
n PCFG parsers resolve ambiguity by preferring constituents (and parse tree) wilh
the highest probability. (iii) not even accept action.
0 ~ 2.8.7 How does PCFG is used ? &.. 2.9.2 Shift Reduce Parsing in Computer
• Shift It eratcs the pasg Tice
1\ The PCFG is used to predict the prior probability distribution of the strucllJlli reduce parser is a type of Bottom-up parser. gen
whereas posterior probabilities are estimated by the inside-outside algorithm and the from leaves to the Root.
~1 most likely structure is found by the CYK algorithm..
• In 8hift
--. reduce parser, the input string will be
red ced t0 the starting symbol.
u
1----- --==- -~--- --- IJ:1 HIN SHAH Venture
[i1
(P8-8S) Tech-Nee Publicatlons...A SACHIN SHAH Venture (P8-86) ~ Tech-Nee Publicalions...A SAC
LP (SPPU-SemB-Comp.) (Lang. Syntax & Semantics)· •••Page no. (2-21)
NLP (SPPU-SemB-Comp.) (Lang. Syntax & Semantics) .... Page no. (2·20) N
~ ,,·' . ,,
• This reduction can be produced by handling the rightmost derivation in reverse ~ ·2~1C)T OP DOWN PARSE R: EARLY PARSER '
i.e. from starting symbol to the input string. ' I""
.
The Early Parser is an algorithm for parsing strings that bcl ong to a given
~ 2.9.3 Why Bottom -up Parser is called Shift Reduce • context-fr ee language.
Parser ? Depending upon the variants, it may suffer problems with certain nulJable
• grammars.
Bottom-up parsing is also called shift-and-reduce parsing where shift means read The algorithm uses dynamic programming.
the next token, reduce means that a substring matching the right side of a production • It is mainly used for parsing in computational linguistics.
A. • • /" · p_ Earley Parser , '.. , •1-· ,;.
Passing grammars that arc context-free
a. 2.9.4 What are the 2.Conf llcts In Shift Reduce Class :
String
Parser ? Data structure :
Worst-case perfonnan ce :
• In shift reduce parsing, there arc two types of conflicts : 0 (n) for all deterministic context-free grammars
Best-case perfonnan ce :
(i) Shift-redu ce (SR) conflict and 2
.Q (n ) for unambiguous grammars
(ii) Reduce-re duce conflict (RR) 3
• For example, if a programm ing language contains a terminal for the reserved Average performan ce : 9 (n )
word ''while", only one entry for "while" can exist in the state.
• A shift-reduc e action is caused when the system does not know if to 'shift' or a. 2.10.1 Functio ning of Earley Parser
'reduce' for a given token.
• Early parser.; are appealing because they can parse all context-free languages.
3
a. 2.9.5 Examp le • The early parser executes in cubic time in the general case O (n ),.where n is the
2
length of the parsing string, quadratic time for unambiguous grammars O (n ),
Ex. 2.9.1 : Consider the grammar
and linear time for all deterministic context free grammars.
E ➔ 2E2, E ➔ 3E3, E➔ 4
• It performs well when the rule, are written left-recursively.
Perform shift-reduce parsing for input string "32423".
0soln. : a. 2.10.2 Earley Recogn iser
= • The following algorithm describes the Earley recogniur.
Stack Input Buffer Parsing Action
• The recognizer can be modified to create a parse tree as it recognizes, and in
$ 32423 $ shift
that way it can be turned into a parser.
$3 2423$ shift
&. 2.10.3 The Algorit hm
$ 32 423 $ shift th
• Herc; a. 13 and y represent any string of terminals/nonterminals (including e
$ 324 23 $ Reduce by 8 ➔ 4 empty string), X and Y represent single nonterminals, and a represents a
lenninal symbol
$328 23$ shift
• Earley's algorithm is a top-down dynamic programming algorithm. .
$32E2 3$ Reduce by E ➔ 2 E2 • Herc, we use Earley's dot notation : given a production X ➔ a'3 tbe notation~
➔ a • 13 represents a condition in which a has alrcady been pmed and ~
15
e $ 3E 3$ shift
expected,
. 'tion n is the position
$383 $ Reduce by 8 ➔ 3 83 • Input position O is the position prior to mput. Input posi
rf\ r--- - - - - - - = = - - - - - - - - - -
$E $ Accept
-
after accepting the n
th
token.
D The predictive. parser does not suffer from backtraclci ng. To accomplish. .
• . its
. to the next
iasks, the predictive parser uses a look,.ahead pointer• whi ch pomts
M(C, +)
input symbols.
Fig. 2.11.2
ii • At each step, the choice of the rule to be expanded is made upon the next
terminal symbol.
human level accuracy from the computers.
It is used in tools like machine translations, chatbots, search engines and text
~ ~ 2.11.s The Difference between Predictive Parser analysis.
Q
and Recursive Descent Parser
~ 2.12.2 Syntactic and Semantic Analysis
I
C
~\ The main difference between recursive descent parser and predictive parser is I~- - - - - - - ,.,- - -y--.. .- -------. ._- --------~------: ---- j
that recursive descent parsing may or may not require backtracking while predictive : ~Q. Wrat Is syntactic and seroanti~ analysis 7 , :
parsing does not require any backtracking. ----~ .
-------- --
• Theoreti~;. ;;~c-a:~y :i; ;;;;n;s-;d-c;~: :hether the instance of
8 ~ 2.11.6 Drawback s of Predictive Parsing I.he language is 'well formed' and analyses its grammatical structure.
Drawbacks or disadvantages of predictive parser are ; • Semantic analysis analyses its meaning and finds out whether it 'makes sense'·
1/ (i) It is inherently a recursive parser, so it consumes a lot of memory as the staei • SYntactic analysis depends on the types of words, but not on their me.aning.
1 grows.
1
(PB-86) [iJ Tech-Nao Publicatlons...A SACHIN SHAH VenltJr8 (PS.Sa) ~ Tech-Neo Publlcatlons...A SACHIN SHAH Venture
NLP (SPPU-Sem8-Comp.)
a. 2.12. 3
Processing
(Lang. Syntax & Semantics) .... Page no. (2·2B)
(iii) frames
, ) Conceptual depende
(JV •
ncy (CD)
(v) Rule_ based architecture
(Lang. Syntax & s~manlics) .. ..Page no. (2-27)
''·f The text - corpus method uses the body of texts written in any natural
It deri ves the set of abstract rules which govern that languag e.
language.
linguisti c debate and further study.
ffPB-BB) Ii] Tech-Nao Publlcalions...A SACHIN SHAH VenttJre (PB-86) ~ Tech-Nao Publications...A $ACHIN SHAH Venture
~ f
(Lang. Syntax & s .
NLP (SPPU-Sem8-Comp.) emant,cs) .... Page no. (2-31)
NLP (SPPU-Sem8-Comp.) (Lang. Syntax & Semantics) .... Page no. (2-3Q) th
'"'ey first identify concepts-and then establis hing
etypes. ,
terms used to designat c
'" . th ches are used for both
them. In practice, e two approa
a. 2.16 .2 Corpus Approach
niere are other types of dictionaries )that• do not fit i to th
e above
. .
d1s11nction ·
I- - - - - - - -
------ > - •- - - -
• -•- -
, --
-- . ,,,-,cv - - • • • • 'lin
For exampIe, b1 iua . di .
. nes.
•
J (transJation diction n
• aries d',_.; .
' i~uonanes of synonv n,,
,~
What is a corpu~ ~~pr9ach ? . . (the saurs) and rh ynung ct1ona
, GQ. - - - - - - - - - - - - - - - - - - - • - • I
_ .,_ ,. _ -
pa
Toe word dictionary is usually meant to refer to a general rpose monolingual
- _ __._ - - - - - - - - -
- -
collection of naturalJy
I, -
dictionary.
(e.g. law) and
of one or more specific (ii) A single • field dictionary covers one particular subjccl
• A dictionary is a listing of lexemes from the lexicon specialised field (e.g.
languages. They are arrang ed alphab etically. For ideogr aphic languages by (iii) A sub-fteld dictio nary. It covers a more
radicaJ and stroke. constitutional law).
logies, pronunciations, The 23-Ianguare Inter- Activ e TerminoloeY for
Europe is a multi-field
• They include infonnation on definitions, usage, etymo field.
translation etc. dictionary. The American National Biography is a single
among the data. al Biogra phy projec t is a sub-fie ld dictionary. .
It is a lexicographical reference that shows interrelationships The African Ameri can Nation
of defined tellll5 rn a
A clear distinction is made between general and
speda lized dictionaries. Another variant is the elosury, an alphabetical list
specialized fields, rather than • specialised field, such as medicine (medical dictionary).
Specialized dictionaries include words in
e •
complete range of words in the language.
Lexical items that describe concepts in specific fields
are usually called terms ~ 2,17 ,3 Defi ning Dict iona ries .
instead of words. A . f th , l~st m(.anings of the
• e sunp
, definmg dictionary. provides a core glossary
O
iological, mapping word
f • In theory, general dictionaries are supposed to be semas ~pl cst concepts.
aries arc suppo sed to be onomasiological,
1\ to defirution, while specialized diction ACHIN SHAH Venture
(Pe-as) rc1
~ Tech-Nao Publications ...A S
(PB-BB ) [iJ Tech-Neo Publications...A SACHIN SHAH Venture
•
•
NLP (SPPU-SemS-Comp.) (Lang. Syntax & Semantics) .. .. Page no. (2.32)
~ 2.1s.1 BabelNe t
(Lang. Syntax & Semantics)
N
idioms and metaphors can be defined. Ope . Multilingual encyclopedic dictionary Linked data
· · non-commerci'al share A ,u.C
Type . : Attnbution
LiceD"
,a. 3.0 unported
a. 2.17.4 Historica l Dictiona ries
Website : babelnet.org Fii:. 2.18.I : BabdNa
• A historical dictionary is a specific kind of descriptive dictionary. It describe, ~ 2.18.2 Statistic s of BabelNet
the development of words and senses over time, using original source material
BabelNet (Version 5.0) covers 500 languages. It contains almost 20 million
to support its conclusions. • synsets and around I .4. billion word senses.
• Dictionaries for Natural Language Processing Dictionaries for Natura] Each Babel Synset contains 2 synonyms per language, i.e. word senses. on
Language Processing (NLP) are Built to be used by Computer Programs. • average.
The direct user is a program, even though the final user is a human being. Such version 5.0 also associates around 51 million images with Babel Synsets and
• • provides a Lemon RDF encoding of the resource, available via a SPARQL
a dictionary does not need to be printed on paper.
endpoint 2.67 million synsets are assigned domain labels.
• The structure of the content is not linear, ordered entry by entry. It has a
complex ~etwork foan. ~ 2. 18,3 Applicat ions
• Since most of these dictionaries control machine translations or cross-lingual Babe!Net has been shown to enable multilingual Natural l..ugu■se Proassinc
information retrieval (CLIR), the content is usually multilingual and usually • applications.
of huge size. The Jexicalised knowledge available in Babeloet has been shown to obtain
•
• To allow formalized exchange and merging of dictionaries, an ISO standard state-of-the-art results in :
called Lexical Makeup Framework {LMF) bas been defined and used among (i) Semantic relatedness
the industrial and academic community. (ii) Multilingual Word Sense Disambiguation.
{iii) Multilingual Word Sense Disambiguation and Entity Lin.king with the
• Babelnet is a multi)ingual lexicalised semantic network. ~ 2,19 RELATIO NS ANONG LEXEMES AND THElR
• Babelnet was automatically created by linlcing most popular computational SENSES
lexicon of the English language, world net.
• The integration is done using an automatic mapping and by filling in lexical We have seen that semantic analysis can be divided into the following 111,'0
pans :
gaps in resource-poor languagai by using statistical machine translation.
(I) In the first part of the semantic analysis, the study of the meaning of indMdua.l
• The result is an encyclopaedic dictionary. It provides concepts and named . . in
words is perfonned. This part is called lexical semanti_cs.
entities that are JexlcalJsed in many languages and connected with large (l) In the second part, the individual words will be combmed 10 P10nde murung
1
NLP (SPPU-Sem 8-Comp.) (Lang. Syntax & Semantics) .. .. Page no. (2-35)
NLP (SPPU-Sem 8-Comp.) (Lang. Syntax & Semantics) .. .. Page no. (2-34)
•
•
The generic term is called hypemym and its instances are called hyponyms. a...,
As an example, the word colour is hypemym and the colour red, green etc are
--
~ 2 , 1 9,2 Ambigu ity and Uncertainty in language
•.Ambiguity' refers to the meaning : 'Double Meaning' .
hyponyms. • .Ambiguity in natural language processing refers to the ability of being
(2) Homonym y
• understood in more than one way.
We can say that ambiguity is the capability of being understood in more than
• It is defined as the words having same spelling or same form but having • one way obviously, Natural language is very ambiguous.
different and unrelated meaning.
We discuss various types of ambiguities in NLP :
• For example, the word "Bat" is a homonymy word because bat can be used to •
hit a ball or bat is a flying mammal also. (I} Lexical Ambiguit y
(3) Polysemy The ambiguity of a single word is called lexical ambiguity. For example the
word walk as a noun or a verb.
• Polysemi means "many signs". It is a Greek word.
• It is a word or phrase with different but related sense. (II) syntactic Ambiguit y
• Polysemy has the same spelling but different and related meanings.
When the sentence is parsed in different ways, this type of ambiguity occurs.
• For example, the word "bank" is a polysemy word with the following different • For example, the sentence "The man saw the girl with camera."
meanings :
It is ambiguous, whether the man saw the girl with the camera or he saw the
(i) A financial institution • girl taking photos.
(ii) The building in which such an institution is located.
(iii) A synonym for "to rely on". (Ill) Semantic Ambiguit y
(4) Differenc e between Polysemy and Homonym y When the meaning of the words can be misinterpreted, such kind of ambiguity
HOmonyrny ~
•
Sr.No. Polysemy occurs.
It has also same spelling or syntax. In short, semantic ambiguity occurs when a sentence contains an ambiguous
I It has same spelling or syntax. • ,
II The meanings of the word are The meaning of the words are not word or phrase.
For example, the sentence, '"lbe bike hit the pole when it was moving" is
related. related.
•
m For example, the word "Bank" But for the word "Bank" we can having semantic ambiguity.
the meaning are related. write the meaning as a financial The interpretatio n can be done as : '"lbe bike, while moving, bit the pole and
institution or a river bank. Here the
•
'"lbe bike hit the pole while the pole was moving''.
meanings are not related, so it is an
example of Homonymy (Iv) Anaphori c Ambiguit y
• This type of ambiguity arises due to the use of anaphora entities in discourse.
(5) Synonym y
• For example, the horse ran up the hill. It was very steep. It soon got tired.
.. It is a relation between two lexical items having different forms but expressing
the same or a close meaning. Examples are 'author / writer', 'fate/ destiny' .
• Here, the anaphoric reference of "it'' in two situations cause ambiguity·
~
r (6) Antonymy (v) Pragmatic Ambiguit y ,
·
· 1e mterpre
• muItip talion to the situation, then
1,, • It is a relation between two lexical items possessing symmetry between their • When the context of a phrase gives
I
semantic components relative to an axis. this kind of ambiguity arises. .
• · amb'igui' ty anses ·
Th us when the statement is not specific, pragmatic
• The scope of antonymy is a follows : .
e (i) Application of property or not - Example
'certitude/incertitude'.
is 'life/death ', • ' t rprel3tJOns·
· 1 10
For example, the sentence, "I like you too" can have multlp e e ·
(i) I like you (just as you like me),
I (ii) A pplication of scalable property; Example is 'rich/poor' , 'hot/cold' .
(ii) I like you (just like someone else dose).
f (iii) A PPIication ofa usage: Exapiple is 'father/son', 'moon/sun ' .
.i rt
NLP (SPPU-Sem8-Comp.) (Lang. Syntax & Semantics) .... Page no. (2-3&) ' p (SPPU-Sem8-Comp.) (Lang. Syntax & Semantics) P
.. ·· age no. (2-37)
,.....
N ,_
DtSEN §~:·!iIS Al'Hil ~UAT ~ON~( ~SD,) ~ '2'- 2 ,20.3 Relev ance of WSD
.. •*WOR
.,.., ' .~.20 - . ' . ,,.
.
Word Sense Disambi guation is related to pans of speech ta .
• To realise the various usage patterns in the languag e is importa
nt for Vario • 1
·mportant part of the whole. Natural
.
Language Process· gging and IS an
mg process.
us .
Natural Languag e Processi ng Applicat ions. '"'e ,nain problem that anses m WSD is the whole
meanmg of word sense·
• Word Sense Disambi guation is an importan t method of NLP,
using it the • '" , t · .
Word sense 1s no a numenc quantity that can be measured as a
true or false,
in a Particular and can be denoted by I or 0.
meaning of a word can be determin ed. And that can be used
context. Tue meaning of a word is contextual and depends on its usage.
The main problem of NLP systems is to identify words properly
and lo • Lexicog raphy ~eals with generalising ~e corpus and explaining the
full and
• in a particula r sentence . • es these meanings fail to apply to the
determin e the specific usage of a word extended meanmg of a word. But sometim
arises while
• Word sense Disambi guation solves the ambigui ty when .t hat algorithms or data.
the same word, when it is used in differe01 But, WSD has immens e applications and uses.
determin ing the meaning of
situations.
• If a compute r algorith m can just read a text and come to know different uses
of
• a text, it will indicate vast improve ment in the field of text analytics.
a. 2.20.1 Word -Sens e Disam bigua tion Applic ations .-. ?' . ,- '
We mention below the various applicati ons of WSD i~ various text
processing ~ . 2.~i KNOW.LEDGE;BASED,;APPRQACH = ~-
and NLP fields.
es :
(i) WSD can be used with Lexicography. Much of the modem
Lexicog raphy is • A knowled ge based system behaviou r can be designed in following approach
can provide significa nt textual indicators.
corpus-based. WSD in Lexicog raphy
tion Extracti on tasks. (1) Declar ative Appro ach
(ii) WSD can also be used in text mining and Informa
can TElL
It can be used for the correct labelling of words, because the main
aim of WSD • In this approac h, stating from an empty knowledge base, the agent
:1 ly in a particula r sentence . sentences one after another.
n is to understa nd the meaning of a word accurate with its
0 nd the differenc e • This is to be continue d till the agent has knowledge of how to work
l (iii) From a security point of view, a text system should understa informat ion in empty knowledge - based system.
environm ent It stores required
i between a coal "mine" and a land "mine''.
t • This is known as declarat ive approach .
threat.
;, (iv) We note that the former serves industria l purposes , the latter is a security
'n Hence a text-min ing application must be able to determin e the differenc e
(2) Procedural Appro ach
l
between the two. in
~ ion Retrieval • In this method, it converts required behaviour ~tly into program code
l (v) WSD can be used for Informat ion Retrieva l purposes . Informat
empty knowled ge - based system.
I'l systems work through text data. And it is based on texrual informat ion. Hence
• Compared to declarat ive approach, it is a contrast approach . Here,
coding
l knowing the relevanc e of using a word in any sentence helps.
C system is designed .
a. 2.20.2 Challe nges in Word Sense Disam bigua tion ~ 2.21.1 Lesk Algor ithm
WSD faces lot many challeng es and problems. in 8 given
the most common The lesk algorith m is based on the assumption that words
(i) The differen ce between various dictionaries or text-corpus is
'neighbourhood' will tend to share a common topic.
problem .
the sense In a simplifie d manner, the Lesk algorithm is to com~are the dictionary
Differen t dictionaries give different meanings for words. That makes rhood .
A lot of text informat ion is available definition of an ambiguo us word with the terms contained in its neighbou
of the words to be perceive d differently. tation appcm like
ng properly . Versions have been adapted to use wordnet An implemen
and it is not possible to process everythi
this:
Q (ii) Different.algorithms are to be formed for different appli~ations and that becomes I p .
or every sense of the word being disamb1gua
. ted
one s
hould count the. number
.
·
r~ a big challeng e for WSD. ·
of Words that are in both the neighbou rhood of that word and in the dicnonary
I l (iii) Words cannot be divided into discrete meaning they have related meaning s. And
definition of that sense.
,/J thfa ~,.., , Jot of pmblo= .
llitl T
ooh-Noo P,blloatio,,. ...A SACHIN SHAH V.,eo
(PB-86) [il Tech-Nao Publicalions...A SACHIN SHAH Venture
~ (PS-86)
NLP (SPPU-Sem8-Comp.) (Lang. Syntax & Semantics) .... Page no. (2-38) ppU-SemB-Comp.) (Lang. Syntax & Semantics) P
NLP (s .. .. age no. (2-39)
2. The sense that is to be chosen is the sense that has the largest number of:
count
We consider an example illustrating this algorithm, for the context "pine cooC",
--
~l,22 DICTIONARIES FOR REGIONAL LANGUAGES
u:ftdi is the official. language of India while English being thc sccond officiaJ
.
(l) ru,, .
Dictionary definitions are : PINE language. But there 1s no nallonal language as per the constirution.
(Z) The oxford dictiona,:y is one of the most famow English language dictionaries
1. Kind of evergreen tree with needle-shaped leaves.
in the world. It has_many ex~ features that augment it as dictionary tool for
2. Waste away through sorrow or illness. language learners, like the ability to make notes on definitions and spellings. a
flashcard learning system and a great thesaurus.
CONE
( ) Hindi is the offi':ial lan~ge. In addition to the official language. the
I. Solid body which narrows to a point. 3
constitution recogruscs 22 regional languages, which docs not include English,
2. Something of this shape whether solid or hollow. as &eheduled languages.
3. Fruit of certain evergreen trees. ( ) The Sanskrit language is the oldest language in India. Sanskrit language has
4
We note that, the best intersection is pine #lr.cone#3 = 2. been spoken since 5000 years before ChrisL Sanskrit is still the official language
of India. But. in the present time, Sanslcril has become a language of wonhip
a. 2.21.2 Simplified Lesk Algorithm
(5)
and ritual instead of the language of speech.
There are 22 official regional languages in India. They are: Assamese. Bengali.
• In simplified Lesk algorithm. the comet meaning of each word in a given Bodo, Dogri. Gujarati.. Hindi. Kannada. Kashmiri. Konkani. Ma.irbili. Malyalam.
context is detennined: by noting down the sense that overlaps the most between Manipuri, Marathi. Nepali, Oriya. Punjabi, Sanskrit, Samhali. Sindhi Tamil.
its dictionary definition and the given context Telugu and Urdu.
• Instead of collectively det.ermining the meanings of all words in a given cont.ex~ (6) 1be youngest language in India is Malayalam.
this approach takes into account each word individually, independent of the It belongs to Dravidian language group and considered an die IIIlllle$t 111d the
youngest language of the Dravidian language group. Government of India
meaning of the other words occurring in the same context
dccwed this language an 'the classical language of India in 2013·.
• A comparative evaluation perfonned has shown that simplified Lesk algorithm
(7) Currently six languages enjoy the ·classical staruS' TIIIlil (declami m 2004).
can outperform the original definition of the algorithm. both in terms of
Sanskrit (2005).Kannada (2008). Telugu (2008). Malyal.am(2013) 111d odiya
precision and efficiency.
(2014).
• Evaluating the disambiguation algorithms on the Scnscval-2 English all word.I
data, they measure 58% precision using the sirnpHfied lcsk algorithm compared
to only 42% under the original algorithm.
1~ 2.23 DISTRIBUTIONAL SEMANTICS
I
language data. . . the meanings of linguistic
C (iv) Dictionary glosses are fairly short and do not provide sufficient vocabulary to (iu" ) The aim of distributional semanucs 1s to learn
I~I
0
relat.e sense distinctions.
This is a significant Hmitation.
Different modifications of this algorithm have appeared. These works use other
resources for analysis (the saur uses, synonyms dictionaries or morphological aJJd
expregsions from a corpus of text . .
The core idea, known as the distributional hy-pothe$1S, 15 ~
the cootc.US in
L i@
NLP (SPPU-Sem8-Comp.) (Lang. Syntax & Semantics) .. .. Page no. (2-40) NLP (SPPU-Sem8-Comp.) (Lang. Syntax & Semanlics).... Page no. (2-41)
The underlying idea that 'a word is charactised by the company it keeps' Was ;;: Assumption of LSA
popularised also here. The words which are used in the same context are analogous to each other.
(v) Distributional structure 1
· the hidden semantic structure of the data is unclear due to the ambiguity of the
The distribution of an element will be understood as the sum of all its 2. words ch osen.
environments. An environment of an element, A is an existing array of its co.
~ Singular Value Decomposition (SVD)
occurrents, i.e. the other elements, each in a particular position, with which A
SVD is the statistical method that is used to find the latent (hidden) semantic
occurs to yield an utterance.
of words spread across the document.
(vi) Distributional properties strUCture
There are three basic properties of a distribution : location, spread and shape. Let
C = collection of documents.
The location refers to the typical value of the distribution . such as the mean. The d = number of documents,
spread of the distribution, is the amount by which similar values differ from n = number of unique words in the
larger ones.
M = dx n
(vii) Semantic criteria The SVD decomposes the M matrix i.e. word i.e. word to document matrix int~
A verb's meaning has to do with events. Correspondingly, we can say that a three matrices as follows : ·
noun denotes an entity, adverbs modify events and so on. M = UL VT
One can call the classification of words on the basis of their meaning : a Where u = distribution of words across different contents.
semantic criteria. r= diagonal matrix of the association among the contents.
VT ,;,, distribution of contexts across the different documents.
• Topic modelling is recogmsmg the words from the topics present in the
~ 2~26 SELF:-LEARNING
1 ,',,
~ "'
TO,PICS I '-.
~\.
0 • The type of annotations that are created for dictionary entries within text have
Latent semantic analysis is a natural language processlng method that uses the the same name as the dictionary.
statistical approach to identify the association among the words in a document.
• LSA deals with the following kind of issue. a, 2,26,2 Detail Explanation of Dictionary Lookup
Examp~e _:. ~obi!;• phone, ceU phone are all similar but if we say "The ccU
lr
•M orph ological
· parsing is a process by wh"1ch worel forms of a language are
ph~ne ts nngmg. Then the documents which have "cell phone" arc only .
. . . Morphological systems
associated with corresponding linguistic descnpaons. d t
retnev~ whereas the documents containing the mobile phone telephone are that specify these associations by merely enumerating them case by c_ase O nrdo
not retrieved. ' ' . which analyzmg a wo
~ ffier any generalization means. Likewise for systems m
,~fP-B-
-B-S-)--------;;'1~T_e_c-h--N-eo_P_ub-lf-ca_ti_on-s-..-.A- S_A_C_H_I_N_S_H_A_H_V_e_nt-ure-
(PB-86) 11:i CHIN SHAH Venture
~ Tech-Neo Publlcallons ...A SA
NLP (SPPU-SemB-Comp.) )
(Lang. Syntax & Semantics) ....Page no. (2-4 LP (SPPU-SemB-Comp.) (Lang. Syntax & Semanr )
2 N •_ r;: res .... Page no. (2-43)
fonn is reduced to looking it up verbatim in word lists, dictionan ~
~ . 2 ,I, .. ·
~ notation •,: , • ~ · - ,,.
databases, unless they are constructed by and kept in sync with cs, Or · •, Countriil$ ·~ • · t"
sophisticated models of the language. mo~ 'fype , '"!.
data structure can be optimized for efficient lookup, and the results can be begin : 2f '
shared. Lookup operations are relatively simple and usually quick. Dictionan
end : 26' ;,;;f~ •. r ,· · -
can be implemented, for instance, as lists, binary search trees, tries, hash w,1:
and-so on. ' ,cov~ ~d-'fe!t...:. Geyjg_M /. - f, ◊, ,. . .
word forms and their desired Annotations include the following features
• Because the set of associations between
descriptions is declared by plain enumeration, the coverage of the model is saseform
finite and the generative potential of the language is not expl?ited. Developing Toe base form of the recovered variants
as well as verifying the association list is tedious, liable to errors, and likely
Id
inefficient and inaccurate unless the data are retrieved automatically from large The ID of the dictionary entry
and reliable linguistic resources.
Despite all that, an enumerative model is often sufficient for the given purpose, begin
• The offset that indicates the begin of the covered text
deals easily with exceptions, and can implement even complex morphology. For
instance, dictionary-based approaches to Korean [35] depend on a large end
dictionary of all possible combinations of allomorphs and morphological The offset that indicates the end of the covered text
alternations. These approaches do not allow development of reusable Covered text
morphological rules, though (36]. Toe variant of the base form that is found in the text
• The word list or dictionary-based approach has been used frequently in various • One can also import dictionaries in the design studio dictionary - XML format
ad hoc implementations for many languages. We could assume that with the
or dictionaries that are compatible with language wave dictionary-resource
availability of immense online data, extracting a high-coverage vocabulary of
word forms is feasible these days (37]. The question remains how the associated format
annotations are constructed and how informative and accurate they are. One can use dictionaries with the Dictionary Lookup operator. These
References to the literature on the unsupervised learning and induction of dictionaries might contain more than one annotation type, and the fearures for
morphology, which are methods resulting in structured and therefore non these annotation types might vary from type to type.
enumerative models, are provided later in this chapter.
~ Example for dictionaries '&. 2,26.3 Finite State Morphology
The following table shows the contents of the dictionary countries
BY finite-state morphological models, we mean those in which the specifications
Base fonn Variant
written by human programmers are directly compiled into finite-state
Sri Lanka Ceylon
transducers. The two most popular tools supporting this approach, which have
Germany BRD been cited in literature and for which example implementations for multiple
United states ' US; USA languages are available online, include XFST (Xerox Finite-State Tool) (9] and
Actually Sri Lanka was known as Ceylon, the following annotations are created- LcxTools [11).
;AnnOfatf~n;J~~ ~· · ;f ": ~~ ~ ~-~m"' J"'· . ., ~ --~ ~ ·, · ~··: -~-$-,M.~- • ~ ~~
Finite-state transducers are computational devices extending the power of finite-
state automata They consist of a finite set of nodes connected by directed edges
labeled with pairs of input and output symbols. In such a network or graph,
nodes are alsq called states, while edges are called arcs.
Traversing the network from the set of initial states to the set of final ~tates
al 0 f encountered mput
ong the arcs is equivalent to reading the sequences
symbols and writing the sequences of corresponding output symbols.
I
C
i•
,1
Ill
•
•
·•
Words are typically composed of smaller units of meaning, called morphemes.
The morphemes that make up a word must be combined in a certain order : pid-
less-ness is a word in English but piti-ness-les s is not.
Most languages build words by concatenation but some languages also exhibit
non-concatenative process such as interdigitation and reduplication.
• Complex regular expressions can be built up from simpler ones by means_of
regular expression operators. Square brackets, [ ], are used for grouping
expressions.
Beca~se both regular languages and regular relations are clos~ un~er
concatenation and union, the following basic operators can be combined with
any kind of regular expression :
'5\
II ► 2. Morphological Alternation s
A 1B union,
• The shape of a morpheme often depends on the environmen t : pity is realised as
'l\ •
piti in the context of less, die as by in dying.
The basic claim of the finite-state approach to morphology is that the relation
between the surface-forms of a language and their correspondi ng lemmas can be
A B concatenation
(A) Optionality ; equivalent to [A I OJ,
A + Iteration; one or more concatenatio ns of A,
A* kleene star; equivalent to (A+) subtraction, and
described as a regular relation. ll.
(I) Regular languages are closed under compl ementa on,
'IJ
t.C • If the relation is regular, it can be denned using the metalanguage of regular intersection, but regular relations are not.
expressions. Then with a suitable compiier, the regular expression source code Hence, the following operators can be combined only with a regular language :
~\ •
can be compiled into a finite-state transducer and that implements the relation
computationally.
In the resulting transducer, each path (= sequence of states and arcs) from the
initial state to a final state represents a mapping between a surface form and its
- A complement
\ A Term complemen t; all single symbols strings not in A.
A & B Intersection
e• lemma This is known as th~ lexical form.
For example, the comparative of the adjective big is bigger can be represented
A- B Subtraction (minus)
(II) Regular relations can be constructed by means of two basic operators :
in English lexical transducer by the path in Fig. 2.26.1 where the zeros represent A · X · B Cross product
\t epsilon symbols.
I ---------=re-,=------------- (Ps.sei
ci
~ Tech-Neo Publications ...A SA
CHIN SHAH Venture
~ Tech-Neo Publications...A SACHIN SHAH Venture
NLP (SPPU-Sem8-Comp.) (Lang. Syntax & Semantics) .... Page no. (2-4 )
8 NLP (sppu-Sem8-Comp.) (Lang. Syntax & Semantics)
.. ..page no. (2-47)
A · 0 • B composition ..-- A signal noise ratio of 1000 is commonly expressed as
The cross product operator, · X ·, is used only with expressions that den 8
regular language; it constructs a relation between them. ote 1 10 [logi!OOO)] = [log1o<IO )]
[A · X · BJ designates the relation that maps every string of A to every stria = 10 [3 log10(10)]
B. hf
= 10 (3 x I) = 30 dB
The notation a : bis a convenient short hand for [a· x · b].
( ) The maximum data rate of noisy channel?
4
Remarks , Toe amount of thermal noise is calculated as the ratio of signal power to noise
(I) Replacement and marking expression in regular expressions have tutned out to power, SNR. Since SNR is. the ~o of two powers that varies over a very large
be very useful for morphology, tokenization and parsing. range, it is often expressed m dectblcs, called SNRdb and calculated as :
(2) Descriptions consisting of regular expressions can be efficiently complied into SNRdb = 1O[log10 SNR]
finite-state-networks. And they can be determined, minimised in other ways to With these characteristics, 't he channel can never transmit much more than 13
reduce the size of the network. • Mbps, no matter how many or how few signal levels are used and no matter
(3) Also they can be sequentialised, compressed, and optimised to increase the . how often or-how infrequently samples are taken.
application speed. Examples : A telephone line normally has a bandwidth of 300 Hz (3000 to 3000
(4) Regular expressions have semantics which are clean and declarative. Hz) assigned for data communication.
(5) They constitute a kind of high level programming language for manipulating
strings, languages and relations. (5) Nolseless Channel
(6) Regular languages and relations can be encoded as finite automata, they can be An idealistic channel in which no frames are lost, corrupted or duplicated. The
more easily manipulated than context-free and more complex languages. protocol does not implement error control in this category.
(7) With regular-expression operators, new regular languages and relations can be W various edit distance
derived directly without mentioning the new grammar rules. (1) In computational linguistics and computer science, edit distance is a way of
This is a fundamental advantage over other higher-level formalism. quantifying how dissimilar two strings (e.g. words) are to one another _by
counting the minimum number of operations required to transform one stnng
a. 2.26.4 Noisy Channel Models into the other.
(2) The maximum edit distance between any two strings (even two identical ones) is
( 1) The noisy channel model is a framework used in natural language processing
(NLP) to identify the correct word in situations where it is unclear.
infinity, unless we add some restrictions on repetitions of edits. In_ spite of that
there can be an arbitrarily large edit distance with any arbitrarily large set
The framework helps detect intended words for spell checkers, virtual assistants.
translation programs, question answering systems and speech to text software. character set.
(3) The minimum edit distance between two strings is defined as the minimum
(2) Difference Between Noisy Channel and Noiseless Channel · b' · ) eededto
number of editing operations (insertion, deletion, su stitutton n
• The capacity of a noiseless channel is numerically equal to the rate at which it transform one string into another.
communicates binary digits.
(4) Operations In edit distance
• The capacity of a noisy channel is less than this because it is limited by the
amount of noise in the channel. Most commonly, the edit operations for this purpose are :
• Noiseless channel means that there will not any kind of disturbance in the path (i) insert a character into a string,
when data is carried forward from sender to receiver. (ii) delete a character from a string, and
(iii) replace a character of a string by another character; hte' distance.
) {3) Capacity of noisy and noiseless channel
F •
or these operations edit distance is sometimes c
ailed as Levens m
. ed from the edit
\ • The channel capacity is directly proportional 'to the power of the signal as :
(S) ~e normalised e.ditdistance is one of the dis~;s ~et~vaccount the length
distance. It is useful in some applications because it es m
'.] SNR • (Po- of ,ig,,,JJ / (Pow.,- of noi"')
IJYJ!m3
NLP (SPPU-Sem8-Comp.) (Lang. Syntax & Semantics) .... Page no. (2-4Sj
The normalised edit distance is not defined in terms of edit operations but rath
in terms of the edit path. er
(6) Edit distance is usually defined as a parametrisable metric. It is calculated with a
specific set of allowed edit operations, and each operation is assigned a cost
(
'I
I',
,