0% found this document useful (0 votes)
93 views30 pages

Morp

The document discusses morphological analysis and describes how it can be used to decompose words into morphemes and analyze their morphological structure. It outlines different types of morphological analyzers, including those based on dictionaries and finite-state techniques. Finite-state transducers are presented as a way to model the relationships between surface forms and morphological analyses of words. The document also discusses issues that arise in morphological analysis, such as phonological alterations and morphotactics.

Uploaded by

Michael Doley
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
93 views30 pages

Morp

The document discusses morphological analysis and describes how it can be used to decompose words into morphemes and analyze their morphological structure. It outlines different types of morphological analyzers, including those based on dictionaries and finite-state techniques. Finite-state transducers are presented as a way to model the relationships between surface forms and morphological analyses of words. The document also discusses issues that arise in morphological analysis, such as phonological alterations and morphotactics.

Uploaded by

Michael Doley
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Morphology 1

• Introduction
• Morphology
• Morphological Analysis (MA)
• Using FS techniques in MA
• Automatic learning of the morphology of a
language

NLP Morphology 1
Morphology 2
• Morphology
• Structure of a word as a composition of morphemes
• Related to word formation rules
• Functions
• Inflection
• Derivation
• Composition
• Result of morphologic analysis
• Morphosyntactic categorization (POS)
• e.g. Parole tagset (VMIP1S0), more than 150 categories for Spanish
• e.g. Penn Treebank tagset (VBD), about 30 categories for English
• Morphological features
• Number, case, gender, lexical functions

NLP Morphology 2
Morphology 3
• Morphologic analysis
• Decompose a word into a concatenation of
morphemes
• Usually some of the morphemes contain the meaning
• One (root or stem) in flexion and derivation
• More than one in composition
• The other (affixes) provide morphological features
• Problems
• Phonological alterations in morpheme concatenation
• Morphotactics
• Which morphemes can be concatenated with which others

NLP Morphology 3
Morphology 4
• Problems
• Affixes
• Suffixes, prefixes, infixes, interfixes
• Inflectional affixes ≠ derivational affixes
• Derivation implies sometimes a semantic change not always
predictible
• Meaning extensions
• Lexical rules
• A derivativational suffix can be followed by an inflectional one
• love => lover => lovers
• Inflection does not change POS, sometimes derivation does
• Inflection affects other words in the sentence
• agreement

NLP Morphology 4
Morphology 5

• Morphotactics
• Word formation rules
• Valid combinations between morphemes
• Simple concatenation
• Complex models root/pattern
• Language dependency regularity
• Phonological alterations (Morphophonology)
• Changes when concatenating morphemes
• Source: Phonology, morphology, orthography
• variable in number and complexity
• e.g. vocalic harmony

NLP Morphology 5
Morphology 6
Morphemes
• 1 morpheme:
Evitar ( verb to avoid)
• 2 morphemes:
• evitable = evitar + able (adj: can be avoided)
• 3 morphemes:
• inevitable = in + evitar + able
(adj: cannot be avoided)
• 4 morphemes:
• inevitabilidad = in + evitar + able + idad
(noun: cannot be avoided)

NLP Morphology 6
Morphology 7
Inflectional Morphology
• number
• house houses
• cheval chevaux
• casa casas
• verbal form
• walk walkes walked walking
• amo amas aman ...
• gender
• niño niña
NLP Morphology 7
Morphology 8
Derivational Morphology
• Form
• Without change barcelonés
• Prefix inevitable
• Suffix importantísimo
• Source
• verb => adjective tardar => tardío
• verb => noun sufrir => sufrimiento
• noun => noun actor => actorazo
• noun => adjective atleta => atlético
• adjective => adjective rojo => rojizo
• adjective => adverb alegre => alegremente

NLP Morphology 8
Morphological Analysis 1
Types of morphological analyzers
Formaries
• Dictionaries of word forms
+ efficiency
+ Languages with few variants (e.g. English)
+ extensibility
+ Possibility of building and maintenance from a
morphological generator
– Languages with high flexive variation
– derivation, composition
• FS techniques
• FSA
• 1 level analyzers
• FST
• > 1 level analyzers

NLP Morphology 9
Morphological Analysis 2

Morphological analyzers of two levels


• General model for languages with morpheme
concatenation
• Independence between lingware and analyzer
• Valid for analysis and generation
• Distinction between lexical and superficial
levels
• Parallel rules for morphophonology
• Simple implementation

NLP Morphology 10
Morphological Analysis 3

• Morphological rules
• Define the relations betweens characters
(surface) and morphemes and map strings of
characters and the morphemic structure of the
word.
• Spelling rules
• Perform at the level of the letters forming the
word. Can be used to define the valid
phomological alterations.
• Ritchie, Pulman, Black, Russell, 1987

NLP Morphology 11
Morphological Analysis 4

• input:
• form
• output
• lemma + morphological features

Input Output
cat cat + N + sg
cats cat + N + pl
cities city + N + pl
merging merge + V + pres_part
caught (catch + V + past) or (catch + V + past_part)

NLP Morphology 12
Morphological Analysis 5

reg_noun irreg_pl_noun irreg_sg_noun plural


fox sheep sheep -s
cat mice mouse
dog

reg_noun plural (-s)

0 1 2

irreg_pl_noun
Morphotactics
irreg_sg_noun
NLP Morphology 13
Morphological Analysis 6

o
f
x
a
c t s
o
d g
ε
fog y
n
m e
cat e
e
dog o s
donkey u
mouse i
c
mice

Letter Transducers
NLP Morphology 14
Morphological Analysis 7

upper level lexic cat + N cat


+ N + pl
lower level surface cat
cats

c:c a:a t:t +N:ε +pl:s

NLP Morphology 15
Morphological Analysis 8
Using FST

• As a recognizer
• From a pair of input strings (one lexical and the other
superficial) determines if one is transduction of the other
• As a generator
• Generates pairs of strings
• As a translator
• From a superficial string generates its lexical translation

NLP Morphology 16
Morphological Analysis 9
reg_noun irreg_pl_noun irreg_sg_noun plural
fox sheep sheep s
cat m o:i u:ε ce mouse
dog g o:e o:e se goose

reg_noun +pl:s

+N:ε
0 irreg_sg_noun 1 4 2
+sg:ε
+N:ε
2 5 +sg:ε

irreg_pl_noun +N:ε +pl:ε


3 6

NLP Morphology 17
Morphological Analysis 10

lexical level f o x +N +pl


morphotactics
intermediate level f o x ^ s
spelling rules
superficial level f o x e
s

NLP Morphology 18
Morphological Analysis 11

o
f
x
a
c t +pl:^s
+N:ε
o
d g
+sg:ε
y
n
m e
e +sg:ε
fog o s
cat u e +pl:ε
dog o:i +N:ε
donkey +u:ε c
mouse e
mice +N:ε
NLP Morphology 19
Morphological Analysis 12
Spelling rules

name description example


consonant doubling single letter consonant
beg/begging doubled before -ing/-ed
e deletion silent e dropped before
-ing/-ed
make/making
e insertion e added after -s,-z,-x,-ch,-sh
before -s watch/watches
y replacement -y changes to -ie before -s, to
i before -ed try/tries
k insertion verbs ending with voyel +c
add -k panic/panicked

NLP Morphology 20
Morphological Analysis 13
Spelling rules: e-insertion

ε:e ⇔ [xsz]^:ε ___ s#

⇒ decomposition
/⇐

ε:e ⇒[xsz]^:ε ___ s# ε:ε /⇐ [xsz]^:ε ___ s#

NLP Morphology 21
Morphological Analysis 14

epenthesis

+:e <=> {< {s:s c:c} h:h> s:s x:x z:z} --- s:s

context

<=>
C: {...}
=> context restriction
V: {a,e,i,o,u,y}
<= surface coercion
C2: {...}
=: whatever
example: box + s
box e s
NLP Morphology 22
Morphological Analysis 15

e-deletion

e:0 <=> = :C2 --- <+:0 V:= >


or <C:C V:V> --- < +:0 e:e >
or <c:c g:g> --- < +:0 {e:e i:i} >
or l:0 --- +:0
or c:c --- < +:0 a:0 t:t b:b>
mov e + ed
mov ed

agre e + ed
agre ed

NLP Morphology 23
Morphological Analysis 16

a-deletion

a:0 <=> <c:c e:0 +:0> --- t:t

redu c e + a t ion
redu c t ion
... left context focus right context ...

NLP Morphology 24
Morphological Analysis 17

lexical level f o x +N +pl

Lexicon-FST

intermediate level f o x ^ s

spelling rules FST1 FST2 ... FSTn

superficial level f o x e s

NLP Morphology 25
Morphological Analysis 18

Lexicon-FST Lexicon-FST Lexicon-FST



FSTA

FST1 .. FSTn FSTA= FST1 ∧ ... ∧ FSTn


.

intersection composition
NLP Morphology 26
Automatic morphology learning 1

• Problem
• Paradigm stem + affixea
• Obtaining the stems
• Classification of stems into models
• Learning part of the morphology (e.g. derivational)
• Two approaches
• No previous morphologic knowledge is available
• Goldsmith, 2001
• Brent, 1999
• Snover, Brent, 2001, 2002
• Morphologic knowledge can be used
• Oliver at al, 2002
NLP Morphology 27
Automatic morphology learning 2

• Automatic morphological analysis


• Identification of borders betwen morphemes
• Zellig Harris
• {prefix, suffix} conditional entropy
• bigrams and trigrams with high probability of
forming a morpheme
• Learning of patterns or rules of mapping
between pairs of words
• Global approach (top-down)
• Golsdmith, Brent, de Marcken

NLP Morphology 28
Automatic morphology learning 3

• Goldsmith’s system based on MDL


(Minimum Description Length)
• Initial Partition: word -> stem + suffix
• split-all-words
• A good candidate to {stem, suffix} splitting in a word
has to be a good candidate in many other words
• MI (mutual information) strategy
• Faster convergence
• Learning Signatures
• {signatures, stem, suffixes}
• MDL

NLP Morphology 29
Automatic morphology learning 4

• Semi-automatic morphological analysis


• Oliver, 2004
• Starts with a set of manually written morphological
rules
• TL:TF:Desc
• lemma ending
• form ending
• POS
• Lists of non flexive classes , closed classes and
irregular words
• Corpora
• Serbo-Croatian 9 Mw
• Russian 16 Mw
NLP Morphology 30

You might also like